Early signs of steganographic capabilities in frontier llms.arXiv preprint arXiv:2507.02737, 2025

Artur Zolkowski, Kei Nishimura-Gasparian, Robert McCarthy, Roland S Zimmermann, David Lindner · 2025 · arXiv 2507.02737

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CoT-Guard: Small Models for Strong Monitoring

cs.CR · 2026-05-12 · unverdicted · novelty 5.0

CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.

Risk Reporting for Developers' Internal AI Model Use

cs.CY · 2026-04-27 · unverdicted · novelty 4.0

A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.

citing papers explorer

Showing 2 of 2 citing papers.

CoT-Guard: Small Models for Strong Monitoring cs.CR · 2026-05-12 · unverdicted · none · ref 32
CoT-Guard is a 4B model using SFT and RL that achieves 75% G-mean^2 on hidden objective detection under prompt and code manipulation attacks, outperforming several larger models.
Risk Reporting for Developers' Internal AI Model Use cs.CY · 2026-04-27 · unverdicted · none · ref 50
A harmonized risk reporting standard for internal frontier AI model use, structured around autonomous misbehavior and insider threats using means, motive, and opportunity factors.

Early signs of steganographic capabilities in frontier llms.arXiv preprint arXiv:2507.02737, 2025

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer