Introduces source-control certificates with Type-I guarantees and a sample-complexity bound for auditing clean-source activation patches on Qwen2.5-7B and Llama3-8B for GSM8K/MATH-500 CoT hijacks.
arXiv preprint arXiv:2507.16407 , year =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Auditing CoT Answer-Hijack Patches: Source-Control Certificates with Type-I Guarantees
Introduces source-control certificates with Type-I guarantees and a sample-complexity bound for auditing clean-source activation patches on Qwen2.5-7B and Llama3-8B for GSM8K/MATH-500 CoT hijacks.