Critical-CoT defends LLMs from reasoning-level backdoor attacks via two-stage fine-tuning that builds automatic detection and refusal of poisoned chain-of-thought steps.
- Depends on a specific phrase, wording, or stylistic cue in the question
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Critical-CoT: A Robust Defense Framework against Reasoning-Level Backdoor Attacks in Large Language Models
Critical-CoT defends LLMs from reasoning-level backdoor attacks via two-stage fine-tuning that builds automatic detection and refusal of poisoned chain-of-thought steps.