A two-agent adversarial rewriting framework achieves 20-40% evasion rates against LLM-based misinformation detectors under strict black-box constraints with binary feedback only, far outperforming prior methods and linking success to specific architectural properties.
Available: https://arxiv.org/abs/2410.20940
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Agentic Adversarial Rewriting Exposes Architectural Vulnerabilities in Black-Box NLP Pipelines
A two-agent adversarial rewriting framework achieves 20-40% evasion rates against LLM-based misinformation detectors under strict black-box constraints with binary feedback only, far outperforming prior methods and linking success to specific architectural properties.