PacifAIst: Benchmarking AI agent safety.arXiv preprint arXiv:2508.09762

Herrador, M · arXiv 2508.09762

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

The Compliance Trap: How Structural Constraints Degrade Frontier AI Metacognition Under Adversarial Pressure

cs.AI · 2026-05-04 · unverdicted · novelty 6.0 · 2 refs

Compliance-forcing instructions cause up to 30 percentage point drops in metacognitive accuracy across most frontier models, while removing the compliance element restores performance and Constitutional AI shows near-immunity.

citing papers explorer

Showing 1 of 1 citing paper.

The Compliance Trap: How Structural Constraints Degrade Frontier AI Metacognition Under Adversarial Pressure cs.AI · 2026-05-04 · unverdicted · none · ref 5 · 2 links
Compliance-forcing instructions cause up to 30 percentage point drops in metacognitive accuracy across most frontier models, while removing the compliance element restores performance and Constitutional AI shows near-immunity.

PacifAIst: Benchmarking AI agent safety.arXiv preprint arXiv:2508.09762

fields

years

verdicts

representative citing papers

citing papers explorer