A single consistency instruction with harmful prior actions causes aligned frontier LLMs to select unsafe options at 91-98% rates in high-stakes domains, with escalation and inverse scaling by model size.
Advances in Neural Information Processing Systems (NeurIPS) , year =
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
years
2026 4roles
method 1polarities
use method 1representative citing papers
Sessa integrates attention within recurrent paths to achieve power-law memory tails and flexible non-decaying selective retrieval, outperforming baselines on long-context tasks.
An explanatory book that supplies a clear mental map and intuition for how Vision-Language Models combine vision and language capabilities.