Harmful generation in LLMs relies on a compact, unified set of weights that alignment compresses and that are distinct from benign capabilities, explaining emergent misalignment.
Accessed: 2025-07-12
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
AI model evaluations for biological capabilities should prioritize high-consequence risks like pandemics, informed by life sciences dual-use experience, and occur prior to deployment to enable biosafety measures.
citing papers explorer
-
Large Language Models Generate Harmful Content Using a Distinct, Unified Mechanism
Harmful generation in LLMs relies on a compact, unified set of weights that alignment compresses and that are distinct from benign capabilities, explaining emergent misalignment.
-
Prioritizing High-Consequence Biological Capabilities in Evaluations of Artificial Intelligence Models
AI model evaluations for biological capabilities should prioritize high-consequence risks like pandemics, informed by life sciences dual-use experience, and occur prior to deployment to enable biosafety measures.