ATLAS shows constitutions induce recoverable latent geometry in LLMs that redistributes but remains detectable across models and neural perturbation data via source-defined families and AUC separations.
Abc align: Large language model alignment for safety & accuracy
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
extension 1
citation-polarity summary
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1roles
extension 1polarities
extend 1representative citing papers
citing papers explorer
-
ATLAS: Constitution-Conditioned Latent Geometry and Redistribution Across Language Models and Neural Perturbation Data
ATLAS shows constitutions induce recoverable latent geometry in LLMs that redistributes but remains detectable across models and neural perturbation data via source-defined families and AUC separations.