Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.
Does this advance safety along with, or as a consequence of, advancing other capabilities or the study of AI? □ E.3 E LABORATIONS AND OTHER CONSIDERATIONS
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2023 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Representation Engineering: A Top-Down Approach to AI Transparency
Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.