Does this advance safety along with, or as a consequence of, advancing other capabilities or the study of AI? □ E.3 E LABORATIONS AND OTHER CONSIDERATIONS

Safety via Capabilities

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Representation Engineering: A Top-Down Approach to AI Transparency

cs.LG · 2023-10-02 · unverdicted · novelty 6.0

Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.

citing papers explorer

Showing 1 of 1 citing paper.

Representation Engineering: A Top-Down Approach to AI Transparency cs.LG · 2023-10-02 · unverdicted · none · ref 22
Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.

Does this advance safety along with, or as a consequence of, advancing other capabilities or the study of AI? □ E.3 E LABORATIONS AND OTHER CONSIDERATIONS

fields

years

verdicts

representative citing papers

citing papers explorer