pith. sign in

Does this work advance progress on tasks that have been previously considered the subject of usual capabilities research? □

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

other 1

citation-polarity summary

fields

cs.LG 3

years

2024 2 2023 1

verdicts

UNVERDICTED 3

roles

other 1

polarities

unclear 1

representative citing papers

Representation Engineering: A Top-Down Approach to AI Transparency

cs.LG · 2023-10-02 · unverdicted · novelty 6.0

Representation engineering uses population-level representations in deep neural networks to monitor and manipulate cognitive phenomena like honesty and harmlessness, providing simple effective baselines for LLM safety.

citing papers explorer

Showing 3 of 3 citing papers.