pith. sign in

Title resolution pending

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 2

years

2024 1 2023 1

verdicts

UNVERDICTED 2

roles

background 1

polarities

background 1

representative citing papers

Improving Dictionary Learning with Gated Sparse Autoencoders

cs.LG · 2024-04-24 · unverdicted · novelty 7.0

Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

Linear Representations of Sentiment in Large Language Models

cs.LG · 2023-10-23 · unverdicted · novelty 6.0

Sentiment is represented as a single linear direction in LLM activation space that is causally relevant across tasks and is summarized at punctuation and names in addition to charged words.

citing papers explorer

Showing 2 of 2 citing papers.

  • Improving Dictionary Learning with Gated Sparse Autoencoders cs.LG · 2024-04-24 · unverdicted · none · ref 91

    Gated SAEs decouple which features to use from how large their activations should be, applying the L1 penalty only to selection and thereby eliminating shrinkage while halving the number of firing features needed for good fidelity.

  • Linear Representations of Sentiment in Large Language Models cs.LG · 2023-10-23 · unverdicted · none · ref 30

    Sentiment is represented as a single linear direction in LLM activation space that is causally relevant across tasks and is summarized at punctuation and names in addition to charged words.