Towards monoseman- ticity: Decomposing language models with dictionary learn- ing.Transformer Circuits Thread, 2023

Trenton Bricken, Adly Templeton, Joshua Batson, Brian Chen, Adam Jermyn, Tom Conerly, Nick Turner, Cem Anil, Carson Denison, Amanda Askell, Robert Lasenby, Yifan Wu, Shauna Kravec, Nicholas Schiefer, Tim Maxwell, Nicholas Joseph, Zac Hatfie · 2023

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Improving Sparse Autoencoder with Dynamic Attention

cs.LG · 2026-04-16 · unverdicted · novelty 7.0

A cross-attention SAE with sparsemax attention achieves lower reconstruction loss and higher-quality concepts than fixed-sparsity baselines by making activation counts data-dependent.

citing papers explorer

Showing 1 of 1 citing paper.

Improving Sparse Autoencoder with Dynamic Attention cs.LG · 2026-04-16 · unverdicted · none · ref 4
A cross-attention SAE with sparsemax attention achieves lower reconstruction loss and higher-quality concepts than fixed-sparsity baselines by making activation counts data-dependent.

Towards monoseman- ticity: Decomposing language models with dictionary learn- ing.Transformer Circuits Thread, 2023

fields

years

verdicts

representative citing papers

citing papers explorer