Kiho Park, Yo Joong Choe, and Victor Veitch

URLhttps://arxiv · 2024 · arXiv 2411.13117

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations

cs.LG · 2026-05-27 · conditional · novelty 7.0

SA-GSAE with Bi-Jump-ReLU enables one latent to encode both polarities of anticorrelated features, Pareto-dominating or matching full-width gated SAEs while reducing dead latents by up to 500x on some LLM hookpoints.

The Rate-Distortion-Polysemanticity Tradeoff in SAEs

cs.LG · 2026-05-14 · unverdicted · novelty 6.0

SAEs exhibit a rate-distortion-polysemanticity tradeoff where monosemanticity increases rate and distortion, with optimal polysemanticity set by feature co-occurrence probabilities in the data.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations cs.LG · 2026-05-27 · conditional · none · ref 20
SA-GSAE with Bi-Jump-ReLU enables one latent to encode both polarities of anticorrelated features, Pareto-dominating or matching full-width gated SAEs while reducing dead latents by up to 500x on some LLM hookpoints.

Kiho Park, Yo Joong Choe, and Victor Veitch

fields

years

verdicts

representative citing papers

citing papers explorer