Feature hedging: Correlated features break narrow sparse autoencoders.arXiv preprint arXiv:2505.11756, 2025

David Chanin et al · 2025 · arXiv 2505.11756

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations

cs.LG · 2026-05-27 · conditional · novelty 7.0

SA-GSAE with Bi-Jump-ReLU enables one latent to encode both polarities of anticorrelated features, Pareto-dominating or matching full-width gated SAEs while reducing dead latents by up to 500x on some LLM hookpoints.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Sign-Aware Gated Sparse Autoencoders: Modeling Anticorrelated Features with Bi-Jump-ReLU Activations cs.LG · 2026-05-27 · conditional · none · ref 35
SA-GSAE with Bi-Jump-ReLU enables one latent to encode both polarities of anticorrelated features, Pareto-dominating or matching full-width gated SAEs while reducing dead latents by up to 500x on some LLM hookpoints.

Feature hedging: Correlated features break narrow sparse autoencoders.arXiv preprint arXiv:2505.11756, 2025

fields

years

verdicts

representative citing papers

citing papers explorer