Diab, Virginia Smith, and Kun Zhang

Xiangchen Song, Aashiq Muhamed, Yujia Zheng, Lingjing Kong, Zeyu Tang, Mona T · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE)

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

Aligned training reparameterizes SAEs to enforce unit alignment between encoder and decoder directions, yielding Pareto gains on SAEBench while removing dead features and improving stability.

Are Sparse Autoencoder Benchmarks Reliable?

cs.LG · 2026-05-18 · unverdicted · novelty 6.0

An audit of SAEBench reveals that Targeted Probe Perturbation and Spurious Correlation Removal metrics fail reliability tests and should not be used to evaluate sparse autoencoders.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Aligned Training: A Parameter-Free Method to Improve Feature Quality and Stability of Sparse Autoencoders (SAE) cs.LG · 2026-05-18 · unverdicted · none · ref 30
Aligned training reparameterizes SAEs to enforce unit alignment between encoder and decoder directions, yielding Pareto gains on SAEBench while removing dead features and improving stability.
Are Sparse Autoencoder Benchmarks Reliable? cs.LG · 2026-05-18 · unverdicted · none · ref 30
An audit of SAEBench reveals that Targeted Probe Perturbation and Spurious Correlation Removal metrics fail reliability tests and should not be used to evaluate sparse autoencoders.

Diab, Virginia Smith, and Kun Zhang

fields

years

verdicts

representative citing papers

citing papers explorer