pith. sign in

Fan RK Chung.Spectral graph theory, volume

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2025 1

verdicts

UNVERDICTED 1

representative citing papers

Graph-Regularized Sparse Autoencoders for LLM Safety Steering

cs.LG · 2025-12-07 · unverdicted · novelty 6.0

GSAE improves selective refusal on safety benchmarks by smoothing SAE directions over a co-activation graph and applying them via a two-gate controller, outperforming standard SAEs and baselines on Llama-3 and other models.

citing papers explorer

Showing 1 of 1 citing paper.

  • Graph-Regularized Sparse Autoencoders for LLM Safety Steering cs.LG · 2025-12-07 · unverdicted · none · ref 3

    GSAE improves selective refusal on safety benchmarks by smoothing SAE directions over a co-activation graph and applying them via a two-gate controller, outperforming standard SAEs and baselines on Llama-3 and other models.