Transactions on Machine Learning Research , year=

Finding Neurons in a Haystack: Case Studies with Sparse Probing , author=

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Structural Instability of Feature Composition

cs.LG · 2026-04-18 · unverdicted · novelty 7.0

Feature composition in SAEs collapses asymptotically when the Gaussian mean width of the signal cone is exceeded, with ReLU inducing a ratchet-like accumulation of interference from correlations.

LLM Safety From Within: Detecting Harmful Content with Internal Representations

cs.AI · 2026-04-20 · unverdicted · novelty 6.0

SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.

citing papers explorer

Showing 2 of 2 citing papers.

Structural Instability of Feature Composition cs.LG · 2026-04-18 · unverdicted · none · ref 14
Feature composition in SAEs collapses asymptotically when the Gaussian mean width of the signal cone is exceeded, with ReLU inducing a ratchet-like accumulation of interference from correlations.
LLM Safety From Within: Detecting Harmful Content with Internal Representations cs.AI · 2026-04-20 · unverdicted · none · ref 25
SIREN identifies safety neurons via linear probing on internal LLM layers and combines them with adaptive weighting to detect harm, outperforming prior guard models with 250x fewer parameters.

Transactions on Machine Learning Research , year=

fields

years

verdicts

representative citing papers

citing papers explorer