and Dooms, Thomas and Rigg, Alice and Oramas, Jose M

Pearce, Michael T · 2024 · arXiv 2410.08417

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

WriteSAE: Sparse Autoencoders for Recurrent State

cs.LG · 2026-05-12 · unverdicted · novelty 8.0 · 3 refs

WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.

When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

Tensor similarity is a symmetry-invariant metric that measures functional equivalence between tensor-based networks using a recursive algorithm for cross-layer mechanisms.

Task complexity shapes internal representations and robustness in neural networks

cs.LG · 2025-08-07 · unverdicted · novelty 7.0

Harder classification tasks produce neural representations whose accuracy collapses under binarization and shuffling while easier tasks remain robust, defining task complexity via the performance gap between full-precision and perturbed networks.

citing papers explorer

Showing 3 of 3 citing papers.

WriteSAE: Sparse Autoencoders for Recurrent State cs.LG · 2026-05-12 · unverdicted · none · ref 7 · 3 links
WriteSAE introduces sparse autoencoders with rank-1 matrix atoms for recurrent state updates, allowing replacement tests that outperform deletion on 92.4% of positions and a formula predicting logit changes with R²=0.98.
When Are Two Networks the Same? Tensor Similarity for Mechanistic Interpretability cs.LG · 2026-05-14 · unverdicted · none · ref 24
Tensor similarity is a symmetry-invariant metric that measures functional equivalence between tensor-based networks using a recursive algorithm for cross-layer mechanisms.
Task complexity shapes internal representations and robustness in neural networks cs.LG · 2025-08-07 · unverdicted · none · ref 49
Harder classification tasks produce neural representations whose accuracy collapses under binarization and shuffling while easier tasks remain robust, defining task complexity via the performance gap between full-precision and perturbed networks.

and Dooms, Thomas and Rigg, Alice and Oramas, Jose M

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer