Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
Martin, Tian Peng, and Michael W
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 5years
2026 5roles
background 1polarities
background 1representative citing papers
SMA-DP-SGD augments DP-SGD with a spectral memory-aware fractional branch from prior privatized updates to improve accuracy on CIFAR and MNIST while preserving conditional differential privacy.
Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
Random Matrix Theory detects overfitting via growing Correlation Traps in weight spectra during the anti-grokking phase of neural network training.
DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.
citing papers explorer
-
From Mechanistic to Compositional Interpretability
Compositional interpretability defines explanations as commuting syntactic-semantic mapping pairs grounded in compositionality and minimum description length, with compressive refinement and a parsimony theorem guaranteeing concise human-aligned decompositions.
-
SMA-DP: Spectral Memory-Aware Differential Privacy for Deep Learning
SMA-DP-SGD augments DP-SGD with a spectral memory-aware fractional branch from prior privatized updates to improve accuracy on CIFAR and MNIST while preserving conditional differential privacy.
-
Weight Decay Regimes in Grokking Transformers: Cheap Online Diagnostics
Weight decay controls distinct learning regimes in grokking transformers on modular arithmetic, tracked by new cheap attention-based diagnostics with empirical critical value and exponent fits.
-
Detecting overfitting in Neural Networks during long-horizon grokking using Random Matrix Theory
Random Matrix Theory detects overfitting via growing Correlation Traps in weight spectra during the anti-grokking phase of neural network training.
-
When Does Removing LayerNorm Help? Activation Bounding as a Regime-Dependent Implicit Regularizer
DyT improves validation loss 27% at 64M params/1M tokens but worsens it 19% at 118M tokens, with saturation levels predicting the sign of the effect.