Unpack decomposes transformer credit via a unified backward recursion on the φ(S)U template, recovering known IOI circuits with mode labels and showing consistent duplicate-name suppression across Pythia scales from a single forward pass.
Proceedings of the 34th International Conference on Machine Learning - Volume 70 , pages =
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
Unlearned language models retain low calibration error but show increased shortcut reliance on the TOFU benchmark, extending the reliability paradox to machine unlearning.
Aerodynamic pressure signals enable real-time, interpretable detection and severity classification of structural damage in elastic beam-like structures via CNN enhanced with physics insights and XAI.
citing papers explorer
-
Every Component is a Lookup: Token Attribution and Composition from a Single Decomposition
Unpack decomposes transformer credit via a unified backward recursion on the φ(S)U template, recovering known IOI circuits with mode labels and showing consistent duplicate-name suppression across Pythia scales from a single forward pass.
-
Calibration vs Decision Making: Revisiting the Reliability Paradox in Unlearned Language Models
Unlearned language models retain low calibration error but show increased shortcut reliance on the TOFU benchmark, extending the reliability paradox to machine unlearning.
-
Towards Interpretable Damage Detection based on Aerodynamic Pressure Measurements
Aerodynamic pressure signals enable real-time, interpretable detection and severity classification of structural damage in elastic beam-like structures via CNN enhanced with physics insights and XAI.