Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
A mathematical framework for transformer circuits
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Coactivation of sparse autoencoder features reveals causal semantic modules for concepts and relations in LLMs that can be ablated or amplified to produce predictable and counterfactual changes in outputs.
citing papers explorer
-
Not How Many, But Which: Parameter Placement in Low-Rank Adaptation
Gradient-informed placement of LoRA parameters recovers full performance under GRPO while random placement does not, due to differences in gradient rank and stability across training regimes.
-
Sparse Feature Coactivation Reveals Causal Semantic Modules in Large Language Models
Coactivation of sparse autoencoder features reveals causal semantic modules for concepts and relations in LLMs that can be ablated or amplified to produce predictable and counterfactual changes in outputs.