DFA learns a differentiable alignment to transfer circuits from small to large language models, recovering target faithfulness competitive with direct methods on Llama-3 1B to 3B but degrading with scale and architecture differences.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Differentiable Faithfulness Alignment for Cross-Model Circuit Transfer
DFA learns a differentiable alignment to transfer circuits from small to large language models, recovering target faithfulness competitive with direct methods on Llama-3 1B to 3B but degrading with scale and architecture differences.