pith. sign in

arxiv: 2601.08146 · v3 · pith:6WNSXYFJnew · submitted 2026-01-13 · 💻 cs.CL · cs.AI· cs.LG

Beyond Transfer Accuracy: Faithful Circuits for Controlled Low-Resource Adaptation

classification 💻 cs.CL cs.AIcs.LG
keywords adaptationct-sftfine-tuningaccuracycircuitcircuit-targetedcircuitsdiscovery
0
0 comments X
read the original abstract

Existing circuit discovery methods rely on templated tasks with clean counterfactuals, limiting their use on diverse natural text. We adapt Contextual Decomposition for Transformers (CD-T) for unstructured settings via label-balanced activation means and task-directional relevance scoring, enabling counterfactual-free circuit discovery. We leverage these circuits for Circuit-Targeted Supervised Fine-Tuning (CT-SFT), restricting parameter updates to task-relevant heads and LayerNorm. Experiments on NusaX cross-lingual sentiment transfer show that CT-SFT is highly competitive for low-resource adaptation. While non-circuit sparse updates and full fine-tuning sometimes match target accuracy through capacity recruitment, CT-SFT uniquely minimizes catastrophic forgetting, preserving source-language and related-task performance. Extensions to XNLI confirm these findings hold across broader tasks and model families, demonstrating that circuit-targeted adaptation provides a safer, causally grounded alternative to global fine-tuning.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.