pith. machine review for the scientific record.
sign in

arxiv: 2509.24164 · v3 · submitted 2025-09-29 · 💻 cs.CL

Localizing Task Recognition and Task Learning in In-Context Learning via Attention Head Analysis

Pith reviewed 2026-05-18 13:23 UTC · model grok-4.3

classification 💻 cs.CL
keywords in-context learningattention headstask recognitiontask learningmechanistic interpretabilitylarge language modelshidden state geometry
0
0 comments X

The pith

Attention heads in large language models localize into separate groups for recognizing tasks from examples and for learning to apply them during in-context learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to reconcile two views of in-context learning by showing that its task-recognition and task-learning parts are carried out by distinct attention heads. It introduces a method to locate those heads and then uses correlation, ablation, and steering tests to confirm that one group aligns representations to the overall task while the other rotates them toward the right output label. A sympathetic reader would care because this supplies a concrete, head-level picture of how models extract and use patterns from a few examples without weight updates. If the account holds, it offers a single framework that absorbs earlier observations about induction heads and task vectors. The result is a more interpretable description of ICL that applies across different tasks and model scales.

Core claim

The central claim is that attention heads can be partitioned into TR heads, which align hidden states with a task subspace, and TL heads, which rotate states inside that subspace toward the correct label; these roles are isolated by Task Subspace Logit Attribution and verified through ablation and geometric steering experiments that also reconcile induction-head and task-vector findings.

What carries the argument

Task Subspace Logit Attribution (TSLA), which scores attention heads by their contribution to logits projected onto the task subspace and thereby isolates heads responsible for task recognition versus task learning.

If this is right

  • TR heads align hidden states with the task subspace while TL heads rotate states inside the subspace toward the correct label.
  • Ablating the localized TR or TL heads impairs the corresponding component of in-context learning.
  • Earlier observations on induction heads and task vectors fit inside the TR-TL decomposition at the level of individual attention heads.
  • The same framework accounts for in-context learning across varied tasks and model settings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Targeted editing of TR or TL heads during inference could selectively enhance or suppress task recognition or label selection.
  • The same localization technique might be applied to study other emergent behaviors such as chain-of-thought reasoning.
  • Repeating the analysis on models of different sizes or architectures could show whether the same heads or different ones carry the TR and TL roles.

Load-bearing premise

Task Subspace Logit Attribution isolates heads whose contributions causally produce the task-recognition versus task-learning split rather than merely correlating with it.

What would settle it

An ablation experiment in which removing the identified TR heads leaves task-recognition accuracy intact or in which steering the TL heads fails to shift predictions toward the correct label.

Figures

Figures reproduced from arXiv: 2509.24164 by Hakaze Cho, Haolin Yang, Naoya Inoue.

Figure 1
Figure 1. Figure 1: (A) Example of how LLMs deduce the label of a final query through ICL, which consists of two components: task recognition (identifying the label space) and task learning (mapping demonstration texts to labels). (B) The outputs of Task Recognition heads align with the task subspace spanned by candidate label unembeddings, thus they can steer the hidden states to align with the subspace by reducing the angle… view at source ↗
Figure 2
Figure 2. Figure 2: Overlap, correlation, and consistency of three at [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of the top 10% TR heads, TL heads, and IHs on SST-2. TR heads occur significantly deeper than TL heads and IHs. Overlaps between TR heads and IHs are also more frequent in deeper layers. Results for other models are in Subsection F.2. Layer-wise distribution of special heads [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Pairwise overlap and correlation of TR/TL heads, and IHs identified across datasets. TR heads and IHs are consistent across tasks, while TL heads vary greatly. See Subsection F.3 for other mod￾els. We now show that the TR and TL heads identified by our method indeed independently and effectively capture the respective TR and TL functionalities through ablation experiments. Prior studies have mainly measure… view at source ↗
Figure 6
Figure 6. Figure 6: Dataset-average effects of ablating TR and TL heads under input perturbations. [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effects of ablating the top 3% of different [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 7
Figure 7. Figure 7: Zero-shot accuracy gains from steer￾ing with task vectors constructed from TR, TL, or random heads. TR-based task vec￾tors consistently recover ICL-level accuracy, while TL-based vectors have weaker effects. Task-type dependence of steering effectiveness Note that the relative ineffectiveness of TL heads as task vectors can be partly attributed to the clas￾sification datasets we use, where performance is t… view at source ↗
Figure 8
Figure 8. Figure 8: Ratings in the review generation task when steering with TR, TL, or random TVs. TL vectors yield the largest improvements, reflecting their strength in capturing in-context mappings. Geometric effects of TR and TL outputs To understand the significance of TR/TL heads in ICL at a finer level than task vector experi￾ments, we invoke the geometric analysis of hid￾den states (Kirsanov et al., 2025; Yang et al.… view at source ↗
Figure 9
Figure 9. Figure 9: Geometric effects of TR and TL steering. TR outputs enhance alignment with the task subspace, while TL outputs rotate hidden states toward the correct label unem￾bedding within the subspace. The results in [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Layerwise verification of TR and TL geometric effects. TR heads enforce alignment with [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Decomposed geometric effects of TR and TL outputs. TR heads align hidden states to the task subspace; TL heads rotate states within the subspace toward correct label directions. TR Heads align to task space, TL heads rotate within it To support our geometric intuition from [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Effects of ablating the top 3% TR and TL heads identified using DLA, averaged across [PITH_FULL_IMAGE:figures/full_fig_p016_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Dataset-averaged Jaccard Coefficient, Kendall’s [PITH_FULL_IMAGE:figures/full_fig_p016_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Dataset-averaged overlap, correlation, and consistency analyses of TR and TL heads [PITH_FULL_IMAGE:figures/full_fig_p016_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Results on LLama3-8B: Effects of ablating top 10% TR heads identified using TSLA [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Results on Llama3.1-8B: Effects of ablating top 10% TR heads identified using TSLA [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Results on Llama3.2-3B: Effects of ablating top 10% TR heads identified using TSLA [PITH_FULL_IMAGE:figures/full_fig_p023_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Results on Qwen2-7B: Effects of ablating top 10% TR heads identified using TSLA or DLA [PITH_FULL_IMAGE:figures/full_fig_p023_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Results on Qwen2.5-32B: Effects of ablating top 10% TR heads identified using TSLA [PITH_FULL_IMAGE:figures/full_fig_p024_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Results on Yi-34B: Effects of ablating top 10% TR heads identified using TSLA or DLA [PITH_FULL_IMAGE:figures/full_fig_p024_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Results of overlap, correlation, and consistency analysis of attention head types averaged [PITH_FULL_IMAGE:figures/full_fig_p024_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Results of overlap, correlation, and consistency analysis of attention head types averaged [PITH_FULL_IMAGE:figures/full_fig_p025_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Results of overlap, correlation, and consistency analysis of attention head types averaged [PITH_FULL_IMAGE:figures/full_fig_p025_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Results of overlap, correlation, and consistency analysis of attention head types averaged [PITH_FULL_IMAGE:figures/full_fig_p025_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Results of overlap, correlation, and consistency analysis of attention head types averaged [PITH_FULL_IMAGE:figures/full_fig_p026_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Distribution of the top 10% TR heads, TL heads, and IHs across layers for the SST-2 [PITH_FULL_IMAGE:figures/full_fig_p026_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Distribution of the top 10% TR heads, TL heads, and IHs across layers for the SST-2 [PITH_FULL_IMAGE:figures/full_fig_p027_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Distribution of the top 10% TR heads, TL heads, and IHs across layers for the SST-2 [PITH_FULL_IMAGE:figures/full_fig_p027_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Distribution of the top 10% TR heads, TL heads, and IHs across layers for the SST-2 [PITH_FULL_IMAGE:figures/full_fig_p028_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Distribution of the top 10% TR heads, TL heads, and IHs across layers for the SST-2 [PITH_FULL_IMAGE:figures/full_fig_p028_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Overlap and correlation of the TR heads, TL heads, and IHs across datasets on Llama3.1- [PITH_FULL_IMAGE:figures/full_fig_p029_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Overlap and correlation of the TR heads, TL heads, and IHs across datasets on Llama3.2- [PITH_FULL_IMAGE:figures/full_fig_p029_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Overlap and correlation of the TR heads, TL heads, and IHs across datasets on Qwen2-7B. [PITH_FULL_IMAGE:figures/full_fig_p030_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Overlap and correlation of the TR heads, TL heads, and IHs across datasets on Qwen2.5- [PITH_FULL_IMAGE:figures/full_fig_p030_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: Overlap and correlation of the TR heads, TL heads, and IHs across datasets on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p031_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Effects of ablating the top 3% of TR, TL, and IH heads across datasets on Llama3.1-8B. [PITH_FULL_IMAGE:figures/full_fig_p031_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: Effects of ablating the top 3% of TR, TL, and IH heads across datasets on Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p031_37.png] view at source ↗
Figure 38
Figure 38. Figure 38: Effects of ablating the top 3% of TR, TL, and IH heads across datasets on Qwen2-7B. [PITH_FULL_IMAGE:figures/full_fig_p032_38.png] view at source ↗
Figure 39
Figure 39. Figure 39: Effects of ablating the top 3% of TR, TL, and IH heads across datasets on Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p032_39.png] view at source ↗
Figure 40
Figure 40. Figure 40: Effects of ablating the top 3% of TR, TL, and IH heads across datasets on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p032_40.png] view at source ↗
Figure 41
Figure 41. Figure 41: Effects of ablating TR heads identified on SST-2 when transferred to other datasets using [PITH_FULL_IMAGE:figures/full_fig_p033_41.png] view at source ↗
Figure 42
Figure 42. Figure 42: Effects of ablating TR heads identified on SST-2 when transferred to other datasets using [PITH_FULL_IMAGE:figures/full_fig_p033_42.png] view at source ↗
Figure 43
Figure 43. Figure 43: Effects of ablating TR heads identified on SST-2 when transferred to other datasets using [PITH_FULL_IMAGE:figures/full_fig_p033_43.png] view at source ↗
Figure 44
Figure 44. Figure 44: Effects of ablating TR heads identified on SST-2 when transferred to other datasets using [PITH_FULL_IMAGE:figures/full_fig_p034_44.png] view at source ↗
Figure 45
Figure 45. Figure 45: Effects of ablating TR heads identified on SST-2 when transferred to other datasets using [PITH_FULL_IMAGE:figures/full_fig_p034_45.png] view at source ↗
Figure 46
Figure 46. Figure 46: Effects of ablating TR heads identified on SST-2 when transferred to other datasets using [PITH_FULL_IMAGE:figures/full_fig_p034_46.png] view at source ↗
Figure 47
Figure 47. Figure 47: Effects of ablating TR and TL heads under perturbed ICL inputs on Llama3.1-8B. [PITH_FULL_IMAGE:figures/full_fig_p035_47.png] view at source ↗
Figure 48
Figure 48. Figure 48: Effects of ablating TR and TL heads under perturbed ICL inputs on Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p035_48.png] view at source ↗
Figure 49
Figure 49. Figure 49: Effects of ablating TR and TL heads under perturbed ICL inputs on Qwen2-7B. [PITH_FULL_IMAGE:figures/full_fig_p035_49.png] view at source ↗
Figure 50
Figure 50. Figure 50: Effects of ablating TR and TL heads under perturbed ICL inputs on Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p035_50.png] view at source ↗
Figure 51
Figure 51. Figure 51: Effects of ablating TR and TL heads under perturbed ICL inputs on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p035_51.png] view at source ↗
Figure 52
Figure 52. Figure 52: Effects of ablating TR and TL heads on Llama3-8B when demonstration labels are flipped. [PITH_FULL_IMAGE:figures/full_fig_p035_52.png] view at source ↗
Figure 53
Figure 53. Figure 53: Effects of ablating TR and TL heads on Llama3.1-8B when demonstration labels are [PITH_FULL_IMAGE:figures/full_fig_p036_53.png] view at source ↗
Figure 54
Figure 54. Figure 54: Effects of ablating TR and TL heads on Llama3.2-3B when demonstration labels are [PITH_FULL_IMAGE:figures/full_fig_p036_54.png] view at source ↗
Figure 55
Figure 55. Figure 55: Effects of ablating TR and TL heads on Qwen2-7B when demonstration labels are flipped. [PITH_FULL_IMAGE:figures/full_fig_p036_55.png] view at source ↗
Figure 56
Figure 56. Figure 56: Effects of ablating TR and TL heads on Qwen2.5-32B when demonstration labels are [PITH_FULL_IMAGE:figures/full_fig_p036_56.png] view at source ↗
Figure 57
Figure 57. Figure 57: Effects of ablating TR and TL heads on Yi-34B when demonstration labels are flipped. [PITH_FULL_IMAGE:figures/full_fig_p036_57.png] view at source ↗
Figure 58
Figure 58. Figure 58: Steering zero-shot hidden states of Llama3.1-8B using task vectors from TR, TL, or [PITH_FULL_IMAGE:figures/full_fig_p037_58.png] view at source ↗
Figure 59
Figure 59. Figure 59: Steering zero-shot hidden states of Llama3.2-3B using task vectors from TR, TL, or [PITH_FULL_IMAGE:figures/full_fig_p037_59.png] view at source ↗
Figure 60
Figure 60. Figure 60: Steering zero-shot hidden states of Qwen2-7B using task vectors from TR, TL, or random [PITH_FULL_IMAGE:figures/full_fig_p037_60.png] view at source ↗
Figure 61
Figure 61. Figure 61: Steering zero-shot hidden states of Qwen2.5-32B using task vectors from TR, TL, or [PITH_FULL_IMAGE:figures/full_fig_p038_61.png] view at source ↗
Figure 62
Figure 62. Figure 62: Steering zero-shot hidden states of Yi-34B using task vectors from TR, TL, or random [PITH_FULL_IMAGE:figures/full_fig_p038_62.png] view at source ↗
Figure 63
Figure 63. Figure 63: Mean and standard deviation of review ratings with Llama3.1-8B when task vectors from [PITH_FULL_IMAGE:figures/full_fig_p038_63.png] view at source ↗
Figure 64
Figure 64. Figure 64: Mean and standard deviation of review ratings with Llama3.2-3B when task vectors from [PITH_FULL_IMAGE:figures/full_fig_p038_64.png] view at source ↗
Figure 65
Figure 65. Figure 65: Mean and standard deviation of review ratings with Qwen2-7B when task vectors from [PITH_FULL_IMAGE:figures/full_fig_p039_65.png] view at source ↗
Figure 66
Figure 66. Figure 66: Mean and standard deviation of review ratings with Qwen2.5-32B when task vectors from [PITH_FULL_IMAGE:figures/full_fig_p039_66.png] view at source ↗
Figure 67
Figure 67. Figure 67: Mean and standard deviation of review ratings with Yi-34B when task vectors from [PITH_FULL_IMAGE:figures/full_fig_p039_67.png] view at source ↗
Figure 68
Figure 68. Figure 68: Geometric effects of TR and TL head outputs on hidden states in Llama3.1-8B. [PITH_FULL_IMAGE:figures/full_fig_p040_68.png] view at source ↗
Figure 69
Figure 69. Figure 69: Geometric effects of TR and TL head outputs on hidden states in Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p040_69.png] view at source ↗
Figure 70
Figure 70. Figure 70: Geometric effects of TR and TL head outputs on hidden states in Qwen2-7B. [PITH_FULL_IMAGE:figures/full_fig_p040_70.png] view at source ↗
Figure 71
Figure 71. Figure 71: Geometric effects of TR and TL head outputs on hidden states in Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p041_71.png] view at source ↗
Figure 72
Figure 72. Figure 72: Geometric effects of TR and TL head outputs on hidden states in Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p041_72.png] view at source ↗
Figure 73
Figure 73. Figure 73: Impact of TL and TR head outputs on hidden states w.r.t. task subspace in Llama3.1-8B. [PITH_FULL_IMAGE:figures/full_fig_p041_73.png] view at source ↗
Figure 74
Figure 74. Figure 74: Impact of TL and TR head outputs on hidden states w.r.t. task subspace in Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p042_74.png] view at source ↗
Figure 75
Figure 75. Figure 75: Impact of TL and TR head outputs on hidden states w.r.t. task subspace in Qwen2-7B. [PITH_FULL_IMAGE:figures/full_fig_p042_75.png] view at source ↗
Figure 76
Figure 76. Figure 76: Impact of TL and TR head outputs on hidden states w.r.t. task subspace in Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p042_76.png] view at source ↗
Figure 77
Figure 77. Figure 77: Impact of TL and TR head outputs on hidden states w.r.t. task subspace in Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p043_77.png] view at source ↗
Figure 78
Figure 78. Figure 78: Layerwise correlation of TR and TL head effects on Llama3.1-8B. [PITH_FULL_IMAGE:figures/full_fig_p043_78.png] view at source ↗
Figure 79
Figure 79. Figure 79: Layerwise correlation of TR and TL head effects on Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p043_79.png] view at source ↗
Figure 80
Figure 80. Figure 80: Layerwise correlation of TR and TL head effects on Qwen2-7B. [PITH_FULL_IMAGE:figures/full_fig_p043_80.png] view at source ↗
Figure 81
Figure 81. Figure 81: Layerwise correlation of TR and TL head effects on Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p044_81.png] view at source ↗
Figure 82
Figure 82. Figure 82: Layerwise correlation of TR and TL head effects on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p044_82.png] view at source ↗
Figure 83
Figure 83. Figure 83: Layerwise ablation of TR and TL heads (top 3 per layer) in Llama3-8B. [PITH_FULL_IMAGE:figures/full_fig_p044_83.png] view at source ↗
Figure 84
Figure 84. Figure 84: Layerwise ablation of TR and TL heads (top 3 per layer) in Llama3.1-8B. [PITH_FULL_IMAGE:figures/full_fig_p044_84.png] view at source ↗
Figure 85
Figure 85. Figure 85: Layerwise ablation of TR and TL heads (top 3 per layer) in Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p045_85.png] view at source ↗
Figure 86
Figure 86. Figure 86: Layerwise ablation of TR and TL heads (top 3 per layer) in Qwen2-7B. [PITH_FULL_IMAGE:figures/full_fig_p045_86.png] view at source ↗
Figure 87
Figure 87. Figure 87: Layerwise ablation of TR and TL heads (top 3 per layer) in Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p045_87.png] view at source ↗
Figure 88
Figure 88. Figure 88: Layerwise ablation of TR and TL heads (top 3 per layer) in Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p045_88.png] view at source ↗
read the original abstract

We investigate the mechanistic underpinnings of in-context learning (ICL) in large language models by reconciling two dominant perspectives: the component-level analysis of attention heads and the holistic decomposition of ICL into Task Recognition (TR) and Task Learning (TL). We propose a novel framework based on Task Subspace Logit Attribution (TSLA) to identify attention heads specialized in TR and TL, and demonstrate their distinct yet complementary roles. Through correlation analysis, ablation studies, and input perturbations, we show that the identified TR and TL heads independently and effectively capture the TR and TL components of ICL. Using steering experiments with geometric analysis of hidden states, we reveal that TR heads promote task recognition by aligning hidden states with the task subspace, while TL heads rotate hidden states within the subspace toward the correct label to facilitate prediction. We further show how previous findings on ICL mechanisms, including induction heads and task vectors, can be reconciled with our attention-head-level analysis of the TR-TL decomposition. Our framework thus provides a unified and interpretable account of how large language models execute ICL across diverse tasks and settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper investigates the mechanistic basis of in-context learning (ICL) by reconciling attention-head component analysis with the Task Recognition (TR) / Task Learning (TL) decomposition. It introduces Task Subspace Logit Attribution (TSLA) to identify specialized TR and TL heads, then uses correlation analysis, ablation, input perturbations, and steering experiments with geometric analysis of hidden states to argue that TR heads align representations to the task subspace while TL heads rotate them within the subspace toward the correct label. The work also reconciles these findings with prior results on induction heads and task vectors.

Significance. If the causal claims hold, the framework offers a unified, head-level account of ICL that bridges two previously separate lines of work and could improve mechanistic interpretability across diverse tasks. The geometric steering results and reconciliation with induction heads and task vectors would be particularly valuable contributions if supported by rigorous controls.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (TSLA definition): the task subspace used for logit attribution is constructed from model activations that include the very heads under test. This creates a potential circularity risk in which TSLA surfaces heads whose activations co-vary with the subspace rather than causally producing the alignment or rotation. The ablation and steering experiments described in the abstract are consistent with either interpretation; an explicit test that the subspace is estimated from a held-out set of heads or layers is needed to establish causality.
  2. [§4.3] §4.3 (steering experiments): the geometric analysis claims TR heads promote alignment and TL heads promote rotation toward the correct label, yet no quantitative metrics (e.g., cosine similarity deltas, rotation angles with error bars, or comparison to random-head controls) are referenced in the abstract. Without these numbers it is difficult to judge whether the observed effects are large enough to support the central mechanistic claim.
minor comments (2)
  1. [Introduction] The abstract states that previous findings on induction heads and task vectors are reconciled, but the specific mapping (which heads correspond to which prior mechanism) is not previewed; a brief table or diagram in the introduction would improve readability.
  2. [Methods] Post-hoc head selection procedure is mentioned but not detailed in the abstract; the criteria for declaring a head “TR” versus “TL” (thresholds, statistical tests) should be stated explicitly in the methods.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped us identify areas to strengthen the causal interpretation and quantitative rigor of our work. We address each point below.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (TSLA definition): the task subspace used for logit attribution is constructed from model activations that include the very heads under test. This creates a potential circularity risk in which TSLA surfaces heads whose activations co-vary with the subspace rather than causally producing the alignment or rotation. The ablation and steering experiments described in the abstract are consistent with either interpretation; an explicit test that the subspace is estimated from a held-out set of heads or layers is needed to establish causality.

    Authors: We appreciate the referee's identification of this potential circularity. To mitigate this concern, we have re-estimated the task subspace using activations from held-out heads and layers, excluding those being evaluated by TSLA. The identified TR and TL heads remain consistent, and the ablation and steering results hold. We will include these additional controls and results in the revised manuscript to establish the causal nature of our findings. revision: yes

  2. Referee: [§4.3] §4.3 (steering experiments): the geometric analysis claims TR heads promote alignment and TL heads promote rotation toward the correct label, yet no quantitative metrics (e.g., cosine similarity deltas, rotation angles with error bars, or comparison to random-head controls) are referenced in the abstract. Without these numbers it is difficult to judge whether the observed effects are large enough to support the central mechanistic claim.

    Authors: We agree that explicit quantitative metrics would aid in evaluating the effect sizes. We will enhance the abstract and §4.3 with explicit quantitative metrics, including cosine similarity deltas, rotation angles with error bars, and comparisons to random-head controls. These additions will strengthen the support for our central mechanistic claims. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper defines TR/TL heads via TSLA logit attribution onto a task subspace estimated from activations, then validates the claimed alignment/rotation roles through separate correlation, ablation, steering, and geometric analyses. No equation or step reduces the identification or the causal claims to a fitted parameter or self-citation by construction; the subspace derivation and head selection are distinct from the intervention-based tests that support the mechanistic interpretation. The framework therefore does not collapse into its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The framework assumes a task subspace exists in hidden-state space and that logit attribution can isolate causal contributions of individual heads; no free parameters or invented entities are explicitly quantified in the abstract.

axioms (1)
  • domain assumption A task subspace exists in the model's hidden-state geometry that can be used to separate recognition from learning components.
    Invoked when defining TSLA and when interpreting alignment versus rotation effects.
invented entities (1)
  • Task Subspace Logit Attribution (TSLA) no independent evidence
    purpose: To identify attention heads specialized in TR versus TL.
    New attribution technique introduced in the paper.

pith-pipeline@v0.9.0 · 5725 in / 1365 out tokens · 32026 ms · 2026-05-18T13:23:40.319791+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 8 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    01. AI , Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, and Z...

  3. [3]

    Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-Voss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, et al. Language models are few-shot le...

  4. [4]

    Iteration head: A mechanistic study of chain-of-thought

    Vivien Cabannes, Charles Arnal, Wassim Bouaziz, Xingyu Yang, Francois Charton, and Julia Kempe. Iteration head: A mechanistic study of chain-of-thought. Advances in Neural Information Processing Systems, 37: 0 109101--109122, 2024

  5. [5]

    Revisiting in-context learning inference circuit in large language models

    Hakaze Cho, Mariko Kato, Yoshihiro Sakai, and Naoya Inoue. Revisiting in-context learning inference circuit in large language models. In The Thirteenth International Conference on Learning Representations, 2025 a . URL https://openreview.net/forum?id=xizpnYNvQq

  6. [6]

    Token-based decision criteria are suboptimal in in-context learning

    Hakaze Cho, Yoshihiro Sakai, Mariko Kato, Kenshiro Tanaka, Akira Ishii, and Naoya Inoue. Token-based decision criteria are suboptimal in in-context learning. In Luis Chiruzzo, Alan Ritter, and Lu Wang (eds.), Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technolog...

  7. [7]

    Summing up the facts: Additive mechanisms behind factual recall in llms

    Bilal Chughtai, Alan Cooney, and Neel Nanda. Summing up the facts: Additive mechanisms behind factual recall in llms. arXiv preprint arXiv:2402.07321, 2024

  8. [8]

    Induction heads as an essential mechanism for pattern matching in in-context learning

    Joy Crosbie and Ekaterina Shutova. Induction heads as an essential mechanism for pattern matching in in-context learning. arXiv preprint arXiv:2407.07011, 2024

  9. [9]

    The pascal recognising textual entailment challenge

    Ido Dagan, Oren Glickman, and Bernardo Magnini. The pascal recognising textual entailment challenge. In Machine learning challenges workshop, pp.\ 177--190. Springer, 2005. URL https://link.springer.com/chapter/10.1007/11736790_9

  10. [10]

    The commitmentbank: Investigating projection in naturally occurring discourse

    Marie-Catherine De Marneffe, Mandy Simons, and Judith Tonhauser. The commitmentbank: Investigating projection in naturally occurring discourse. In proceedings of Sinn und Bedeutung, volume 23, pp.\ 107--124, 2019. URL https://ojs.ub.uni-konstanz.de/sub/index.php/sub/article/view/601

  11. [11]

    A Survey on In-context Learning

    Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Tianyu Liu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning, 2024. URL https://arxiv.org/abs/2301.00234

  12. [12]

    A mathematical framework for transformer circuits

    Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1 0 (1): 0 12, 2021. URL https://transformer-circuits.pub/2021/framework/index.html

  13. [13]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, and Abhinav Pandey et al. The llama 3 herd of models, 2024. URL https://arxiv.org/abs/2407.21783

  14. [14]

    Overthinking the truth: Understanding how language models process false demonstrations, 2024

    Danny Halawi, Jean-Stanislas Denain, and Jacob Steinhardt. Overthinking the truth: Understanding how language models process false demonstrations, 2024. URL https://arxiv.org/abs/2307.09476

  15. [15]

    In-context learning creates task vectors

    Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. In Houda Bouamor, Juan Pino, and Kalika Bali (eds.), Findings of the Association for Computational Linguistics: EMNLP 2023, pp.\ 9318--9333, Singapore, December 2023. Association for Computational Linguistics. doi:10.18653/v1/2023.findings-emnlp.624. URL https://aclanthol...

  16. [16]

    From compression to expansion: A layerwise analysis of in-context learning, 2025

    Jiachen Jiang, Yuxin Dong, Jinxin Zhou, and Zhihui Zhu. From compression to expansion: A layerwise analysis of in-context learning, 2025. URL https://arxiv.org/abs/2505.17322

  17. [17]

    Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge conflicts in language models, 2024

    Zhuoran Jin, Pengfei Cao, Hongbang Yuan, Yubo Chen, Jiexin Xu, Huaijun Li, Xiaojian Jiang, Kang Liu, and Jun Zhao. Cutting off the head ends the conflict: A mechanism for interpreting and mitigating knowledge conflicts in language models, 2024. URL https://arxiv.org/abs/2402.18154

  18. [18]

    The atlas of in-context learning: How attention heads shape in-context retrieval augmentation

    Patrick Kahardipraja, Reduan Achtibat, Thomas Wiegand, Wojciech Samek, and Sebastian Lapuschkin. The atlas of in-context learning: How attention heads shape in-context retrieval augmentation. arXiv preprint arXiv:2505.15807, 2025

  19. [19]

    The geometry of prompting: Unveiling distinct mechanisms of task adaptation in language models

    Artem Kirsanov, Chi-Ning Chou, Kyunghyun Cho, and SueYeon Chung. The geometry of prompting: Unveiling distinct mechanisms of task adaptation in language models. In Luis Chiruzzo, Alan Ritter, and Lu Wang (eds.), Findings of the Association for Computational Linguistics: NAACL 2025, pp.\ 1855--1888, Albuquerque, New Mexico, April 2025. Association for Comp...

  20. [20]

    In-context learning state vector with inner and momentum optimization, 2024

    Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, and Min Zhang. In-context learning state vector with inner and momentum optimization, 2024. URL https://arxiv.org/abs/2404.11225

  21. [21]

    Learning question classifiers

    Xin Li and Dan Roth. Learning question classifiers. In COLING 2002: The 19th International Conference on Computational Linguistics , 2002. URL https://www.aclweb.org/anthology/C02-1150

  22. [22]

    Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla, 2023

    Tom Lieberum, Matthew Rahtz, János Kramár, Neel Nanda, Geoffrey Irving, Rohin Shah, and Vladimir Mikulik. Does circuit analysis interpretability scale? evidence from multiple choice capabilities in chinchilla, 2023. URL https://arxiv.org/abs/2307.09458

  23. [23]

    In-context vectors: Making in context learning more effective and controllable through latent space steering, 2024

    Sheng Liu, Haotian Ye, Lei Xing, and James Zou. In-context vectors: Making in context learning more effective and controllable through latent space steering, 2024. URL https://arxiv.org/abs/2311.06668

  24. [24]

    Bill MacCartney and Christopher D. Manning. Modeling semantic containment and exclusion in natural language inference. In Donia Scott and Hans Uszkoreit (eds.), Proceedings of the 22nd International Conference on Computational Linguistics (Coling 2008), pp.\ 521--528, Manchester, UK, August 2008. Coling 2008 Organizing Committee. URL https://aclanthology....

  25. [25]

    The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets

    Samuel Marks and Max Tegmark. The geometry of truth: Emergent linear structure in large language model representations of true/false datasets, 2024. URL https://arxiv.org/abs/2310.06824

  26. [26]

    Language models implement simple word2vec-style vector arithmetic, 2024

    Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. Language models implement simple word2vec-style vector arithmetic, 2024. URL https://arxiv.org/abs/2305.16130

  27. [27]

    Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?

    Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? arXiv preprint arXiv:2202.12837, 2022

  28. [28]

    In-context Learning and Induction Heads

    Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Scott Johnston, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, a...

  29. [29]

    GPT-4 Technical Report

    OpenAI, Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, Red Avila, Igor Babuschkin, Suchir Balaji, Valerie Balcom, Paul Baltescu, Haiming Bao, Mohammad Bavarian, Jeff Belgum, Irwan Bello, et al. Gpt-4 technical report, 2024. URL https://arxiv.org/...

  30. [30]

    What in-context learning learns in-context: Disentangling task recognition and task learning

    Jane Pan, Tianyu Gao, Howard Chen, and Danqi Chen. What in-context learning learns in-context: Disentangling task recognition and task learning. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki (eds.), Findings of the Association for Computational Linguistics: ACL 2023, pp.\ 8298--8319, Toronto, Canada, July 2023. Association for Computational Lingu...

  31. [31]

    Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales

    Bo Pang and Lillian Lee. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. arXiv preprint cs/0506075, 2005

  32. [32]

    Language models are unsupervised multitask learners

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners. OpenAI blog, 2019. URL https://d4mucfpksywv.cloudfront.net/better-language-models/language-models.pdf

  33. [34]

    Identifying semantic induction heads to understand in-context learning, 2024 b

    Jie Ren, Qipeng Guo, Hang Yan, Dongrui Liu, Quanshi Zhang, Xipeng Qiu, and Dahua Lin. Identifying semantic induction heads to understand in-context learning, 2024 b . URL https://arxiv.org/abs/2402.13055

  34. [35]

    Large language models encode semantics in low-dimensional linear subspaces, 2025

    Baturay Saglam, Paul Kassianik, Blaine Nelson, Sajana Weerawardhena, Yaron Singer, and Amin Karbasi. Large language models encode semantics in low-dimensional linear subspaces, 2025. URL https://arxiv.org/abs/2507.09709

  35. [36]

    What needs to go right for an induction head? a mechanistic study of in-context learning circuits and their formation

    Aaditya K Singh, Ted Moskovitz, Felix Hill, Stephanie CY Chan, and Andrew M Saxe. What needs to go right for an induction head? a mechanistic study of in-context learning circuits and their formation. In Forty-first International Conference on Machine Learning, 2024. URL https://arxiv.org/abs/2404.07129

  36. [37]

    Manning, Andrew Ng, and Christopher Potts

    Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Ng, and Christopher Potts. Recursive deep models for semantic compositionality over a sentiment treebank. In David Yarowsky, Timothy Baldwin, Anna Korhonen, Karen Livescu, and Steven Bethard (eds.), Proceedings of the 2013 Conference on Empirical Methods in Natural Langu...

  37. [38]

    Out-of-distribution generalization via composition: a lens through induction heads in transformers

    Jiajun Song, Zhuoyan Xu, and Yiqiao Zhong. Out-of-distribution generalization via composition: a lens through induction heads in transformers. Proceedings of the National Academy of Sciences, 122 0 (6): 0 e2417182122, 2025. URL https://arxiv.org/abs/2408.09503

  38. [39]

    Interpret and improve in-context learning via the lens of input-label mappings

    Chenghao Sun, Zhen Huang, Yonggang Zhang, Le Lu, Houqiang Li, Xinmei Tian, Xu Shen, and Jieping Ye. Interpret and improve in-context learning via the lens of input-label mappings. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 3873--3895, 2025

  39. [40]

    Black-box tuning for language-model-as-a-service

    JTianxiang Sun, Yunfan Shao, Hong Qian, Xuanjing Huang, and Xipeng Qiu. Black-box tuning for language-model-as-a-service. In Proceedings of the 39th International Conference on Machine Learning, pp.\ 20841--20855, Baltimore, Maryland, USA, 2022. ACM. URL https://proceedings.mlr.press/v162/sun22e/sun22e.pdf

  40. [41]

    Function vectors in large language models

    Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function vectors in large language models, 2024. URL https://arxiv.org/abs/2310.15213

  41. [42]

    Baselines and bigrams: Simple, good sentiment and topic classification

    Sida Wang and Christopher Manning. Baselines and bigrams: Simple, good sentiment and topic classification. In Haizhou Li, Chin-Yew Lin, Miles Osborne, Gary Geunbae Lee, and Jong C. Park (eds.), Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp.\ 90--94, Jeju Island, Korea, July 2012. Assoc...

  42. [43]

    arXiv:2303.03846

    Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, and Tengyu Ma. Larger language models do in-context learning differently, 2023. URL https://arxiv.org/abs/2303.03846

  43. [44]

    Qwen2 Technical Report

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng ...

  44. [45]

    Unifying attention heads and task vectors via hidden state geometry in in-context learning, 2025

    Haolin Yang, Hakaze Cho, Yiqiao Zhong, and Naoya Inoue. Unifying attention heads and task vectors via hidden state geometry in in-context learning, 2025. URL https://arxiv.org/abs/2505.18752

  45. [46]

    Which attention heads matter for in-context learning? arXiv preprint arXiv:2502.14010, 2025

    Kayo Yin and Jacob Steinhardt. Which attention heads matter for in-context learning? arXiv preprint arXiv:2502.14010, 2025

  46. [47]

    How do large language models learn in-context? query and key matrices of in-context heads are two towers for metric learning

    Zeping Yu and Sophia Ananiadou. How do large language models learn in-context? query and key matrices of in-context heads are two towers for metric learning. arXiv preprint arXiv:2402.02872, 2024

  47. [48]

    Beyond single concept vector: Modeling concept subspace in llms with gaussian distribution, 2025

    Haiyan Zhao, Heng Zhao, Bo Shen, Ali Payani, Fan Yang, and Mengnan Du. Beyond single concept vector: Modeling concept subspace in llms with gaussian distribution, 2025. URL https://arxiv.org/abs/2410.00153

  48. [49]

    Attention heads of large language models: A survey

    Zifan Zheng, Yezhaohui Wang, Yuxin Huang, Shichao Song, Mingchuan Yang, Bo Tang, Feiyu Xiong, and Zhiyu Li. Attention heads of large language models: A survey. arXiv preprint arXiv:2409.03752, 2024

  49. [50]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  50. [51]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  51. [52]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...