pith. machine review for the scientific record.
sign in

arxiv: 2509.24169 · v3 · submitted 2025-09-29 · 💻 cs.CL

Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

Pith reviewed 2026-05-18 13:17 UTC · model grok-4.3

classification 💻 cs.CL
keywords task vectorsin-context learningmechanistic interpretabilityattention circuitstransformer modelslarge language modelslinear propagation
0
0 comments X

The pith

Training task vectors directly yields higher accuracy and more flexible use than extracting them from model states during in-context learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that task vectors can be trained from scratch rather than pulled out of hidden states or outputs after seeing demonstrations. The resulting learned vectors achieve stronger task performance and can be placed at any layer or position, including alongside ordinary in-context prompts. The authors trace how these vectors affect computation: they act mainly through the output-value circuits of a few key attention heads. At a higher level the vectors move through the network in a largely linear fashion, with early ones rotated toward directions that boost correct labels and later ones simply increased in strength.

Core claim

Directly trained Learned Task Vectors surpass extracted task vectors in accuracy and can be inserted at arbitrary layers, positions, and even together with standard in-context prompts. At the circuit level they steer predictions through attention-head OV circuits, with a small subset of key heads carrying most of the effect. Despite the nonlinearities inside transformers, the vectors propagate largely linearly: early vectors are rotated into task-relevant subspaces that raise the logits of relevant labels, while later vectors are scaled up in magnitude.

What carries the argument

Learned Task Vectors trained end-to-end to steer predictions, operating through attention-head OV circuits and propagating via early rotation toward task subspaces followed by later magnitude scaling.

If this is right

  • Learned task vectors remain effective when inserted at any layer or token position.
  • They continue to work when combined with ordinary in-context learning demonstrations.
  • Prediction steering occurs primarily through the OV circuits of a small number of decisive attention heads.
  • Early vectors improve relevant label logits by rotating into task subspaces; later vectors do so mainly by growing in magnitude.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The ability to train vectors separately could let practitioners optimize task representations without retraining the full model.
  • The reported linear propagation may simplify future interventions that aim to edit or interpret task behavior inside transformers.
  • If the pattern generalizes, similar training could be applied to control other emergent capabilities such as reasoning or tool use.

Load-bearing premise

The accuracy gains and the linear propagation pattern observed for trained vectors are not limited to the particular training procedure, model sizes, or tasks used in the experiments.

What would settle it

Retraining learned task vectors on a new collection of tasks or a different model family and finding that extracted vectors match or exceed them in accuracy, or measuring that interventions on nonlinear components substantially change the reported rotation-and-scaling pattern.

Figures

Figures reproduced from arXiv: 2509.24169 by Hakaze Cho, Haolin Yang, Kaize Ding, Naoya Inoue.

Figure 1
Figure 1. Figure 1: (A) We directly train Learned Task Vectors (LTVs) to be injected, which influence model outputs through later layers updates. (B) In the low-level interactions between TVs and later layers, the OV circuits of attention heads are the crucial components interacting with TVs to induce their effects. (C) On a high level, subsequent layer updates act on TVs as a largely linear transformation of rotation and str… view at source ↗
Figure 2
Figure 2. Figure 2: Dataset-average accuracy of injecting the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (A) Mean and standard deviation of ratings for responses generated with ICL, LTV, FV, and Vanilla TV. (B) An example question and responses across settings. The results in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Cosine similarity heatmap of LTVs for seven tasks, showing inter-task separation and intra-task clustering. Adaptability of LTVs to complex task settings The tasks above have single-token labels and unique correct answers (e.g., The capital of China is → Beijing). To evaluate generalizability to a more complex generation task with multi-token responses—where the goal is to elicit a behavioral mode rather t… view at source ↗
Figure 5
Figure 5. Figure 5: Assessing the significance of attention heads in the low-level interactions between TVs [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: (A) Key attention heads cluster mainly in layers immediately after the injection (16 for Llama3.1-8B) and secondarily in final layers. (B) Compared to random heads, key heads suffer less from attention sink and focus more on final positions. See Subsection E.4 for other models. Assessing key attention heads leveraging the TV We further evaluate attention heads by identifying those most reliant on TVs for p… view at source ↗
Figure 7
Figure 7. Figure 7: (A) Metric values of hidden states across layers when the TV is injected at an early or late layer. (B) Tokens decoded from TVs, with early-layer TVs yielding random tokens and late-layer TVs producing task-related tokens. See Subsection F.1 for other models’ results. 0 4 8 12 16 20 24 28 Layer 0% 25% 50% 75% 100% Accuracy l Reconstructed l (a) Effect of linearly reconstructed TV. For other mod￾els see Sub… view at source ↗
Figure 8
Figure 8. Figure 8: (A) A reconstructed TV based on modeling θl’s influence as linear achieves comparable accuracy for most layers. (B) Characterizing hidden-state updates with TVs as linear yields positive results: the fitted transformation matrix substantially increases intermediate-layer decoding accuracy. TV θl is injected at an early or late layer l. We set l = L 4 for early and l = 3L 4 for late (8 and 24 for Llama3.1-8… view at source ↗
Figure 9
Figure 9. Figure 9: (A) Applying the rotation to TVs at different lay￾ers substantially increases alignment with unembeddings of task-related labels. (B) After rotation, early-layer TVs that originally decode random tokens produce task-related tokens. (C) The rotation effect diminishes for late-layer TVs as the estimated matrix approaches identity. Decomposition of TV’s influence mechanism While TVs injected at different laye… view at source ↗
Figure 10
Figure 10. Figure 10: Visualization of how we reconstruct the aggregate effect of TVs induced through the OV [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Myopic dataset: LTV vs. Vanilla TV and FV on Llama2-7B. [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Myopic dataset: LTV vs. Vanilla TV and FV on Llama2-13B. [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Myopic dataset: LTV vs. Vanilla TV and FV on Llama3-8B. [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Myopic dataset: LTV vs. Vanilla TV and FV on Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p025_18.png] view at source ↗
Figure 19
Figure 19. Figure 19: Myopic dataset: LTV vs. Vanilla TV on Llama3-70B. [PITH_FULL_IMAGE:figures/full_fig_p025_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Myopic dataset: LTV vs. Vanilla TV on Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Myopic dataset: LTV vs. Vanilla TV on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p026_21.png] view at source ↗
Figure 22
Figure 22. Figure 22: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama3-8B. [PITH_FULL_IMAGE:figures/full_fig_p026_22.png] view at source ↗
Figure 23
Figure 23. Figure 23: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p026_23.png] view at source ↗
Figure 24
Figure 24. Figure 24: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama3-70B. [PITH_FULL_IMAGE:figures/full_fig_p027_24.png] view at source ↗
Figure 25
Figure 25. Figure 25: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama2-7B. [PITH_FULL_IMAGE:figures/full_fig_p027_25.png] view at source ↗
Figure 26
Figure 26. Figure 26: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama2-13B. [PITH_FULL_IMAGE:figures/full_fig_p028_26.png] view at source ↗
Figure 27
Figure 27. Figure 27: Cosine-similarity heatmap of LTVs trained for seven tasks on Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p028_27.png] view at source ↗
Figure 28
Figure 28. Figure 28: Cosine-similarity heatmap of LTVs trained for seven tasks on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p029_28.png] view at source ↗
Figure 29
Figure 29. Figure 29: Attention heads and TV on Llama3-8B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p029_29.png] view at source ↗
Figure 30
Figure 30. Figure 30: Attention heads and TV on Llama3.2-3B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p029_30.png] view at source ↗
Figure 31
Figure 31. Figure 31: Attention heads and TV on Llama2-7B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_31.png] view at source ↗
Figure 32
Figure 32. Figure 32: Attention heads and TV on Llama2-13B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_32.png] view at source ↗
Figure 33
Figure 33. Figure 33: Attention heads and TV on Llama3-70B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_33.png] view at source ↗
Figure 34
Figure 34. Figure 34: Attention heads and TV on Qwen2.5-32B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_34.png] view at source ↗
Figure 35
Figure 35. Figure 35: Attention heads and TV on Yi-34B: OV-circuit reconstruction (left) and ablation of key [PITH_FULL_IMAGE:figures/full_fig_p031_35.png] view at source ↗
Figure 36
Figure 36. Figure 36: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p031_36.png] view at source ↗
Figure 37
Figure 37. Figure 37: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p031_37.png] view at source ↗
Figure 38
Figure 38. Figure 38: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p032_38.png] view at source ↗
Figure 39
Figure 39. Figure 39: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p032_39.png] view at source ↗
Figure 40
Figure 40. Figure 40: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p032_40.png] view at source ↗
Figure 41
Figure 41. Figure 41: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p033_41.png] view at source ↗
Figure 42
Figure 42. Figure 42: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p033_42.png] view at source ↗
Figure 43
Figure 43. Figure 43: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p033_43.png] view at source ↗
Figure 44
Figure 44. Figure 44: Key attention heads on Llama3-8B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_44.png] view at source ↗
Figure 45
Figure 45. Figure 45: Key attention heads on Llama3.2-3B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_45.png] view at source ↗
Figure 46
Figure 46. Figure 46: Key attention heads on Llama3-70B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_46.png] view at source ↗
Figure 47
Figure 47. Figure 47: Key attention heads on Llama2-7B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_47.png] view at source ↗
Figure 48
Figure 48. Figure 48: Key attention heads on Llama2-13B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p035_48.png] view at source ↗
Figure 49
Figure 49. Figure 49: Key attention heads on Qwen2.5-32B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p035_49.png] view at source ↗
Figure 50
Figure 50. Figure 50: Key attention heads on Yi-34B: distribution across layers (left) and attention over token [PITH_FULL_IMAGE:figures/full_fig_p035_50.png] view at source ↗
Figure 51
Figure 51. Figure 51: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p035_51.png] view at source ↗
Figure 52
Figure 52. Figure 52: Metrics across layers on Llama3.2-3B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p036_52.png] view at source ↗
Figure 53
Figure 53. Figure 53: Metrics across layers on Llama3-70B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p036_53.png] view at source ↗
Figure 54
Figure 54. Figure 54: Metrics across layers on Llama2-7B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p036_54.png] view at source ↗
Figure 55
Figure 55. Figure 55: Metrics across layers on Llama2-13B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p037_55.png] view at source ↗
Figure 56
Figure 56. Figure 56: Metrics across layers on Qwen2.5-32B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p037_56.png] view at source ↗
Figure 57
Figure 57. Figure 57: Metrics across layers on Yi-34B when the TV is injected into the hidden state at an early [PITH_FULL_IMAGE:figures/full_fig_p037_57.png] view at source ↗
Figure 58
Figure 58. Figure 58: Linear hypothesis on Llama3-8B: linearly reconstructed TV (left) and linear surrogate for [PITH_FULL_IMAGE:figures/full_fig_p038_58.png] view at source ↗
Figure 59
Figure 59. Figure 59: Linear hypothesis on Llama3.2-3B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p038_59.png] view at source ↗
Figure 60
Figure 60. Figure 60: Linear hypothesis on Llama3-70B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p038_60.png] view at source ↗
Figure 61
Figure 61. Figure 61: Linear hypothesis on Llama2-7B: linearly reconstructed TV (left) and linear surrogate for [PITH_FULL_IMAGE:figures/full_fig_p038_61.png] view at source ↗
Figure 62
Figure 62. Figure 62: Linear hypothesis on Llama2-13B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p039_62.png] view at source ↗
Figure 63
Figure 63. Figure 63: Linear hypothesis on Qwen2.5-32B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p039_63.png] view at source ↗
Figure 64
Figure 64. Figure 64: Linear hypothesis on Yi-34B: linearly reconstructed TV (left) and linear surrogate for [PITH_FULL_IMAGE:figures/full_fig_p039_64.png] view at source ↗
Figure 65
Figure 65. Figure 65: Rotation analysis on Llama3-8B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p039_65.png] view at source ↗
Figure 66
Figure 66. Figure 66: Rotation analysis on Llama3.2-3B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p040_66.png] view at source ↗
Figure 67
Figure 67. Figure 67: Rotation analysis on Llama3-70B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p040_67.png] view at source ↗
Figure 68
Figure 68. Figure 68: Rotation analysis on Llama2-7B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p040_68.png] view at source ↗
Figure 69
Figure 69. Figure 69: Rotation analysis on Llama2-13B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p041_69.png] view at source ↗
Figure 70
Figure 70. Figure 70: Rotation analysis on Qwen2.5-32B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p041_70.png] view at source ↗
Figure 71
Figure 71. Figure 71: Rotation analysis on Yi-34B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p041_71.png] view at source ↗
Figure 72
Figure 72. Figure 72: Average attention distribution of Llama3.1-8B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p042_72.png] view at source ↗
Figure 73
Figure 73. Figure 73: Average attention distribution of Llama3-8B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p042_73.png] view at source ↗
Figure 74
Figure 74. Figure 74: Average attention distribution of Llama3.2-3B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p042_74.png] view at source ↗
Figure 75
Figure 75. Figure 75: Average attention distribution of Llama3-70B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p043_75.png] view at source ↗
Figure 76
Figure 76. Figure 76: Average attention distribution of Llama2-7B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p043_76.png] view at source ↗
Figure 77
Figure 77. Figure 77: Average attention distribution of Llama2-13B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p043_77.png] view at source ↗
Figure 78
Figure 78. Figure 78: Average attention distribution of Qwen2.5-32B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p044_78.png] view at source ↗
Figure 79
Figure 79. Figure 79: Average attention distribution of Yi-34B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p044_79.png] view at source ↗
Figure 80
Figure 80. Figure 80: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p044_80.png] view at source ↗
Figure 81
Figure 81. Figure 81: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p044_81.png] view at source ↗
Figure 82
Figure 82. Figure 82: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_82.png] view at source ↗
Figure 83
Figure 83. Figure 83: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_83.png] view at source ↗
Figure 84
Figure 84. Figure 84: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_84.png] view at source ↗
Figure 85
Figure 85. Figure 85: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_85.png] view at source ↗
Figure 86
Figure 86. Figure 86: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_86.png] view at source ↗
Figure 87
Figure 87. Figure 87: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_87.png] view at source ↗
Figure 88
Figure 88. Figure 88: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_88.png] view at source ↗
Figure 89
Figure 89. Figure 89: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_89.png] view at source ↗
Figure 90
Figure 90. Figure 90: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_90.png] view at source ↗
Figure 91
Figure 91. Figure 91: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_91.png] view at source ↗
Figure 92
Figure 92. Figure 92: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_92.png] view at source ↗
Figure 93
Figure 93. Figure 93: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_93.png] view at source ↗
Figure 94
Figure 94. Figure 94: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p048_94.png] view at source ↗
Figure 95
Figure 95. Figure 95: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p048_95.png] view at source ↗
read the original abstract

Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of "key heads" most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 2 minor

Summary. The manuscript proposes directly training Learned Task Vectors (LTVs) rather than extracting task vectors (TVs) from LLM hidden states or outputs for in-context learning. It claims LTVs achieve higher accuracy and greater flexibility (effective at arbitrary layers/positions and compatible with ICL prompts). Mechanistically, TVs steer predictions primarily via attention-head OV circuits (with a small subset of key heads decisive), and despite Transformer nonlinearities, propagation is largely linear: early TVs are rotated toward task-relevant subspaces to boost relevant label logits, while later TVs are mainly scaled in magnitude.

Significance. If the empirical and mechanistic claims hold under broader testing, the work supplies both a practical method for obtaining high-quality task representations and interpretable insights into ICL mechanisms. The direct-training approach and the OV-circuit plus linear-propagation findings could inform future control and analysis of transformer behavior.

major comments (4)
  1. [§4] §4 (Experiments): The central claim that trained LTVs surpass extracted TVs in accuracy and flexibility lacks explicit quantitative results, baseline comparisons, statistical significance tests, and controls for training-procedure artifacts in the reported overview; without these the degree of support for superiority cannot be assessed.
  2. [§5.2] §5.2 (Mechanistic analysis): The assertion that steering occurs primarily through attention-head OV circuits with a decisive subset of key heads requires ablation results quantifying performance drop when those heads are masked or removed; otherwise the circuit-level claim remains qualitative.
  3. [§6.1] §6.1 (Propagation analysis): The higher-level claim that TV propagation is 'largely linear' (early rotation to task subspaces, later magnitude scaling) despite nonlinearities needs explicit quantitative metrics such as cosine similarity to linear approximations or residual nonlinear error; absent these, the summary risks overstatement.
  4. [§4.3] §4.3 and §7 (Generalization): The observed performance and mechanistic advantages may be tied to the specific training objective, model family, or narrow task distribution; additional experiments across architectures and broader task sets are required to rule out artifacts and support the broader conclusions.
minor comments (2)
  1. [§2] Notation for extracted TVs versus LTVs should be introduced with a clear table or diagram in §2 to prevent reader confusion.
  2. [Figures] Figures showing key-head attention patterns and layer-wise propagation should include axis labels, error bars, and layer indices for immediate readability.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and describe the revisions we will incorporate to strengthen the empirical and mechanistic claims.

read point-by-point responses
  1. Referee: [§4] §4 (Experiments): The central claim that trained LTVs surpass extracted TVs in accuracy and flexibility lacks explicit quantitative results, baseline comparisons, statistical significance tests, and controls for training-procedure artifacts in the reported overview; without these the degree of support for superiority cannot be assessed.

    Authors: We agree that the main-text overview would benefit from greater explicitness. The manuscript already reports quantitative accuracy comparisons, baseline methods, and some statistical tests in the experimental section and appendices, but we will expand §4 to prominently feature key numerical results, direct baseline contrasts, significance testing, and controls for training-procedure artifacts so that the superiority claims can be evaluated directly from the main text. revision: yes

  2. Referee: [§5.2] §5.2 (Mechanistic analysis): The assertion that steering occurs primarily through attention-head OV circuits with a decisive subset of key heads requires ablation results quantifying performance drop when those heads are masked or removed; otherwise the circuit-level claim remains qualitative.

    Authors: We accept that the current identification of key heads via importance metrics would be strengthened by direct ablation. In the revised manuscript we will add ablation experiments that mask or remove the identified key heads and report the resulting performance drops, thereby providing quantitative support for their decisive role within the OV circuits. revision: yes

  3. Referee: [§6.1] §6.1 (Propagation analysis): The higher-level claim that TV propagation is 'largely linear' (early rotation to task subspaces, later magnitude scaling) despite nonlinearities needs explicit quantitative metrics such as cosine similarity to linear approximations or residual nonlinear error; absent these, the summary risks overstatement.

    Authors: We agree that additional quantitative grounding is warranted. We will augment §6.1 with explicit metrics, including cosine similarity between observed propagation trajectories and their linear approximations as well as measures of residual nonlinear error, to substantiate the characterization of propagation as largely linear. revision: yes

  4. Referee: [§4.3] §4.3 and §7 (Generalization): The observed performance and mechanistic advantages may be tied to the specific training objective, model family, or narrow task distribution; additional experiments across architectures and broader task sets are required to rule out artifacts and support the broader conclusions.

    Authors: We acknowledge the scope limitation. Our experiments prioritized depth of mechanistic analysis on standard models and tasks. In the revision we will add experiments on at least one additional architecture and an expanded task distribution to provide evidence that the reported advantages are not artifacts of the original experimental setting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on training and analysis without definitional reduction

full rationale

The paper's core contribution is the proposal to directly train Learned Task Vectors (LTVs) rather than extract them, followed by empirical comparisons of accuracy and flexibility plus mechanistic dissection via attention OV circuits and observed linear propagation patterns. No equations, self-citations, or uniqueness theorems are invoked in the provided text that would make any performance gain or linearity claim equivalent to its inputs by construction. The derivation chain consists of experimental procedures and internal model analysis that remain independent of the target results; the claims are falsifiable via replication on different models or tasks and do not reduce to fitted parameters renamed as predictions or ansatzes smuggled through prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are described. Training LTVs presumably involves standard optimization but details are absent.

pith-pipeline@v0.9.0 · 5754 in / 1207 out tokens · 51185 ms · 2026-05-18T13:17:25.001454+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    Yi: Open foundation models by 01.ai, 2024

    AI, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, and Zongho...

  2. [2]

    Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, and Min Zhang

    URLhttps://arxiv.org/abs/2401.01967. Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, and Min Zhang. In-context learning state vector with inner and momentum optimization, 2024a. URLhttps://arxiv.org/abs/ 2404.11225. Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. Inference- time intervention: Eliciting truthf...

  3. [3]

    Would you like a donut now, or two donuts in an hour?

    Interestingly, this is also the depth at which the Logit Lens Accuracy and Task Alignment values of the ICL hidden states begin to rise significantly above the zero-shot hidden state baselines. This is consistent with previous findings (Yang et al., 2025a), which report that ICL features a distinct transition pattern where hidden states increasingly align...

  4. [6]

    More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 38.26% 1.96% 14.16% 18.85% 13.30% 52.82% FV 51.81% 1.40% 28.60% 47.14% 20.44% 73.23% LTV 82.54%↑30.73% 79.34%↑77.38% 84.60%↑56.00% 82.24%↑35.10% 51.60%↑31.16% 85.16%↑11.93% 22 Preprint. Under Review. 0 2 4 6 8 101214161820222426283032343638 Layer of TV Injection 0% 20% 40% 60% 80%Va...

  5. [9]

    More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 27.67% 1.84% 16.42% 20.46% 16.07% 43.84% FV 41.59% 1.22% 42.25% 36.97% 24.74% 77.51% LTV 80.33%↑38.74% 71.53%↑69.69% 87.69%↑45.44% 82.25%↑45.28% 51.46%↑26.72% 84.99%↑7.48% 23 Preprint. Under Review. 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Layer of TV Injection 0% 20% 40% 60% 80%Value Z...

  6. [12]

    More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 31.69% 2.02% 1.05% 26.68% 0.33% 75.83% FV 33.28% 2.93% 18.38% 16.95% 17.72% 53.93% LTV 76.26%↑42.98% 76.22%↑73.29% 77.93%↑59.55% 83.48%↑56.80% 44.82%↑27.10% 84.51%↑8.68% Table 6: Comparison of LTV vs. FV and Vanilla TV across five scenarios on Llama3-8B. 24 Preprint. Under Review. I...

  7. [15]

    More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 42.61% 3.07% 18.73% 37.05% 11.33% 65.38% FV 19.54% 3.53% 15.07% 4.69% 13.26% 62.12% LTV 78.65%↑36.04% 74.10%↑70.57% 78.18%↑59.45% 80.43%↑43.38% 46.38%↑33.12% 82.80%↑17.42% Table 7: Comparison of LTV vs. FV and Vanilla TV across five scenarios on Llama3.2-3B. Method Baseline P={−1},L={40}

  8. [18]

    More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts LTV 78.18% 75.34% 76.13% 75.59% 48.75% 88.40% Table 8: Performance of LTV across settings on Llama3-70B. 27 Preprint. Under Review. SST-2TRECSNLI RTE CapitalCapitalizeAntonym SST-2 TREC SNLI RTE Capital Capitalize Antonym 0.0 0.2 0.4 0.6 0.8 1.0 Cosine Similarity Figure 26: Cosine-similarity h...

  9. [21]

    More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts LTV 75.59% 36.04% 75.20% 87.24% 53.30% 87.08% Table 9: Performance of LTV across settings on Qwen2.5-32B. Method Baseline P={−1},L={30}

  10. [22]

    More Pos. P={−5, . . . ,−1}

  11. [23]

    More layers L={0,4,8, . . .}

  12. [24]

    More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts LTV 81.37% 73.53% 84.39% 82.47% 51.29% 89.69% Table 10: Performance of LTV across settings on Yi-34B. 28 Preprint. Under Review. SST-2TRECSNLI RTE CapitalCapitalizeAntonym SST-2 TREC SNLI RTE Capital Capitalize Antonym 0.2 0.4 0.6 0.8 1.0 Cosine Similarity Figure 28: Cosine-similarity heatmap ...