arxiv: 2509.24169 · v3 · submitted 2025-09-29 · 💻 cs.CL

Task Vectors, Learned Not Extracted: Performance Gains and Mechanistic Insight

Haolin Yang , Hakaze Cho , Kaize Ding , Naoya Inoue This is my paper

Pith reviewed 2026-05-18 13:17 UTC · model grok-4.3

classification 💻 cs.CL

keywords task vectorsin-context learningmechanistic interpretabilityattention circuitstransformer modelslarge language modelslinear propagation

0 comments

The pith

Training task vectors directly yields higher accuracy and more flexible use than extracting them from model states during in-context learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows that task vectors can be trained from scratch rather than pulled out of hidden states or outputs after seeing demonstrations. The resulting learned vectors achieve stronger task performance and can be placed at any layer or position, including alongside ordinary in-context prompts. The authors trace how these vectors affect computation: they act mainly through the output-value circuits of a few key attention heads. At a higher level the vectors move through the network in a largely linear fashion, with early ones rotated toward directions that boost correct labels and later ones simply increased in strength.

Core claim

Directly trained Learned Task Vectors surpass extracted task vectors in accuracy and can be inserted at arbitrary layers, positions, and even together with standard in-context prompts. At the circuit level they steer predictions through attention-head OV circuits, with a small subset of key heads carrying most of the effect. Despite the nonlinearities inside transformers, the vectors propagate largely linearly: early vectors are rotated into task-relevant subspaces that raise the logits of relevant labels, while later vectors are scaled up in magnitude.

What carries the argument

Learned Task Vectors trained end-to-end to steer predictions, operating through attention-head OV circuits and propagating via early rotation toward task subspaces followed by later magnitude scaling.

If this is right

Learned task vectors remain effective when inserted at any layer or token position.
They continue to work when combined with ordinary in-context learning demonstrations.
Prediction steering occurs primarily through the OV circuits of a small number of decisive attention heads.
Early vectors improve relevant label logits by rotating into task subspaces; later vectors do so mainly by growing in magnitude.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The ability to train vectors separately could let practitioners optimize task representations without retraining the full model.
The reported linear propagation may simplify future interventions that aim to edit or interpret task behavior inside transformers.
If the pattern generalizes, similar training could be applied to control other emergent capabilities such as reasoning or tool use.

Load-bearing premise

The accuracy gains and the linear propagation pattern observed for trained vectors are not limited to the particular training procedure, model sizes, or tasks used in the experiments.

What would settle it

Retraining learned task vectors on a new collection of tasks or a different model family and finding that extracted vectors match or exceed them in accuracy, or measuring that interventions on nonlinear components substantially change the reported rotation-and-scaling pattern.

Figures

Figures reproduced from arXiv: 2509.24169 by Hakaze Cho, Haolin Yang, Kaize Ding, Naoya Inoue.

**Figure 1.** Figure 1: (A) We directly train Learned Task Vectors (LTVs) to be injected, which influence model outputs through later layers updates. (B) In the low-level interactions between TVs and later layers, the OV circuits of attention heads are the crucial components interacting with TVs to induce their effects. (C) On a high level, subsequent layer updates act on TVs as a largely linear transformation of rotation and str… view at source ↗

**Figure 2.** Figure 2: Dataset-average accuracy of injecting the [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: (A) Mean and standard deviation of ratings for responses generated with ICL, LTV, FV, and Vanilla TV. (B) An example question and responses across settings. The results in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Cosine similarity heatmap of LTVs for seven tasks, showing inter-task separation and intra-task clustering. Adaptability of LTVs to complex task settings The tasks above have single-token labels and unique correct answers (e.g., The capital of China is → Beijing). To evaluate generalizability to a more complex generation task with multi-token responses—where the goal is to elicit a behavioral mode rather t… view at source ↗

**Figure 5.** Figure 5: Assessing the significance of attention heads in the low-level interactions between TVs [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

**Figure 6.** Figure 6: (A) Key attention heads cluster mainly in layers immediately after the injection (16 for Llama3.1-8B) and secondarily in final layers. (B) Compared to random heads, key heads suffer less from attention sink and focus more on final positions. See Subsection E.4 for other models. Assessing key attention heads leveraging the TV We further evaluate attention heads by identifying those most reliant on TVs for p… view at source ↗

**Figure 7.** Figure 7: (A) Metric values of hidden states across layers when the TV is injected at an early or late layer. (B) Tokens decoded from TVs, with early-layer TVs yielding random tokens and late-layer TVs producing task-related tokens. See Subsection F.1 for other models’ results. 0 4 8 12 16 20 24 28 Layer 0% 25% 50% 75% 100% Accuracy l Reconstructed l (a) Effect of linearly reconstructed TV. For other models see Sub… view at source ↗

**Figure 8.** Figure 8: (A) A reconstructed TV based on modeling θl’s influence as linear achieves comparable accuracy for most layers. (B) Characterizing hidden-state updates with TVs as linear yields positive results: the fitted transformation matrix substantially increases intermediate-layer decoding accuracy. TV θl is injected at an early or late layer l. We set l = L 4 for early and l = 3L 4 for late (8 and 24 for Llama3.1-8… view at source ↗

**Figure 9.** Figure 9: (A) Applying the rotation to TVs at different layers substantially increases alignment with unembeddings of task-related labels. (B) After rotation, early-layer TVs that originally decode random tokens produce task-related tokens. (C) The rotation effect diminishes for late-layer TVs as the estimated matrix approaches identity. Decomposition of TV’s influence mechanism While TVs injected at different laye… view at source ↗

**Figure 10.** Figure 10: Visualization of how we reconstruct the aggregate effect of TVs induced through the OV [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

**Figure 12.** Figure 12: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p023_13.png] view at source ↗

**Figure 14.** Figure 14: Layer sweeping results of injecting the Vanilla TV, FV, and our LTV to the last token [PITH_FULL_IMAGE:figures/full_fig_p024_14.png] view at source ↗

**Figure 15.** Figure 15: Myopic dataset: LTV vs. Vanilla TV and FV on Llama2-7B. [PITH_FULL_IMAGE:figures/full_fig_p024_15.png] view at source ↗

**Figure 16.** Figure 16: Myopic dataset: LTV vs. Vanilla TV and FV on Llama2-13B. [PITH_FULL_IMAGE:figures/full_fig_p024_16.png] view at source ↗

**Figure 17.** Figure 17: Myopic dataset: LTV vs. Vanilla TV and FV on Llama3-8B. [PITH_FULL_IMAGE:figures/full_fig_p025_17.png] view at source ↗

**Figure 18.** Figure 18: Myopic dataset: LTV vs. Vanilla TV and FV on Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p025_18.png] view at source ↗

**Figure 19.** Figure 19: Myopic dataset: LTV vs. Vanilla TV on Llama3-70B. [PITH_FULL_IMAGE:figures/full_fig_p025_19.png] view at source ↗

**Figure 20.** Figure 20: Myopic dataset: LTV vs. Vanilla TV on Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p025_20.png] view at source ↗

**Figure 21.** Figure 21: Myopic dataset: LTV vs. Vanilla TV on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p026_21.png] view at source ↗

**Figure 22.** Figure 22: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama3-8B. [PITH_FULL_IMAGE:figures/full_fig_p026_22.png] view at source ↗

**Figure 23.** Figure 23: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama3.2-3B. [PITH_FULL_IMAGE:figures/full_fig_p026_23.png] view at source ↗

**Figure 24.** Figure 24: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama3-70B. [PITH_FULL_IMAGE:figures/full_fig_p027_24.png] view at source ↗

**Figure 25.** Figure 25: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama2-7B. [PITH_FULL_IMAGE:figures/full_fig_p027_25.png] view at source ↗

**Figure 26.** Figure 26: Cosine-similarity heatmap of LTVs trained for seven tasks on Llama2-13B. [PITH_FULL_IMAGE:figures/full_fig_p028_26.png] view at source ↗

**Figure 27.** Figure 27: Cosine-similarity heatmap of LTVs trained for seven tasks on Qwen2.5-32B. [PITH_FULL_IMAGE:figures/full_fig_p028_27.png] view at source ↗

**Figure 28.** Figure 28: Cosine-similarity heatmap of LTVs trained for seven tasks on Yi-34B. [PITH_FULL_IMAGE:figures/full_fig_p029_28.png] view at source ↗

**Figure 29.** Figure 29: Attention heads and TV on Llama3-8B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p029_29.png] view at source ↗

**Figure 30.** Figure 30: Attention heads and TV on Llama3.2-3B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p029_30.png] view at source ↗

**Figure 31.** Figure 31: Attention heads and TV on Llama2-7B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_31.png] view at source ↗

**Figure 32.** Figure 32: Attention heads and TV on Llama2-13B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_32.png] view at source ↗

**Figure 33.** Figure 33: Attention heads and TV on Llama3-70B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_33.png] view at source ↗

**Figure 34.** Figure 34: Attention heads and TV on Qwen2.5-32B: OV-circuit reconstruction (left) and ablation of [PITH_FULL_IMAGE:figures/full_fig_p030_34.png] view at source ↗

**Figure 35.** Figure 35: Attention heads and TV on Yi-34B: OV-circuit reconstruction (left) and ablation of key [PITH_FULL_IMAGE:figures/full_fig_p031_35.png] view at source ↗

**Figure 36.** Figure 36: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p031_36.png] view at source ↗

**Figure 37.** Figure 37: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p031_37.png] view at source ↗

**Figure 38.** Figure 38: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p032_38.png] view at source ↗

**Figure 39.** Figure 39: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p032_39.png] view at source ↗

**Figure 40.** Figure 40: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p032_40.png] view at source ↗

**Figure 41.** Figure 41: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p033_41.png] view at source ↗

**Figure 42.** Figure 42: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p033_42.png] view at source ↗

**Figure 43.** Figure 43: Effects of the OV circuit reconstruction with or without the TV added to the final layer: [PITH_FULL_IMAGE:figures/full_fig_p033_43.png] view at source ↗

**Figure 44.** Figure 44: Key attention heads on Llama3-8B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_44.png] view at source ↗

**Figure 45.** Figure 45: Key attention heads on Llama3.2-3B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_45.png] view at source ↗

**Figure 46.** Figure 46: Key attention heads on Llama3-70B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_46.png] view at source ↗

**Figure 47.** Figure 47: Key attention heads on Llama2-7B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p034_47.png] view at source ↗

**Figure 48.** Figure 48: Key attention heads on Llama2-13B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p035_48.png] view at source ↗

**Figure 49.** Figure 49: Key attention heads on Qwen2.5-32B: distribution across layers (left) and attention over [PITH_FULL_IMAGE:figures/full_fig_p035_49.png] view at source ↗

**Figure 50.** Figure 50: Key attention heads on Yi-34B: distribution across layers (left) and attention over token [PITH_FULL_IMAGE:figures/full_fig_p035_50.png] view at source ↗

**Figure 51.** Figure 51: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p035_51.png] view at source ↗

**Figure 52.** Figure 52: Metrics across layers on Llama3.2-3B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p036_52.png] view at source ↗

**Figure 53.** Figure 53: Metrics across layers on Llama3-70B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p036_53.png] view at source ↗

**Figure 54.** Figure 54: Metrics across layers on Llama2-7B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p036_54.png] view at source ↗

**Figure 55.** Figure 55: Metrics across layers on Llama2-13B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p037_55.png] view at source ↗

**Figure 56.** Figure 56: Metrics across layers on Qwen2.5-32B when the TV is injected into the hidden state at an [PITH_FULL_IMAGE:figures/full_fig_p037_56.png] view at source ↗

**Figure 57.** Figure 57: Metrics across layers on Yi-34B when the TV is injected into the hidden state at an early [PITH_FULL_IMAGE:figures/full_fig_p037_57.png] view at source ↗

**Figure 58.** Figure 58: Linear hypothesis on Llama3-8B: linearly reconstructed TV (left) and linear surrogate for [PITH_FULL_IMAGE:figures/full_fig_p038_58.png] view at source ↗

**Figure 59.** Figure 59: Linear hypothesis on Llama3.2-3B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p038_59.png] view at source ↗

**Figure 60.** Figure 60: Linear hypothesis on Llama3-70B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p038_60.png] view at source ↗

**Figure 61.** Figure 61: Linear hypothesis on Llama2-7B: linearly reconstructed TV (left) and linear surrogate for [PITH_FULL_IMAGE:figures/full_fig_p038_61.png] view at source ↗

**Figure 62.** Figure 62: Linear hypothesis on Llama2-13B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p039_62.png] view at source ↗

**Figure 63.** Figure 63: Linear hypothesis on Qwen2.5-32B: linearly reconstructed TV (left) and linear surrogate [PITH_FULL_IMAGE:figures/full_fig_p039_63.png] view at source ↗

**Figure 64.** Figure 64: Linear hypothesis on Yi-34B: linearly reconstructed TV (left) and linear surrogate for [PITH_FULL_IMAGE:figures/full_fig_p039_64.png] view at source ↗

**Figure 65.** Figure 65: Rotation analysis on Llama3-8B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p039_65.png] view at source ↗

**Figure 66.** Figure 66: Rotation analysis on Llama3.2-3B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p040_66.png] view at source ↗

**Figure 67.** Figure 67: Rotation analysis on Llama3-70B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p040_67.png] view at source ↗

**Figure 68.** Figure 68: Rotation analysis on Llama2-7B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p040_68.png] view at source ↗

**Figure 69.** Figure 69: Rotation analysis on Llama2-13B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p041_69.png] view at source ↗

**Figure 70.** Figure 70: Rotation analysis on Qwen2.5-32B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p041_70.png] view at source ↗

**Figure 71.** Figure 71: Rotation analysis on Yi-34B: applying the fitted rotation [PITH_FULL_IMAGE:figures/full_fig_p041_71.png] view at source ↗

**Figure 72.** Figure 72: Average attention distribution of Llama3.1-8B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p042_72.png] view at source ↗

**Figure 73.** Figure 73: Average attention distribution of Llama3-8B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p042_73.png] view at source ↗

**Figure 74.** Figure 74: Average attention distribution of Llama3.2-3B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p042_74.png] view at source ↗

**Figure 75.** Figure 75: Average attention distribution of Llama3-70B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p043_75.png] view at source ↗

**Figure 76.** Figure 76: Average attention distribution of Llama2-7B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p043_76.png] view at source ↗

**Figure 77.** Figure 77: Average attention distribution of Llama2-13B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p043_77.png] view at source ↗

**Figure 78.** Figure 78: Average attention distribution of Qwen2.5-32B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p044_78.png] view at source ↗

**Figure 79.** Figure 79: Average attention distribution of Yi-34B on SST-2: proportions of attention weights [PITH_FULL_IMAGE:figures/full_fig_p044_79.png] view at source ↗

**Figure 80.** Figure 80: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p044_80.png] view at source ↗

**Figure 81.** Figure 81: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p044_81.png] view at source ↗

**Figure 82.** Figure 82: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_82.png] view at source ↗

**Figure 83.** Figure 83: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_83.png] view at source ↗

**Figure 84.** Figure 84: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_84.png] view at source ↗

**Figure 85.** Figure 85: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p045_85.png] view at source ↗

**Figure 86.** Figure 86: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_86.png] view at source ↗

**Figure 87.** Figure 87: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_87.png] view at source ↗

**Figure 88.** Figure 88: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_88.png] view at source ↗

**Figure 89.** Figure 89: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p046_89.png] view at source ↗

**Figure 90.** Figure 90: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_90.png] view at source ↗

**Figure 91.** Figure 91: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_91.png] view at source ↗

**Figure 92.** Figure 92: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_92.png] view at source ↗

**Figure 93.** Figure 93: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p047_93.png] view at source ↗

**Figure 94.** Figure 94: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p048_94.png] view at source ↗

**Figure 95.** Figure 95: Metrics across layers on Llama3-8B when the TV is injected into the hidden state at layer [PITH_FULL_IMAGE:figures/full_fig_p048_95.png] view at source ↗

read the original abstract

Large Language Models (LLMs) can perform new tasks from in-context demonstrations, a phenomenon known as in-context learning (ICL). Recent work suggests that these demonstrations are compressed into task vectors (TVs), compact task representations that LLMs exploit for predictions. However, prior studies typically extract TVs from model outputs or hidden states using cumbersome and opaque methods, and they rarely elucidate the mechanisms by which TVs influence computation. In this work, we address both limitations. First, we propose directly training Learned Task Vectors (LTVs), which surpass extracted TVs in accuracy and exhibit superior flexibility-acting effectively at arbitrary layers, positions, and even with ICL prompts. Second, through systematic analysis, we investigate the mechanistic role of TVs, showing that at the low level they steer predictions primarily through attention-head OV circuits, with a small subset of "key heads" most decisive. At a higher level, we find that despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces to improve logits of relevant labels, while later TVs are predominantly scaled in magnitude. Taken together, LTVs not only provide a practical approach for obtaining effective TVs but also offer a principled lens into the mechanistic foundations of ICL.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Directly training task vectors beats extraction for flexibility and accuracy while adding a mechanistic story on OV heads and mostly linear propagation, but the abstract gives no numbers to judge if the gains hold up.

read the letter

Directly training Learned Task Vectors outperforms the usual extraction approach and comes with some mechanistic details on how they steer the model through attention heads in a mostly linear manner. That's the headline from the abstract. The new element is the training procedure itself. Instead of trying to pull task vectors out of hidden states or outputs after running in-context examples, they optimize them directly. This reportedly gives higher accuracy and lets the vectors work at more places in the model, including mixed with regular prompts. The analysis of key heads in the OV circuits and the distinction between early-layer rotation toward relevant subspaces and later-layer magnitude scaling is a step toward understanding the internal mechanics of in-context learning. What the paper does well is address two common complaints about task vector work: the extraction is opaque and limited, and the mechanisms are under-explored. Offering a trainable alternative plus circuit-level and propagation-level accounts is a reasonable way to move forward. The soft spots are mostly around evidence. The abstract states superiority and the linear propagation finding but gives no numbers, no comparison tables, and no mention of statistical significance or controls. This makes it tough to judge how much the training procedure might be driving the results or whether the linearity holds beyond the tested cases. The stress-test concern about artifacts from specific choices seems on point until the full experiments are checked. This paper is for people who work on in-context learning, task adaptation, and transformer interpretability. Someone already following the task vector literature would find the shift to learned vectors and the mechanistic claims relevant. I would recommend sending it for peer review. The ideas are worth testing against real data, and referees can push on the experimental rigor.

Referee Report

4 major / 2 minor

Summary. The manuscript proposes directly training Learned Task Vectors (LTVs) rather than extracting task vectors (TVs) from LLM hidden states or outputs for in-context learning. It claims LTVs achieve higher accuracy and greater flexibility (effective at arbitrary layers/positions and compatible with ICL prompts). Mechanistically, TVs steer predictions primarily via attention-head OV circuits (with a small subset of key heads decisive), and despite Transformer nonlinearities, propagation is largely linear: early TVs are rotated toward task-relevant subspaces to boost relevant label logits, while later TVs are mainly scaled in magnitude.

Significance. If the empirical and mechanistic claims hold under broader testing, the work supplies both a practical method for obtaining high-quality task representations and interpretable insights into ICL mechanisms. The direct-training approach and the OV-circuit plus linear-propagation findings could inform future control and analysis of transformer behavior.

major comments (4)

[§4] §4 (Experiments): The central claim that trained LTVs surpass extracted TVs in accuracy and flexibility lacks explicit quantitative results, baseline comparisons, statistical significance tests, and controls for training-procedure artifacts in the reported overview; without these the degree of support for superiority cannot be assessed.
[§5.2] §5.2 (Mechanistic analysis): The assertion that steering occurs primarily through attention-head OV circuits with a decisive subset of key heads requires ablation results quantifying performance drop when those heads are masked or removed; otherwise the circuit-level claim remains qualitative.
[§6.1] §6.1 (Propagation analysis): The higher-level claim that TV propagation is 'largely linear' (early rotation to task subspaces, later magnitude scaling) despite nonlinearities needs explicit quantitative metrics such as cosine similarity to linear approximations or residual nonlinear error; absent these, the summary risks overstatement.
[§4.3] §4.3 and §7 (Generalization): The observed performance and mechanistic advantages may be tied to the specific training objective, model family, or narrow task distribution; additional experiments across architectures and broader task sets are required to rule out artifacts and support the broader conclusions.

minor comments (2)

[§2] Notation for extracted TVs versus LTVs should be introduced with a clear table or diagram in §2 to prevent reader confusion.
[Figures] Figures showing key-head attention patterns and layer-wise propagation should include axis labels, error bars, and layer indices for immediate readability.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below and describe the revisions we will incorporate to strengthen the empirical and mechanistic claims.

read point-by-point responses

Referee: [§4] §4 (Experiments): The central claim that trained LTVs surpass extracted TVs in accuracy and flexibility lacks explicit quantitative results, baseline comparisons, statistical significance tests, and controls for training-procedure artifacts in the reported overview; without these the degree of support for superiority cannot be assessed.

Authors: We agree that the main-text overview would benefit from greater explicitness. The manuscript already reports quantitative accuracy comparisons, baseline methods, and some statistical tests in the experimental section and appendices, but we will expand §4 to prominently feature key numerical results, direct baseline contrasts, significance testing, and controls for training-procedure artifacts so that the superiority claims can be evaluated directly from the main text. revision: yes
Referee: [§5.2] §5.2 (Mechanistic analysis): The assertion that steering occurs primarily through attention-head OV circuits with a decisive subset of key heads requires ablation results quantifying performance drop when those heads are masked or removed; otherwise the circuit-level claim remains qualitative.

Authors: We accept that the current identification of key heads via importance metrics would be strengthened by direct ablation. In the revised manuscript we will add ablation experiments that mask or remove the identified key heads and report the resulting performance drops, thereby providing quantitative support for their decisive role within the OV circuits. revision: yes
Referee: [§6.1] §6.1 (Propagation analysis): The higher-level claim that TV propagation is 'largely linear' (early rotation to task subspaces, later magnitude scaling) despite nonlinearities needs explicit quantitative metrics such as cosine similarity to linear approximations or residual nonlinear error; absent these, the summary risks overstatement.

Authors: We agree that additional quantitative grounding is warranted. We will augment §6.1 with explicit metrics, including cosine similarity between observed propagation trajectories and their linear approximations as well as measures of residual nonlinear error, to substantiate the characterization of propagation as largely linear. revision: yes
Referee: [§4.3] §4.3 and §7 (Generalization): The observed performance and mechanistic advantages may be tied to the specific training objective, model family, or narrow task distribution; additional experiments across architectures and broader task sets are required to rule out artifacts and support the broader conclusions.

Authors: We acknowledge the scope limitation. Our experiments prioritized depth of mechanistic analysis on standard models and tasks. In the revision we will add experiments on at least one additional architecture and an expanded task distribution to provide evidence that the reported advantages are not artifacts of the original experimental setting. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on training and analysis without definitional reduction

full rationale

The paper's core contribution is the proposal to directly train Learned Task Vectors (LTVs) rather than extract them, followed by empirical comparisons of accuracy and flexibility plus mechanistic dissection via attention OV circuits and observed linear propagation patterns. No equations, self-citations, or uniqueness theorems are invoked in the provided text that would make any performance gain or linearity claim equivalent to its inputs by construction. The derivation chain consists of experimental procedures and internal model analysis that remain independent of the target results; the claims are falsifiable via replication on different models or tasks and do not reduce to fitted parameters renamed as predictions or ansatzes smuggled through prior work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review based on abstract only; no explicit free parameters, axioms, or invented entities are described. Training LTVs presumably involves standard optimization but details are absent.

pith-pipeline@v0.9.0 · 5754 in / 1207 out tokens · 51185 ms · 2026-05-18T13:17:25.001454+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

despite Transformer nonlinearities, TV propagation is largely linear: early TVs are rotated toward task-relevant subspaces ... later TVs are predominantly scaled in magnitude
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

steer predictions primarily through attention-head OV circuits

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

Yi: Open foundation models by 01.ai, 2024

AI, Alex Young, Bei Chen, Chao Li, Chengen Huang, Ge Zhang, Guanwei Zhang, Heng Li, Jiangcheng Zhu, Jianqun Chen, Jing Chang, Kaidong Yu, Peng Liu, Qiang Liu, Shawn Yue, Senbin Yang, Shiming Yang, Tao Yu, Wen Xie, Wenhao Huang, Xiaohui Hu, Xiaoyi Ren, Xinyao Niu, Pengcheng Nie, Yuchi Xu, Yudong Liu, Yue Wang, Yuxuan Cai, Zhenyu Gu, Zhiyuan Liu, and Zongho...

work page doi:10.5555/3495724.3495883 2024
[2]

Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, and Min Zhang

URLhttps://arxiv.org/abs/2401.01967. Dongfang Li, Zhenyu Liu, Xinshuo Hu, Zetian Sun, Baotian Hu, and Min Zhang. In-context learning state vector with inner and momentum optimization, 2024a. URLhttps://arxiv.org/abs/ 2404.11225. Kenneth Li, Oam Patel, Fernanda Viégas, Hanspeter Pfister, and Martin Wattenberg. Inference- time intervention: Eliciting truthf...

work page arXiv 2007
[3]

Would you like a donut now, or two donuts in an hour?

Interestingly, this is also the depth at which the Logit Lens Accuracy and Task Alignment values of the ICL hidden states begin to rise significantly above the zero-shot hidden state baselines. This is consistent with previous findings (Yang et al., 2025a), which report that ICL features a distinct transition pattern where hidden states increasingly align...

work page 2024
[6]

More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 38.26% 1.96% 14.16% 18.85% 13.30% 52.82% FV 51.81% 1.40% 28.60% 47.14% 20.44% 73.23% LTV 82.54%↑30.73% 79.34%↑77.38% 84.60%↑56.00% 82.24%↑35.10% 51.60%↑31.16% 85.16%↑11.93% 22 Preprint. Under Review. 0 2 4 6 8 101214161820222426283032343638 Layer of TV Injection 0% 20% 40% 60% 80%Va...

work page
[9]

More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 27.67% 1.84% 16.42% 20.46% 16.07% 43.84% FV 41.59% 1.22% 42.25% 36.97% 24.74% 77.51% LTV 80.33%↑38.74% 71.53%↑69.69% 87.69%↑45.44% 82.25%↑45.28% 51.46%↑26.72% 84.99%↑7.48% 23 Preprint. Under Review. 0 2 4 6 8 10 12 14 16 18 20 22 24 26 Layer of TV Injection 0% 20% 40% 60% 80%Value Z...

work page
[12]

More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 31.69% 2.02% 1.05% 26.68% 0.33% 75.83% FV 33.28% 2.93% 18.38% 16.95% 17.72% 53.93% LTV 76.26%↑42.98% 76.22%↑73.29% 77.93%↑59.55% 83.48%↑56.80% 44.82%↑27.10% 84.51%↑8.68% Table 6: Comparison of LTV vs. FV and Vanilla TV across five scenarios on Llama3-8B. 24 Preprint. Under Review. I...

work page
[15]

More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts Vanilla TV 42.61% 3.07% 18.73% 37.05% 11.33% 65.38% FV 19.54% 3.53% 15.07% 4.69% 13.26% 62.12% LTV 78.65%↑36.04% 74.10%↑70.57% 78.18%↑59.45% 80.43%↑43.38% 46.38%↑33.12% 82.80%↑17.42% Table 7: Comparison of LTV vs. FV and Vanilla TV across five scenarios on Llama3.2-3B. Method Baseline P={−1},L={40}

work page
[18]

More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts LTV 78.18% 75.34% 76.13% 75.59% 48.75% 88.40% Table 8: Performance of LTV across settings on Llama3-70B. 27 Preprint. Under Review. SST-2TRECSNLI RTE CapitalCapitalizeAntonym SST-2 TREC SNLI RTE Capital Capitalize Antonym 0.0 0.2 0.4 0.6 0.8 1.0 Cosine Similarity Figure 26: Cosine-similarity h...

work page
[21]

More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts LTV 75.59% 36.04% 75.20% 87.24% 53.30% 87.08% Table 9: Performance of LTV across settings on Qwen2.5-32B. Method Baseline P={−1},L={30}

work page
[22]

More Pos. P={−5, . . . ,−1}

work page
[23]

More layers L={0,4,8, . . .}

work page
[24]

More layers & Pos. P={−5, . . .},L={0,2, . . .} 5) ICL prompts LTV 81.37% 73.53% 84.39% 82.47% 51.29% 89.69% Table 10: Performance of LTV across settings on Yi-34B. 28 Preprint. Under Review. SST-2TRECSNLI RTE CapitalCapitalizeAntonym SST-2 TREC SNLI RTE Capital Capitalize Antonym 0.2 0.4 0.6 0.8 1.0 Cosine Similarity Figure 28: Cosine-similarity heatmap ...

work page