In-Context Learning Operates as Concept Subspace Learning

Fakhri Karray; Lijie Hu; Wei Tang; Xinyan Jiang

arxiv: 2605.18830 · v1 · pith:JG3ZQMY3new · submitted 2026-05-12 · 💻 cs.LG

In-Context Learning Operates as Concept Subspace Learning

Wei Tang , Xinyan Jiang , Fakhri Karray , Lijie Hu This is my paper

Pith reviewed 2026-05-20 21:27 UTC · model grok-4.3

classification 💻 cs.LG

keywords in-context learningconcept subspacesactivation patchingresidual streammechanistic interpretabilityCounterFactLlama-3

0 comments

The pith

In-context learning recovers task predictions from low-dimensional concept subspaces in model activations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether in-context learning induces low-dimensional concept subspaces that carry task information inside high-dimensional activation spaces. It models tasks as varying only along intrinsic concept coordinates while inputs sit in a larger ambient space. For ridge and least-squares proxies of ICL, the prediction splits exactly into concept-coordinate regression plus off-subspace leakage. Under block-diagonal covariance assumptions, the leading estimation and sensitivity terms grow with subspace dimension while cross terms stay controlled. Experiments on Llama-3-8B with multi-relation prompts show that a 68-73 dimensional subspace restores 78.8 percent of the accuracy gap between clean and corrupted inputs, whereas the complementary subspace restores none.

Core claim

The paper claims that recoverable task information in in-context learning concentrates in a low-dimensional, task-aligned activation subspace. On CounterFact-derived multi-relation prompts with Llama-3-8B, a 68-73-dimensional subspace of the 4096-dimensional residual stream restores 78.8 percent of the clean-corrupted accuracy gap, while patching the complementary subspace restores 0 percent. Concept swaps inside this subspace redirect predictions toward the injected relations, and random or cross-task matched-rank controls do not. The same qualitative pattern appears on Qwen2.5-7B and a controlled cross-lingual rule task.

What carries the argument

The concept subspace: the low-dimensional linear directions inside the residual stream activations that align with the task's intrinsic concept coordinates and mediate the exact decomposition of ICL predictions into concept regression and off-subspace leakage.

If this is right

ICL prediction accuracy depends primarily on the dimension of the concept subspace rather than the full ambient dimension.
Targeted interventions inside the identified subspace can steer task behavior without touching the orthogonal complement.
The same low-dimensional concentration pattern holds across Llama-3-8B, Qwen2.5-7B, and controlled cross-lingual tasks.
Concept swaps inside the subspace successfully alter model outputs, confirming that the subspace encodes the relation information.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the claim holds, identifying these subspaces could let practitioners steer ICL behavior with far fewer dimensions than full activation edits.
The decomposition invites tests of whether subspace dimension grows with the number of relations or demonstrations in a task family.
The result suggests ICL may generalize by projecting onto learned concept directions rather than relying on diffuse high-dimensional patterns.

Load-bearing premise

The covariance between concept directions and nuisance directions is block-diagonal or nearly so, which separates the scaling of estimation terms from cross-subspace effects.

What would settle it

If a random subspace of the same size or the complementary high-dimensional complement restores a comparable fraction of the accuracy gap, or if concept swaps fail to redirect predictions while random swaps succeed, the concentration claim would be falsified.

Figures

Figures reproduced from arXiv: 2605.18830 by Fakhri Karray, Lijie Hu, Wei Tang, Xinyan Jiang.

**Figure 2.** Figure 2: Concept-subspace patching and swapping. Left: the learned subspace recovers most of the clean-corrupted gap. Right: subspace swaps redirect predictions toward the injected relation. all subsequent concept-subspace estimation and causal patching experiments. Further layerwise analyses confirm the late-layer emergence of this causal concept bottleneck (see Appendix B.3). This design choice is methodologicall… view at source ↗

**Figure 3.** Figure 3: Few-shot contraction of debiased concept-coordinate estimates. Query-centered concept [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Dimension stability of concept-subspace estimates. The 98% criterion selects a consistently [PITH_FULL_IMAGE:figures/full_fig_p021_4.png] view at source ↗

**Figure 5.** Figure 5: Subspace-selective sensitivity and few-shot scaling. [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: Layerwise emergence of a causal concept bottleneck. [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: Task–format geometry after concept-subspace projection. PCA of task vectors shows that [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Format invariance and corruption stability of concept representations. [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Prompt contamination rotates the inferred concept direction. Alignment in concept space [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Layerwise localization and low-rank extraction of task-aligned residual directions. [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: Concept-subspace patching and swapping. Left: the learned subspace recovers most of the clean-corrupted gap. Right: subspace swaps redirect predictions toward the injected relation. complementary space reduces accuracy to 21.0%, yielding a negative recovery rate. We therefore do not interpret the complementary intervention as carrying useful task information in this setting. Rather, it provides a strong c… view at source ↗

**Figure 12.** Figure 12: Layerwise localization and low-rank extraction of task-aligned residual directions. [PITH_FULL_IMAGE:figures/full_fig_p028_12.png] view at source ↗

**Figure 13.** Figure 13: Concept-subspace patching and swapping. Left: the learned subspace recovers most of the clean-corrupted gap. Right: subspace swaps redirect predictions toward the injected relation. C Broader Impacts This work is a foundational study of in-context learning mechanisms rather than a deploymentoriented system. Its main potential benefit is to make model behavior more inspectable. By identifying low-dimensio… view at source ↗

read the original abstract

Regression and Bayesian accounts of in-context learning (ICL) explain how demonstrations can induce predictors, while mechanistic analyses often identify compact activation directions that steer prompted behavior. However, it remains unclear whether structured demonstrations induce low-dimensional concept inference. We study this question through a concept-subspace view of ICL, in which tasks vary only along intrinsic concept coordinates, although inputs are observed in a high-dimensional ambient space. For ridge and least-squares ICL proxies, prediction decomposes exactly into concept-coordinate regression and off-subspace leakage. Under block-diagonal or near-block-diagonal covariance assumptions, the leading estimation and nuisance-sensitivity terms scale with the dimension of the concept subspace, while residual effects are controlled by cross-subspace coupling. This separation gives a mechanistic prediction: recoverable task information should concentrate in a low-dimensional, task-aligned activation subspace. On CounterFact-derived multi-relation prompts with Llama-3-8B, a 68--73-dimensional subspace of the 4096-dimensional residual stream restores 78.8% of the clean--corrupted accuracy gap, whereas patching the complementary subspace restores 0%. Concept swaps redirect predictions toward injected relations, while random and cross-task matched-rank controls are largely ineffective. Additional experiments on Qwen2.5-7B and a controlled cross-lingual rule task show the same qualitative pattern. These results support concept subspaces as compact, task-aligned mediators of recoverable ICL behavior in structured task families, without implying full-circuit recovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows ICL performance concentrates in a 68-73 dimensional subspace of activations on their tasks, with a regression-style decomposition to explain why.

read the letter

The main takeaway is that the authors treat in-context learning as regression over a low-dimensional concept subspace embedded in the high-dimensional residual stream. For ridge and least-squares proxies they derive an exact decomposition into concept-coordinate terms and off-subspace leakage. Under block-diagonal covariance the leading terms then scale with subspace dimension, which leads to the prediction that task information should sit in a compact, task-aligned set of directions. On CounterFact multi-relation prompts with Llama-3-8B they patch a 68-73 dimensional subspace and recover 78.8 percent of the clean-to-corrupted accuracy gap while the orthogonal complement recovers none. Concept swaps move predictions toward the injected relation and random or cross-task controls do not. The same pattern appears on Qwen2.5-7B and a controlled cross-lingual rule task.

Referee Report

3 major / 2 minor

Summary. The manuscript develops a concept-subspace view of in-context learning, arguing that structured demonstrations induce low-dimensional concept inference within high-dimensional activation spaces. For ridge and least-squares ICL proxies, it derives an exact decomposition of predictions into concept-coordinate regression and off-subspace leakage. Under block-diagonal or near-block-diagonal covariance assumptions, leading estimation and nuisance terms scale with subspace dimension while residuals are controlled by cross-subspace coupling. Empirically, on CounterFact-derived multi-relation prompts with Llama-3-8B, patching a 68-73 dimensional subspace of the 4096-dimensional residual stream restores 78.8% of the clean-corrupted accuracy gap, whereas the complementary subspace restores 0%; concept swaps redirect predictions while random and cross-task controls do not. The pattern holds on Qwen2.5-7B and a controlled cross-lingual task.

Significance. If the results hold, the work bridges regression accounts of ICL with mechanistic interpretability by identifying compact, task-aligned activation subspaces as mediators of recoverable behavior. The concrete patching numbers, the 0% restoration on the complement, and the qualitative consistency across models and tasks are strengths. The paper supplies falsifiable scaling predictions from the decomposition and reproducible empirical controls, which add value even if the covariance assumptions require further checks.

major comments (3)

[Theory section] Theoretical decomposition (around the ridge/least-squares analysis): the scaling predictions for estimation and nuisance-sensitivity terms rest on block-diagonal or near-block-diagonal covariance. The manuscript provides no direct measurement of cross-subspace coupling or verification that residual-stream activations satisfy this assumption, despite attention and MLP layers plausibly inducing dense correlations. This is load-bearing for the claim that recoverable task information must concentrate in a low-dimensional subspace.
[Results on Llama-3-8B and CounterFact] Empirical patching results: a 68-73 dimensional subspace restores 78.8% of the gap, but the text gives no details on the method used to identify or select this specific dimension, whether it was fixed independently of the accuracy numbers, and no error bars or run-to-run variance. Without these, the result risks appearing post-hoc and weakens the mechanistic interpretation that information concentrates at this scale.
[Discussion of assumptions] Cross-subspace coupling control: the theory states that residual effects are governed by this quantity, yet no empirical bound or estimate is reported for the actual activations. If coupling is not small, the observed concentration could arise from mechanisms outside the assumed decomposition.

minor comments (2)

[Abstract] Abstract and results lack error bars on the 78.8% restoration figure and explicit description of the subspace identification procedure.
[Methods] Notation for the concept subspace and its estimation from activations could be clarified with a short algorithmic outline or pseudocode.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight key areas where additional empirical support and reporting can strengthen the connection between the theoretical decomposition and the experimental results. We address each major comment below and will incorporate the suggested clarifications and analyses in the revised manuscript.

read point-by-point responses

Referee: [Theory section] Theoretical decomposition (around the ridge/least-squares analysis): the scaling predictions for estimation and nuisance-sensitivity terms rest on block-diagonal or near-block-diagonal covariance. The manuscript provides no direct measurement of cross-subspace coupling or verification that residual-stream activations satisfy this assumption, despite attention and MLP layers plausibly inducing dense correlations. This is load-bearing for the claim that recoverable task information must concentrate in a low-dimensional subspace.

Authors: We agree that direct empirical verification of the block-diagonal covariance assumption would strengthen the theoretical claims. In the revised manuscript we will add an analysis that computes the cross-subspace coupling directly from the residual-stream activations collected in the Llama-3-8B experiments. We will report the magnitude of off-block correlations and provide quantitative bounds on the coupling term, thereby testing whether the assumption holds sufficiently well to support the scaling predictions. revision: yes
Referee: [Results on Llama-3-8B and CounterFact] Empirical patching results: a 68-73 dimensional subspace restores 78.8% of the gap, but the text gives no details on the method used to identify or select this specific dimension, whether it was fixed independently of the accuracy numbers, and no error bars or run-to-run variance. Without these, the result risks appearing post-hoc and weakens the mechanistic interpretation that information concentrates at this scale.

Authors: The referee correctly identifies a reporting gap. The 68-73 dimensional range was obtained from the elbow of the cumulative explained-variance curve of task-aligned principal components computed on a held-out subset of activations, prior to any accuracy evaluation on the main test set. In revision we will describe this procedure in full, including the precise selection criterion and any hyperparameters. We will also add error bars and standard deviations computed across five independent runs that vary random seeds and prompt ordering. revision: yes
Referee: [Discussion of assumptions] Cross-subspace coupling control: the theory states that residual effects are governed by this quantity, yet no empirical bound or estimate is reported for the actual activations. If coupling is not small, the observed concentration could arise from mechanisms outside the assumed decomposition.

Authors: This concern is closely related to the first comment. The revised version will include the same empirical estimate of cross-subspace coupling derived from the experimental activations. By reporting a concrete bound on this quantity we will directly address whether residual effects remain controlled under the observed coupling strength, thereby reinforcing the mechanistic interpretation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; theory derives testable prediction from explicit assumptions, validated by independent patching experiments

full rationale

The paper states an explicit mathematical decomposition for ridge/least-squares ICL under block-diagonal covariance assumptions, from which it derives the prediction that task information concentrates in a low-dimensional subspace. This prediction is then tested via patching experiments on Llama-3-8B activations that measure accuracy restoration with controls (concept swaps, random, cross-task). The 68-73 dimensional finding and 78.8% restoration are empirical outcomes, not forced by re-using fitted parameters or self-citations as the load-bearing step. The covariance assumption is presented as an assumption rather than derived from the target accuracy numbers, and the experiments include negative controls that would falsify the claim if the subspace were not specific. No load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The framework rests on modeling choices and assumptions stated in the abstract; full details on any fitted quantities or additional background results are unavailable from the abstract alone.

free parameters (1)

concept subspace dimension
The 68-73 dimensional range is reported as the subspace that captures recoverable task information in the Llama-3-8B experiments.

axioms (2)

domain assumption Tasks vary only along intrinsic concept coordinates although inputs are observed in a high-dimensional ambient space
This is the core premise of the concept-subspace view introduced in the abstract.
domain assumption Block-diagonal or near-block-diagonal covariance structure
Invoked to derive that leading terms scale with concept subspace dimension and residual effects are controlled by cross-subspace coupling.

pith-pipeline@v0.9.0 · 5801 in / 1719 out tokens · 73007 ms · 2026-05-20T21:27:48.290042+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under block-diagonal or near-block-diagonal covariance assumptions, the leading estimation and nuisance-sensitivity terms scale with the dimension of the concept subspace, while residual effects are controlled by cross-subspace coupling.
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

a 68--73-dimensional subspace of the 4096-dimensional residual stream restores 78.8% of the clean--corrupted accuracy gap

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 3 internal anchors

[1]

Brown, Benjamin Mann, Nick Ryder, and et al

Tom B. Brown, Benjamin Mann, Nick Ryder, and et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems 33, pages 1877–1901, 2020

work page 1901
[2]

A glance at in-context learning.Frontiers of Computer Science, 18(5): 185347, 2024

Xu Yang Yongliang Wu. A glance at in-context learning.Frontiers of Computer Science, 18(5): 185347, 2024

work page 2024
[3]

arXiv preprint arXiv:2303.03846 , year =

Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, et al. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023

work page arXiv 2023
[4]

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, 2022

work page 2022
[5]

Stephanie C. Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya K. Singh, Pierre H. Richemond, James L. McClelland, and Felix Hill. Data distributional properties drive emergent in-context learning in transformers. InAdvances in Neural Information Processing Systems 35, pages 18878–18891, 2022. 11

work page 2022
[6]

An explanation of in-context learning as implicit bayesian inference

Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference. InProceedings of the 10th International Conference on Learning Representations, 2022

work page 2022
[7]

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, and Song Mei. Transformers as statisticians: Provable in-context learning with in-context algorithm selection. InAdvances in Neural Information Processing Systems 36, pages 57125–57211, 2023

work page 2023
[8]

What in-context learning “learns” in-context: Disentangling task recognition and task learning

Jane Pan, Tianyu Gao, Howard Chen, and Danqi Chen. What in-context learning “learns” in-context: Disentangling task recognition and task learning. InFindings of the Association for Computational Linguistics: ACL 2023, pages 8298–8319, 2023

work page 2023
[9]

Latent concept disentanglement in transformer-based language models

Guan Zhe Hong, Bhavya Vasudeva, Vatsal Sharan, Cyrus Rashtchian, Prabhakar Raghavan, and Rina Panigrahy. Latent concept disentanglement in transformer-based language models. InProceedings of the 14th International Conference on Learning Representations, 2026

work page 2026
[10]

Separating tongue from thought: Activation patching reveals language-agnostic concept repre- sentations in transformers

Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, and Robert West. Separating tongue from thought: Activation patching reveals language-agnostic concept repre- sentations in transformers. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, pages 31822–31841, 2025

work page 2025
[11]

Sparse autoencoders find highly interpretable features in language models

Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024
[12]

Independent subspace analysis for unsupervised learning of disentangled representations

Jan Stuehmer, Richard Turner, and Sebastian Nowozin. Independent subspace analysis for unsupervised learning of disentangled representations. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pages 1200–1210, 2020

work page 2020
[13]

Fast multi-instance partial-label learning

Yin-Fang Yang, Wei Tang, and Min-Ling Zhang. Fast multi-instance partial-label learning. InProceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, pages 22038–22046, 2025

work page 2025
[14]

Locating and editing factual associations in GPT

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. InAdvances in Neural Information Processing Systems 35, pages 17359– 17372, 2022

work page 2022
[15]

The Llama 3 Herd of Models

Llama Team. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[16]

Word translation without parallel data

Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. Word translation without parallel data. InProceedings of the 6th International Conference on Learning Representations, 2018

work page 2018
[17]

Qwen2.5-Coder Technical Report

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

What can transformers learn in-context? a case study of simple function classes

Shivam Garg, Dimitris Tsipras, Percy Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes. InAdvances in Neural Information Processing Systems 35, pages 30583–30598, 2022

work page 2022
[19]

What learning algorithm is in-context learning? investigations with linear models

Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? investigations with linear models. InProceedings of the 11th International Conference on Learning Representations, 2023

work page 2023
[20]

Transformers learn in-context by gradient descent

Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InProceedings of the 40th International Conference on Machine Learning, pages 35151–35174, 2023

work page 2023
[21]

Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers

Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. InFindings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 4005–4019, 2023

work page 2023
[22]

Transformers learn to implement preconditioned gradient descent for in-context learning

Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra. Transformers learn to implement preconditioned gradient descent for in-context learning. InAdvances in Neural Information Processing Systems 36, pages 45614–45650, 2023

work page 2023
[23]

Pretraining task diversity and the emergence of non-bayesian in-context learning for regression

Allan Raventós, Mansheej Paul, Feng Chen, and Surya Ganguli. Pretraining task diversity and the emergence of non-bayesian in-context learning for regression. InAdvances in Neural Information Processing Systems 36, pages 14228–14246, 2023

work page 2023
[24]

General-purpose in- context learning by meta-learning transformers.arXiv preprint arXiv:2212.04458, 2022

Louis Kirsch, James Harrison, Jascha Sohl-Dickstein, and Luke Metz. General-purpose in- context learning by meta-learning transformers.arXiv preprint arXiv:2212.04458, 2022

work page arXiv 2022
[25]

The learnability of in-context learning

Noam Wies, Yoav Levine, and Amnon Shashua. The learnability of in-context learning. In Advances in Neural Information Processing Systems 36, pages 36637–36651, 2023

work page 2023
[26]

In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads.arXiv preprint arXiv:2209.11895, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[27]

Liu Yang, Ziqian Lin, Kangwook Lee, Dimitris Papailiopoulos, and Robert D. Nowak. Task vectors in in-context learning: Emergence, formation, and benefits. InProceedings of the Second Conference on Language Modeling, 2025

work page 2025
[28]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InProceedings of the 37th International Conference on Machine Learning, pages 5338–5348, 2020

work page 2020
[29]

Editable concept bottleneck models

Lijie Hu, Chenyang Ren, Zhengyu Hu, Hongbin Lin, Cheng-Long Wang, Zhen Tan, Weimin Lyu, Jingfeng Zhang, Hui Xiong, and Di Wang. Editable concept bottleneck models. InProceedings of the 42nd International Conference on Machine Learning, pages 24678–24726, 2025

work page 2025
[30]

Semi-supervised concept bottleneck models

Lijie Hu, Tianhao Huang, Huanyi Xie, Xilin Gong, Chenyang Ren, Zhengyu Hu, Lu Yu, Ping Ma, and Di Wang. Semi-supervised concept bottleneck models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2110–2119, 2025. 13

work page 2025
[31]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viégas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). InProceedings of the 35th International Conference on Machine Learning, pages 2668–2677, 2018

work page 2018
[32]

Concept whitening for interpretable image recognition

Zhi Chen, Yijie Bei, and Cynthia Rudin. Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020

work page 2020
[33]

Concept embedding models: Beyond the accuracy- explainability trade-off

Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lió, and Mateja Jamnik. Concept embedding models: Beyond the accuracy- explainability trade-off. InAdvances in Neural Information Processing Systems 35, pages 2140...

work page 2022
[34]

Challenging common assumptions in the unsupervised learning of disentangled representations

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. InProceedings of the 36th International Conference on Machine Learning, pages 4114–4124, 2019

work page 2019
[35]

Kingma, Ricardo Pio Monti, and Aapo Hyvärinen

Ilyes Khemakhem, Diederik P. Kingma, Ricardo Pio Monti, and Aapo Hyvärinen. Variational autoencoders and nonlinear ICA: A unifying framework. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pages 2207–2217, 2020

work page 2020
[36]

There was never a bottleneck in concept bottleneck models

Antonio Almudévar, José Miguel Hernández-Lobato, and Alfonso Ortega. There was never a bottleneck in concept bottleneck models. InProceedings of the 14th International Conference on Learning Representations, 2026

work page 2026
[37]

Investigating gender bias in language models using causal mediation analysis

Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber. Investigating gender bias in language models using causal mediation analysis. InAdvances in Neural Information Processing Systems 33, pages 12388–12401, 2020

work page 2020
[38]

Causal abstractions of neural networks

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks. InAdvances in Neural Information Processing Systems 34, pages 9574–9586, 2021

work page 2021
[39]

Interpretability in the wild: A circuit for indirect object identification in GPT-2 small

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt. Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. In Proceedings of the 11th International Conference on Learning Representations, 2023

work page 2023
[40]

Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Advances in Neural Information Processing Systems 36, pages 16318–16352, 2023

work page 2023
[41]

Towards best practices of activation patching in language models: Metrics and methods

Fred Zhang and Neel Nanda. Towards best practices of activation patching in language models: Metrics and methods. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024
[42]

Is this the subspace you are looking for? an interpretability illusion for subspace activation patching

Aleksandar Makelov, Georg Lange, Atticus Geiger, and Neel Nanda. Is this the subspace you are looking for? an interpretability illusion for subspace activation patching. InProceedings of the 12th International Conference on Learning Representations, 2024. 14

work page 2024
[43]

How do transformers learn in-context beyond simple functions? a case study on learning with repre- sentations

Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, and Yu Bai. How do transformers learn in-context beyond simple functions? a case study on learning with repre- sentations. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024
[44]

In-context linear regression demystified: Training dynamics and mechanistic interpretability of multi-head softmax attention

Jianliang He, Xintian Pan, Siyu Chen, and Zhuoran Yang. In-context linear regression demystified: Training dynamics and mechanistic interpretability of multi-head softmax attention. InProceedings of the 42nd International Conference on Machine Learning, pages 22686–22742, 2025

work page 2025
[45]

Hoerl and Robert W

Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970

work page 1970
[46]

Bishop.Pattern Recognition and Machine Learning

Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, 2006

work page 2006
[47]

Murphy.Machine Learning: A Probabilistic Perspective

Kevin P. Murphy.Machine Learning: A Probabilistic Perspective. MIT Press, 2012

work page 2012
[48]

ProMIPL: A probabilistic generative model for multi-instance partial-label learning

Yin-Fang Yang, Wei Tang, and Min-Ling Zhang. ProMIPL: A probabilistic generative model for multi-instance partial-label learning. InProceedings of the 24th IEEE International Conference on Data Mining, pages 560–569, 2024

work page 2024
[49]

Springer, 2 edition, 2009

Trevor Hastie, Robert Tibshirani, and Jerome Friedman.The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2 edition, 2009

work page 2009
[50]

Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint

Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, 2019

work page 2019
[51]

In-context learning creates task vectors

Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9318–9333, 2023

work page 2023
[52]

Sheng Liu, Haotian Ye, Lei Xing, and James Y. Zou. In-context vectors: Making in context learning more effective and controllable through latent space steering. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 32287–32307, 2024

work page 2024
[53]

Li, Arnab Sen Sharma, Aaron Mueller, Byron C

Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function vectors in large language models. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024
[54]

Language models implement simple Word2Vec-style vector arithmetic

Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. Language models implement simple Word2Vec-style vector arithmetic. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5030–5047, 2024

work page 2024
[55]

Understanding task vectors in in-context learning: Emergence, functionality, and limitations

Yuxin Dong, Jiachen Jiang, Zhihui Zhu, and Xia Ning. Understanding task vectors in in-context learning: Emergence, functionality, and limitations. InProceedings of the 14th International Conference on Learning Representations, 2026

work page 2026
[56]

corrupted

Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Qingfu Zhang, Hau-San Wong, and Taiji Suzuki. Provable in-context vector arithmetic via retrieving task concepts. InProceedings of the 42nd International Conference on Machine Learning, pages 5669–5724, 2025. 15 A Additional Theory and Proofs A.1 Noisy labels in concept space We extend Equation (1) to homosce...

work page 2025

[1] [1]

Brown, Benjamin Mann, Nick Ryder, and et al

Tom B. Brown, Benjamin Mann, Nick Ryder, and et al. Language models are few-shot learners. InAdvances in Neural Information Processing Systems 33, pages 1877–1901, 2020

work page 1901

[2] [2]

A glance at in-context learning.Frontiers of Computer Science, 18(5): 185347, 2024

Xu Yang Yongliang Wu. A glance at in-context learning.Frontiers of Computer Science, 18(5): 185347, 2024

work page 2024

[3] [3]

arXiv preprint arXiv:2303.03846 , year =

Jerry Wei, Jason Wei, Yi Tay, Dustin Tran, Albert Webson, Yifeng Lu, Xinyun Chen, Hanxiao Liu, Da Huang, Denny Zhou, et al. Larger language models do in-context learning differently. arXiv preprint arXiv:2303.03846, 2023

work page arXiv 2023

[4] [4]

Sewon Min, Xinxi Lyu, Ari Holtzman, Mikel Artetxe, Mike Lewis, Hannaneh Hajishirzi, and Luke Zettlemoyer. Rethinking the role of demonstrations: What makes in-context learning work? InProceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 11048–11064, 2022

work page 2022

[5] [5]

Stephanie C. Y. Chan, Adam Santoro, Andrew K. Lampinen, Jane X. Wang, Aaditya K. Singh, Pierre H. Richemond, James L. McClelland, and Felix Hill. Data distributional properties drive emergent in-context learning in transformers. InAdvances in Neural Information Processing Systems 35, pages 18878–18891, 2022. 11

work page 2022

[6] [6]

An explanation of in-context learning as implicit bayesian inference

Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference. InProceedings of the 10th International Conference on Learning Representations, 2022

work page 2022

[7] [7]

Transformers as statisticians: Provable in-context learning with in-context algorithm selection

Yu Bai, Fan Chen, Huan Wang, Caiming Xiong, and Song Mei. Transformers as statisticians: Provable in-context learning with in-context algorithm selection. InAdvances in Neural Information Processing Systems 36, pages 57125–57211, 2023

work page 2023

[8] [8]

What in-context learning “learns” in-context: Disentangling task recognition and task learning

Jane Pan, Tianyu Gao, Howard Chen, and Danqi Chen. What in-context learning “learns” in-context: Disentangling task recognition and task learning. InFindings of the Association for Computational Linguistics: ACL 2023, pages 8298–8319, 2023

work page 2023

[9] [9]

Latent concept disentanglement in transformer-based language models

Guan Zhe Hong, Bhavya Vasudeva, Vatsal Sharan, Cyrus Rashtchian, Prabhakar Raghavan, and Rina Panigrahy. Latent concept disentanglement in transformer-based language models. InProceedings of the 14th International Conference on Learning Representations, 2026

work page 2026

[10] [10]

Separating tongue from thought: Activation patching reveals language-agnostic concept repre- sentations in transformers

Clément Dumas, Chris Wendler, Veniamin Veselovsky, Giovanni Monea, and Robert West. Separating tongue from thought: Activation patching reveals language-agnostic concept repre- sentations in transformers. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics, pages 31822–31841, 2025

work page 2025

[11] [11]

Sparse autoencoders find highly interpretable features in language models

Robert Huben, Hoagy Cunningham, Logan Riggs Smith, Aidan Ewart, and Lee Sharkey. Sparse autoencoders find highly interpretable features in language models. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024

[12] [12]

Independent subspace analysis for unsupervised learning of disentangled representations

Jan Stuehmer, Richard Turner, and Sebastian Nowozin. Independent subspace analysis for unsupervised learning of disentangled representations. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pages 1200–1210, 2020

work page 2020

[13] [13]

Fast multi-instance partial-label learning

Yin-Fang Yang, Wei Tang, and Min-Ling Zhang. Fast multi-instance partial-label learning. InProceedings of the 39th AAAI Conference on Artificial Intelligence, Philadelphia, pages 22038–22046, 2025

work page 2025

[14] [14]

Locating and editing factual associations in GPT

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT. InAdvances in Neural Information Processing Systems 35, pages 17359– 17372, 2022

work page 2022

[15] [15]

The Llama 3 Herd of Models

Llama Team. The llama 3 herd of models.arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[16] [16]

Word translation without parallel data

Guillaume Lample, Alexis Conneau, Marc’Aurelio Ranzato, Ludovic Denoyer, and Hervé Jégou. Word translation without parallel data. InProceedings of the 6th International Conference on Learning Representations, 2018

work page 2018

[17] [17]

Qwen2.5-Coder Technical Report

An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li, Tingyu X...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

What can transformers learn in-context? a case study of simple function classes

Shivam Garg, Dimitris Tsipras, Percy Liang, and Gregory Valiant. What can transformers learn in-context? a case study of simple function classes. InAdvances in Neural Information Processing Systems 35, pages 30583–30598, 2022

work page 2022

[19] [19]

What learning algorithm is in-context learning? investigations with linear models

Ekin Akyürek, Dale Schuurmans, Jacob Andreas, Tengyu Ma, and Denny Zhou. What learning algorithm is in-context learning? investigations with linear models. InProceedings of the 11th International Conference on Learning Representations, 2023

work page 2023

[20] [20]

Transformers learn in-context by gradient descent

Johannes von Oswald, Eyvind Niklasson, Ettore Randazzo, João Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InProceedings of the 40th International Conference on Machine Learning, pages 35151–35174, 2023

work page 2023

[21] [21]

Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers

Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. InFindings of the 61st Annual Meeting of the Association for Computational Linguistics, pages 4005–4019, 2023

work page 2023

[22] [22]

Transformers learn to implement preconditioned gradient descent for in-context learning

Kwangjun Ahn, Xiang Cheng, Hadi Daneshmand, and Suvrit Sra. Transformers learn to implement preconditioned gradient descent for in-context learning. InAdvances in Neural Information Processing Systems 36, pages 45614–45650, 2023

work page 2023

[23] [23]

Pretraining task diversity and the emergence of non-bayesian in-context learning for regression

Allan Raventós, Mansheej Paul, Feng Chen, and Surya Ganguli. Pretraining task diversity and the emergence of non-bayesian in-context learning for regression. InAdvances in Neural Information Processing Systems 36, pages 14228–14246, 2023

work page 2023

[24] [24]

General-purpose in- context learning by meta-learning transformers.arXiv preprint arXiv:2212.04458, 2022

Louis Kirsch, James Harrison, Jascha Sohl-Dickstein, and Luke Metz. General-purpose in- context learning by meta-learning transformers.arXiv preprint arXiv:2212.04458, 2022

work page arXiv 2022

[25] [25]

The learnability of in-context learning

Noam Wies, Yoav Levine, and Amnon Shashua. The learnability of in-context learning. In Advances in Neural Information Processing Systems 36, pages 36637–36651, 2023

work page 2023

[26] [26]

In-context Learning and Induction Heads

Catherine Olsson, Nelson Elhage, Neel Nanda, Nicholas Joseph, Nova DasSarma, Tom Henighan, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, et al. In-context learning and induction heads.arXiv preprint arXiv:2209.11895, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[27] [27]

Liu Yang, Ziqian Lin, Kangwook Lee, Dimitris Papailiopoulos, and Robert D. Nowak. Task vectors in in-context learning: Emergence, formation, and benefits. InProceedings of the Second Conference on Language Modeling, 2025

work page 2025

[28] [28]

Concept bottleneck models

Pang Wei Koh, Thao Nguyen, Yew Siang Tang, Stephen Mussmann, Emma Pierson, Been Kim, and Percy Liang. Concept bottleneck models. InProceedings of the 37th International Conference on Machine Learning, pages 5338–5348, 2020

work page 2020

[29] [29]

Editable concept bottleneck models

Lijie Hu, Chenyang Ren, Zhengyu Hu, Hongbin Lin, Cheng-Long Wang, Zhen Tan, Weimin Lyu, Jingfeng Zhang, Hui Xiong, and Di Wang. Editable concept bottleneck models. InProceedings of the 42nd International Conference on Machine Learning, pages 24678–24726, 2025

work page 2025

[30] [30]

Semi-supervised concept bottleneck models

Lijie Hu, Tianhao Huang, Huanyi Xie, Xilin Gong, Chenyang Ren, Zhengyu Hu, Lu Yu, Ping Ma, and Di Wang. Semi-supervised concept bottleneck models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 2110–2119, 2025. 13

work page 2025

[31] [31]

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV)

Been Kim, Martin Wattenberg, Justin Gilmer, Carrie Cai, James Wexler, Fernanda Viégas, and Rory Sayres. Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (TCAV). InProceedings of the 35th International Conference on Machine Learning, pages 2668–2677, 2018

work page 2018

[32] [32]

Concept whitening for interpretable image recognition

Zhi Chen, Yijie Bei, and Cynthia Rudin. Concept whitening for interpretable image recognition. Nature Machine Intelligence, 2(12):772–782, 2020

work page 2020

[33] [33]

Concept embedding models: Beyond the accuracy- explainability trade-off

Mateo Espinosa Zarlenga, Pietro Barbiero, Gabriele Ciravegna, Giuseppe Marra, Francesco Giannini, Michelangelo Diligenti, Zohreh Shams, Frederic Precioso, Stefano Melacci, Adrian Weller, Pietro Lió, and Mateja Jamnik. Concept embedding models: Beyond the accuracy- explainability trade-off. InAdvances in Neural Information Processing Systems 35, pages 2140...

work page 2022

[34] [34]

Challenging common assumptions in the unsupervised learning of disentangled representations

Francesco Locatello, Stefan Bauer, Mario Lucic, Gunnar Rätsch, Sylvain Gelly, Bernhard Schölkopf, and Olivier Bachem. Challenging common assumptions in the unsupervised learning of disentangled representations. InProceedings of the 36th International Conference on Machine Learning, pages 4114–4124, 2019

work page 2019

[35] [35]

Kingma, Ricardo Pio Monti, and Aapo Hyvärinen

Ilyes Khemakhem, Diederik P. Kingma, Ricardo Pio Monti, and Aapo Hyvärinen. Variational autoencoders and nonlinear ICA: A unifying framework. InProceedings of the 23rd International Conference on Artificial Intelligence and Statistics, pages 2207–2217, 2020

work page 2020

[36] [36]

There was never a bottleneck in concept bottleneck models

Antonio Almudévar, José Miguel Hernández-Lobato, and Alfonso Ortega. There was never a bottleneck in concept bottleneck models. InProceedings of the 14th International Conference on Learning Representations, 2026

work page 2026

[37] [37]

Investigating gender bias in language models using causal mediation analysis

Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber. Investigating gender bias in language models using causal mediation analysis. InAdvances in Neural Information Processing Systems 33, pages 12388–12401, 2020

work page 2020

[38] [38]

Causal abstractions of neural networks

Atticus Geiger, Hanson Lu, Thomas Icard, and Christopher Potts. Causal abstractions of neural networks. InAdvances in Neural Information Processing Systems 34, pages 9574–9586, 2021

work page 2021

[39] [39]

Interpretability in the wild: A circuit for indirect object identification in GPT-2 small

Kevin Ro Wang, Alexandre Variengien, Arthur Conmy, Buck Shlegeris, and Jacob Steinhardt. Interpretability in the wild: A circuit for indirect object identification in GPT-2 small. In Proceedings of the 11th International Conference on Learning Representations, 2023

work page 2023

[40] [40]

Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso

Arthur Conmy, Augustine N. Mavor-Parker, Aengus Lynch, Stefan Heimersheim, and Adrià Garriga-Alonso. Towards automated circuit discovery for mechanistic interpretability. In Advances in Neural Information Processing Systems 36, pages 16318–16352, 2023

work page 2023

[41] [41]

Towards best practices of activation patching in language models: Metrics and methods

Fred Zhang and Neel Nanda. Towards best practices of activation patching in language models: Metrics and methods. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024

[42] [42]

Is this the subspace you are looking for? an interpretability illusion for subspace activation patching

Aleksandar Makelov, Georg Lange, Atticus Geiger, and Neel Nanda. Is this the subspace you are looking for? an interpretability illusion for subspace activation patching. InProceedings of the 12th International Conference on Learning Representations, 2024. 14

work page 2024

[43] [43]

How do transformers learn in-context beyond simple functions? a case study on learning with repre- sentations

Tianyu Guo, Wei Hu, Song Mei, Huan Wang, Caiming Xiong, Silvio Savarese, and Yu Bai. How do transformers learn in-context beyond simple functions? a case study on learning with repre- sentations. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024

[44] [44]

In-context linear regression demystified: Training dynamics and mechanistic interpretability of multi-head softmax attention

Jianliang He, Xintian Pan, Siyu Chen, and Zhuoran Yang. In-context linear regression demystified: Training dynamics and mechanistic interpretability of multi-head softmax attention. InProceedings of the 42nd International Conference on Machine Learning, pages 22686–22742, 2025

work page 2025

[45] [45]

Hoerl and Robert W

Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970

work page 1970

[46] [46]

Bishop.Pattern Recognition and Machine Learning

Christopher M. Bishop.Pattern Recognition and Machine Learning. Springer, 2006

work page 2006

[47] [47]

Murphy.Machine Learning: A Probabilistic Perspective

Kevin P. Murphy.Machine Learning: A Probabilistic Perspective. MIT Press, 2012

work page 2012

[48] [48]

ProMIPL: A probabilistic generative model for multi-instance partial-label learning

Yin-Fang Yang, Wei Tang, and Min-Ling Zhang. ProMIPL: A probabilistic generative model for multi-instance partial-label learning. InProceedings of the 24th IEEE International Conference on Data Mining, pages 560–569, 2024

work page 2024

[49] [49]

Springer, 2 edition, 2009

Trevor Hastie, Robert Tibshirani, and Jerome Friedman.The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer, 2 edition, 2009

work page 2009

[50] [50]

Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint

Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge University Press, 2019

work page 2019

[51] [51]

In-context learning creates task vectors

Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. In Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9318–9333, 2023

work page 2023

[52] [52]

Sheng Liu, Haotian Ye, Lei Xing, and James Y. Zou. In-context vectors: Making in context learning more effective and controllable through latent space steering. InProceedings of the 41st International Conference on Machine Learning, volume 235, pages 32287–32307, 2024

work page 2024

[53] [53]

Li, Arnab Sen Sharma, Aaron Mueller, Byron C

Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function vectors in large language models. InProceedings of the 12th International Conference on Learning Representations, 2024

work page 2024

[54] [54]

Language models implement simple Word2Vec-style vector arithmetic

Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. Language models implement simple Word2Vec-style vector arithmetic. InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 5030–5047, 2024

work page 2024

[55] [55]

Understanding task vectors in in-context learning: Emergence, functionality, and limitations

Yuxin Dong, Jiachen Jiang, Zhihui Zhu, and Xia Ning. Understanding task vectors in in-context learning: Emergence, functionality, and limitations. InProceedings of the 14th International Conference on Learning Representations, 2026

work page 2026

[56] [56]

corrupted

Dake Bu, Wei Huang, Andi Han, Atsushi Nitanda, Qingfu Zhang, Hau-San Wong, and Taiji Suzuki. Provable in-context vector arithmetic via retrieving task concepts. InProceedings of the 42nd International Conference on Machine Learning, pages 5669–5724, 2025. 15 A Additional Theory and Proofs A.1 Noisy labels in concept space We extend Equation (1) to homosce...

work page 2025