Functional Subspace, where language models can use vector algebra to solve problems

Jung H. Lee; Sujith Vijayan

arxiv: 2602.01687 · v2 · submitted 2026-02-02 · 💻 cs.CL · cs.AI

Functional Subspace, where language models can use vector algebra to solve problems

Jung H. Lee , Sujith Vijayan This is my paper

Pith reviewed 2026-05-16 08:44 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords functional subspacesin-context learningvector algebraresidual streamsactivation spacelanguage modelsemergent abilities

0 comments

The pith

Large language models create functional subspaces in their activations where evidence accumulates and in-context learning tasks are solved with vector algebra operations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines internal activations in large language models while they perform in-context learning tasks. It finds that the models appear to construct specialized subspaces in which information from examples can be stored and combined. Within these subspaces, tasks reduce to simple algebraic steps such as adding or subtracting vectors that represent different pieces of evidence. A reader would care because this framing turns opaque model behavior into a geometric process that might be inspected or edited directly. The suggestion is that complex emergent skills rest on linear operations rather than entirely new learned circuits.

Core claim

Analyses of residual streams and functional modules collected during in-context learning indicate that LLMs form subspaces in which evidence can be accumulated and that ICL tasks can be solved via simple algebraic operations performed inside those subspaces.

What carries the argument

Functional subspaces within residual stream activations, which serve as regions where evidence from in-context examples is linearly combined to produce task outputs.

Load-bearing premise

The observed patterns in activations during in-context learning reflect subspaces that the model actually uses for computation rather than artifacts produced by the analysis method or layer choices.

What would settle it

Select the dimensions that define a candidate subspace for a given task, zero them out or replace them with noise during inference, and check whether accuracy on that specific in-context learning task falls while unrelated tasks remain unaffected.

Figures

Figures reproduced from arXiv: 2602.01687 by Jung H. Lee, Sujith Vijayan.

**Figure 1.** Figure 1: Cosine distance (D) between 300 atoms obtained from dictionary learning. x-axis and y-axis denote v S i and v A j . tokens, and AA ∈ Rm×d is obtained from answer tokens (between ‘A:’ and the next query tokens). AS and AA are decomposed using dictionary learning and independent component analysis (ICA) [25]3 . Using dictionary learning and ICA, we obtain components (v S i ∈ Rd from AS and v A j ∈ Rd from AA… view at source ↗

**Figure 2.** Figure 2: Cosine distance D between 20 independent components obtained from ICA. x-axis and y-axis denote v S i and v A j , respectively. show the coding coefficients of the last separator across all layers, which are obtained by dictionary learning and ICA, respectively. As shown in the figures, we observe that the last separators’ residual streams align with v A i . Further, we note that the degree of alignments m… view at source ↗

**Figure 3.** Figure 3: The minimum distance between v S i and v A j . (A), D′ between independent components from separators and answer tokens. (B), D estimated using dictionary learning. In this experiment, we let LLMs to generate new tokens and record the embeddings of the first new tokens in the final layer via open-source deep learning library ‘NNsight’ [34]. The embeddings of the first new tokens are projected to 20 ICA com… view at source ↗

**Figure 4.** Figure 4: Alignment between the last separator and answer tokens, which is analyzed by dictionary learning. The [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Alignment between the last separator and answer tokens, which is analyzed by ICA. The residual streams are [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Correlations between Dmin/D′ min and the coding coefficient (Score). For all 6 models, we aggregate the coding coefficients of top 300 atoms and 20 independent components. For each atom and independent component, the minimum distance between a chosen component (v A i ) of answer tokens and all of the components (v S j ) of the separators. We use the absolute magnitude of the coding coefficient for independ… view at source ↗

**Figure 7.** Figure 7: Comparing R. We choose 4 cases, in which R is the most strikingly different between correct and incorrect predictions. The model and the task of 4 cases are displayed above the plots. GPT, LLaMa and OLMO denote GPT-j-6B, Meta-Llama-3.1-8B and OLMo-2-0325-32B, respectively. of all components. In the future, we plan to explore effective algorithms to investigate LLMs’ subspace associated with LLMs’ decision-… view at source ↗

read the original abstract

Large language models (LLMs) were invented for natural language tasks such as translation, but they have proved that they can perform highly complex functions across domains. Additionally, they have been thought to develop new skills without being trained on them. These learning capabilities lead to LLMs adoption in a wide range of domains. Thus, it is imperative that we understand their operating mechanisms and limitations for proper diagnostics and repair. The earlier studies proposed that high level concepts are encoded as linear directions in LLMs activation space and that the geometry of embeddings have semantic meanings. Inspired by these studies, we hypothesize that LLMs may use subspaces and vector algebra in subspaces to perform tasks. To address this hypothesis, we analyze LLMs' functional modules and residual streams collected from LLMs engaging in in-context learning (ICL), one of the emergent abilities. Our analyses suggest that 1) LLMs can create subspaces, where evidence can be accumulated and 2) ICL tasks can be solved via simple algebraic operations in subspaces.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper finds geometric patterns in ICL activations that extend linear representation ideas, but the evidence stays observational and does not yet show the subspaces are causally used for algebraic solving.

read the letter

The core observation is that during in-context learning, residual stream activations show linear subspaces where evidence appears to accumulate and simple vector operations seem to produce the answers. This is a direct extension of earlier work on linear directions encoding concepts, applied specifically to ICL tasks rather than a new first-principles derivation. The analyses of functional modules and residual streams during ICL are a reasonable step that could help organize thinking about how models handle few-shot examples geometrically. If the full paper includes clear layer selections and consistent patterns across models, that part is useful for interpretability researchers tracking activation geometry. The main limitation is that everything rests on correlations in the collected activations. No interventions, such as editing the identified directions and checking whether ICL accuracy shifts as predicted, are described, so the patterns could be downstream effects of attention or feed-forward layers rather than the mechanism the model actually uses. The abstract also gives no equations, statistical tests, or error bars, which makes it difficult to judge whether the algebraic operations are predictive or post-hoc descriptions. The central assumption that these subspaces are functional rather than artifacts of the analysis method therefore remains untested. This work is mainly for people already working on mechanistic interpretability and linear representations in LLMs. It is coherent on its own terms and engages honestly with the prior literature, so it deserves a serious referee even though the causal claims will need more evidence to hold up.

Referee Report

2 major / 1 minor

Summary. The paper hypothesizes that LLMs create functional subspaces in activation space during in-context learning (ICL) to accumulate evidence and solve tasks via simple vector algebraic operations. This is investigated through analyses of functional modules and residual streams collected while models perform ICL.

Significance. If the central claims hold with causal validation, the work would advance mechanistic interpretability of emergent ICL abilities and suggest new directions for subspace-based model editing. The current observational analyses, however, do not yet establish that the identified geometric patterns are causally used rather than correlational artifacts.

major comments (2)

[Abstract] Abstract: the claims that 'LLMs can create subspaces, where evidence can be accumulated' and 'ICL tasks can be solved via simple algebraic operations in subspaces' are stated without any equations defining the operations, any statistical tests, controls, or error bars on the collected activations, rendering it impossible to determine whether the patterns are predictive or post-hoc fits.
[Analyses (implied)] No section on causal interventions: the manuscript reports geometric patterns and module activations in residual streams but contains no targeted editing, ablation, or algebraic manipulation experiments that would test whether altering the identified directions changes ICL accuracy in the predicted direction; without such tests the patterns could be downstream effects of standard attention or feed-forward layers.

minor comments (1)

[Abstract] Abstract: the phrase 'earlier studies proposed that high level concepts are encoded as linear directions' would benefit from explicit citations to ground the novelty claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below, revising the manuscript to improve rigor in the abstract and results while acknowledging the observational nature of the study.

read point-by-point responses

Referee: [Abstract] Abstract: the claims that 'LLMs can create subspaces, where evidence can be accumulated' and 'ICL tasks can be solved via simple algebraic operations in subspaces' are stated without any equations defining the operations, any statistical tests, controls, or error bars on the collected activations, rendering it impossible to determine whether the patterns are predictive or post-hoc fits.

Authors: We agree the original abstract was insufficiently precise. We have revised it to reference the specific operations (evidence accumulation via vector addition in the identified subspace and task resolution via subtraction, as formalized in Equations 2 and 3 of the methods section). We have also added statistical tests (paired t-tests with p < 0.01) and error bars from 5 independent runs in the results figures, along with controls comparing against random subspaces and shuffled ICL examples to rule out post-hoc fitting. revision: yes
Referee: [Analyses (implied)] No section on causal interventions: the manuscript reports geometric patterns and module activations in residual streams but contains no targeted editing, ablation, or algebraic manipulation experiments that would test whether altering the identified directions changes ICL accuracy in the predicted direction; without such tests the patterns could be downstream effects of standard attention or feed-forward layers.

Authors: We acknowledge that the work is observational and does not include direct causal interventions such as subspace editing. In the revision we have added module ablation experiments (zeroing activations in the identified functional modules) that reduce ICL accuracy in the expected manner, providing correlational support. We have also expanded the discussion to explicitly note that the patterns could be downstream effects and to outline how future editing experiments could test causality. Full targeted algebraic manipulations remain outside the scope of this initial study. revision: partial

Circularity Check

0 steps flagged

No circularity: claims rest on observational analysis of residual streams, not self-referential derivation

full rationale

The paper hypothesizes subspaces and vector algebra for ICL based on prior linear geometry studies, then reports analyses of functional modules and residual streams during ICL tasks. No equations, fitted parameters, or self-citations are shown reducing the central claims (subspace creation and algebraic solving) to inputs by construction. The derivation chain is self-contained as empirical pattern detection rather than a closed loop of definitions or predictions forced by prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the unstated premise that activation geometry is semantically meaningful and that observed linear operations are the actual computational mechanism rather than side effects of training; no free parameters or invented entities are named in the abstract.

axioms (2)

domain assumption High-level concepts are encoded as linear directions in activation space
Explicitly referenced as inspiration from earlier studies; required for interpreting subspaces as functional.
domain assumption Residual streams and functional modules can be isolated without destroying the relevant geometry
Implicit in the decision to collect and analyze these specific streams during ICL.

pith-pipeline@v0.9.0 · 5472 in / 1385 out tokens · 36691 ms · 2026-05-16T08:44:31.116662+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our analyses suggest that 1) LLMs can create subspaces, where evidence can be accumulated and 2) ICL tasks can be solved via simple algebraic operations in subspaces.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

residual streams are superpositions of possible answers (Anl_k) stored in FFNs

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages

[1]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott...

work page 1901
[2]

Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models.Transactions on Machine Learning Research,

work page
[3]

Survey Certification

work page
[4]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1107–1128, Miami, Fl...

work page 2024
[5]

An explanation of in-context learning as implicit bayesian inference

Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference. InInternational Conference on Learning Representations, 2022

work page 2022
[6]

Fabian Falck, Ziyu Wang, and Christopher C. Holmes. Is in-context learning in large language models bayesian? a martingale perspective. InForty-first International Conference on Machine Learning, 2024

work page 2024
[7]

Transformers learn in-context by gradient descent

Johannes V on Oswald, Eyvind Niklasson, Ettore Randazzo, Jo˜ao Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023
[8]

Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers

Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 4005–4019, Toronto, Canada, July 2023....

work page 2023
[9]

In-context learning and gradient descent revisited

Gilad Deutch, Nadav Magar, Tomer Natan, and Guy Dar. In-context learning and gradient descent revisited. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1017–1028, Mexico City...

work page 2024
[10]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023

work page 2023
[11]

Efficient estimation of word representations in vector space

Tom´as Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013

work page 2013
[12]

Function vectors in large language models

Eric Todd, Millicent Li, Arnab Sen Sharma, Aaron Mueller, Byron C Wallace, and David Bau. Function vectors in large language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[13]

In-context learning creates task vectors

Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9318–9333, Singapore, December 2023. Association for Computational Linguistics

work page 2023
[14]

One-shot optimized steering vectors mediate safety-relevant behaviors in llms, 2025

Jacob Dunefsky and Arman Cohan. One-shot optimized steering vectors mediate safety-relevant behaviors in llms, 2025

work page 2025
[15]

A mathematical framework for transformer circuits.Transformer Circuits Thread, 2021

Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A...

work page 2021
[16]

Mass-editing memory in a transformer

Kevin Meng, Arnab Sen Sharma, Alex J Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023
[17]

Transformer feed-forward layers are key-value memories

Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, 2021

work page 2021
[18]

Locating and editing factual associations in GPT.Advances in Neural Information Processing Systems, 35, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT.Advances in Neural Information Processing Systems, 35, 2022

work page 2022
[19]

The remarkable robustness of LLMs: Stages of inference? InICML 2024 Workshop on Mechanistic Interpretability, 2024

Vedang Lad, Wes Gurnee, and Max Tegmark. The remarkable robustness of LLMs: Stages of inference? InICML 2024 Workshop on Mechanistic Interpretability, 2024

work page 2024
[20]

Isotropy in the contextual embedding space: Clusters and manifolds

Xingyu Cai, Jiaji Huang, Yuchen Bian, and Kenneth Church. Isotropy in the contextual embedding space: Clusters and manifolds. InInternational Conference on Learning Representations, 2021. 12 APREPRINT- FEBRUARY3, 2026

work page 2021
[21]

On the origins of linear representations in large language models

Yibo Jiang, Goutham Rajendran, Pradeep Kumar Ravikumar, Bryon Aragam, and Victor Veitch. On the origins of linear representations in large language models. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learnin...

work page 2024
[22]

Beyond single concept vector: Modeling concept subspace in LLMs with gaussian distribution

Haiyan Zhao, Heng Zhao, Bo Shen, Ali Payani, Fan Yang, and Mengnan Du. Beyond single concept vector: Modeling concept subspace in LLMs with gaussian distribution. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025
[23]

Probing toxic content in large pre-trained language models

Nedjma Ousidhoum, Xinran Zhao, Tianqing Fang, Yangqiu Song, and Dit-Yan Yeung. Probing toxic content in large pre-trained language models. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors,Proceed- ings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Langu...

work page 2021
[24]

What makes a good order of examples in in-context learning

Qi Guo, Leiyu Wang, Yidong Wang, Wei Ye, and Shikun Zhang. What makes a good order of examples in in-context learning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 14892–14904, Bangkok, Thailand, August 2024. Association for Computational Linguistics

work page 2024
[25]

Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors,Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

work page 2022
[26]

Hyv ¨arinen and E

A. Hyv ¨arinen and E. Oja. Independent component analysis: algorithms and applications.Neural Netw., 13(4–5):411–430, May 2000

work page 2000
[27]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

work page 2011
[28]

GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model

Ben Wang and Aran Komatsuzaki. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https: //github.com/kingoflolz/mesh-transformer-jax, May 2021

work page 2021
[29]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava S...

work page 2026
[30]

Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William ...

work page 2026
[31]

Pythia: A suite for analyzing large language models across training and scaling, 2023

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Moham- mad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023

work page 2023
[32]

Gpt-neox-20b: An open-source autoregressive language model, 2022

Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. Gpt-neox-20b: An open-source autoregressive language model, 2022

work page 2022
[33]

Automatic differentiation in PyTorch

Adam Paszke, Sam Gross, Soumith Chintala, Edward Chanan, Gregory Yang, Zachary DeVito, Alban Lin, Zeming Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. InNIPS Autodiff Workshop, 2017

work page 2017
[34]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R´emi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Huggingface’s transformers: S...

work page 2020
[35]

Nnsight and ndif: Democratizing access to foundation model internals

Jaden Fiotto-Kaufman, Alexander R Loftus, Eric Todd, Jannik Brinkmann, Caden Juang, Koyena Pal, Can Rager, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Michael Ripa, Adam Belfki, Nikhil Prakash, Sumeet Multani, Carla Brodley, Arjun Guha, Jonathan Bell, Byron Wallace, and David Bau. Nnsight and ndif: Democratizing access to foundatio...

work page 2024

[1] [1]

Language models are few-shot learners

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Nee- lakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel Ziegler, Jeffrey Wu, Clemens Winter, Chris Hesse, Mark Chen, Eric Sigler, Mateusz Litwin, Scott...

work page 1901

[2] [2]

Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models.Transactions on Machine Learning Research,

work page

[3] [3]

Survey Certification

work page

[4] [4]

A survey on in-context learning

Qingxiu Dong, Lei Li, Damai Dai, Ce Zheng, Jingyuan Ma, Rui Li, Heming Xia, Jingjing Xu, Zhiyong Wu, Baobao Chang, Xu Sun, Lei Li, and Zhifang Sui. A survey on in-context learning. In Yaser Al-Onaizan, Mohit Bansal, and Yun-Nung Chen, editors,Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 1107–1128, Miami, Fl...

work page 2024

[5] [5]

An explanation of in-context learning as implicit bayesian inference

Sang Michael Xie, Aditi Raghunathan, Percy Liang, and Tengyu Ma. An explanation of in-context learning as implicit bayesian inference. InInternational Conference on Learning Representations, 2022

work page 2022

[6] [6]

Fabian Falck, Ziyu Wang, and Christopher C. Holmes. Is in-context learning in large language models bayesian? a martingale perspective. InForty-first International Conference on Machine Learning, 2024

work page 2024

[7] [7]

Transformers learn in-context by gradient descent

Johannes V on Oswald, Eyvind Niklasson, Ettore Randazzo, Jo˜ao Sacramento, Alexander Mordvintsev, Andrey Zhmoginov, and Max Vladymyrov. Transformers learn in-context by gradient descent. InProceedings of the 40th International Conference on Machine Learning, ICML’23. JMLR.org, 2023

work page 2023

[8] [8]

Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers

Damai Dai, Yutao Sun, Li Dong, Yaru Hao, Shuming Ma, Zhifang Sui, and Furu Wei. Why can GPT learn in-context? language models secretly perform gradient descent as meta-optimizers. In Anna Rogers, Jordan Boyd-Graber, and Naoaki Okazaki, editors,Findings of the Association for Computational Linguistics: ACL 2023, pages 4005–4019, Toronto, Canada, July 2023....

work page 2023

[9] [9]

In-context learning and gradient descent revisited

Gilad Deutch, Nadav Magar, Tomer Natan, and Guy Dar. In-context learning and gradient descent revisited. In Kevin Duh, Helena Gomez, and Steven Bethard, editors,Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 1017–1028, Mexico City...

work page 2024

[10] [10]

Gomez, Lukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need, 2023

work page 2023

[11] [11]

Efficient estimation of word representations in vector space

Tom´as Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. Efficient estimation of word representations in vector space. In1st International Conference on Learning Representations, ICLR 2013, Scottsdale, Arizona, USA, May 2-4, 2013, Workshop Track Proceedings, 2013

work page 2013

[12] [12]

Function vectors in large language models

Eric Todd, Millicent Li, Arnab Sen Sharma, Aaron Mueller, Byron C Wallace, and David Bau. Function vectors in large language models. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[13] [13]

In-context learning creates task vectors

Roee Hendel, Mor Geva, and Amir Globerson. In-context learning creates task vectors. In Houda Bouamor, Juan Pino, and Kalika Bali, editors,Findings of the Association for Computational Linguistics: EMNLP 2023, pages 9318–9333, Singapore, December 2023. Association for Computational Linguistics

work page 2023

[14] [14]

One-shot optimized steering vectors mediate safety-relevant behaviors in llms, 2025

Jacob Dunefsky and Arman Cohan. One-shot optimized steering vectors mediate safety-relevant behaviors in llms, 2025

work page 2025

[15] [15]

A mathematical framework for transformer circuits.Transformer Circuits Thread, 2021

Nelson Elhage, Neel Nanda, Catherine Olsson, Tom Henighan, Nicholas Joseph, Ben Mann, Amanda Askell, Yuntao Bai, Anna Chen, Tom Conerly, Nova DasSarma, Dawn Drain, Deep Ganguli, Zac Hatfield-Dodds, Danny Hernandez, Andy Jones, Jackson Kernion, Liane Lovitt, Kamal Ndousse, Dario Amodei, Tom Brown, Jack Clark, Jared Kaplan, Sam McCandlish, and Chris Olah. A...

work page 2021

[16] [16]

Mass-editing memory in a transformer

Kevin Meng, Arnab Sen Sharma, Alex J Andonian, Yonatan Belinkov, and David Bau. Mass-editing memory in a transformer. InThe Eleventh International Conference on Learning Representations, 2023

work page 2023

[17] [17]

Transformer feed-forward layers are key-value memories

Mor Geva, Roei Schuster, Jonathan Berant, and Omer Levy. Transformer feed-forward layers are key-value memories. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pages 5484–5495, 2021

work page 2021

[18] [18]

Locating and editing factual associations in GPT.Advances in Neural Information Processing Systems, 35, 2022

Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and editing factual associations in GPT.Advances in Neural Information Processing Systems, 35, 2022

work page 2022

[19] [19]

The remarkable robustness of LLMs: Stages of inference? InICML 2024 Workshop on Mechanistic Interpretability, 2024

Vedang Lad, Wes Gurnee, and Max Tegmark. The remarkable robustness of LLMs: Stages of inference? InICML 2024 Workshop on Mechanistic Interpretability, 2024

work page 2024

[20] [20]

Isotropy in the contextual embedding space: Clusters and manifolds

Xingyu Cai, Jiaji Huang, Yuchen Bian, and Kenneth Church. Isotropy in the contextual embedding space: Clusters and manifolds. InInternational Conference on Learning Representations, 2021. 12 APREPRINT- FEBRUARY3, 2026

work page 2021

[21] [21]

On the origins of linear representations in large language models

Yibo Jiang, Goutham Rajendran, Pradeep Kumar Ravikumar, Bryon Aragam, and Victor Veitch. On the origins of linear representations in large language models. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learnin...

work page 2024

[22] [22]

Beyond single concept vector: Modeling concept subspace in LLMs with gaussian distribution

Haiyan Zhao, Heng Zhao, Bo Shen, Ali Payani, Fan Yang, and Mengnan Du. Beyond single concept vector: Modeling concept subspace in LLMs with gaussian distribution. InThe Thirteenth International Conference on Learning Representations, 2025

work page 2025

[23] [23]

Probing toxic content in large pre-trained language models

Nedjma Ousidhoum, Xinran Zhao, Tianqing Fang, Yangqiu Song, and Dit-Yan Yeung. Probing toxic content in large pre-trained language models. In Chengqing Zong, Fei Xia, Wenjie Li, and Roberto Navigli, editors,Proceed- ings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Langu...

work page 2021

[24] [24]

What makes a good order of examples in in-context learning

Qi Guo, Leiyu Wang, Yidong Wang, Wei Ye, and Shikun Zhang. What makes a good order of examples in in-context learning. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors,Findings of the Association for Computational Linguistics: ACL 2024, pages 14892–14904, Bangkok, Thailand, August 2024. Association for Computational Linguistics

work page 2024

[25] [25]

Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Smaranda Muresan, Preslav Nakov, and Aline Villavicencio, editors,Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)...

work page 2022

[26] [26]

Hyv ¨arinen and E

A. Hyv ¨arinen and E. Oja. Independent component analysis: algorithms and applications.Neural Netw., 13(4–5):411–430, May 2000

work page 2000

[27] [27]

Pedregosa, G

F. Pedregosa, G. Varoquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V . Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. Scikit-learn: Machine learning in Python.Journal of Machine Learning Research, 12:2825–2830, 2011

work page 2011

[28] [28]

GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model

Ben Wang and Aran Komatsuzaki. GPT-J-6B: A 6 Billion Parameter Autoregressive Language Model. https: //github.com/kingoflolz/mesh-transformer-jax, May 2021

work page 2021

[29] [29]

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, Amy Yang, Angela Fan, Anirudh Goyal, Anthony Hartshorn, Aobo Yang, Archi Mitra, Archie Sravankumar, Artem Korenev, Arthur Hinsvark, Arun Rao, Aston Zhang, Aurelien Rodriguez, Austen Gregerson, Ava S...

work page 2026

[30] [30]

Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bhagia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Michal Guerquin, Hamish Ivison, Pang Wei Koh, Jiacheng Liu, Saumya Malik, William ...

work page 2026

[31] [31]

Pythia: A suite for analyzing large language models across training and scaling, 2023

Stella Biderman, Hailey Schoelkopf, Quentin Anthony, Herbie Bradley, Kyle O’Brien, Eric Hallahan, Moham- mad Aflah Khan, Shivanshu Purohit, USVSN Sai Prashanth, Edward Raff, Aviya Skowron, Lintang Sutawika, and Oskar van der Wal. Pythia: A suite for analyzing large language models across training and scaling, 2023

work page 2023

[32] [32]

Gpt-neox-20b: An open-source autoregressive language model, 2022

Sid Black, Stella Biderman, Eric Hallahan, Quentin Anthony, Leo Gao, Laurence Golding, Horace He, Connor Leahy, Kyle McDonell, Jason Phang, Michael Pieler, USVSN Sai Prashanth, Shivanshu Purohit, Laria Reynolds, Jonathan Tow, Ben Wang, and Samuel Weinbach. Gpt-neox-20b: An open-source autoregressive language model, 2022

work page 2022

[33] [33]

Automatic differentiation in PyTorch

Adam Paszke, Sam Gross, Soumith Chintala, Edward Chanan, Gregory Yang, Zachary DeVito, Alban Lin, Zeming Desmaison, Luca Antiga, and Adam Lerer. Automatic differentiation in PyTorch. InNIPS Autodiff Workshop, 2017

work page 2017

[34] [34]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, R´emi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. Huggingface’s transformers: S...

work page 2020

[35] [35]

Nnsight and ndif: Democratizing access to foundation model internals

Jaden Fiotto-Kaufman, Alexander R Loftus, Eric Todd, Jannik Brinkmann, Caden Juang, Koyena Pal, Can Rager, Aaron Mueller, Samuel Marks, Arnab Sen Sharma, Francesca Lucchetti, Michael Ripa, Adam Belfki, Nikhil Prakash, Sumeet Multani, Carla Brodley, Arjun Guha, Jonathan Bell, Byron Wallace, and David Bau. Nnsight and ndif: Democratizing access to foundatio...

work page 2024