arxiv: 2510.01685 · v2 · submitted 2025-10-02 · 💻 cs.CL · cs.AI

How Do Language Models Compose Functions?

Apoorv Khandelwal , Ellie Pavlick This is my paper

Pith reviewed 2026-05-18 11:00 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords language modelscompositionalitymechanistic interpretabilitytwo-hop tasksembedding geometryresidual streamfactual recall

0 comments p. Extension

The pith

Language models solve two-hop tasks either by computing the intermediate result or by direct mapping, with embedding geometry deciding which path is taken.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether language models compose functions like g(f(x)) by actually computing the first function then the second, or by some other route. It confirms that models still show a compositionality gap: knowing how to compute f(x) and g(z) does not guarantee they can compute g(f(x)). By reading out residual stream activations with linear probes, the authors identify two distinct mechanisms: one that computes the intermediate f(x) on the path to the answer and one that reaches the answer without any detectable trace of that intermediate. The geometry of the embedding space, specifically whether the task appears as a simple translation from input to final output, strongly predicts which mechanism the model uses.

Core claim

Modern LLMs solve two-hop factual recall tasks expressed as g(f(x)) using either a compositional mechanism that computes the intermediate f(x) along the way or a direct mechanism with no detectable signature of f(x), and embedding space geometry determines which mechanism is employed.

What carries the argument

Linear decoding of residual stream activations to detect or rule out computation of the intermediate variable f(x) during two-hop factual recall.

If this is right

Tasks whose embedding-space representation is a direct translation from x to g(f(x)) tend to be solved by the direct mechanism.
The compositionality gap arises when models can reach the answer without computing the intermediate step.
Models can employ different mechanisms for different tasks depending on how those tasks sit in embedding space.
Idiomatic factual recall favors direct solving when geometry permits it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Changing embedding geometry through training data or fine-tuning might shift models toward more compositional processing on novel inputs.
Direct mechanisms could be more brittle when the model encounters slight variations in how facts are phrased.
The same split between compositional and direct routes may appear in other multi-step reasoning problems such as multi-hop question answering or arithmetic.
More powerful detection methods beyond linear probes could reveal hidden compositional structure even in cases currently classified as direct.

Load-bearing premise

Linear probes on residual activations can detect whether the intermediate f(x) was computed, even if it is represented in some other non-linear or distributed form.

What would settle it

A two-hop task where linear probes show no trace of f(x) yet a more complete analysis reveals a non-linear representation of f(x) that the model actually uses to reach the correct output.

Figures

Figures reproduced from arXiv: 2510.01685 by Apoorv Khandelwal, Ellie Pavlick.

**Figure 2.** Figure 2: Compositionality gap (dashed purple line; lower is better) of various models aggregated [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: (a–b) Processing signatures aggregated over examples (across all tasks) in which Llama 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: (a) Strong correlation across tasks between presence of intermediate variables (heuristic [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: We illustrate the monotonically diminishing improvements to the compositionality gap [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗

**Figure 6.** Figure 6: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗

**Figure 7.** Figure 7: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

**Figure 9.** Figure 9: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗

**Figure 10.** Figure 10: Aggregate processing signatures (using the token identity patchscope) for each of our [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗

**Figure 11.** Figure 11: Aggregate processing signatures (using the token identity patchscope) for each of our tasks, [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Correlation across tasks (r 2 = 0.35) for embedding space task linearity and presence of intermediate variables. Analogous to Fig. 4a, using the intermediate variable metric from the token identity patchscope. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Causal effects on predicted values after patching from [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Causal effects on predicted values after patching from [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: Relationships between presence of intermediate variables and embedding space linearity [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

read the original abstract

While large language models (LLMs) appear to be increasingly capable of solving compositional tasks, it is an open question whether they do so using compositional mechanisms. In this work, we investigate how feedforward LLMs solve two-hop factual recall tasks, which can be expressed compositionally as $g(f(x))$. We first confirm that modern LLMs continue to suffer from the "compositionality gap", i.e. their ability to compute both $z = f(x)$ and $y = g(z)$ does not entail their ability to compute the composition $y = g(f(x))$. We then decode residual stream representations and identify two processing mechanisms: one which solves tasks $\textit{compositionally}$, computing $f(x)$ along the way to $g(f(x))$, and one which solves them $\textit{directly}$, without any detectable signature of the intermediate variable $f(x)$. Finally, we find that embedding space geometry is strongly related to which mechanism is employed, where the idiomatic mechanism is dominant when tasks are represented by translations from $x$ to $g(f(x))$ in the embedding spaces. We fully release our data and code at: https://github.com/apoorvkh/composing-functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLMs handle two-hop recall either by computing the intermediate or bypassing it, with embedding geometry predicting the choice, though linear probes carry the main risk.

read the letter

The main point from this paper is that language models tackle two-hop factual recall tasks in one of two ways. Either they compute the intermediate result f(x) as they go toward the final answer g(f(x)), or they skip that and go direct without any sign of the middle step showing up in the activations. They also find that this choice tracks closely with the geometry in the embedding spaces, where translations from the input to the final output favor the direct approach.

Referee Report

2 major / 2 minor

Summary. The paper investigates whether LLMs solve two-hop factual recall tasks of the form g(f(x)) using compositional mechanisms. It first verifies the compositionality gap in modern models. It then uses linear probes on residual stream activations to identify two mechanisms: compositional (where f(x) is computed and linearly decodable en route to g(f(x))) versus direct (no detectable signature of the intermediate f(x)). It reports that embedding-space geometry strongly predicts which mechanism is used, with idiomatic translations from x to g(f(x)) favoring the direct route. Data and code are released.

Significance. If the reported mechanistic distinction and geometry correlation hold under more robust tests, the work would advance mechanistic interpretability by providing an observational taxonomy of how LLMs handle composition and a potential geometric predictor of internal strategy. The public release of code and data is a clear strength that enables follow-up work.

major comments (2)

[§4.2] §4.2 (Probe Analysis): The partition into compositional versus direct mechanisms rests on linear probes failing to recover f(x) in the direct class. This does not rule out non-linear or distributed representations of the intermediate variable, which could still reflect internal composition without triggering the chosen probes and would collapse the reported dichotomy.
[§5] §5 (Geometry Correlation): The claimed strong relation between embedding geometry and mechanism choice lacks controls for confounds such as task frequency or lexical overlap; without these, it is unclear whether geometry is causal or merely correlated with the probe outcomes.

minor comments (2)

[§3] Notation for the residual stream positions and layer indices is introduced without a consolidated table, making it hard to track which activations are probed at each step.
[Figure 2] Figure 2 caption does not specify the exact statistical test or multiple-comparison correction used for the reported significance levels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comments point by point below, making revisions where we agree that additional analysis or clarification is warranted.

read point-by-point responses

Referee: [§4.2] §4.2 (Probe Analysis): The partition into compositional versus direct mechanisms rests on linear probes failing to recover f(x) in the direct class. This does not rule out non-linear or distributed representations of the intermediate variable, which could still reflect internal composition without triggering the chosen probes and would collapse the reported dichotomy.

Authors: We thank the referee for pointing this out. Our classification into compositional and direct mechanisms is based on the detectability of the intermediate f(x) using linear probes on the residual stream activations, following standard practices in mechanistic interpretability. While this does not preclude the existence of non-linear representations, the absence of a linear signature is a meaningful distinction for our taxonomy. In the revised manuscript, we will expand §4.2 to explicitly discuss this limitation and note that our 'direct' category means no linearly decodable intermediate. We will also add results from non-linear probes (e.g., small MLPs) in the appendix to test for more complex representations, though these may not alter the main conclusions. revision: partial
Referee: [§5] §5 (Geometry Correlation): The claimed strong relation between embedding geometry and mechanism choice lacks controls for confounds such as task frequency or lexical overlap; without these, it is unclear whether geometry is causal or merely correlated with the probe outcomes.

Authors: We agree that establishing the relationship between embedding geometry and mechanism choice would benefit from controls for potential confounds. In the original analysis, we focused on the geometric properties as an observational correlate. For the revision, we will introduce controls by subsampling tasks to balance for frequency and lexical overlap, and re-evaluate the correlation strength. This will be added to §5, along with a discussion on whether geometry appears to be a robust predictor independent of these factors. We believe this will strengthen the claim without overclaiming causality. revision: yes

Circularity Check

0 steps flagged

No significant circularity in observational activation analysis

full rationale

The paper's central findings rest on empirical experiments: confirming the compositionality gap via task performance, then using linear probes on residual stream activations to classify tasks as compositional (detectable f(x)) or direct (no detectable signature), followed by correlation with embedding geometry. These steps are observational classifications and measurements rather than derivations that reduce to inputs by construction. No equations or self-referential definitions appear; no fitted parameters are relabeled as predictions; no load-bearing self-citations or uniqueness theorems are invoked. The analysis is self-contained against external benchmarks such as probe-based decoding experiments and does not rely on prior author work to force its taxonomy or conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard mechanistic interpretability assumptions about what residual stream decoding can reveal, with no free parameters or invented entities introduced in the abstract.

axioms (1)

domain assumption Residual stream activations can be linearly decoded to detect presence or absence of intermediate variables
Invoked when distinguishing compositional from direct mechanisms via detectable signatures.

pith-pipeline@v0.9.0 · 5738 in / 1113 out tokens · 62547 ms · 2026-05-18T11:00:37.343619+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We identify two processing mechanisms: one which solves tasks compositionally, computing f(x) along the way to g(f(x)), and one which solves them directly, without any detectable signature of the intermediate variable f(x).

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 5 internal anchors

[1]

Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, and Amir Globerson

URL https://dl.acm .org/doi/10.1145/3132847.3132921. Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, and Amir Globerson. Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries.Empirical Methods in Natural Language Processing (EMNLP),

work page doi:10.1145/3132847.3132921
[2]

Susan Carey

URLhttps://arxiv.org/pdf/2406.12775. Susan Carey. Précis of The Origin of Concepts.Behavioral and Brain Sciences (BBS),

work page arXiv
[3]

Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, et al

URL https://doi.org/10.1017/S0140525X10000919. Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, et al. Faith and Fate: Limits of Transformers on Compositionality.Neural Information Processing Systems (NeurIPS),

work page doi:10.1017/s0140525x10000919
[4]

Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Josh Tenenbaum

URL https://arxiv.org/pdf/2305.18654. Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Josh Tenenbaum. Library Learning for Neurally-Guided Bayesian Program Induction.Neural Information Processing Systems (NeurIPS),

work page arXiv
[5]

Kevin Ellis, Lionel Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lore Anaya Pozo, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum

URLhttps://dl.acm.org/doi/10.5555/3327757.3327878. Kevin Ellis, Lionel Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lore Anaya Pozo, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning.Philosophical Transactions of the Royal Society A,

work page doi:10.5555/3327757.3327878
[6]

Katrin Erk

URLhttps://arxiv.org/pdf/2006.08381. Katrin Erk. Vector Space Models of Word Meaning and Phrase Meaning: A Survey.Language and Linguistics Compass,

work page arXiv 2006
[7]

Jonathan St BT Evans

URLhttps://doi.org/10.1002/lnco.362. Jonathan St BT Evans. Logic and Human Reasoning: An Assessment of the Deduction Paradigm. Psychological Bulletin,

work page doi:10.1002/lnco.362
[8]

Jerry A Fodor.The Language of Thought

URLhttps://doi.org/10.1037/0033-2909.128.6.978. Jerry A Fodor.The Language of Thought. Harvard University Press,

work page doi:10.1037/0033-2909.128.6.978
[9]

Daniel Furrer, Marc van Zee, Nathan Scales, and Nathanael Schärli

URLhttps://doi.org/10.1016/0010-0277(88)90031-5. Daniel Furrer, Marc van Zee, Nathan Scales, and Nathanael Schärli. Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures,

work page doi:10.1016/0010-0277(88)90031-5
[10]

Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg

URL https://arxiv.org/ pdf/2007.08970. Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the V ocabulary Space.Empirical Methods in Natural Language Processing (EMNLP),

work page arXiv 2007
[11]

Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, and Mor Geva

URLhttps://arxiv.org/pdf/2203.14680. Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, and Mor Geva. Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models.International Conference on Machine Learning (ICML),

work page arXiv
[12]

Patchscopes: A unifying framework for inspecting hidden representations of language models

URLhttps://arxiv.org/pdf/2401.06102. Thomas L. Griffiths, Brenden M. Lake, R. Thomas McCoy, Ellie Pavlick, and Taylor W. Webb. Whither symbols in the era of advanced neural networks?,

work page arXiv
[13]

10 Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, and David Bau

URL https://arxiv.org/pdf/ 2508.05776. 10 Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, and David Bau. Linearity of Relation Decoding in Transformer Language Models.International Conference on Learning Representations (ICLR),

work page arXiv
[14]

Linearity of relation decoding in transformer language models

URL https: //arxiv.org/pdf/2308.09124. John Hewitt and Christopher D. Manning. A Structural Probe for Finding Syntax in Word Represen- tations.North American Chapter of the Association for Computational Linguistics (NAACL),

work page arXiv
[15]

URL https://arxiv.org/pdf/1908.08351. IMDb. IMDb Non-Commercial Datasets,

work page arXiv 1908
[16]

CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

URL https: //arxiv.org/pdf/1612.06890. Daniel Kahneman and Amos Tversky. Subjective probability: A judgment of representativeness. Cognitive Psychology,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

URLhttps://doi.org/10.1016/0010-0285(72)90016-3. Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, and Olivier Bousquet. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data...

work page doi:10.1016/0010-0285(72)90016-3
[18]

Najoung Kim and Tal Linzen

URLhttps://arxiv.org/pdf/1912.09713. Najoung Kim and Tal Linzen. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation.Empirical Methods in Natural Language Processing (EMNLP),

work page arXiv 1912
[19]

Brenden Lake and Marco Baroni

URL https://arxiv.org/pdf/2010.05465. Brenden Lake and Marco Baroni. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks.International Conference on Machine Learning (ICML),

work page arXiv 2010
[20]

Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

URLhttps://arxiv.org/pdf/1711.00350. Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building Machines That Learn and Think Like People.Behavioral and Brain Sciences (BBS),

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Building Machines That Learn and Think Like People

URL https://arxiv.org/pdf/1604.00289. Andrew K Lampinen, Ishita Dasgupta, Stephanie C Y Chan, Hannah R Sheahan, Antonia Creswell, Dharshan Kumaran, James L McClelland, and Felix Hill. Language models, like humans, show content effects on reasoning tasks.PNAS Nexus,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Michael A

URL https://arxiv.org/pdf/2301.10884. Michael A. Lepori, Michael C. Mozer, and Asma Ghandeharioun. Racing Thoughts: Explaining Contextualization Errors in Large Language Models.North American Chapter of the Association for Computational Linguistics (NAACL),

work page arXiv
[23]

Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, and Ying Wei

URLhttps://arxiv.org/pdf/2410.02102. Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, and Ying Wei. Understanding and Patching Compositional Reasoning in LLMs.Findings of the Association for Computational Linguistics (ACL),

work page arXiv
[24]

Meng Lu, Ruochen Zhang, Carsten Eickhoff, and Ellie Pavlick

URLhttps://arxiv.org/pdf/2402.14328. Meng Lu, Ruochen Zhang, Carsten Eickhoff, and Ellie Pavlick. Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline,

work page arXiv
[25]

URLhttps://arxiv.org/pdf/2212.07796. R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, and Thomas L. Griffiths. Embers of autoregression show how large language models are shaped by the problem they are trained to solve.Proceedings of the National Academy of Sciences (PNAS),

work page arXiv
[26]

Kate McCurdy, Paul Soulos, Paul Smolensky, Roland Fernandez, and Jianfeng Gao

URL https://arxiv.org/pdf/2309.13638. Kate McCurdy, Paul Soulos, Paul Smolensky, Roland Fernandez, and Jianfeng Gao. Toward Compo- sitional Behavior in Neural Models: A Survey of Current Views.Empirical Methods in Natural Language Processing (EMNLP),

work page arXiv
[27]

URL https://aclanthology.org/2024.emnlp-main. 524.pdf. Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. Language Models Implement Simple Word2Vec- style Vector Arithmetic.North American Chapter of the Association for Computational Linguistics (NAACL),

work page 2024
[28]

Language models implement simple word2vec-style vector arithmetic, 2024

URLhttps://arxiv.org/pdf/2305.16130. Jack Merullo, Noah A. Smith, Sarah Wiegreffe, and Yanai Elazar. On Linear Representations and Pretraining Data Frequency in Language Models.International Conference on Learning Representations (ICLR),

work page arXiv
[29]

Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig

URLhttps://arxiv.org/pdf/2504.12459. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations.North American Chapter of the Association for Computational Linguistics (NAACL),

work page arXiv
[30]

Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network

URLhttps://arxiv.org/pdf/1701.02962. nostalgebraist. interpreting GPT: the logit lens,

work page internal anchor Pith review Pith/arXiv arXiv
[31]

Measuring and Narrowing the Compositionality Gap in Language Models

URLhttps://arxiv.org/pdf/2210.03350. Jake Quilty-Dunn, Nicolas Porot, and Eric Mandelbaum. The best game in town: The reemergence of the language-of-thought hypothesis across the cognitive sciences.Behavioral and Brain Sciences (BBS),

work page internal anchor Pith review Pith/arXiv arXiv
[32]

Jacob Russin, Sam Whitman McGrath, Danielle J

URLhttps://doi.org/10.1017/S0140525X22002849. Jacob Russin, Sam Whitman McGrath, Danielle J. Williams, and Lotem Elber-Dorozko. From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks,

work page doi:10.1017/s0140525x22002849
[33]

Jacob Russin, Ellie Pavlick, and Michael J Frank

URL https://arxiv.org/pdf/2405.15164. Jacob Russin, Ellie Pavlick, and Michael J Frank. Parallel trade-offs in human cognition and neural networks: The dynamic interplay between in-context and in-weight learning.Proceedings of the National Academy of Sciences (PNAS),

work page arXiv
[34]

Yuval Shalev, Amir Feder, and Ariel Goldstein

URLhttps://www.pnas.org/doi/10.1073/pna s.2510270122. Yuval Shalev, Amir Feder, and Ariel Goldstein. Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning,

work page doi:10.1073/pna
[35]

Zoltán Gendler Szabó

URLhttps://arxiv.org/pdf/2406.13858. Zoltán Gendler Szabó. Compositionality. InThe Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Fall

work page arXiv
[36]

Bert rediscovers the classical nlp pipeline.arXiv preprint arXiv:1905.05950,

URL https://arxiv.org/pdf/1905.05950. 12 Jörg Tiedemann and Santhosh Thottingal. OPUS-MT – Building open translation services for the World.European Association for Machine Translation (EAMT),

work page arXiv 1905
[37]

Eric Todd, Millicent L

URLhttps://aclantho logy.org/2020.eamt-1.61.pdf. Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function Vectors in Large Language Models.International Conference on Learning Representations (ICLR),

work page 2020
[38]

Function vectors in large language models

URLhttps://arxiv.org/pdf/2310.15213. Ivan Vegner, Sydelle de Souza, Valentin Forch, Martha Lewis, and Leonidas A. A. Doumas. Be- havioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey.Asso- ciation for Computational Linguistics (ACL),

work page arXiv
[39]

Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber

URL https://arxiv.org/pdf/2506.04461. Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber. Investigating Gender Bias in Language Models Using Causal Mediation Analysis. Neural Information Processing Systems (NeurIPS),

work page arXiv
[40]

Denny Vrandeˇci´c and Markus Krötzsch

URL https://arxiv.org/pdf/2004 .12265. Denny Vrandeˇci´c and Markus Krötzsch. Wikidata: a free collaborative knowledgebase.Communica- tions of the ACM (CACM),

work page 2004
[41]

Boshi Wang, Xiang Yue, Yu Su, and Huan Sun

URLhttps://dl.acm.org/doi/10.1145/2629489. Boshi Wang, Xiang Yue, Yu Su, and Huan Sun. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization.Neural Information Processing Systems (NeurIPS),

work page doi:10.1145/2629489
[42]

Martin Wattenberg and Fernanda B

URLhttps://arxiv.org/pdf/2405.15071. Martin Wattenberg and Fernanda B. Viégas. Relational Composition in Neural Networks: A Survey and Call to Action.Mechanistic Interpretability Workshop at ICML,

work page arXiv
[43]

org/pdf/2407.14662

URL https://arxiv. org/pdf/2407.14662. Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, and Sebastian Riedel. Do Large Language Models Latently Perform Multi-Hop Reasoning?Association for Computational Linguistics (ACL), 2024a. URLhttps://arxiv.org/pdf/2402.16837. Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel, and Mor Geva. Do Large ...

work page arXiv
[44]

Hangyeol Yu, Myeongho Jeong, Jamin Shin, Hyeongdon Moon, Juneyoung Park, and Seungtaek Choi

URLhttps://arxiv.org/pdf/2505.14530. Hangyeol Yu, Myeongho Jeong, Jamin Shin, Hyeongdon Moon, Juneyoung Park, and Seungtaek Choi. Towards Zero-Shot Functional Compositionality of Language Models,

work page arXiv
[45]

Zeping Yu, Yonatan Belinkov, and Sophia Ananiadou

URL https: //arxiv.org/pdf/2303.03103. Zeping Yu, Yonatan Belinkov, and Sophia Ananiadou. Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models.Empirical Methods in Natural Language Processing (EMNLP),

work page arXiv
[46]

Yanli Zhou, Brenden M

URLhttps://arxiv.org/pdf/2502.10835. Yanli Zhou, Brenden M. Lake, and Adina Williams. Compositional learning of functions in humans and machines.Annual Meeting of the Cognitive Science Society (CogSci),

work page arXiv
[47]

13 A DATACREATION Table 2: List of our tasks, showing x, g(x), and f(g(x)) for the random example in Table

URL https: //arxiv.org/pdf/2403.12201. 13 A DATACREATION Table 2: List of our tasks, showing x, g(x), and f(g(x)) for the random example in Table

work page arXiv
[48]

sitelinks

Tasks with neitherg(x)norf(g(x))are omitted.f(g(x))only shown if distinct fromg(f(x)). f g x g(x) f(g(x)) Word→Antonym English→Spanish bogus false — Word→Antonym English→German philosophical philosophisch — Word→Antonym English→French excessive excessive — x + 10 2x 699 1398 1408 x + 100 2x 922 1844 1944 x mod 20 2x 891 1782 2 Word→Numeric 2x one hundred ...

work page 1944
[49]

Heartbreak Hotel

and [ modern] in this example. 14 Representational analysisIn Sec. 4, we analyze the model’s computation from x→g(f(x)) . Consider the query for “Heartbreak Hotel” → “1935”: i.e. “... Q: Heartbreak Hotel \n A: ”. Here, multiple tokens ([ Heart][break][ Hotel][ \][n][ A:][ ] ) are central to the computation. We therefore analyze all residual streams for th...

work page 1935
[50]

token identity prompt

instead of logit lens. This method is proposed as one that is more closely aligned with a language model’s computation than other methods (such as logit lens). We would specifically like to use this method to decode a representation into vocabulary-space logits. To do so, we prompt a model with the “token identity prompt”, in which random tokens are repea...

work page 2020