pith. the verified trust layer for science. sign in

arxiv: 2510.01685 · v2 · submitted 2025-10-02 · 💻 cs.CL · cs.AI

How Do Language Models Compose Functions?

Pith reviewed 2026-05-18 11:00 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords language modelscompositionalitymechanistic interpretabilitytwo-hop tasksembedding geometryresidual streamfactual recall
0
0 comments X p. Extension

The pith

Language models solve two-hop tasks either by computing the intermediate result or by direct mapping, with embedding geometry deciding which path is taken.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether language models compose functions like g(f(x)) by actually computing the first function then the second, or by some other route. It confirms that models still show a compositionality gap: knowing how to compute f(x) and g(z) does not guarantee they can compute g(f(x)). By reading out residual stream activations with linear probes, the authors identify two distinct mechanisms: one that computes the intermediate f(x) on the path to the answer and one that reaches the answer without any detectable trace of that intermediate. The geometry of the embedding space, specifically whether the task appears as a simple translation from input to final output, strongly predicts which mechanism the model uses.

Core claim

Modern LLMs solve two-hop factual recall tasks expressed as g(f(x)) using either a compositional mechanism that computes the intermediate f(x) along the way or a direct mechanism with no detectable signature of f(x), and embedding space geometry determines which mechanism is employed.

What carries the argument

Linear decoding of residual stream activations to detect or rule out computation of the intermediate variable f(x) during two-hop factual recall.

If this is right

  • Tasks whose embedding-space representation is a direct translation from x to g(f(x)) tend to be solved by the direct mechanism.
  • The compositionality gap arises when models can reach the answer without computing the intermediate step.
  • Models can employ different mechanisms for different tasks depending on how those tasks sit in embedding space.
  • Idiomatic factual recall favors direct solving when geometry permits it.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Changing embedding geometry through training data or fine-tuning might shift models toward more compositional processing on novel inputs.
  • Direct mechanisms could be more brittle when the model encounters slight variations in how facts are phrased.
  • The same split between compositional and direct routes may appear in other multi-step reasoning problems such as multi-hop question answering or arithmetic.
  • More powerful detection methods beyond linear probes could reveal hidden compositional structure even in cases currently classified as direct.

Load-bearing premise

Linear probes on residual activations can detect whether the intermediate f(x) was computed, even if it is represented in some other non-linear or distributed form.

What would settle it

A two-hop task where linear probes show no trace of f(x) yet a more complete analysis reveals a non-linear representation of f(x) that the model actually uses to reach the correct output.

Figures

Figures reproduced from arXiv: 2510.01685 by Apoorv Khandelwal, Ellie Pavlick.

Figure 1
Figure 1. Figure 1: Compositionality gap for Llama 3 (3B) on our tasks. Red bar represents examples for which [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Compositionality gap (dashed purple line; lower is better) of various models aggregated [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a–b) Processing signatures aggregated over examples (across all tasks) in which Llama 3 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: (a) Strong correlation across tasks between presence of intermediate variables (heuristic [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: We illustrate the monotonically diminishing improvements to the compositionality gap [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Aggregate processing signatures for each of our tasks, in which Llama 3 (3B) correctly [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Aggregate processing signatures (using the token identity patchscope) for each of our [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Aggregate processing signatures (using the token identity patchscope) for each of our tasks, [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Correlation across tasks (r 2 = 0.35) for embedding space task linearity and presence of intermediate variables. Analogous to Fig. 4a, using the intermediate variable metric from the token identity patchscope. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Causal effects on predicted values after patching from [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Causal effects on predicted values after patching from [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Relationships between presence of intermediate variables and embedding space linearity [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗
read the original abstract

While large language models (LLMs) appear to be increasingly capable of solving compositional tasks, it is an open question whether they do so using compositional mechanisms. In this work, we investigate how feedforward LLMs solve two-hop factual recall tasks, which can be expressed compositionally as $g(f(x))$. We first confirm that modern LLMs continue to suffer from the "compositionality gap", i.e. their ability to compute both $z = f(x)$ and $y = g(z)$ does not entail their ability to compute the composition $y = g(f(x))$. We then decode residual stream representations and identify two processing mechanisms: one which solves tasks $\textit{compositionally}$, computing $f(x)$ along the way to $g(f(x))$, and one which solves them $\textit{directly}$, without any detectable signature of the intermediate variable $f(x)$. Finally, we find that embedding space geometry is strongly related to which mechanism is employed, where the idiomatic mechanism is dominant when tasks are represented by translations from $x$ to $g(f(x))$ in the embedding spaces. We fully release our data and code at: https://github.com/apoorvkh/composing-functions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper investigates whether LLMs solve two-hop factual recall tasks of the form g(f(x)) using compositional mechanisms. It first verifies the compositionality gap in modern models. It then uses linear probes on residual stream activations to identify two mechanisms: compositional (where f(x) is computed and linearly decodable en route to g(f(x))) versus direct (no detectable signature of the intermediate f(x)). It reports that embedding-space geometry strongly predicts which mechanism is used, with idiomatic translations from x to g(f(x)) favoring the direct route. Data and code are released.

Significance. If the reported mechanistic distinction and geometry correlation hold under more robust tests, the work would advance mechanistic interpretability by providing an observational taxonomy of how LLMs handle composition and a potential geometric predictor of internal strategy. The public release of code and data is a clear strength that enables follow-up work.

major comments (2)
  1. [§4.2] §4.2 (Probe Analysis): The partition into compositional versus direct mechanisms rests on linear probes failing to recover f(x) in the direct class. This does not rule out non-linear or distributed representations of the intermediate variable, which could still reflect internal composition without triggering the chosen probes and would collapse the reported dichotomy.
  2. [§5] §5 (Geometry Correlation): The claimed strong relation between embedding geometry and mechanism choice lacks controls for confounds such as task frequency or lexical overlap; without these, it is unclear whether geometry is causal or merely correlated with the probe outcomes.
minor comments (2)
  1. [§3] Notation for the residual stream positions and layer indices is introduced without a consolidated table, making it hard to track which activations are probed at each step.
  2. [Figure 2] Figure 2 caption does not specify the exact statistical test or multiple-comparison correction used for the reported significance levels.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the major comments point by point below, making revisions where we agree that additional analysis or clarification is warranted.

read point-by-point responses
  1. Referee: [§4.2] §4.2 (Probe Analysis): The partition into compositional versus direct mechanisms rests on linear probes failing to recover f(x) in the direct class. This does not rule out non-linear or distributed representations of the intermediate variable, which could still reflect internal composition without triggering the chosen probes and would collapse the reported dichotomy.

    Authors: We thank the referee for pointing this out. Our classification into compositional and direct mechanisms is based on the detectability of the intermediate f(x) using linear probes on the residual stream activations, following standard practices in mechanistic interpretability. While this does not preclude the existence of non-linear representations, the absence of a linear signature is a meaningful distinction for our taxonomy. In the revised manuscript, we will expand §4.2 to explicitly discuss this limitation and note that our 'direct' category means no linearly decodable intermediate. We will also add results from non-linear probes (e.g., small MLPs) in the appendix to test for more complex representations, though these may not alter the main conclusions. revision: partial

  2. Referee: [§5] §5 (Geometry Correlation): The claimed strong relation between embedding geometry and mechanism choice lacks controls for confounds such as task frequency or lexical overlap; without these, it is unclear whether geometry is causal or merely correlated with the probe outcomes.

    Authors: We agree that establishing the relationship between embedding geometry and mechanism choice would benefit from controls for potential confounds. In the original analysis, we focused on the geometric properties as an observational correlate. For the revision, we will introduce controls by subsampling tasks to balance for frequency and lexical overlap, and re-evaluate the correlation strength. This will be added to §5, along with a discussion on whether geometry appears to be a robust predictor independent of these factors. We believe this will strengthen the claim without overclaiming causality. revision: yes

Circularity Check

0 steps flagged

No significant circularity in observational activation analysis

full rationale

The paper's central findings rest on empirical experiments: confirming the compositionality gap via task performance, then using linear probes on residual stream activations to classify tasks as compositional (detectable f(x)) or direct (no detectable signature), followed by correlation with embedding geometry. These steps are observational classifications and measurements rather than derivations that reduce to inputs by construction. No equations or self-referential definitions appear; no fitted parameters are relabeled as predictions; no load-bearing self-citations or uniqueness theorems are invoked. The analysis is self-contained against external benchmarks such as probe-based decoding experiments and does not rely on prior author work to force its taxonomy or conclusions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard mechanistic interpretability assumptions about what residual stream decoding can reveal, with no free parameters or invented entities introduced in the abstract.

axioms (1)
  • domain assumption Residual stream activations can be linearly decoded to detect presence or absence of intermediate variables
    Invoked when distinguishing compositional from direct mechanisms via detectable signatures.

pith-pipeline@v0.9.0 · 5738 in / 1113 out tokens · 62547 ms · 2026-05-18T11:00:37.343619+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 5 internal anchors

  1. [1]

    Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, and Amir Globerson

    URL https://dl.acm .org/doi/10.1145/3132847.3132921. Eden Biran, Daniela Gottesman, Sohee Yang, Mor Geva, and Amir Globerson. Hopping Too Late: Exploring the Limitations of Large Language Models on Multi-Hop Queries.Empirical Methods in Natural Language Processing (EMNLP),

  2. [2]

    Susan Carey

    URLhttps://arxiv.org/pdf/2406.12775. Susan Carey. Précis of The Origin of Concepts.Behavioral and Brain Sciences (BBS),

  3. [3]

    Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, et al

    URL https://doi.org/10.1017/S0140525X10000919. Nouha Dziri, Ximing Lu, Melanie Sclar, Xiang Lorraine Li, Liwei Jiang, Bill Yuchen Lin, Sean Welleck, Peter West, Chandra Bhagavatula, Ronan Le Bras, et al. Faith and Fate: Limits of Transformers on Compositionality.Neural Information Processing Systems (NeurIPS),

  4. [4]

    Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Josh Tenenbaum

    URL https://arxiv.org/pdf/2305.18654. Kevin Ellis, Lucas Morales, Mathias Sablé-Meyer, Armando Solar-Lezama, and Josh Tenenbaum. Library Learning for Neurally-Guided Bayesian Program Induction.Neural Information Processing Systems (NeurIPS),

  5. [5]

    Kevin Ellis, Lionel Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lore Anaya Pozo, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum

    URLhttps://dl.acm.org/doi/10.5555/3327757.3327878. Kevin Ellis, Lionel Wong, Maxwell Nye, Mathias Sable-Meyer, Luc Cary, Lore Anaya Pozo, Luke Hewitt, Armando Solar-Lezama, and Joshua B Tenenbaum. DreamCoder: Growing generalizable, interpretable knowledge with wake-sleep Bayesian program learning.Philosophical Transactions of the Royal Society A,

  6. [6]

    Katrin Erk

    URLhttps://arxiv.org/pdf/2006.08381. Katrin Erk. Vector Space Models of Word Meaning and Phrase Meaning: A Survey.Language and Linguistics Compass,

  7. [7]

    Jonathan St BT Evans

    URLhttps://doi.org/10.1002/lnco.362. Jonathan St BT Evans. Logic and Human Reasoning: An Assessment of the Deduction Paradigm. Psychological Bulletin,

  8. [8]

    Jerry A Fodor.The Language of Thought

    URLhttps://doi.org/10.1037/0033-2909.128.6.978. Jerry A Fodor.The Language of Thought. Harvard University Press,

  9. [9]

    Daniel Furrer, Marc van Zee, Nathan Scales, and Nathanael Schärli

    URLhttps://doi.org/10.1016/0010-0277(88)90031-5. Daniel Furrer, Marc van Zee, Nathan Scales, and Nathanael Schärli. Compositional Generalization in Semantic Parsing: Pre-training vs. Specialized Architectures,

  10. [10]

    Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg

    URL https://arxiv.org/ pdf/2007.08970. Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the V ocabulary Space.Empirical Methods in Natural Language Processing (EMNLP),

  11. [11]

    Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, and Mor Geva

    URLhttps://arxiv.org/pdf/2203.14680. Asma Ghandeharioun, Avi Caciularu, Adam Pearce, Lucas Dixon, and Mor Geva. Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models.International Conference on Machine Learning (ICML),

  12. [12]

    Patchscopes: A unifying framework for inspecting hidden representations of language models

    URLhttps://arxiv.org/pdf/2401.06102. Thomas L. Griffiths, Brenden M. Lake, R. Thomas McCoy, Ellie Pavlick, and Taylor W. Webb. Whither symbols in the era of advanced neural networks?,

  13. [13]

    10 Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, and David Bau

    URL https://arxiv.org/pdf/ 2508.05776. 10 Evan Hernandez, Arnab Sen Sharma, Tal Haklay, Kevin Meng, Martin Wattenberg, Jacob Andreas, Yonatan Belinkov, and David Bau. Linearity of Relation Decoding in Transformer Language Models.International Conference on Learning Representations (ICLR),

  14. [14]

    Linearity of relation decoding in transformer language models

    URL https: //arxiv.org/pdf/2308.09124. John Hewitt and Christopher D. Manning. A Structural Probe for Finding Syntax in Word Represen- tations.North American Chapter of the Association for Computational Linguistics (NAACL),

  15. [15]

    URL https://arxiv.org/pdf/1908.08351. IMDb. IMDb Non-Commercial Datasets,

  16. [16]

    CLEVR: A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

    URL https: //arxiv.org/pdf/1612.06890. Daniel Kahneman and Amos Tversky. Subjective probability: A judgment of representativeness. Cognitive Psychology,

  17. [17]

    URLhttps://doi.org/10.1016/0010-0285(72)90016-3. Daniel Keysers, Nathanael Schärli, Nathan Scales, Hylke Buisman, Daniel Furrer, Sergii Kashubin, Nikola Momchev, Danila Sinopalnikov, Lukasz Stafiniak, Tibor Tihon, Dmitry Tsarkov, Xiao Wang, Marc van Zee, and Olivier Bousquet. Measuring Compositional Generalization: A Comprehensive Method on Realistic Data...

  18. [18]

    Najoung Kim and Tal Linzen

    URLhttps://arxiv.org/pdf/1912.09713. Najoung Kim and Tal Linzen. COGS: A Compositional Generalization Challenge Based on Semantic Interpretation.Empirical Methods in Natural Language Processing (EMNLP),

  19. [19]

    Brenden Lake and Marco Baroni

    URL https://arxiv.org/pdf/2010.05465. Brenden Lake and Marco Baroni. Generalization without Systematicity: On the Compositional Skills of Sequence-to-Sequence Recurrent Networks.International Conference on Machine Learning (ICML),

  20. [20]

    Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks

    URLhttps://arxiv.org/pdf/1711.00350. Brenden M Lake, Tomer D Ullman, Joshua B Tenenbaum, and Samuel J Gershman. Building Machines That Learn and Think Like People.Behavioral and Brain Sciences (BBS),

  21. [21]

    Building Machines That Learn and Think Like People

    URL https://arxiv.org/pdf/1604.00289. Andrew K Lampinen, Ishita Dasgupta, Stephanie C Y Chan, Hannah R Sheahan, Antonia Creswell, Dharshan Kumaran, James L McClelland, and Felix Hill. Language models, like humans, show content effects on reasoning tasks.PNAS Nexus,

  22. [22]

    Michael A

    URL https://arxiv.org/pdf/2301.10884. Michael A. Lepori, Michael C. Mozer, and Asma Ghandeharioun. Racing Thoughts: Explaining Contextualization Errors in Large Language Models.North American Chapter of the Association for Computational Linguistics (NAACL),

  23. [23]

    Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, and Ying Wei

    URLhttps://arxiv.org/pdf/2410.02102. Zhaoyi Li, Gangwei Jiang, Hong Xie, Linqi Song, Defu Lian, and Ying Wei. Understanding and Patching Compositional Reasoning in LLMs.Findings of the Association for Computational Linguistics (ACL),

  24. [24]

    Meng Lu, Ruochen Zhang, Carsten Eickhoff, and Ellie Pavlick

    URLhttps://arxiv.org/pdf/2402.14328. Meng Lu, Ruochen Zhang, Carsten Eickhoff, and Ellie Pavlick. Paths Not Taken: Understanding and Mending the Multilingual Factual Recall Pipeline,

  25. [25]

    URLhttps://arxiv.org/pdf/2212.07796. R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, and Thomas L. Griffiths. Embers of autoregression show how large language models are shaped by the problem they are trained to solve.Proceedings of the National Academy of Sciences (PNAS),

  26. [26]

    Kate McCurdy, Paul Soulos, Paul Smolensky, Roland Fernandez, and Jianfeng Gao

    URL https://arxiv.org/pdf/2309.13638. Kate McCurdy, Paul Soulos, Paul Smolensky, Roland Fernandez, and Jianfeng Gao. Toward Compo- sitional Behavior in Neural Models: A Survey of Current Views.Empirical Methods in Natural Language Processing (EMNLP),

  27. [27]

    URL https://aclanthology.org/2024.emnlp-main. 524.pdf. Jack Merullo, Carsten Eickhoff, and Ellie Pavlick. Language Models Implement Simple Word2Vec- style Vector Arithmetic.North American Chapter of the Association for Computational Linguistics (NAACL),

  28. [28]

    Language models implement simple word2vec-style vector arithmetic, 2024

    URLhttps://arxiv.org/pdf/2305.16130. Jack Merullo, Noah A. Smith, Sarah Wiegreffe, and Yanai Elazar. On Linear Representations and Pretraining Data Frequency in Language Models.International Conference on Learning Representations (ICLR),

  29. [29]

    Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig

    URLhttps://arxiv.org/pdf/2504.12459. Tomas Mikolov, Wen-tau Yih, and Geoffrey Zweig. Linguistic Regularities in Continuous Space Word Representations.North American Chapter of the Association for Computational Linguistics (NAACL),

  30. [30]

    Distinguishing Antonyms and Synonyms in a Pattern-based Neural Network

    URLhttps://arxiv.org/pdf/1701.02962. nostalgebraist. interpreting GPT: the logit lens,

  31. [31]

    Measuring and Narrowing the Compositionality Gap in Language Models

    URLhttps://arxiv.org/pdf/2210.03350. Jake Quilty-Dunn, Nicolas Porot, and Eric Mandelbaum. The best game in town: The reemergence of the language-of-thought hypothesis across the cognitive sciences.Behavioral and Brain Sciences (BBS),

  32. [32]

    Jacob Russin, Sam Whitman McGrath, Danielle J

    URLhttps://doi.org/10.1017/S0140525X22002849. Jacob Russin, Sam Whitman McGrath, Danielle J. Williams, and Lotem Elber-Dorozko. From Frege to chatGPT: Compositionality in language, cognition, and deep neural networks,

  33. [33]

    Jacob Russin, Ellie Pavlick, and Michael J Frank

    URL https://arxiv.org/pdf/2405.15164. Jacob Russin, Ellie Pavlick, and Michael J Frank. Parallel trade-offs in human cognition and neural networks: The dynamic interplay between in-context and in-weight learning.Proceedings of the National Academy of Sciences (PNAS),

  34. [34]

    Yuval Shalev, Amir Feder, and Ariel Goldstein

    URLhttps://www.pnas.org/doi/10.1073/pna s.2510270122. Yuval Shalev, Amir Feder, and Ariel Goldstein. Distributional reasoning in LLMs: Parallel reasoning processes in multi-hop reasoning,

  35. [35]

    Zoltán Gendler Szabó

    URLhttps://arxiv.org/pdf/2406.13858. Zoltán Gendler Szabó. Compositionality. InThe Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stanford University, Fall

  36. [36]

    Bert rediscovers the classical nlp pipeline.arXiv preprint arXiv:1905.05950,

    URL https://arxiv.org/pdf/1905.05950. 12 Jörg Tiedemann and Santhosh Thottingal. OPUS-MT – Building open translation services for the World.European Association for Machine Translation (EAMT),

  37. [37]

    Eric Todd, Millicent L

    URLhttps://aclantho logy.org/2020.eamt-1.61.pdf. Eric Todd, Millicent L. Li, Arnab Sen Sharma, Aaron Mueller, Byron C. Wallace, and David Bau. Function Vectors in Large Language Models.International Conference on Learning Representations (ICLR),

  38. [38]

    Function vectors in large language models

    URLhttps://arxiv.org/pdf/2310.15213. Ivan Vegner, Sydelle de Souza, Valentin Forch, Martha Lewis, and Leonidas A. A. Doumas. Be- havioural vs. Representational Systematicity in End-to-End Models: An Opinionated Survey.Asso- ciation for Computational Linguistics (ACL),

  39. [39]

    Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber

    URL https://arxiv.org/pdf/2506.04461. Jesse Vig, Sebastian Gehrmann, Yonatan Belinkov, Sharon Qian, Daniel Nevo, Yaron Singer, and Stuart Shieber. Investigating Gender Bias in Language Models Using Causal Mediation Analysis. Neural Information Processing Systems (NeurIPS),

  40. [40]

    Denny Vrandeˇci´c and Markus Krötzsch

    URL https://arxiv.org/pdf/2004 .12265. Denny Vrandeˇci´c and Markus Krötzsch. Wikidata: a free collaborative knowledgebase.Communica- tions of the ACM (CACM),

  41. [41]

    Boshi Wang, Xiang Yue, Yu Su, and Huan Sun

    URLhttps://dl.acm.org/doi/10.1145/2629489. Boshi Wang, Xiang Yue, Yu Su, and Huan Sun. Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization.Neural Information Processing Systems (NeurIPS),

  42. [42]

    Martin Wattenberg and Fernanda B

    URLhttps://arxiv.org/pdf/2405.15071. Martin Wattenberg and Fernanda B. Viégas. Relational Composition in Neural Networks: A Survey and Call to Action.Mechanistic Interpretability Workshop at ICML,

  43. [43]

    org/pdf/2407.14662

    URL https://arxiv. org/pdf/2407.14662. Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva, and Sebastian Riedel. Do Large Language Models Latently Perform Multi-Hop Reasoning?Association for Computational Linguistics (ACL), 2024a. URLhttps://arxiv.org/pdf/2402.16837. Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel, and Mor Geva. Do Large ...

  44. [44]

    Hangyeol Yu, Myeongho Jeong, Jamin Shin, Hyeongdon Moon, Juneyoung Park, and Seungtaek Choi

    URLhttps://arxiv.org/pdf/2505.14530. Hangyeol Yu, Myeongho Jeong, Jamin Shin, Hyeongdon Moon, Juneyoung Park, and Seungtaek Choi. Towards Zero-Shot Functional Compositionality of Language Models,

  45. [45]

    Zeping Yu, Yonatan Belinkov, and Sophia Ananiadou

    URL https: //arxiv.org/pdf/2303.03103. Zeping Yu, Yonatan Belinkov, and Sophia Ananiadou. Back Attention: Understanding and Enhancing Multi-Hop Reasoning in Large Language Models.Empirical Methods in Natural Language Processing (EMNLP),

  46. [46]

    Yanli Zhou, Brenden M

    URLhttps://arxiv.org/pdf/2502.10835. Yanli Zhou, Brenden M. Lake, and Adina Williams. Compositional learning of functions in humans and machines.Annual Meeting of the Cognitive Science Society (CogSci),

  47. [47]

    13 A DATACREATION Table 2: List of our tasks, showing x, g(x), and f(g(x)) for the random example in Table

    URL https: //arxiv.org/pdf/2403.12201. 13 A DATACREATION Table 2: List of our tasks, showing x, g(x), and f(g(x)) for the random example in Table

  48. [48]

    sitelinks

    Tasks with neitherg(x)norf(g(x))are omitted.f(g(x))only shown if distinct fromg(f(x)). f g x g(x) f(g(x)) Word→Antonym English→Spanish bogus false — Word→Antonym English→German philosophical philosophisch — Word→Antonym English→French excessive excessive — x + 10 2x 699 1398 1408 x + 100 2x 922 1844 1944 x mod 20 2x 891 1782 2 Word→Numeric 2x one hundred ...

  49. [49]

    Heartbreak Hotel

    and [ modern] in this example. 14 Representational analysisIn Sec. 4, we analyze the model’s computation from x→g(f(x)) . Consider the query for “Heartbreak Hotel” → “1935”: i.e. “... Q: Heartbreak Hotel \n A: ”. Here, multiple tokens ([ Heart][break][ Hotel][ \][n][ A:][ ] ) are central to the computation. We therefore analyze all residual streams for th...

  50. [50]

    token identity prompt

    instead of logit lens. This method is proposed as one that is more closely aligned with a language model’s computation than other methods (such as logit lens). We would specifically like to use this method to decode a representation into vocabulary-space logits. To do so, we prompt a model with the “token identity prompt”, in which random tokens are repea...