pith. machine review for the scientific record. sign in

arxiv: 2602.07794 · v3 · submitted 2026-02-08 · 💻 cs.CL · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Emergent Structured Representations Support Flexible In-Context Inference in Large Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-16 06:48 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords large language modelsin-context inferenceconceptual subspacecausal mediationstructured representationsattention headsinternal processingemergent behaviors
0
0 comments X

The pith

Large language models construct a persistent conceptual subspace in middle layers that causally drives in-context inference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines the internal computations of LLMs during in-context inference on varied tasks. It identifies a conceptual subspace that forms in middle-to-late layers and keeps the same structure no matter the input context. Causal mediation analyses establish that this subspace is not incidental but actively shapes the model's output predictions. The construction process unfolds layer by layer, with early attention heads gathering contextual details to form the subspace and later layers drawing on it to produce answers. If accurate, the finding indicates that LLMs achieve flexible task adaptation through these internal structured representations rather than surface-level pattern recall.

Core claim

LLMs dynamically build a conceptual subspace in middle-to-late layers whose representational structure remains stable across contexts. Attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace. Later layers then leverage this subspace to generate predictions. Causal mediation analyses confirm the subspace plays a direct functional role in inference rather than arising as a byproduct.

What carries the argument

The conceptual subspace, a persistent structured representation in middle-to-late layers that encodes contextual information and supports inference.

If this is right

  • Early-to-middle attention heads integrate contextual cues to build the subspace.
  • Later layers depend on the subspace to generate final predictions.
  • Altering activity in the subspace changes model outputs on inference tasks.
  • LLMs achieve flexible adaptation by constructing and reusing these latent structures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same subspace mechanism may support other emergent behaviors such as multi-step reasoning.
  • Focusing interpretability tools on this subspace could enable more targeted editing of model behavior.
  • The layer-wise progression offers a concrete target for testing whether smaller models exhibit the same internal structure.
  • Extending the analysis to non-text modalities could show whether the subspace construction is modality-specific.

Load-bearing premise

The chosen causal mediation interventions isolate the subspace's contribution without creating artifacts or overlooking other pathways in the model's computation.

What would settle it

Targeted disruption of the identified subspace through ablation or activation patching produces no measurable change in the model's accuracy on in-context inference tasks.

Figures

Figures reproduced from arXiv: 2602.07794 by Ningyu Xu, Qi Zhang, Xipeng Qiu, Xuanjing Huang.

Figure 1
Figure 1. Figure 1: Illustration of how an emergent conceptual subspace supports in-context inference in LLMs. Given a small set of description– word demonstrations and a query description, a Transformer-based LLM integrates contextual information across layers to form a shared conceptual subspace in the middle-to-late layers. Hidden states can be projected into this subspace, where the relational structure among representati… view at source ↗
Figure 2
Figure 2. Figure 2: A shared conceptual subspace emerges in the middle to late layers of Llama-3.1 70B. a, Layer-wise similarity of hidden states, measured as subspace overlap (mean squared cosine of principal angles) between SVD subspaces explaining 95% variance; averaged over five runs with 24 demonstrations. Axes index layers. b, Number of principal components (PCs) needed to explain 95% variance across layers, increasing … view at source ↗
Figure 3
Figure 3. Figure 3: The conceptual subspace causally mediates model inference. a–c, Activation patching with N demonstrations under three corruption conditions: description (a), label (b) and query (c). Patching the conceptual subspace (blue) is compared against a random￾subspace baseline (red). The x-axis indexes layers, and the y-axis shows the normalized causal indirect effect (CIE). d–e, Subspace necessity and sufficiency… view at source ↗
Figure 4
Figure 4. Figure 4: Attention patterns of heads with statistically significant causal indirect effects (CIEs) identified under description (a), label (b), and query corruption (c). Within each layer, attention patterns were averaged over heads with statistically significant CIEs. Layers with no significant heads were assigned zero values. The x-axis indicates layer index, and the y-axis shows attended token spans grouped by s… view at source ↗
Figure 5
Figure 5. Figure 5: Contribution of attention heads to the conceptual subspace in Llama-3.1 70B. a, Contribution strength (α) of attention heads to the conceptual subspace. b, Directional alignment (cosine similarity) between attention head outputs and the conceptual subspace. Bordered cells highlight attention heads with statistically significant causal indirect effects (CIEs). 4.1. Methods We used activation patching to ide… view at source ↗
Figure 6
Figure 6. Figure 6: Illustration of causal mediation analysis under three corruption conditions: description (a), label (b), and query (c) corruption. In the source context, the prompt contains correct description–word pairs as demonstrations, together with a query description. In the target context, the corrupted field is replaced with mismatched content of the same token length. We then patch the conceptual subspace from th… view at source ↗
Figure 7
Figure 7. Figure 7: Performance of various LLMs on the reverse dictionary task, evaluated on the THINGS data and measured through exact match accuracy. The models were presented with N demonstrations sampled from the training set and evaluated on an independent test set. Shaded areas denote 95% confidence intervals, calculated from 10,000 resamples across five independent runs. 38 46 54 62 70 78 Layer 38 46 54 62 70 78 a Dim … view at source ↗
Figure 8
Figure 8. Figure 8: Identifying shared subspaces across layers in Llama-3.1 70B at varying dimensionalities using GCCA. Titles indicate the retained subspace dimensionality. All results are computed with 24 in-context demonstrations and averaged over five runs. a–f, Alignment of representational geometry between GCCA-derived subspace across selected layers, measured by RSA, with both axes indexing layers. g–l, Overlap between… view at source ↗
Figure 9
Figure 9. Figure 9: Causal intervention results for Llama-3.1 70B with GCCA-derived subspaces of varying dimensionality. All results are obtained with 24 in-context demonstrations. Results for the GCCA-derived subspace (blue) are compared with dimension-matched random-subspace baselines (red). Black lines denote the permutation-selected dimensionality used in the main text. a–c, Activation patching under three corruption cond… view at source ↗
Figure 10
Figure 10. Figure 10: Emergence of conceptual subspaces in Llama-3.1 70B across four in-context inference tasks. a–d, Layer-wise similarity of hidden states, measured as subspace overlap (mean squared cosine of principal angles) between SVD subspaces explaining 95% variance. Results are averaged over five runs, each with 24 demonstrations. Axes index layers. e–h, Number of principal components (PCs) needed to explain 95% of th… view at source ↗
Figure 11
Figure 11. Figure 11: Causal intervention results for Llama-3.1 70B on two in-context inference tasks (Antonym and Country–Capital). a–c, g–i, Activation patching with 24 demonstrations under three corruption conditions: description (a, g), label (b, h) and query (c, i). The conceptual subspace (blue) is compared against a random-subspace baseline (red). The x-axis indexes layers, and the y-axis shows the normalized causal ind… view at source ↗
Figure 12
Figure 12. Figure 12: Causal intervention results for Llama-3.1 70B on two in-context inference tasks (Landmark–Country and National Parks). a–c, g–i, Activation patching with 24 demonstrations under three corruption conditions: description (a, g), label (b, h) and query (c, i). The conceptual subspace (blue) is compared against a random-subspace baseline (red). The x-axis indexes layers, and the y-axis shows the normalized ca… view at source ↗
Figure 13
Figure 13. Figure 13: Emergence of conceptual subspaces in Llama-3.1 8B (a–e) and Llama-3 8B (f–j). a, f, Layer-wise similarity of hidden states, measured as subspace overlap (mean squared cosine of principal angles) between SVD subspaces explaining 95% variance; results are averaged over five runs with 24 demonstrations. Axes index layers. b, g, Number of principal components (PCs) required to explain 95% of the variance acro… view at source ↗
Figure 14
Figure 14. Figure 14: Emergence of conceptual subspaces in Qwen2.5 32B (a–e), Qwen2.5 7B (f–j), and Qwen2.5 3B (k–o). a, f, k, Layer-wise similarity of hidden states, measured as subspace overlap (mean squared cosine of principal angles) between SVD subspaces explaining 95% variance; results are averaged over five runs with 24 demonstrations. Axes index layers. b, g, l, Number of principal components (PCs) required to explain … view at source ↗
Figure 15
Figure 15. Figure 15: Causal intervention results for Llama-3.1 8B (a–f) and Llama-3 8B (g–l). a–c, g–i, Activation patching with N demonstrations under three corruption conditions: description (a, g), label (b, h) and query (c, i). Patching the conceptual subspace (blue) is compared against a random-subspace baseline (red). The x-axis indexes layers, and the y-axis shows the normalized causal indirect effect (CIE). d–e, j–k, … view at source ↗
Figure 16
Figure 16. Figure 16: Causal intervention results for Qwen2.5 32B (a–f) and Qwen2.5 7B (g–l). a–c, g–i, Activation patching with N demonstrations under three corruption conditions: description (a, g), label (b, h) and query (c, i). Patching the conceptual subspace (blue) is compared against a random-subspace baseline (red). The x-axis indexes layers, and the y-axis shows the normalized causal indirect effect (CIE). d–e, j–k, S… view at source ↗
Figure 17
Figure 17. Figure 17: Causal intervention results for Qwen2.5 3B. a–c, Activation patching with N demonstrations under three corruption conditions: description (a), label (b) and query (c). Patching the conceptual subspace (blue) is compared against a random-subspace baseline (red). The x-axis indexes layers, and the y-axis shows the normalized causal indirect effect (CIE). d–e, Subspace necessity and sufficiency tested by abl… view at source ↗
Figure 18
Figure 18. Figure 18: Causal indirect effects (CIEs) of attention heads across Llama-3.1 70B (a–c), Llama-3.1 8B (d–f), and Llama-3 8B (g–i). Within each row, columns represent CIEs under description, label, and query corruption, respectively. The x axis indexes model layers, and the y-axis denotes attention head indices. Bordered cells highlight attention heads with statistically significant effects under the respective corru… view at source ↗
Figure 19
Figure 19. Figure 19: Causal indirect effects (CIEs) of attention heads across Qwen2.5 32B (a–c), Qwen2.5 7B (d–f), and Qwen2.5 3B (g–i). Within each row, columns represent CIEs under description, label, and query corruption, respectively. The x axis indexes model layers, and the y-axis denotes attention head indices. Bordered cells highlight attention heads with statistically significant effects under the respective corruptio… view at source ↗
Figure 20
Figure 20. Figure 20: Attention patterns of heads with statistically significant causal indirect effects (CIEs) in Llama-3.1 70B (a–c), Llama-3.1 8B (d–f), Llama-3 8B (g–i), Qwen2.5 32B (j–l), Qwen2.5 7B (m–o), and Qwen2.5 3B (p–r). Within each model (row), columns show the attention patterns of heads identified under description, label, and query corruption, respectively. Within each layer, attention patterns were averaged ov… view at source ↗
Figure 21
Figure 21. Figure 21: Attention patterns of heads with statistically significant causal indirect effects (CIEs) in Llama-3.1 70B (a–c), Llama-3.1 8B (d–f), Llama-3 8B (g–i), Qwen2.5 32B (j–l), Qwen2.5 7B (m–o), and Qwen2.5 3B (p–r). Within each model (row), columns show the attention patterns of heads identified under description, label, and query corruption, respectively. The x-axis denotes attention heads by their (layer, he… view at source ↗
Figure 22
Figure 22. Figure 22: Contribution of attention heads to the conceptual subspace across Llama-3.1 8B (a–b), Llama-3 8B (c–d), Qwen2.5 32B (e–f), Qwen2.5 7B (g–h), and Qwen2.5 3B (i–j). For each model, the left column reports the contribution strength (α) of each head to the conceptual subspace, and the right column shows the directional alignment (cosine similarity) between head outputs and the subspace. Bordered cells highlig… view at source ↗
Figure 23
Figure 23. Figure 23: Contribution–alignment relationship for attention heads across Llama-3.1 70B (a), Llama-3.1 8B (b), Llama-3 8B (c), Qwen2.5 32B (d), Qwen2.5 7B (e), and Qwen2.5 3B (f). Each point corresponds to an attention head, with the x-axis denoting its contribution strength to the conceptual subspace (α) and the y-axis showing its directional alignment (cosine similarity) with that subspace. Points are colored by l… view at source ↗
read the original abstract

Large language models (LLMs) exhibit emergent behaviors suggestive of human-like reasoning. While recent work has identified structured conceptual representations within these models, it remains unclear whether they functionally rely on such representations for reasoning. Here we investigate the internal processing of LLMs during in-context inference across diverse tasks. Our results reveal a conceptual subspace emerging in middle to late layers, whose representational structure persists across contexts. Using causal mediation analyses, we demonstrate that this subspace is not merely an epiphenomenon but is functionally central to model predictions, establishing its causal role in inference. We further identify a layer-wise progression where attention heads in early-to-middle layers integrate contextual cues to construct and refine the subspace, which is subsequently leveraged by later layers to generate predictions. Together, these findings provide evidence that LLMs dynamically construct and use structured latent representations in context for inference, offering insights into the computational processes underlying flexible adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that LLMs dynamically construct a persistent conceptual subspace in middle-to-late layers during in-context inference across tasks. Attention heads in early-to-middle layers integrate contextual cues to build and refine this subspace, which later layers then leverage for predictions. Causal mediation analyses are used to argue that the subspace is not epiphenomenal but plays a functionally causal role in model outputs.

Significance. If the causal mediation results hold under rigorous controls for parallel pathways, the work would offer a mechanistic account of how structured latent representations enable flexible in-context adaptation in transformers, strengthening links between interpretability findings and emergent reasoning behaviors.

major comments (2)
  1. [Abstract and Methods] Abstract and Methods: The causal mediation claim that the subspace is 'functionally central to model predictions' rests on interventions whose ability to isolate the subspace is not demonstrated. In a transformer, the subspace is built via attention integration in middle layers; without explicit controls that also ablate or orthogonalize residual streams and unpatched attention heads, measured effects on logits could be carried by confounding pathways rather than the target subspace.
  2. [Results] Results (layer-wise progression): The description of the progression from contextual integration to subspace readout lacks quantitative metrics (e.g., intervention effect sizes, ablation baselines, or statistical controls) showing that the identified subspace accounts for the bulk of the predictive signal once parallel routes are blocked.
minor comments (1)
  1. [Abstract] Abstract: The term 'conceptual subspace' is introduced without a precise operational definition (e.g., how it is extracted from activations or what dimensionality it occupies), which could be clarified for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive feedback on our manuscript. We address each major comment below, agreeing that additional controls and quantitative metrics will strengthen the causal claims. We propose revisions accordingly.

read point-by-point responses
  1. Referee: [Abstract and Methods] The causal mediation claim that the subspace is 'functionally central to model predictions' rests on interventions whose ability to isolate the subspace is not demonstrated. In a transformer, the subspace is built via attention integration in middle layers; without explicit controls that also ablate or orthogonalize residual streams and unpatched attention heads, measured effects on logits could be carried by confounding pathways rather than the target subspace.

    Authors: We agree that rigorous isolation from parallel pathways is essential to support the causal claim. Our mediation interventions target subspace activations while preserving other components, but we acknowledge the potential for confounding. In the revision, we will add explicit controls that ablate or orthogonalize residual streams and unpatched attention heads to demonstrate that measured logit effects are attributable to the subspace rather than alternative routes. revision: yes

  2. Referee: [Results] The description of the progression from contextual integration to subspace readout lacks quantitative metrics (e.g., intervention effect sizes, ablation baselines, or statistical controls) showing that the identified subspace accounts for the bulk of the predictive signal once parallel routes are blocked.

    Authors: We will strengthen this section with quantitative support. The revised manuscript will include intervention effect sizes, ablation baselines (comparing subspace interventions to full-model and parallel-pathway-ablated performance), and statistical controls to show the subspace accounts for the majority of the predictive signal after blocking parallel routes. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical causal mediation stands independent of inputs

full rationale

The paper's core argument rests on experimental identification of a conceptual subspace via activation analysis followed by causal mediation interventions to test functional necessity. No equations, fitted parameters, or derivations are presented that reduce to the inputs by construction. The subspace is located observationally in middle-to-late layers and its causal role is assessed through interventions on model activations; these steps are falsifiable against held-out data and do not rely on self-referential definitions, self-citation chains, or renaming of known results. The provided abstract and skeptic notes contain no load-bearing self-citations or ansatz smuggling that would collapse the claim into its own premises.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The claim rests on the validity of causal mediation analysis applied to transformer activations and on the assumption that the identified subspace captures the relevant computational variable.

axioms (1)
  • domain assumption Causal mediation analysis can isolate the functional role of specific subspaces in transformer forward passes
    Invoked when the paper concludes the subspace is causally central based on intervention results.
invented entities (1)
  • conceptual subspace no independent evidence
    purpose: Structured latent representation that persists across contexts and supports inference
    Emergent from model activations; no independent falsifiable prediction outside the reported analyses is given.

pith-pipeline@v0.9.0 · 5456 in / 1227 out tokens · 22744 ms · 2026-05-16T06:48:09.981347+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 5 internal anchors

  1. [1]

    A., Zhang, S., and Arora, R

    Benton, A., Khayrallah, H., Gujral, B., Reisinger, D. A., Zhang, S., and Arora, R. Deep generalized canonical correlation analysis. In Augenstein, I., Gella, S., Ruder, S., Kann, K., Can, B., Welbl, J., Conneau, A., Ren, X., and Rei, M. (eds.),Proceedings of the 4th Workshop on Representation Learning for NLP (RepL4NLP-2019), pp. 1–6, Florence, Italy, August

  2. [2]

    Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., Neelakantan, A., Shyam, P., Sastry, G., Askell, A., Agarwal, S., Herbert-V oss, A., Krueger, G., Henighan, T., Child, R., Ramesh, A., Ziegler, D., Wu, J., Winter, C., Hesse, C., Chen, M., Sigler, E., Litwin, M., Gray, S., Chess, B., Clark, J., Berner, C., McCandlish, S., Radford, A....

  3. [3]

    Sparks of Artificial General Intelligence: Early experiments with GPT-4

    Bubeck, S., Chandrasekaran, V ., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., Lee, P., Lee, Y . T., Li, Y ., Lundberg, S., et al. Sparks of artificial general intel- ligence: Early experiments with gpt-4.arXiv preprint arXiv:2303.12712,

  4. [4]

    Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y ., Chen, A., Conerly, T., et al

    ISSN 2045-2322. Elhage, N., Nanda, N., Olsson, C., Henighan, T., Joseph, N., Mann, B., Askell, A., Bai, Y ., Chen, A., Conerly, T., et al. A mathematical framework for transformer circuits. Transformer Circuits Thread, 1(1):12,

  5. [5]

    Do neural language representations learn physical commonsense?arXiv preprint arXiv:1908.02899,

    Forbes, M., Holtzman, A., and Choi, Y . Do neural language representations learn physical commonsense?arXiv preprint arXiv:1908.02899,

  6. [6]

    Dis- secting recall of factual associations in auto-regressive language models

    Geva, M., Bastings, J., Filippova, K., and Globerson, A. Dis- secting recall of factual associations in auto-regressive language models. In Bouamor, H., Pino, J., and Bali, K. (eds.),Proceedings of the 2023 Conference on Em- pirical Methods in Natural Language Processing, pp. 12216–12235, Singapore, December

  7. [7]

    The Llama 3 Herd of Models

    Association for Computational Linguistics. Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Vaughan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

  8. [8]

    L., Lake, B

    10 Emergent Structured Representations Support In-Context Inference in LLMs Griffiths, T. L., Lake, B. M., McCoy, R. T., Pavlick, E., and Webb, T. W. Whither symbols in the era of advanced neu- ral networks?arXiv preprint arXiv:2508.05776,

  9. [9]

    ISSN 2662-8457. Hahn, M. and Goyal, N. A theory of emergent in-context learning as implicit structure induction.arXiv preprint arXiv:2303.07971,

  10. [10]

    In-context learning creates task vectors

    Hendel, R., Geva, M., and Globerson, A. In-context learning creates task vectors. In Bouamor, H., Pino, J., and Bali, K. (eds.),Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 9318–9333, Singapore, December

  11. [11]

    K., Chan, S

    Lampinen, A. K., Chan, S. C., Singh, A. K., and Shanahan, M. The broader spectrum of in-context learning.arXiv preprint arXiv:2412.03782, 2024a. Lampinen, A. K., Dasgupta, I., Chan, S. C. Y ., Sheahan, H. R., Creswell, A., Kumaran, D., McClelland, J. L., and Hill, F. Language models, like humans, show content effects on reasoning tasks.PNAS Nexus, 3(7):pg...

  12. [12]

    Rethinking the role of demonstrations: What makes in-context learning work? In Goldberg, Y ., Kozareva, Z., and Zhang, Y

    Min, S., Lyu, X., Holtzman, A., Artetxe, M., Lewis, M., Hajishirzi, H., and Zettlemoyer, L. Rethinking the role of demonstrations: What makes in-context learning work? In Goldberg, Y ., Kozareva, Z., and Zhang, Y . (eds.),Pro- ceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, pp. 11048–11064, Abu Dhabi, United Arab Emira...

  13. [13]

    In-context Learning and Induction Heads

    Association for Computational Linguistics. Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., Mann, B., Askell, A., Bai, Y ., Chen, A., et al. In-context learning and induction heads.arXiv preprint arXiv:2209.11895,

  14. [14]

    Opiełka, G., Rosenbusch, H., and Stevenson, C. E. Ana- logical reasoning inside large language models: Con- cept vectors and the limits of abstraction.arXiv preprint arXiv:2503.03666,

  15. [15]

    ISSN 1364-6613. Qwen, :, Yang, A., Yang, B., Zhang, B., Hui, B., Zheng, B., Yu, B., Li, C., Liu, D., Huang, F., Wei, H., Lin, H., Yang, J., Tu, J., Zhang, J., Yang, J., Yang, J., Zhou, J., Lin, J., Dang, K., Lu, K., Bao, K., Yang, K., Yu, L., Li, M., Xue, M., Zhang, P., Zhu, Q., Men, R., Lin, R., Li, T., Tang, T., Xia, T., Ren, X., Ren, X., Fan, Y ., Su, ...

  16. [16]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    Wolf, T., Debut, L., Sanh, V ., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., et al. Huggingface’s transformers: State-of-the-art natural language processing.arXiv preprint arXiv:1910.03771,

  17. [17]

    Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks

    Wu, Z., Qiu, L., Ross, A., Aky ¨urek, E., Chen, B., Wang, B., Kim, N., Andreas, J., and Kim, Y . Reasoning or reciting? exploring the capabilities and limitations of language models through counterfactual tasks. In Duh, K., Gomez, H., and Bethard, S. (eds.),Proceedings of the 2024 Conference of the North American Chapter of the Association for Computation...

  18. [18]

    On the tip of the tongue: Analyzing conceptual representation in large language models with reverse-dictionary probe

    Xu, N., Zhang, Q., Zhang, M., Qian, P., and Huang, X. On the tip of the tongue: Analyzing conceptual representation in large language models with reverse-dictionary probe. arXiv preprint arXiv:2402.14404,

  19. [19]

    13 Emergent Structured Representations Support In-Context Inference in LLMs A. Data, Code, and Materials Availability This work uses publicly available datasets and materials (Hebart et al., 2019; Todd et al., 2024; Nguyen et al., 2017; Hernandez et al., 2024). Code for reproducing all experiments is available at https://github.com/ningyuxu/ llm_structure...

  20. [20]

    #Layers” denotes the total number of layers in each model, including the embedding layer (indexed as layer 0). “Sel. Layers (0-indexed)

    (available at https://osf.io/ jum2f/), which contains 1,854 concrete, nameable object concepts selected to be representative of everyday objects commonly used in American English. Each concept is paired with its WordNet synset ID, definitional descriptions, and multiple associated images, providing a suitable testbed for analyzing conceptual representatio...

  21. [21]

    Series Model #Layers Sel

    (https://huggingface.co). Series Model #Layers Sel. Layers (0-indexed) Dim. Subspace Dim. Llama 3.1 meta-llama/Llama-3.1-70B meta-llama/Llama-3.1-8B 81 33 38–79 16–31 8192 4096 1180 [1176–1181] 1250 [1242–1254] Llama 3meta-llama/Meta-Llama-3-8B33 16–31 4096 1245 [1244–1249] Qwen2.5 Qwen/Qwen2.5-32B Qwen/Qwen2.5-7B Qwen/Qwen2.5-3B 65 29 37 50–63 22–28 32–3...

  22. [22]

    (2024) (https://github.com/ericwtodd/function_vectors)

    ( https://osf.io/jum2f/); data for the remaining four tasks were taken from Todd et al. (2024) (https://github.com/ericwtodd/function_vectors). Task Examples Source Concept Inference a small very thin pancake⇒crepe a small guitar having four strings⇒ukulele dried grape⇒raisin Hebart et al. (2019) Antonym true⇒false difficult⇒easy proceed⇒halt Nguyen et al...