arxiv: 2605.08405 · v1 · submitted 2026-05-08 · 💻 cs.AI · cs.LG

Recognition: no theorem link

Belief or Circuitry? Causal Evidence for In-Context Graph Learning

Katharine Kowalyshyn , Timothy Duggan , Daniel Little , Michael C Hughes

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:47 UTC · model grok-4.3

classification 💻 cs.AI cs.LG

keywords in-context learninggraph learningactivation patchingstructure inferencedual mechanismsresidual streamcausal interventionLLM internals

0 comments

The pith

Large language models maintain competing graph topologies in orthogonal internal subspaces while allowing causal edits to their graph-family preference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether in-context learning in LLMs relies on local token copying or on inferring latent global structure. It uses a random-walk task on two families of graphs whose correct next-step predictions differ only in overall topology. Internal analysis shows that at mixed inputs the model keeps both topologies active at once in separate principal directions. Activation patching from one graph's run into another's shifts the output preference almost completely, and linear steering vectors move predictions in the expected direction while matched controls do not. The combined evidence favors two mechanisms running together rather than a single pattern-matching process.

Core claim

At intermediate mixtures of two graph families, PCA of the residual stream recovers two orthogonal subspaces each carrying one graph's global topology. Late-layer residual-stream patching from a clean run of one graph into a run of the other transfers nearly the full graph preference. Linear steering along the graph-difference direction shifts next-token predictions toward the intended family, while norm-matched random vectors and label-shuffled controls produce no such shift. These observations are taken to indicate that genuine structure-inference circuits and local-induction circuits operate in parallel.

What carries the argument

The toy graph random-walk task that forces a choice between global topology and local transition copying, combined with PCA subspace separation and residual-stream activation patching plus graph-difference steering.

If this is right

The model keeps multiple global structures active simultaneously instead of committing to one local pattern.
The graph-family signal is localized enough in late layers that targeted patching can rewrite which structure the model follows.
Linear directions in activation space specifically encode graph identity rather than generic output statistics.
In-context learning on structured tasks combines explicit structure inference with induction circuits rather than relying on one alone.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same orthogonal-encoding pattern may appear in other tasks that require tracking latent rules, such as planning or causal reasoning.
If the dual system persists with scale, training methods could be designed to strengthen or suppress one circuit without erasing the other.
The ability to steer graph preference with a single vector suggests that similar low-dimensional interventions could edit other inferred structures at inference time.

Load-bearing premise

The toy graph task and its mixture ratios isolate global topology tracking from local copying without other confounding signals that could produce the observed orthogonal subspaces or patching effects.

What would settle it

An experiment showing that the two graph topologies no longer occupy orthogonal PCA directions at any mixture ratio, or that patching the same late-layer activations fails to transfer graph preference above chance.

Figures

Figures reproduced from arXiv: 2605.08405 by Daniel Little, Katharine Kowalyshyn, Michael C Hughes, Timothy Duggan.

**Figure 1.** Figure 1: Snapshot of PCA embeddings for all tokens across value of mixture ratio ρ (columns). Top: Blue edges of grid overlaid when grid has non-zero mixture weight. Bottom: Same plots with red edges of ring overlaid. Full ladder of ρ values in Appendix B 2022; Turner et al., 2023; Zou et al., 2023). For clean graph Gc, corrupt graph Gr, and final token xt, we score each logit vector with a graph-family contrast ∆(… view at source ↗

**Figure 2.** Figure 2: Full suite of PCA analysis on all mixture ratios ρ, with first row showing grid reconstruction edges in blue and second row with ring edges in red [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Representative class-mean PCA snapshots for the neutral disjoint vocabulary condition. These plots provide additional visual context for the layer-26 residual-stream geometry discussed in the main text. 7 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Long-context activation-patching diagnostics. The selected-layer run shows late-layer recovery and held-out graph-neighbor logits becoming positive in late layers [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Additional steering diagnostics. Steering is strongest for target-to-source grid-minus-ring additions in late layers. The reverse direction is weaker, and held-out edge-specific logits remain difficult to move with a single global vector [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

read the original abstract

How do LLMs learn in-context? Is it by pattern-matching recent tokens, or by inferring latent structure? We probe this question using a toy graph random-walk across two competing graph structures. This task's answer is, in principle, decidable: either the model tracks global topology, or it copies local transitions. We present two lines of evidence that neither account alone is sufficient. First, reconstructing the internal representation structure via PCA reveals that at intermediate mixture ratios, both graph topologies are encoded in orthogonal principal subspaces simultaneously. This pattern is difficult to reconcile with purely local transition copying. Second, residual-stream activation patching and graph-difference steering causally intervene on this graph-family signal: late-layer patching almost fully transfers the clean graph preference, while linear steering moves predictions in the intended direction and fails under norm-matched and label-shuffled controls. Taken together, our findings are most consistent with a dual-mechanism account in which genuine structure inference and induction circuits operate in parallel.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives some causal evidence via orthogonal subspaces and patching that this toy graph task engages both local copying and global structure tracking at once.

read the letter

The main takeaway is that on this random-walk mixture of two graphs, PCA shows both topologies encoded in separate directions at intermediate ratios, and late-layer patching plus steering can shift the model's graph preference while controls fail. That combination makes a pure local-transition story less likely to explain the full behavior. The task design is the real strength here: by forcing a choice between copying recent steps and tracking overall topology, it turns an otherwise vague question into something testable with standard tools. The orthogonal components at 50-50 mixtures are hard to square with simple copying, and the patching transfers the preference cleanly enough to add causal weight. Controls like norm-matching and label-shuffling are in place and the effects are directional, so the interventions feel grounded rather than post-hoc. No obvious circularity or unfalsifiable fitting in the reported results. The soft spots are mostly about reach. The setup uses only two small fixed graphs and simple walks, so the dual-mechanism reading may not travel to larger models or messier real-world tasks where structure is less explicit. Calling it 'genuine structure inference' alongside circuits is interpretive; the data show parallel signals but do not rule out a single sophisticated circuit that happens to represent both. Generalization across model scales or to non-graph tasks is not tested. This is useful for people who work on in-context learning mechanisms and want concrete examples of how to apply patching and representation analysis to separate hypotheses. Readers who already follow interpretability papers on LLMs will get the most out of the details. It deserves a serious referee because the question is decidable, the evidence is direct, and the controls are reasonable, even if the scope stays narrow and the interpretation can be pushed on.

Referee Report

1 major / 2 minor

Summary. The paper investigates whether LLMs perform in-context learning via local pattern-matching or by inferring latent global structure, using a controlled toy task of random walks on mixtures of two competing graph topologies. It reports that PCA of internal activations reveals orthogonal subspaces simultaneously encoding both graph families at intermediate mixture ratios, and that causal interventions (late-layer residual-stream patching and linear graph-difference steering) transfer or shift graph preferences in a manner inconsistent with pure local copying, supporting a dual-mechanism account of parallel structure inference and induction circuits.

Significance. If the central results hold, the work supplies direct causal evidence distinguishing structure inference from local transition copying in in-context learning, with falsifiable predictions via interventions. The combination of representation analysis and targeted patching/steering on a decidable task is a strength, though the toy scope limits immediate generalization to naturalistic settings.

major comments (1)

[Task definition and §3] The separation between global topology tracking and local transition copying (abstract and task definition) is load-bearing for the dual-mechanism claim, yet the manuscript does not report explicit controls for walk length, node-degree distributions, or higher-order transition statistics that could leak global information into local copying at intermediate mixtures.

minor comments (2)

[Figure 2] Figure 2 (PCA plots): axis labels and variance-explained percentages are missing; adding these would clarify the scale of the orthogonal subspaces.
[§4.3] The steering experiments report directional effects but do not include the full set of norm-matched and label-shuffled control statistics in the main text (only referenced); moving a summary table to the main paper would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive review and recommendation for minor revision. We address the major comment below and have incorporated additional controls to strengthen the separation between global and local mechanisms.

read point-by-point responses

Referee: [Task definition and §3] The separation between global topology tracking and local transition copying (abstract and task definition) is load-bearing for the dual-mechanism claim, yet the manuscript does not report explicit controls for walk length, node-degree distributions, or higher-order transition statistics that could leak global information into local copying at intermediate mixtures.

Authors: We agree that explicit verification of these controls is valuable for substantiating the load-bearing distinction. The task construction in §3 fixes walk lengths across conditions and balances node degrees by design while ensuring that the two graph families produce identical first-order transition probabilities at the intermediate mixture ratios under study. To directly address the referee's point, the revised manuscript adds Appendix C, which reports the full degree distributions, walk-length histograms, and higher-order (2- and 3-step) transition matrices for each mixture ratio. These statistics confirm that no distinguishing global information is available to a purely local copying mechanism. The reported PCA subspaces and causal intervention results remain unchanged under these controls, reinforcing that both structure inference and induction circuits operate in parallel. revision: partial

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents an empirical investigation using a toy random-walk task on graphs, PCA-based representation analysis, activation patching, and steering interventions. No mathematical derivations, fitted parameters renamed as predictions, or self-referential definitions appear in the provided text. The central dual-mechanism claim rests on observable model behaviors (orthogonal subspaces at intermediate mixtures, causal transfer via patching) that are externally verifiable through the described experiments rather than reducing to inputs by construction. Self-citations, if present, are not load-bearing for the core findings.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that the random-walk task distinguishes mechanisms and that interventions isolate graph signals; no free parameters or invented entities are introduced.

axioms (2)

domain assumption The graph random-walk task answer is in principle decidable between global topology and local transitions.
Stated in abstract as the basis for the probe.
domain assumption PCA subspaces and patching effects reflect distinct mechanisms rather than artifacts of the mixture ratios.
Implicit in interpreting orthogonal principal subspaces and causal transfers.

pith-pipeline@v0.9.0 · 5470 in / 1244 out tokens · 33899 ms · 2026-05-12T00:47:32.517196+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 4 internal anchors

[1]

arXiv preprint arXiv:2404.09932 , year=

URL https://arxiv.org/ abs/2404.09932. Arditi, A. In-context learning of representations can be explained by induction circuits. LessWrong, March

work page arXiv
[2]

Crosspost of ICLR 2026 Blogpost Track post

URL https://www.lesswrong.com/posts/ qtdSzLpQ8BXv6YANd/in-context-learnin g-of-representations-can-be-explained -by. Crosspost of ICLR 2026 Blogpost Track post. Bigelow, E., Wurgaft, D., Wang, Y ., Goodman, N., Ullman, T., Tanaka, H., and Lubana, E. S. Belief dynamics reveal the dual nature of in-context learning and activation steer- ing,

work page 2026
[3]

Dong, Q., Li, L., Dai, D., Zheng, C., Ma, J., Li, R., Xia, H., Xu, J., Wu, Z., Chang, B., Sun, X., Li, L., and Sui, Z

URL https://arxiv.org/abs/2511 .00617. Dong, Q., Li, L., Dai, D., Zheng, C., Ma, J., Li, R., Xia, H., Xu, J., Wu, Z., Chang, B., Sun, X., Li, L., and Sui, Z. A survey on in-context learning. In Al-Onaizan, Y ., Bansal, M., and Chen, Y .-N. (eds.),Proceedings of the 2024 Conference on Empirical Methods in Natural Lan- guage Processing, pp. 1107–1128, Miami...

work page 2024
[4]

A survey on in-context learning

Association for Computational Linguis- tics. doi: 10.18653/v1/2024.emnlp-main.64. URL https://aclanthology.org/2024.emnlp-m ain.64/. Kim, C. Task schema and binding: A double dissociation study of in-context learning,

work page doi:10.18653/v1/2024.emnlp-main.64 2024
[5]

URL https://ar xiv.org/abs/2512.17325. Lin, B. Y ., Ravichander, A., Lu, X., Dziri, N., Sclar, M., Chandu, K., Bhagavatula, C., and Choi, Y . The unlocking spell on base llms: Rethinking alignment via in-context learning,

work page arXiv
[6]

Meng, K., Bau, D., Andonian, A., and Belinkov, Y

URL https://arxiv.org/abs/ 2312.01552. Meng, K., Bau, D., Andonian, A., and Belinkov, Y . Locating and editing factual associations in GPT. InAdvances in Neural Information Processing Systems, volume 35, pp. 17359–17372,

work page arXiv
[7]

neurips.cc/paper_files/paper/2022/ha sh/6f1d43d5a82a37e89b0665b33bf3a182-A bstract-Conference.html

URL https://proceedings. neurips.cc/paper_files/paper/2022/ha sh/6f1d43d5a82a37e89b0665b33bf3a182-A bstract-Conference.html. Nanda, N. and Bloom, J. Transformerlens. https://gi thub.com/TransformerLensOrg/Transfor merLens,

work page 2022
[8]

URL https://ar xiv.org/abs/2209.11895. Park, C. F., Lee, A., Lubana, E. S., Yang, Y ., Okawa, M., Nishi, K., Wattenberg, M., and Tanaka, H. Iclr: In-context learning of representations,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Polyakov, A

URL https://ar xiv.org/abs/2501.00070. Polyakov, A. and Kuznetsov, D. Involuntary in-context learning: Exploiting few-shot pattern completion to by- pass safety alignment in gpt-5.4,

work page arXiv
[10]

Involuntary In-Context Learning: Exploiting Few-Shot Pattern Completion to Bypass Safety Alignment in GPT-5.4

URL https: //arxiv.org/abs/2604.19461. Qin, C., Zhang, A., Chen, C., Dagar, A., and Ye, W. In- context learning with iterative demonstration selection,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Turner, A

URL https://arxiv.or g/abs/2404.07129. Turner, A. M., Thiergart, L., Udell, D., Leech, G., Mini, U., and MacDiarmid, M. Activation addition: Steering language models without optimization,

work page arXiv
[12]

URL ht tps://arxiv.org/abs/2308.10248. Xie, S. M., Raghunathan, A., Liang, P., and Ma, T. An explanation of in-context learning as implicit bayesian inference,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

An explanation of in-context learning as implicit bayesian inference.arXiv preprint arXiv:2111.02080, 2021

URL https://arxiv.org/abs/ 2111.02080. Zou, A., Phan, L., Chen, S., Campbell, J., Guo, P., Ren, R., Pan, A., Yin, X., Mazeika, M., Dombrowski, A.-K., Goel, S., Li, N., Byun, M. J., Wang, Z., Mallen, A., Basart, S., Koyejo, S., Song, D., Fredrikson, M., Kolter, J. Z., and Hendrycks, D. Representation engineering: A top-down approach to AI transparency,

work page arXiv
[14]

Representation Engineering: A Top-Down Approach to AI Transparency

URL https://arxiv.org/abs/2310.01405. 5 Belief or Circuitry? A. Behavioral Model Details A.1. Baseline Derivation Let S∈ {0,1} be a latent binary variable: S= 1 indicates the LLM has adopted the in-context graph structure; S= 0 indicates reliance on pretrained associations. The log-prior overSisb∈R, withb <0encoding initial skepticism toward the arbitrary...

work page internal anchor Pith review Pith/arXiv arXiv
[15]

The selected-layer run shows late-layer recovery and held-out graph-neighbor logits becoming positive in late layers

Figure 4.Long-context activation-patching diagnostics. The selected-layer run shows late-layer recovery and held-out graph-neighbor logits becoming positive in late layers. Table 1.Selected activation-patching aggregates. Effects are mean±standard error over prompt pairs. Context LayernNormalized effect Held-out contrast 1400 16 2000.598±0.013−2.814 1400 ...

work page 2000