pith. sign in

arxiv: 2605.21049 · v1 · pith:R2SEQGRFnew · submitted 2026-05-20 · 💻 cs.CL

Cross-lingual robustness of LLM-brain alignment and its computational roots

Pith reviewed 2026-05-21 05:15 UTC · model grok-4.3

classification 💻 cs.CL
keywords cross-lingual alignmentLLM-brain encodingnaturalistic story listeningsurprisalintrinsic dimensionalitylexical-semantic correspondencessubcortical predictionmultilingual fMRI
0
0 comments X

The pith

Brain-LLM alignment holds across languages in cortical and subcortical areas but does not track surprisal or dimensionality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies a multilingual whole-brain encoding approach to fMRI data collected while participants listened to stories in Mandarin, English, and French. Transformer models from LLMs predict activity across distributed cortical networks and subcortical structures, with alignment patterns showing substantial overlap between languages and little change across model layers. Contextual embeddings perform no better than static ones, and neither surprisal nor intrinsic dimensionality matches the observed layer-wise prediction profiles. This pattern indicates that the alignment arises mainly from distributed lexical-semantic correspondences that transfer across languages rather than from matching hierarchical or predictive computations.

Core claim

Using a multilingual whole-brain encoding framework, transformer-based models predicted activity in a distributed landscape spanning widely distributed cortical functional networks like limbic, ventral attention, default mode network, and subcortical structures. Spatial alignment patterns showed substantial cross-linguistic overlap and remained largely stable across model layers, with limited layer progression consistent with functional cortical hierarchies. Contrary to previous evidence, contextual embeddings did not outperform static embeddings. Neither surprisal nor intrinsic dimensionality mirrored neural alignment profiles, suggesting that brain-LLM alignment is spatially robust and跨-l馭

What carries the argument

Multilingual whole-brain encoding that maps LLM embeddings to fMRI responses during naturalistic story listening to measure spatial overlap and test computational explanations.

If this is right

  • Alignment covers subcortical structures in addition to multiple cortical networks.
  • Patterns remain stable across transformer layers with little evidence of hierarchical progression.
  • Static embeddings predict neural responses as effectively as contextual embeddings.
  • Alignment does not arise from predictive uncertainty measured by surprisal.
  • Alignment does not arise from representational geometry measured by intrinsic dimensionality.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If lexical-semantic overlap is the main driver, alignment should weaken when low-frequency or abstract words are removed from the stimuli.
  • The result raises the possibility that similar alignments could appear between LLMs and brain data in non-linguistic domains if semantic content overlaps.
  • Future experiments could control for lexical overlap explicitly to test whether any residual alignment still tracks model depth or task demands.

Load-bearing premise

That surprisal and intrinsic dimensionality are the right metrics to rule out predictive-processing and compression accounts when they fail to match alignment patterns.

What would settle it

A new dataset or model set in which brain prediction accuracy rises and falls in step with layer-wise surprisal or intrinsic dimensionality would contradict the claim that these metrics do not explain the alignment.

read the original abstract

Large language models (LLMs) reliably predict neural activity during language comprehension and transformer depth has been interpreted as mirroring hierarchical cortical organization. However, it remains unclear whether such alignment extends to subcortical regions, overlaps spatially across languages, and what the computational roots of such alignment are. Here, we used a multilingual, whole-brain encoding framework to examine brain-LLM alignment across three typologically distinct languages: Mandarin, English, and French during naturalistic story listening. Our results show that across languages, transformer-based models predicted activity in a distributed landscape spanning widely distributed cortical functional networks like limbic, ventral attention, default mode network, and subcortical structures. Spatial alignment patterns showed substantial cross-linguistic overlap and remained largely stable across model layers, with limited layer progression consistent with functional cortical hierarchies. Contrary to previous evidence, contextual embeddings did not outperform static embeddings. To test candidate computational explanations, we examined whether layer-wise brain scores reflect surprisal and intrinsic dimensionality, and thereby predictive processing and information compression. Neither of these two computational metrics mirrored neural alignment profiles. Our findings suggest that brain-LLM alignment is spatially robust and cross-linguistically stable but not explainable from predictive uncertainty or representational geometry. Rather than directly reflecting shared hierarchical computation, neural predictivity may primarily arise from distributed lexical-semantic correspondences that generalize across languages.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper uses a multilingual whole-brain encoding framework to study LLM-brain alignment during naturalistic story listening in Mandarin, English, and French. It reports distributed alignment across cortical networks (limbic, ventral attention, default mode) and subcortical structures, with substantial cross-linguistic spatial overlap and stability across model layers. Contextual embeddings do not outperform static ones, and layer-wise brain scores do not track surprisal or intrinsic dimensionality. The authors conclude that alignment is spatially robust and cross-linguistically stable but arises primarily from distributed lexical-semantic correspondences rather than predictive processing or representational geometry.

Significance. If the empirical patterns hold, the work would meaningfully extend prior LLM-brain alignment studies by demonstrating cross-lingual stability including in subcortical regions and by directly testing (and failing to support) two prominent computational accounts. The negative findings on surprisal and dimensionality provide a useful constraint, and the inference toward lexical-semantic correspondences offers a plausible alternative interpretation. The multilingual naturalistic design is a strength for generality.

major comments (2)
  1. [Abstract and Results (computational metrics)] Abstract and Results (computational metrics subsection): The claim that 'neither of these two computational metrics mirrored neural alignment profiles' is central to ruling out predictive-processing and compression accounts, yet the manuscript provides no quantitative details on how surprisal and intrinsic dimensionality were computed (e.g., exact formulas, layer selection, aggregation across stimuli), what statistical thresholds defined 'mirroring,' or how data exclusions were handled. This absence prevents evaluation of whether the mismatch is conclusive or artifactual.
  2. [Discussion] Discussion: The positive suggestion that 'neural predictivity may primarily arise from distributed lexical-semantic correspondences' is presented as the favored interpretation after falsifying only surprisal and dimensionality. Because the encoding framework itself relies on lexical-semantic features in the embeddings, this risks circularity; a direct test (e.g., ablation of semantic content or comparison to non-semantic controls) would be needed to make the inference load-bearing.
minor comments (2)
  1. [Figures] Figure legends and axis labels for cross-lingual overlap maps should explicitly state the similarity metric (e.g., Dice coefficient or Pearson r) and correction method used for spatial overlap quantification.
  2. [Methods] The manuscript should clarify whether the same participants or independent cohorts were used across the three languages, as this affects the interpretation of cross-linguistic stability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and insightful comments, which have helped us strengthen the methodological transparency and interpretive rigor of the manuscript. We address each major comment point by point below and indicate the revisions made.

read point-by-point responses
  1. Referee: Abstract and Results (computational metrics subsection): The claim that 'neither of these two computational metrics mirrored neural alignment profiles' is central to ruling out predictive-processing and compression accounts, yet the manuscript provides no quantitative details on how surprisal and intrinsic dimensionality were computed (e.g., exact formulas, layer selection, aggregation across stimuli), what statistical thresholds defined 'mirroring,' or how data exclusions were handled. This absence prevents evaluation of whether the mismatch is conclusive or artifactual.

    Authors: We agree that additional quantitative details are necessary for full evaluation. In the revised manuscript, we have added a dedicated 'Computational Metrics' subsection to the Methods. Surprisal was computed as the negative log probability of each token given prior context from the model's output distribution, averaged per story and layer. Intrinsic dimensionality was estimated via the Levina-Bickel maximum likelihood estimator on per-layer embedding matrices. Metrics were aggregated by averaging across the three languages' stimuli. 'Mirroring' was quantified via Pearson correlation between layer-wise brain scores and metric values, with significance at p < 0.05 after FDR correction across layers. No data exclusions occurred beyond standard fMRI preprocessing. These clarifications make the negative findings more evaluable. revision: yes

  2. Referee: Discussion: The positive suggestion that 'neural predictivity may primarily arise from distributed lexical-semantic correspondences' is presented as the favored interpretation after falsifying only surprisal and dimensionality. Because the encoding framework itself relies on lexical-semantic features in the embeddings, this risks circularity; a direct test (e.g., ablation of semantic content or comparison to non-semantic controls) would be needed to make the inference load-bearing.

    Authors: We appreciate the concern regarding circularity. Our inference rests on the observed empirical patterns rather than the framework in isolation: contextual embeddings showed no advantage over static ones (which primarily encode lexical-semantic information), and alignment lacked the layer-wise progression expected under predictive or hierarchical accounts. We have revised the Discussion to explicitly trace this logic from the negative results on surprisal and dimensionality to the lexical-semantic interpretation, while noting the static-vs-contextual comparison as an internal control. Although a dedicated ablation study would be a valuable extension, it falls outside the current scope; we have added this as a limitation and future direction. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's derivation consists of empirical encoding results across languages showing distributed alignment patterns that are stable and overlapping, followed by direct comparisons showing that layer-wise brain scores do not track surprisal or intrinsic dimensionality. These mismatches are reported as observations rather than predictions forced by the model fits themselves. The inference toward lexical-semantic correspondences is offered as an interpretive suggestion, not a self-definitional or fitted-input reduction. No load-bearing step reduces by construction to prior inputs via self-citation chains, uniqueness theorems, or ansatz smuggling. The central claims remain independent of the tested metrics and rest on observable spatial and cross-lingual patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard assumptions in fMRI encoding studies and LLM representation analysis; no new free parameters, axioms, or invented entities are introduced beyond those already common in the field.

axioms (2)
  • domain assumption Naturalistic story listening elicits reliable BOLD responses that can be linearly predicted from LLM embeddings
    Invoked throughout the encoding framework described in the abstract
  • domain assumption Surprisal and intrinsic dimensionality are valid proxies for predictive processing and information compression
    Used to test computational explanations of alignment

pith-pipeline@v0.9.0 · 5770 in / 1426 out tokens · 33084 ms · 2026-05-21T05:15:02.388110+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    ur findings suggest that brain-LLM alignment is spatially robust and cross-linguistically stable but not explainable from predictive uncertainty or representational geometry. Rather than directly reflecting shared hierarchical computation, neural predictivity may primarily arise from distributed lexical-semantic correspondences that generalize across lang...

  2. [2]

    hierarchy

    or discourse levels (Hong et al., 2024). Higher-order discourse features such as event boundary segmentation engage the default mode network (DMN), which supports integration over longer timescales during narrative comprehension (Fernandino & Binder, 2024; Simony et al., 2016). In parallel, studies of connectivity and brain lesions show that subcortical s...

  3. [3]

    and comprehension (Braga et al., 2020; Rossi et al., 2025). Narrative provides coherence and structure for linking events over time (Baldassano et al., 2017; Dominey, 2021), engaging memory integration and requiring long-timescale accumulation of semantic information. Evaluating whether transformer depth truly mirrors cortical hierarchy thus invites a who...

  4. [4]

    The Le Petit Prince: A multilingual fMRI corpus using ecological stimuli

    and training steps (Cheng & Antonello, 2024), raising questions about its generality. We therefore asked whether the layer-wise brain-model alignment pattern is mirrored by the ones in surprisal or representational geometry. To address these open questions, we here first investigate whether transformer-based language models predict neural activity beyond ...

  5. [5]

    Figure 1: Encoding workflow

    and 16 subcortical areas from Melbourne Subcortex Atlas (S1) parcellation (Tian et al., 2020), as in Figure 1B. Figure 1: Encoding workflow. (A) The stimuli used for the storytelling were divided into different utterances. In addition to static semantic features (FastText), we extracted the contextual features from multilingual BERT (mBERT) and monolingua...

  6. [6]

    Pre-trained FastText word vector models (cc.en.300.bin, cc.zh.300bin, and cc.fr.300bin) served as models for non-contextual word embeddings

    Firstly, 7 different models (multilingual BERT, 3 mono-lingual BERT models, and 3 FastText models, one for each language) were employed. Pre-trained FastText word vector models (cc.en.300.bin, cc.zh.300bin, and cc.fr.300bin) served as models for non-contextual word embeddings. For contextual embeddings, we use off-the-shelf BERT variants, including mBERT-...

  7. [7]

    and predicted areas in the brain (Pasquiou et al., 2023). While many studies use the maximum available context length (Antonello et al., 2021; Caucheteux et al., 2023; Varda et al., 2025), others have found that shorter windows, typically around 10 to 20 words (Toneva & Wehbe, 2019; Yu et al.,

  8. [8]

    or 50 words (Raugel et al., 2025), yield better alignment between LLMs and neural data. In our case, given the findings from former studies and in order to preserve the completeness of a sentence to ensure a meaningful context, as in its training objectives, we use the entire current sentence (15 words on average) as the context window. This choice minimi...

  9. [9]

    For each mBERT layer, thresholded ROI maps from Chinese, French, and English were binarized based on the presence or absence of significant encoding effects

    using the Schaefer-400 parcellation and Melbourne S1 subcortical atlas. For each mBERT layer, thresholded ROI maps from Chinese, French, and English were binarized based on the presence or absence of significant encoding effects. Cortical and subcortical regions were categorized as language-specific (Chinese-only, French-only, and English-only) or shared ...

  10. [10]

    (B) The preferred layer at each ROI across three languages

    whose brain scores differ significantly between layer i and layer j, based on a paired sign-flip permutation test across subjects, with FDR correction across ROIs (q = 0.05). (B) The preferred layer at each ROI across three languages. The darker color indicates deeper model depth. To examine whether different transformer layers yield systematically differ...

  11. [11]

    One-sided sign-flip permutation tests for the hypothesis BERT > FastText were not significant (all p > 0.95), reflecting consistent effects in the opposite direction

    suggesting that contextual embeddings (e.g., BERT) capture activity in regions associated with higher-level linguistic and semantic processing compared to non-contextual embeddings (e.g., GloVe or limited-context embeddings), we used FastText as a contrasting static model. One-sided sign-flip permutation tests for the hypothesis BERT > FastText were not s...

  12. [12]

    Figure 5: Network-wise and layer-wise brain scores across languages. (A) Mean ROI-level brain scores within Yeo-7 cortical networks: visual (Vis), somatomotor (SomMot), dorsal attention (DorsAttn), ventral attention (SalVentAttn), limbic (Limbic), control (Cont), default mode (Default) network, and the Melbourne subcortical system in 12 mBERT layers for C...

  13. [13]

    (Figure 5C). To further interpret these layer-wise patterns and the linguistic processes underlying LLM-based predictions of brain activity, we selected surprisal and ID as two indices of information prediction and compression, respectively, across model layers. Surprisal decreased progressively and showed a marked drop in the final layer, whereas ID peak...

  14. [14]

    thinking

    or stronger predictive processing. Finally, architectural depth in transformers may not directly correspond to cortical hierarchy. Transformer layers consist of repeated attention and feedforward computations trained under a shared optimization objective, whereas cortical hierarchy reflects anatomical connectivity, temporal receptive windows, and multimod...

  15. [15]

    https://doi.org/10.1038/s42003-022-03036-1 Cheng, E., & Antonello, R. J. (2024). Evidence from fMRI Supports a Two-Phase Abstraction Process in Language Models (arXiv:2409.05771). arXiv. https://doi.org/10.48550/arXiv.2409.05771 Cheng, E., Doimo, D., Kervadec, C., Macocco, I., Yu, J., Laio, A., & Baroni, M. (2025). Emergence of a High-Dimensional Abstract...

  16. [16]

    default mode

    https://doi.org/10.3389/fnhum.2012.00069 Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., Kent, J. D., Goncalves, M., DuPre, E., Snyder, M., Oya, H., Ghosh, S. S., Wright, J., Durnez, J., Poldrack, R. A., & Gorgolewski, K. J. (2019). fMRIPrep: A robust preprocessing pipeline for functional MRI. Nature Methods, 16(1)...

  17. [17]

    https://doi.org/10.1038/s41467-024-46631-y Goldstein, A., Ham, E., Schain, M., Nastase, S. A., Aubrey, B., Zada, Z., Grinstein-Dabush, A., Gazula, H., Feder, A., Doyle, W., Devore, S., Dugan, P., Friedman, D., Brenner, M., Hassidim, A., Matias, Y ., Devinsky, O., Siegelman, N., Flinker, A., … Hasson, U. (2025). Temporal structure of natural language proce...

  18. [18]

    https://doi.org/10.3389/fninf.2011.00013 Graichen, N., de-Dios-Flores, I., & Boleda, G. (2026). The Grammar of Transformers: A Systematic Review of Interpretability Research on Syntactic Knowledge in Language Models (arXiv:2601.19926). arXiv. https://doi.org/10.48550/arXiv.2601.19926 Hafner, D., Ortega, P. A., Ba, J., Parr, T., Friston, K., & Heess, N. (2...

  19. [19]

    https://doi.org/10.1038/s41467-024-49173-5 Kurczek, J., Brown-Schmidt, S., & Duff, M. C. (2013). Hippocampal contributions to language: Evidence of referential processing deficits in amnesia. Journal of Experimental Psychology. General, 142(4), 1346–1354. https://doi.org/10.1037/a0034026 Lei, Y ., Ge, X., Zhang, Y ., Yang, Y ., & Ma, B. (2025). Do Large L...

  20. [20]

    Nathan and Brennan, Jonathan R

    https://doi.org/10.1038/s41597-022-01625-7 Lin, Y ., Tan, Y . C., & Frank, R. (2019). Open Sesame: Getting inside BERT’s Linguistic Knowledge. In T. Linzen, G. Chrupała, Y . Belinkov, & D. Hupkes (Eds.), Proceedings of the 2019 ACL Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP (pp. 241–253). Association for Computational Linguis...

  21. [21]

    https://doi.org/10.1038/s42003-025-08377-1 Pasquiou, A., Lakretz, Y ., Hale, J., Thirion, B., & Pallier, C. (2022). Neural Language Models are not Born Equal to Fit Brain Data, but Training Helps (arXiv:2207.03380). arXiv. https://doi.org/10.48550/arXiv.2207.03380 Pasquiou, A., Lakretz, Y ., Thirion, B., & Pallier, C. (2023). Information-Restricted Neural...

  22. [22]

    J., Muñoz, E., Painous, C., Santacruz, P., Ruiz-Idiago, J., Mareca, C., & Hinzen, W

    https://proceedings.neurips.cc/paper/2019/hash/749a8e6c231831ef7756db230b4359c8-Abstract.html Tovar, A., Perry, S. J., Muñoz, E., Painous, C., Santacruz, P., Ruiz-Idiago, J., Mareca, C., & Hinzen, W. (2024). Understanding of referential dependencies in Huntington’s disease. Neuropsychologia, 197, 108845. https://doi.org/10.1016/j.neuropsychologia.2024.108...

  23. [23]

    https://doi.org/10.1038/s42003-025-07862-x V os de Wael, R., Benkarim, O., Paquola, C., Lariviere, S., Royer, J., Tavakol, S., Xu, T., Hong, S.-J., Langs, G., Valk, S., Misic, B., Milham, M., Margulies, D., Smallwood, J., & Bernhardt, B. C. (2020). BrainSpace: A toolbox for the analysis of macroscale gradients in neuroimaging and connectomics datasets. Co...

  24. [24]

    L., Sharmarke, H., Clarke, N., Gensollen, N., Markiewicz, C

    https://doi.org/10.1038/s42003-020-0794-7 Wang, H.-T., Meisler, S. L., Sharmarke, H., Clarke, N., Gensollen, N., Markiewicz, C. J., Paugam, F., Thirion, B., & Bellec, P. (2024). Continuous evaluation of denoising strategies in resting-state fMRI connectivity using fMRIPrep and Nilearn. PLOS Computational Biology, 20(3), e1011942. https://doi.org/10.1371/j...

  25. [25]

    Declaration of competing interest P.H

    https://doi.org/10.1038/s42003-025-09377-x Acknowledgements We thank all the members of the Grammar and Cognition Lab and all the colleagues for helpful discussions and feedback. Declaration of competing interest P.H. has received grants and honoraria from Novartis, Lundbeck, Mepha, Janssen, Boehringer Ingelheim, OM Pharma, and Neurolite outside of this w...