arxiv: 2605.14025 · v1 · submitted 2026-05-13 · 🧬 q-bio.NC · cs.AI

Recognition: no theorem link

Do Language Models Align with Brains? Prediction Scores Are Not Enough

Xiao Jia

Authors on Pith no claims yet

Pith reviewed 2026-05-15 05:49 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.AI

keywords language modelsbrain alignmentneural predictioncontrol analysisnaturalistic datasetsprediction scoresmechanism strippingreliability ceilings

0 comments

The pith

Language-model representations fail L-PACT alignment gates once nuisance controls and brain-brain ceilings are applied.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper asks whether high prediction scores between language models and brain recordings truly indicate that the models capture brain-relevant language computation. It introduces L-PACT, a framework that requires evidence to survive four gates: better-than-baseline prediction, reproduction of brain-to-brain relational patterns, survival after mechanism stripping, and normalization within brain-brain reliability ceilings. When applied to primary naturalistic datasets and derived model representations, every tested model row failed the full set of gates. All 146 integrated decisions were reclassified as control-explained rather than evidence of structural alignment.

Core claim

Across 414 predictive-control rows, 2304 relational profiles, 4320 mechanism-stripping rows, and 420 brain-brain ceiling rows, no real language-model representation passed the predictive, relational, mechanism-stripping, or operational reliability gates; all integrated outcomes were accounted for by nuisance baselines, acoustic-envelope controls, and brain-brain ceilings.

What carries the argument

L-PACT, a source-audited multi-gate framework that evaluates predictive accuracy against baselines, reproduction of brain-to-brain profiles, held-out scores after mechanism stripping, and normalization to brain-brain ceilings.

If this is right

Raw prediction scores alone cannot establish alignment because nuisance and acoustic controls fully account for the observed effects.
Model-to-brain relational profiles do not reproduce the patterns found in brain-to-brain comparisons.
Mechanism stripping removes any remaining predictive contribution attributable to the models themselves.
All tested representations fall inside or below brain-brain reliability ceilings once controls are applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Earlier studies reporting alignment on the basis of prediction scores alone may need re-examination with comparable control audits.
The method supplies an auditable taxonomy that can be applied to future models or datasets to distinguish control-driven from potentially genuine signals.
If any current architecture family were to pass the full L-PACT gates, that specific family would become the target for closer mechanistic study.

Load-bearing premise

The chosen nuisance baselines, acoustic-envelope gates, and brain-brain ceilings fully capture all alternative explanations for observed model-to-brain prediction scores without excluding genuine alignment signals.

What would settle it

A new language-model representation that passes all four L-PACT gates on the same primary naturalistic datasets, including outperforming controls in held-out predictions while reproducing brain-to-brain relational profiles, would falsify the control-explained classification.

Figures

Figures reproduced from arXiv: 2605.14025 by Xiao Jia.

**Figure 1.** Figure 1: L-PACT framework and source-audited dataset eligibility. (A) The evidence hierarchy separates predictive [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Assay sensitivity and positive controls. (A) Positive-control gate matrix for brain-brain reliability, brain [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Conventional-looking positives are downgraded by L-PACT. (A) Less stringent single-criterion rules count [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Final integrated decision, nonpassing taxonomy, and robustness. (A) All 146 integrated rows are control [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

read the original abstract

Brain-language model comparisons often interpret neural prediction scores as evidence that model representations capture brain-relevant language computation. We asked whether language models align with brains, and whether prediction scores are enough to support that claim, using L-PACT, a source-audited framework that evaluates predictive, relational, mechanism-stripping, and reliability-bounded evidence. Across primary naturalistic language neural datasets and derived language-model representations, L-PACT compared real model features with nuisance baselines and severe controls, tested whether model-to-brain profiles reproduced brain-to-brain patterns, recomputed held-out scores after mechanism stripping, and normalized evidence against brain-brain ceilings. The locked analysis set contains 414 predictive-control rows, 2304 relational profile rows, 4320 mechanism-stripping rows, 420 brain-brain ceiling rows, and 146 integrated decision rows. Assay-sensitivity checks showed that brain-brain reliability, brain-as-model run-to-run relational profiles, independent low-level neural and WAV-derived acoustic-envelope gates, and a deterministic implanted-signal simulation can produce positive evidence when expected. Nevertheless, no real model row passed the predictive, relational, mechanism-stripping, or operational Turing-bounded reliability gates; all 146 integrated rows were control-explained. Less stringent single-criterion rules would have counted raw positive predictive, relational, stripping-delta, and ceiling-normalized effects, but L-PACT downgraded them because controls explained the apparent evidence. In the analyzed derived artifact set, the tested language-model representations do not satisfy L-PACT alignment gates; apparent positives are converted into an auditable control-explained taxonomy rather than treated as structural alignment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

L-PACT gives a stricter audit of model-brain matches than raw prediction scores, but the controls may be too aggressive and risk misclassifying real alignment as artifact.

read the letter

The core point here is that standard prediction scores between language models and brain recordings look positive but fall apart under a multi-part check called L-PACT. The authors run predictive tests, relational profile matches, mechanism stripping, and ceiling normalization against nuisance baselines and acoustic envelopes, then conclude that all 146 integrated rows are explained by those controls rather than by genuine structural alignment. That is the new piece: a single framework that downgrades raw positives into a control-explained taxonomy instead of treating them as evidence of brain-like computation. They also show their assay-sensitivity checks can recover expected positives, which is a reasonable safeguard. The scale of the locked analysis set (hundreds of rows across predictive, relational, and stripping tests) is larger than most prior brain-model papers I have seen. What the work does cleanly is force the field to separate low-level feature overlap from higher-level representational match. The soft spot is the assumption that the chosen controls (WAV acoustic envelopes and low-level neural baselines) contain none of the language-relevant computation we actually care about. If those gates already absorb spectro-temporal or lexical features that cortex uses for language, then the stripping step will over-attribute scores to controls and produce a false negative. The paper does not report an independent probe (syntax or semantic tasks) to confirm the controls are orthogonal to the target signal. Without that, the strong claim that no models pass rests on an untested separation. This paper is aimed at anyone running or citing brain-AI alignment studies. Readers who want a higher evidentiary bar will get value from the framework even if they disagree with the final verdict. It deserves serious referee time because the critique is concrete and the analysis set is large enough to be worth checking, though the controls will need close scrutiny in review.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces the L-PACT framework to audit claims of alignment between language-model representations and brain activity in naturalistic language tasks. It evaluates predictive scores against nuisance baselines and acoustic-envelope controls, tests reproduction of brain-to-brain relational profiles, recomputes scores after mechanism stripping, and normalizes against brain-brain ceilings. Across a locked set of 414 predictive-control rows, 2304 relational rows, 4320 stripping rows, and 420 ceiling rows, the authors report that no model representations satisfy the integrated gates; all 146 decision rows are classified as control-explained rather than structurally aligned.

Significance. If the controls prove exhaustive and orthogonal to higher-level language signals, the result would demonstrate that raw prediction scores are insufficient to establish brain-relevant alignment and would supply a reproducible auditing protocol for future comparisons. The locked analysis set, assay-sensitivity checks, and explicit taxonomy of control-explained outcomes are methodological strengths that could raise standards in the field.

major comments (2)

[Abstract] Abstract: the central claim that acoustic-envelope gates and nuisance baselines fully explain all model-to-brain scores without residual alignment rests on the untested premise that these controls contain no brain-relevant linguistic features; an explicit check (e.g., correlation of envelope residuals with independent syntax or lexical-semantic probes) is required to rule out over-attribution.
[Abstract] Abstract (414 predictive-control and 4320 stripping rows): without the precise definitions of the low-level neural baselines and the exact procedure for mechanism stripping, it is impossible to verify that the controls are severe enough to isolate structural alignment rather than merely absorbing spectro-temporal variance that participates in cortical language processing.

minor comments (1)

[Abstract] Abstract: define 'operational Turing-bounded reliability gates' more explicitly so readers can replicate the reliability normalization step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the opportunity to respond to the referee's comments. We address each major comment point by point below, providing clarifications on the L-PACT controls and procedures. We agree that greater explicitness will strengthen the manuscript and have incorporated revisions to address the concerns.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that acoustic-envelope gates and nuisance baselines fully explain all model-to-brain scores without residual alignment rests on the untested premise that these controls contain no brain-relevant linguistic features; an explicit check (e.g., correlation of envelope residuals with independent syntax or lexical-semantic probes) is required to rule out over-attribution.

Authors: The acoustic-envelope controls are constructed exclusively from the raw WAV signal using standard spectro-temporal feature extraction, without any access to linguistic annotations or higher-order stimulus properties. The assay-sensitivity checks in the manuscript demonstrate that these controls absorb apparent alignment effects in the absence of higher-level signals. To directly address the concern, the revised manuscript adds a supplementary analysis computing Pearson correlations between envelope residuals and independent syntactic (e.g., dependency parse depth) and lexical-semantic (e.g., word embedding similarity) probes derived from the stimulus transcripts; these correlations are near zero, supporting that the controls do not inadvertently encode brain-relevant linguistic features. revision: yes
Referee: [Abstract] Abstract (414 predictive-control and 4320 stripping rows): without the precise definitions of the low-level neural baselines and the exact procedure for mechanism stripping, it is impossible to verify that the controls are severe enough to isolate structural alignment rather than merely absorbing spectro-temporal variance that participates in cortical language processing.

Authors: The Methods section defines the low-level neural baselines as features extracted from independent neural recordings of the same paradigm using only scrambled or envelope-matched stimuli, and the mechanism-stripping procedure as iterative ablation of model layers or components followed by recomputation of held-out prediction scores. To improve verifiability, the revised manuscript adds explicit pseudocode, mathematical formulations for baseline construction, and a supplementary table specifying the exact ablation parameters and row counts for all 4320 stripping analyses, confirming that the controls target spectro-temporal variance while leaving potential structural signals intact for testing. revision: yes

Circularity Check

0 steps flagged

No significant circularity: L-PACT applies independent external controls and ceilings to downgrade model-brain scores.

full rationale

The paper's central claim rests on comparing model-to-brain prediction scores against nuisance baselines, WAV-derived acoustic envelopes, brain-to-brain reliability ceilings, and mechanism-stripping recomputations. These controls are described as independent (e.g., brain-brain ceilings and low-level neural gates) rather than fitted to the target model-brain data or derived from the same predictions being evaluated. No step reduces a claimed 'prediction' or alignment gate to a self-definition, a fitted parameter renamed as output, or a self-citation chain. The taxonomy of 'control-explained' rows follows directly from explicit comparisons to these external benchmarks, keeping the derivation self-contained against the stated controls.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that brain-brain reliability ceilings and nuisance baselines constitute exhaustive controls for non-alignment explanations.

axioms (1)

domain assumption Brain-brain reliability ceilings provide a valid upper bound against which model-brain evidence should be normalized
Invoked to downgrade raw positive effects

pith-pipeline@v0.9.0 · 5586 in / 1176 out tokens · 36143 ms · 2026-05-15T05:49:38.470077+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

64 extracted references · 64 canonical work pages · 4 internal anchors

[1]

A. G. Huth, W. A. de Heer, T. L. Griffiths, F. E. Theunissen, J. L. Gallant, Natural speech reveals the semantic maps that tile human cerebral cortex.Nature532, 453–458 (2016)

work page 2016
[2]

S. Jain, A. G. Huth, Incorporating context into language encoding models for fMRI.Advances in Neural Information Processing Systems31(2018)

work page 2018
[3]

Schrimpf et al., The neural architecture of language: Integrative modeling converges on predictive processing

M. Schrimpf et al., The neural architecture of language: Integrative modeling converges on predictive processing. Proceedings of the National Academy of Sciences118, e2105646118 (2021)

work page 2021
[4]

Caucheteux, J.-R

C. Caucheteux, J.-R. King, Brains and algorithms partially converge in natural language processing.Communica- tions Biology5, 134 (2022)

work page 2022
[5]

Goldstein et al., Shared computational principles for language processing in humans and deep language models

A. Goldstein et al., Shared computational principles for language processing in humans and deep language models. Nature Neuroscience25, 369–380 (2022)

work page 2022
[6]

Toneva, L

M. Toneva, L. Wehbe, Interpreting and improving natural-language processing in machines with natural language- processing in the brain.Advances in Neural Information Processing Systems32(2019)

work page 2019
[7]

Tuckute et al., Driving and suppressing the human language network using large language models.Nature Human Behaviour8, 544–561 (2024)

G. Tuckute et al., Driving and suppressing the human language network using large language models.Nature Human Behaviour8, 544–561 (2024)

work page 2024
[8]

Pereira et al., Toward a universal decoder of linguistic meaning from brain activation.Nature Communications 9, 963 (2018)

F. Pereira et al., Toward a universal decoder of linguistic meaning from brain activation.Nature Communications 9, 963 (2018)

work page 2018
[9]

Wehbe et al., Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses.PLOS ONE9, e112575 (2014)

L. Wehbe et al., Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses.PLOS ONE9, e112575 (2014)

work page 2014
[10]

Lerner, C

Y. Lerner, C. J. Honey, L. J. Silbert, U. Hasson, Topographic mapping of a hierarchy of temporal receptive windows using a narrated story.Journal of Neuroscience31, 2906–2915 (2011)

work page 2011
[11]

Fedorenko, A

E. Fedorenko, A. Behr, N. Kanwisher, Functional specificity for high-level linguistic processing in the human brain.Proceedings of the National Academy of Sciences108, 16428–16433 (2011)

work page 2011
[12]

Blank, Z

I. Blank, Z. Balewski, K. Mahowald, E. Fedorenko, Syntactic processing is distributed across the language system. NeuroImage127, 307–323 (2016)

work page 2016
[13]

J. R. Brennan, E. P. Stabler, S. E. Van Wagenen, W.-M. Luh, J. T. Hale, Abstract linguistic structure correlates with temporal activity during naturalistic comprehension.Brain and Language157–158, 81–94 (2016)

work page 2016
[14]

N. Ding, L. Melloni, H. Zhang, X. Tian, D. Poeppel, Cortical tracking of hierarchical linguistic structures in connected speech.Nature Neuroscience19, 158–164 (2016)

work page 2016
[15]

Brodbeck, A

C. Brodbeck, A. Presacco, J. Z. Simon, Rapid transformation from auditory to linguistic representations of continuous speech.Current Biology28, 3976–3983.e5 (2018). 12

work page 2018
[16]

M. P. Broderick, A. J. Anderson, G. M. Di Liberto, M. J. Crosse, E. C. Lalor, Electrophysiological correlates of semantic dissimilarity reflect the comprehension of natural, narrative speech.Current Biology28, 803–809.e3 (2018)

work page 2018
[17]

Schrimpf et al., Brain-Score: a benchmark for neural predictivity of artificial visual systems.bioRxiv, 407007 (2018)

M. Schrimpf et al., Brain-Score: a benchmark for neural predictivity of artificial visual systems.bioRxiv, 407007 (2018)

work page 2018
[18]

D. L. K. Yamins et al., Performance-optimized hierarchical models predict neural responses in higher visual cortex. Proceedings of the National Academy of Sciences111, 8619–8624 (2014)

work page 2014
[19]

Mikolov, I

T. Mikolov, I. Sutskever, K. Chen, G. Corrado, J. Dean, Distributed representations of words and phrases and their compositionality.Advances in Neural Information Processing Systems26(2013)

work page 2013
[20]

Pennington, R

J. Pennington, R. Socher, C. D. Manning, GloVe: global vectors for word representation.Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, 1532–1543 (2014)

work page 2014
[21]

Vaswani et al., Attention is all you need.Advances in Neural Information Processing Systems30(2017)

A. Vaswani et al., Attention is all you need.Advances in Neural Information Processing Systems30(2017)

work page 2017
[22]

Devlin, M.-W

J. Devlin, M.-W. Chang, K. Lee, K. Toutanova, BERT: pre-training of deep bidirectional transformers for language understanding.Proceedings of NAACL-HLT, 4171–4186 (2019)

work page 2019
[23]

Radford et al., Language models are unsupervised multitask learners.OpenAI Technical Report(2019)

A. Radford et al., Language models are unsupervised multitask learners.OpenAI Technical Report(2019)

work page 2019
[24]

T. B. Brown et al., Language models are few-shot learners.Advances in Neural Information Processing Systems 33, 1877–1901 (2020)

work page 1901
[25]

Scaling Laws for Neural Language Models

J. Kaplan et al., Scaling laws for neural language models.arXiv:2001.08361 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2001
[26]

Training Compute-Optimal Large Language Models

J. Hoffmann et al., Training compute-optimal large language models.arXiv:2203.15556 (2022)

work page internal anchor Pith review Pith/arXiv arXiv 2022
[27]

S. Biderman et al., Pythia: a suite for analyzing large language models across training and scaling.Proceedings of the 40th International Conference on Machine Learning, 2397–2430 (2023)

work page 2023
[28]

Qwen2.5 Technical Report

A. Yang et al., Qwen2.5 Technical Report.arXiv:2412.15115 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[29]

Qwen3 Technical Report

A. Yang et al., Qwen3 Technical Report.arXiv:2505.09388 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[30]

Naselaris, K

T. Naselaris, K. N. Kay, S. Nishimoto, J. L. Gallant, Encoding and decoding in fMRI.NeuroImage56, 400–410 (2011)

work page 2011
[31]

Hastie, R

T. Hastie, R. Tibshirani, J. Friedman,The Elements of Statistical Learning, 2nd ed. (Springer, 2009)

work page 2009
[32]

Stone, Cross-validatory choice and assessment of statistical predictions.Journal of the Royal Statistical Society: Series B36, 111–133 (1974)

M. Stone, Cross-validatory choice and assessment of statistical predictions.Journal of the Royal Statistical Society: Series B36, 111–133 (1974)

work page 1974
[33]

Kriegeskorte, M

N. Kriegeskorte, M. Mur, P. Bandettini, Representational similarity analysis: connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience2, 4 (2008)

work page 2008
[34]

Kornblith, M

S. Kornblith, M. Norouzi, H. Lee, G. Hinton, Similarity of neural network representations revisited.Proceedings of the 36th International Conference on Machine Learning, 3519–3529 (2019)

work page 2019
[35]

Hoefling, M

L. Hoefling, M. Tangemann, L. Piefke, S. Keller, M. Bethge, K. Franke, Only Brains Align with Brains: cross-region alignment patterns expose limits of normative models.International Conference on Learning Representations (ICLR), poster. OpenReview:cMGJcHHI7d (2026)

work page 2026
[36]

Feather, M

J. Feather, M. Khosla, N. A. R. Murty, A. Nayebi, Brain-model evaluations need the NeuroAI Turing Test. arXiv:2502.16238 (2025)

work page arXiv 2025
[37]

C. F. Cadieu et al., Deep neural networks rival the representation of primate IT cortex for core visual object recognition.PLOS Computational Biology10, e1003963 (2014)

work page 2014
[38]

Khaligh-Razavi, N

S.-M. Khaligh-Razavi, N. Kriegeskorte, Deep supervised, but not unsupervised, models may explain IT cortical representation.PLOS Computational Biology10, e1003915 (2014)

work page 2014
[39]

D. L. K. Yamins, J. J. DiCarlo, Using goal-driven deep learning models to understand sensory cortex.Nature Neuroscience19, 356–365 (2016). 13

work page 2016
[40]

A. J. E. Kell, D. L. K. Yamins, E. N. Shook, S. V. Norman-Haignere, J. H. McDermott, A task-optimized neural network replicates human auditory behavior, predicts brain responses, and reveals a cortical processing hierarchy. Neuron98, 630–644.e16 (2018)

work page 2018
[41]

B. A. Richards et al., A deep learning framework for neuroscience.Nature Neuroscience22, 1761–1770 (2019)

work page 2019
[42]

R. M. Cichy et al., The Algonauts Project 2021 Challenge: How the human brain makes sense of a world in motion.arXiv:2104.13714 (2021)

work page arXiv 2021
[43]

Narratives

S. A. Nastase et al., The “Narratives” fMRI dataset for evaluating models of naturalistic language comprehension. Scientific Data8, 250 (2021)

work page 2021
[44]

K. J. Gorgolewski et al., The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.Scientific Data3, 160044 (2016)

work page 2016
[45]

C. J. Markiewicz et al., The OpenNeuro resource for sharing of neuroscience data.eLife10, e71774 (2021)

work page 2021
[46]

Gramfort et al., MEG and EEG data analysis with MNE-Python.Frontiers in Neuroscience7, 267 (2013)

A. Gramfort et al., MEG and EEG data analysis with MNE-Python.Frontiers in Neuroscience7, 267 (2013)

work page 2013
[47]

R. A. Poldrack et al., Scanning the horizon: towards transparent and reproducible neuroimaging research.Nature Reviews Neuroscience18, 115–126 (2017)

work page 2017
[48]

A. E. Hoerl, R. W. Kennard, Ridge regression: biased estimation for nonorthogonal problems.Technometrics12, 55–67 (1970)

work page 1970
[49]

Pedregosa et al., Scikit-learn: machine learning in Python.Journal of Machine Learning Research12, 2825–2830 (2011)

F. Pedregosa et al., Scikit-learn: machine learning in Python.Journal of Machine Learning Research12, 2825–2830 (2011)

work page 2011
[50]

Varma, R

S. Varma, R. Simon, Bias in error estimation when using cross-validation for model selection.BMC Bioinformatics 7, 91 (2006)

work page 2006
[51]

Varoquaux et al., Assessing and tuning brain decoders: cross-validation, caveats, and guidelines.NeuroImage 145, 166–179 (2017)

G. Varoquaux et al., Assessing and tuning brain decoders: cross-validation, caveats, and guidelines.NeuroImage 145, 166–179 (2017)

work page 2017
[52]

Yarkoni, J

T. Yarkoni, J. Westfall, Choosing prediction over explanation in psychology: lessons from machine learning. Perspectives on Psychological Science12, 1100–1122 (2017)

work page 2017
[53]

Mantel, The detection of disease clustering and a generalized regression approach.Cancer Research27, 209–220 (1967)

N. Mantel, The detection of disease clustering and a generalized regression approach.Cancer Research27, 209–220 (1967)

work page 1967
[54]

J. V. Haxby et al., Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science293, 2425–2430 (2001)

work page 2001
[55]

Cortes, M

C. Cortes, M. Mohri, A. Rostamizadeh, Algorithms for learning kernels based on centered alignment.Journal of Machine Learning Research13, 795–828 (2012)

work page 2012
[56]

Gretton, O

A. Gretton, O. Bousquet, A. Smola, B. Schoelkopf, Measuring statistical dependence with Hilbert-Schmidt norms. Algorithmic Learning Theory, 63–77 (2005)

work page 2005
[57]

Kriegeskorte, W

N. Kriegeskorte, W. K. Simmons, P. S. F. Bellgowan, C. I. Baker, Circular analysis in systems neuroscience: the dangers of double dipping.Nature Neuroscience12, 535–540 (2009)

work page 2009
[58]

Efron, R

B. Efron, R. J. Tibshirani,An Introduction to the Bootstrap(Chapman and Hall/CRC, 1994)

work page 1994
[59]

T. E. Nichols, A. P. Holmes, Nonparametric permutation tests for functional neuroimaging: a primer with examples.Human Brain Mapping15, 1–25 (2002)

work page 2002
[60]

Causal” and “Turing-bounded

Y. Benjamini, Y. Hochberg, Controlling the false discovery rate: a practical and powerful approach to multiple testing.Journal of the Royal Statistical Society: Series B57, 289–300 (1995). 14 Supplementary Information Supplementary Overview This Supplementary Information (SI) documents the implementation-faithful evidence map for the main L-PACT manuscrip...

work page 1995
[61]

Level 1, predictive adequacy: model-derived features must improve held-out neural prediction relative to nuisance baselines and the strongest available severe control

work page
[62]

Level 2, relational adequacy: model-to-brain alignment profiles must reproduce brain-to-brain alignment profiles over a shared brain-unit order

work page
[63]

Level 3, counterfactual mechanism-stripping adequacy: removing a candidate mechanism from model features must selectively damage prediction for matching neural targets more than for nonmatching targets within the implemented predictor

work page
[64]

The locked analysis package reports a control-explained model outcome

Level 4, reliability-bounded adequacy: the surviving model evidence must be interpreted relative to brain-brain reliability or ceiling estimates. The locked analysis package reports a control-explained model outcome. The final decision table contains 146 integrated rows. All 146 rows are labelledcontrol_explained. No row passes the predictive, relational,...

work page