pith. sign in

arxiv: 2604.24942 · v1 · submitted 2026-04-27 · 💻 cs.CL · q-bio.NC

Independent-Component-Based Encoding Models of Brain Activity During Story Comprehension

Pith reviewed 2026-05-08 03:32 UTC · model grok-4.3

classification 💻 cs.CL q-bio.NC
keywords fMRIindependent component analysisencoding modelsstory comprehensionlarge language modelsfunctional networksnaturalistic stimuli
0
0 comments X

The pith

Independent-component encoding models predict functional brain network activity from language model features during story listening.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors decompose fMRI data recorded while people listen to stories into independent components using one portion of the recordings. Encoding models are then trained on the remaining data to forecast the time series of these components based on representations from large language models of the story text. Certain components show strong and consistent predictability across different listeners, aligning with known auditory and language networks, while artifact-related components do not. This method shifts the analysis from noisy individual voxels to more stable network-level signals. It matters because it handles the fact that the same brain functions appear in slightly different places in each person's brain, making cross-subject comparisons more straightforward and the results more interpretable.

Core claim

We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features. Components identified as noise or motion-related artifacts showed uniformly poor predictive performance, confirming that highly p

What carries the argument

Independent component (IC)-based encoding framework that decomposes fMRI into ICs on one data subset and predicts their time series from LLM features on held-out data.

Load-bearing premise

The assumption that independent components derived from one part of the fMRI dataset capture real stimulus-related brain signals that can be accurately predicted by language model features in the other part, separate from confounds like motion or scanner noise.

What would settle it

Finding that the time series of the highly predictable independent components do not correlate with any measurable features of the auditory stories, such as sound amplitude or word meanings, would indicate they do not reflect stimulus-driven activity.

Figures

Figures reproduced from arXiv: 2604.24942 by Anna A. Ivanova, Cory Shain, Jin Li, Kamya Hari, Taha Binhuraib.

Figure 1
Figure 1. Figure 1: Overview of the component-wise encoding model framework. fMRI data are preprocessed and decom view at source ↗
Figure 2
Figure 2. Figure 2: Test story predictivity (correlation) across ICs view at source ↗
Figure 3
Figure 3. Figure 3: Top 5 most predicted ICs for representative subjects in the test dataset. Spatial weight maps (IC sources) view at source ↗
Figure 4
Figure 4. Figure 4: Predictivity of Auditory, Language and Visual components across subjects. Functional networks are chosen view at source ↗
Figure 5
Figure 5. Figure 5: Temporal vs. spatial component matching strategies. Top row (Temporal-first): Components matched by maximizing temporal correlation, showing strong temporal agreement (left) and moderate spa￾tial agreement (right) of matched components. Bot￾tom row (Spatial-first): Components matched by maxi￾mizing spatial correlation, showing strong spatial agree￾ment (left) and moderate temporal agreement (right) of matc… view at source ↗
Figure 8
Figure 8. Figure 8: Cross-validation predictivity of components for view at source ↗
Figure 7
Figure 7. Figure 7: Mean predictivity across 5-fold cross￾validation, across all subjects. Results show that the ob￾served patterns are consistent across different train/test splits, indicating that the findings are not driven by id￾iosyncratic properties of a single held-out story. Error bars represent ±1 standard deviation across folds view at source ↗
Figure 9
Figure 9. Figure 9: Predictivity scores of networks most spatially correlated to the networks found in DU atlases for each view at source ↗
Figure 10
Figure 10. Figure 10: Top 5 most predicted ICs for representative subjects in the test dataset. Spatial weight maps (IC sources) view at source ↗
read the original abstract

Encoding models provide a powerful framework for linking continuous stimulus features to neural activity; however, traditional voxelwise approaches are limited by measurement noise, inter-subject variability, and redundancy arising from spatially correlated voxels encoding overlapping neural signals. Here, we propose an independent component (IC)-based encoding framework that dissociates stimulus-driven and noise-driven signals in fMRI data. We decompose continuous fMRI data from naturalistic story listening into ICs using one subset of the data, and train encoding models on independent data to predict IC time series from large language model representations of linguistic input. Across subjects, a subset of ICs exhibited consistently high predictivity. These ICs were spatially and temporally consistent across subjects and included cognitive networks known to respond during story listening (auditory and language). Auditory component time series were strongly correlated with acoustic stimulus features, highlighting the interpretability of identified component time series. Components identified as noise or motion-related artifacts by ICA-AROMA showed uniformly poor predictive performance, confirming that highly predicted components reflect genuine stimulus-related neural signals rather than confounds. Overall, IC-based encoding models enable analyses at the level of functional networks, accommodating the variability in network locations across individuals and providing interpretable results that are easy to compare across subjects.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents an independent component (IC)-based encoding model for fMRI data during story comprehension. fMRI time series are decomposed into ICs using ICA on one data subset. Encoding models are trained on independent data to predict the time series of these ICs from large language model (LLM) representations of the story stimulus. The authors find that certain ICs, corresponding to auditory and language networks, show high predictivity and consistency across subjects, while noise and motion components identified by ICA-AROMA show poor predictivity. This framework is claimed to enable analyses at the functional network level, accommodating inter-subject variability in network locations and yielding interpretable, comparable results across subjects.

Significance. Should the quantitative results bear out the qualitative descriptions, the work would offer a useful methodological advance for linking linguistic features to brain activity. Traditional voxel-wise encoding models suffer from noise and inter-subject alignment issues; the IC approach mitigates these by operating on data-driven networks. The explicit separation of ICA decomposition and encoding training, combined with the AROMA-based validation that noise components are not predictable, provides a strong internal control against artifactual findings. This could improve the biological interpretability of encoding models and facilitate cross-subject comparisons without requiring spatial normalization of individual networks.

major comments (2)
  1. Abstract: The abstract reports that 'a subset of ICs exhibited consistently high predictivity' and 'were spatially and temporally consistent across subjects' but supplies no numerical values, error bars, statistical tests, or effect sizes. These metrics are load-bearing for the central claims of improved interpretability and cross-subject comparability; the results section must report mean predictivity correlations (with SE), spatial/temporal consistency measures (e.g., Dice overlap or ICC), and direct statistical comparisons between signal and noise ICs.
  2. Methods: Although data splitting is used to separate ICA decomposition from encoding-model training and prediction, the manuscript does not specify the exact fraction of data allocated to each stage, the number of runs or time points per stage, or the cross-validation procedure. These details are required to confirm that the reported differential predictivity between stimulus-driven and AROMA noise components is not an artifact of the split.
minor comments (2)
  1. Abstract: Specify the particular LLM (e.g., GPT-2, LLaMA) and the exact feature extraction (layer, pooling) used to generate linguistic representations, as this choice directly affects the encoding-model results and replicability.
  2. Results: Consider adding a supplementary table listing per-subject or average predictivity values for the auditory/language ICs versus AROMA components to make the 'uniformly poor' claim for noise components quantitatively verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback and positive assessment of the potential methodological contribution of our work. We address each major comment below and will revise the manuscript accordingly to improve quantitative reporting and methodological transparency.

read point-by-point responses
  1. Referee: Abstract: The abstract reports that 'a subset of ICs exhibited consistently high predictivity' and 'were spatially and temporally consistent across subjects' but supplies no numerical values, error bars, statistical tests, or effect sizes. These metrics are load-bearing for the central claims of improved interpretability and cross-subject comparability; the results section must report mean predictivity correlations (with SE), spatial/temporal consistency measures (e.g., Dice overlap or ICC), and direct statistical comparisons between signal and noise ICs.

    Authors: We agree that the abstract should include key quantitative metrics to support the central claims. In the revised manuscript we will update the abstract to report mean predictivity correlations with standard errors, spatial and temporal consistency measures (including Dice overlap or ICC), and direct statistical comparisons between signal and noise ICs. We will also review the results section to ensure all requested metrics, error bars, and comparisons are explicitly presented with appropriate statistical tests. revision: yes

  2. Referee: Methods: Although data splitting is used to separate ICA decomposition from encoding-model training and prediction, the manuscript does not specify the exact fraction of data allocated to each stage, the number of runs or time points per stage, or the cross-validation procedure. These details are required to confirm that the reported differential predictivity between stimulus-driven and AROMA noise components is not an artifact of the split.

    Authors: We thank the referee for highlighting the need for greater precision on the data-splitting protocol. In the revised manuscript we will explicitly state the fraction of data allocated to ICA decomposition versus encoding-model training, the number of runs and time points per stage, and the full cross-validation procedure. These additions will allow readers to verify that the differential predictivity between stimulus-driven and noise components is not an artifact of the split. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's core derivation relies on an explicit train/test split: ICA decomposition is performed on one data subset to extract independent components, after which encoding models are trained and evaluated on fully held-out data to predict component time series from LLM features. This separation prevents any reported predictivity from reducing to a fitted parameter by construction. Internal validation via AROMA noise components (showing uniformly low predictivity) and cross-subject consistency checks further confirm that selected components reflect stimulus-driven signals rather than artifacts or tautological fits. No load-bearing self-citations, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation are present in the described methodology. The framework does not rename known empirical patterns as novel derivations but instead applies a standard ICA-plus-encoding pipeline with data partitioning to accommodate inter-subject variability.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Central claim rests on standard domain assumptions about ICA signal separation and the ability of LLM features to capture stimulus-driven variance in neural components; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption ICA can dissociate stimulus-driven neural signals from noise and motion artifacts in continuous fMRI data during naturalistic story listening.
    Invoked to justify the decomposition step and the claim that high-predictivity ICs reflect genuine signals.
  • domain assumption LLM representations of linguistic input contain information sufficient to predict time series of stimulus-related independent components on held-out data.
    Core premise of the encoding model training and evaluation.

pith-pipeline@v0.9.0 · 5530 in / 1406 out tokens · 73654 ms · 2026-05-08T03:32:10.648082+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

7 extracted references · 7 canonical work pages

  1. [1]

    From Language to Cognition: How LLM s Outgrow the Human Language Network

    AlKhamissi,B.,Tuckute,G.,Tang,Y.,Binhuraib,T.O.A., Bosselut, A., & Schrimpf, M. (2025, November). From language to cognition: How LLMs outgrow the humanlanguagenetwork.InC.Christodoulopoulos, T.Chakraborty,C.Rose,&V.Peng(Eds.),Proceed- ings of the 2025 conference on empirical methods in natural language processing(pp. 24321–24339). Association for Computa...

  2. [2]

    L., Ladopoulou, J., Sun, W., Eldaief, M

    Du, J., Tripathi, V., Elliott, M. L., Ladopoulou, J., Sun, W., Eldaief, M. C., & Buckner, R. L. (2025). Within- individual precision mapping of brain networks ex- clusively using task data.Neuron,113(23). Esteban, O., Markiewicz, C. J., Blair, R. W., Moodie, C. A., Isik, A. I., Erramuzpe, A., Kent, J. D., Goncalves, M., DuPre, E., Snyder, M., Oya, H., Gho...

  3. [3]

    A., Struhl, M

    Lipkin, B., Tuckute, G., Affourtit, J., Small, H., Mineroff, Z., Kean, H., Jouravlev, O., Rakocevic, L., Pritchett, B., Siegelman, M., Hoeflin, C., Pongos, A., Blank, I. A., Struhl, M. K., Ivanova, A., Shannon, S., Sathe, A.,Hoffmann,M.,Nieto-Castañón,A.,&Fedorenko, E. (2022). Probabilistic atlas for the language net- work based on precision fMRI data fro...

  4. [4]

    M., Shinkareva, S

    Mitchell, T. M., Shinkareva, S. V., Carlson, A., Chang, K.-M., Malave, V. L., Mason, R. A., & Just, M. A. (2008). Predicting human brain activity associated with the meanings of nouns.Science,320(5880), 1191–1195. Naselaris, T., Kay, K. N., Nishimoto, S., & Gallant, J. L. (2011). Encoding and decoding in fMRI.NeuroIm- age,56(2), 400–410. Oota, S. R., Gupt...

  5. [5]

    K., & Beckmann, C

    Pruim,R.H.,Mennes,M.,vanRooij,D.,Llera,A.,Buite- laar, J. K., & Beckmann, C. F. (2015). ICA-AROMA: ArobustICA-basedstrategyforremovingmotionar- tifacts from fMRI data.NeuroImage,112, 267–277. Ratan Murty, N. A., Bashivan, P., Abate, A., DiCarlo, J. J., & Kanwisher, N. (2021). Computational mod- els of category-selective brain regions enable high- throughp...

  6. [6]

    J., Anderson, N

    Salvo, J. J., Anderson, N. L., & Braga, R. M. (2025). Intrinsic functional connectivity delineates trans- modal language functions.Imaging Neuroscience, 3, IMAG.a.25. Schrimpf, M., Blank, I. A., Tuckute, G., Kauf, C., Hos- seini, E. A., Kanwisher, N., Tenenbaum, J. B., & Fedorenko, E. (2021). The neural architecture of language: Integrative modeling conve...

  7. [7]

    We then trained en- coding models to predict these ROI-averaged time se- ries directly from the stimulus features

    (Refer- ence images shown in Figure 3). We then trained en- coding models to predict these ROI-averaged time se- ries directly from the stimulus features. Performance wasagainquantifiedusingPearsoncorrelationbetween predicted and actual ROI signals. Analysis on additional known functional networks Beyond the language network, we observe that several addit...