Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli
Pith reviewed 2026-05-18 13:32 UTC · model grok-4.3
The pith
Neuroprobe introduces decoding tasks to map when and where the brain computes language features from intracranial recordings.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Neuroprobe is a suite of decoding tasks built on intracranial EEG recordings from subjects engaged in naturalistic movie viewing that allows systematic measurement of when and where each aspect of multi-modal language processing occurs by assessing feature decodability across time and electrode sites, while also serving as a benchmark for comparing model architectures.
What carries the argument
Neuroprobe, a collection of decoding tasks that quantify the decodability of auditory features such as pitch and volume and linguistic features such as part of speech from intracranial EEG signals across time and all electrode locations.
If this is right
- Information can be shown flowing from language and audio sites in the superior temporal gyrus to locations in the prefrontal cortex.
- Processing advances over time from simple auditory properties to more abstract linguistic properties in a data-driven way.
- Different model architectures and training methods for neural data can be compared on the same standardized tasks.
- Neuroscience questions about the spatial and temporal organization of language computations can be addressed directly from the labeled recordings.
Where Pith is reading between the lines
- The same decoding approach could be extended to test whether the observed flow patterns hold during other everyday activities besides movie watching.
- Reliable decoding of these features might identify candidate sites for future brain-computer interface applications that target language.
- Results from this intracranial data could be compared with non-invasive recordings to check which processing stages are visible at larger scales.
Load-bearing premise
Brain responses collected during movie viewing represent general multi-modal language processing rather than being shaped mainly by movie-specific content or labeling inaccuracies.
What would settle it
Decoding accuracy for complex language features such as part of speech remains at chance level across subjects, time windows, and electrode locations even after controlling for basic auditory confounds.
Figures
read the original abstract
High-resolution neural datasets enable foundation models for the next generation of brain-computer interfaces and neurological treatments. The community requires rigorous benchmarks to discriminate between competing modeling approaches, yet no standardized evaluation frameworks exist for intracranial EEG (iEEG) recordings. To address this gap, we present Neuroprobe: a suite of decoding tasks for studying multi-modal language processing in the brain. Unlike scalp EEG, intracranial EEG requires invasive surgery to implant electrodes that record neural activity directly from the brain with minimal signal distortion. Neuroprobe is built on the BrainTreebank dataset, which consists of over 40 hours of iEEG recordings from 10 human subjects performing a naturalistic movie viewing task. Neuroprobe serves two critical functions. First, it is a source from which neuroscience insights can be drawn. The high temporal and spatial resolution of the labeled iEEG allows researchers to systematically determine when and where computations for each aspect of language processing occur in the brain by measuring the decodability of each feature across time and all electrode locations. Using Neuroprobe, we visualize how information flows from key language and audio processing sites in the superior temporal gyrus to sites in the prefrontal cortex. We also demonstrate the time evolution of processing from simple auditory features (e.g., pitch and volume) to more complex language features (e.g., part of speech) in a purely data-driven manner. Second, as the field moves toward neural foundation models trained on large-scale datasets, Neuroprobe provides a rigorous framework for comparing competing architectures and training protocols. We make the code for Neuroprobe openly available, aiming to enable rapid progress in the field of iEEG foundation models. Public leaderboard: https://neuroprobe.dev/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents Neuroprobe, a suite of decoding tasks for intracranial EEG (iEEG) recordings from the BrainTreebank dataset (over 40 hours from 10 subjects during naturalistic movie viewing). It functions as both a resource for neuroscience insights—via decodability analyses to determine when and where multi-modal language computations occur—and a benchmark framework for comparing neural foundation models and training protocols. The authors report visualizations of information flow from superior temporal gyrus sites to prefrontal cortex and a data-driven time evolution from simple auditory features (pitch, volume) to complex linguistic features (part of speech), with code and a public leaderboard made available.
Significance. If the central demonstrations hold after addressing potential confounds, Neuroprobe would provide a valuable open benchmark and high-resolution dataset for advancing iEEG-based brain-computer interfaces and foundation models. The public leaderboard and open code release are strengths that could accelerate community progress in model evaluation. The data-driven approach to mapping neural processing hierarchies in naturalistic settings offers potential for novel insights into brain computations, though its impact depends on robustness to stimulus correlations.
major comments (2)
- [Abstract] Abstract: The claim to demonstrate 'the time evolution of processing from simple auditory features (e.g., pitch and volume) to more complex language features (e.g., part of speech) in a purely data-driven manner' is load-bearing for the neuroscience contribution but lacks support against stimulus confounds. In naturalistic audiovisual movies, low-level acoustic features statistically co-vary with linguistic features (e.g., pitch contours with syntactic boundaries or lexical stress), so later decodability of complex features could reflect leakage of simpler variance rather than a genuine neural hierarchy. The manuscript should report stimulus-feature correlation matrices and apply residualization or partial-correlation controls to validate the timeline.
- [Abstract] Abstract: The described visualizations of information flow and time evolution lack accompanying quantitative metrics, error bars, statistical tests, or full methods details (e.g., decoding procedures, feature extraction, cross-validation), as noted in the soundness assessment. This makes it difficult to evaluate the robustness of the reported findings.
minor comments (2)
- The manuscript would benefit from a dedicated methods section or supplementary table explicitly listing all decoding features, their extraction pipelines, and any preprocessing to improve reproducibility.
- Verify and prominently display all links to code, data, and the leaderboard (https://neuroprobe.dev/) to ensure accessibility.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed feedback. We have carefully reviewed the concerns regarding stimulus confounds and the presentation of quantitative details. Below we respond point by point and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim to demonstrate 'the time evolution of processing from simple auditory features (e.g., pitch and volume) to more complex language features (e.g., part of speech) in a purely data-driven manner' is load-bearing for the neuroscience contribution but lacks support against stimulus confounds. In naturalistic audiovisual movies, low-level acoustic features statistically co-vary with linguistic features (e.g., pitch contours with syntactic boundaries or lexical stress), so later decodability of complex features could reflect leakage of simpler variance rather than a genuine neural hierarchy. The manuscript should report stimulus-feature correlation matrices and apply residualization or partial-correlation controls to validate the timeline.
Authors: We agree that potential correlations between low-level acoustic and higher-level linguistic features in naturalistic stimuli represent an important interpretational concern. Our decoding analyses measure the temporal profile of each feature's decodability independently, which already provides a data-driven characterization of when information becomes available in the neural signals. To further isolate neural contributions from stimulus statistics, we will add stimulus-feature correlation matrices and perform residualization (or partial-correlation) controls in the revised manuscript. These additions will allow us to test whether the reported temporal progression persists after removing shared variance with simpler features. revision: yes
-
Referee: [Abstract] Abstract: The described visualizations of information flow and time evolution lack accompanying quantitative metrics, error bars, statistical tests, or full methods details (e.g., decoding procedures, feature extraction, cross-validation), as noted in the soundness assessment. This makes it difficult to evaluate the robustness of the reported findings.
Authors: The full manuscript contains a dedicated Methods section that details the decoding procedures, feature extraction pipelines, cross-validation scheme, and statistical testing approach. To improve clarity and address the referee's concern directly, we will revise the relevant results figures and accompanying text to include quantitative metrics, error bars, and statistical significance indicators for the information-flow and time-evolution visualizations. We will also add concise methodological summaries to the figure captions and ensure the abstract references the full methods. revision: yes
Circularity Check
Neuroprobe benchmark is self-contained data release with no derivation chain
full rationale
The manuscript defines a suite of decoding tasks on the BrainTreebank iEEG dataset and reports empirical visualizations of information flow and feature decodability timelines. These results are obtained directly from applying standard decoding methods to the provided stimulus features and neural recordings. No equations, fitted parameters, or self-citations are invoked to derive or justify the core claims; the work contains no mathematical derivation that could reduce to its inputs by construction. The central contributions are task definition and data-driven observations, which remain externally falsifiable against the released dataset and code.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The BrainTreebank iEEG recordings and stimulus labels are sufficiently accurate and representative for multi-modal language processing studies.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we visualize how information flows from key language and audio processing sites in the superior temporal gyrus to sites in the prefrontal cortex. We also demonstrate the time evolution of processing from simple auditory features (e.g., pitch and volume) to more complex language features (e.g., part of speech)
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Neuroprobe is built on the BrainTreebank dataset, which consists of over 40 hours of iEEG recordings from 10 human subjects performing a naturalistic movie viewing task
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Guillaume Alain and Yoshua Bengio
Accessed: 2024-11-23. Guillaume Alain and Yoshua Bengio. Understanding intermediate layers using linear classifier probes. InInternational Conference on Learning Representations (ICLR),
work page 2024
-
[2]
European Language Resources Association. ISBN 979-10-95546-34-4. URLhttps://aclanthology.org/2020.lrec-1.15/. Jonathan R Brennan and John T Hale. Hierarchical structure guides rapid linguistic predictions during naturalistic listening.PloS one, 14(1):e0207741,
work page 2020
-
[3]
doi: 10.1016/j.neuroimage.2006.01.021. Linnea Evanson, Christine Bulteau, Mathilde Chipaux, Georg Dorfmüller, Sarah Ferrand-Sorbets, Emmanuel Raffo, Sarah Rosenberg, Pierre Bourdillon, and Jean-Rémi King. Emergence of language in the de- veloping brain.Meta AI Research,
-
[4]
URL https://doi.org/10.1038/s41597-021-01102-7
doi: 10.1038/s41597-021-01102-7. URL https://doi.org/10.1038/s41597-021-01102-7. Martin N Hebart, Oliver Contier, Lina Teichmann, Adam H Rockter, Charles Y Zheng, Alexis Kidder, Anna Corriveau, Maryam Vaziri-Pashkam, and Chris I Baker. Things-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and ...
-
[5]
10 Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli Wei-Bang Jiang, Li-Ming Zhao, and Bao-Liang Lu. Large brain model for learning generic representations with tremendous eeg data in bci.arXiv preprint arXiv:2405.18765,
-
[6]
Jean-Philippe Lachaux, Nikolai Axmacher, Florian Mormann, Eric Halgren, and Nathan E
URLhttps://arxiv.org/abs/2507.11783. Jean-Philippe Lachaux, Nikolai Axmacher, Florian Mormann, Eric Halgren, and Nathan E. Crone. High-frequency neural activity and human cognition: Past, present and possible future of intracranial eeg research.Progress in Neurobiology, 98(3):279–301,
-
[7]
doi: 10.1016/j.pneurobio.2012.06.008. Trung Le and Eli Shlizerman. STNDT: Modeling Neural Population Activity with a Spatiotemporal Transformer, June
-
[8]
doi: 10.1038/s41597-023-02437-z
doi: 10.1038/s41597-023-02437-z. URL https://doi.org/10.1038/ s41597-023-02437-z. G. Li, S. Jiang, S. E. Paraskevopoulou, M. Wang, Y . Xu, Z. Wu, L. Chen, D. Zhang, and G. Schalk. Optimal referencing for stereo-electroencephalographic (seeg) recordings.NeuroImage, 183:327–335, Dec
-
[9]
doi: 10.1016/j.neuroimage.2018.08.020. Epub 2018 Aug
-
[10]
URL https://www.nature.com/articles/ s41597-022-01625-7
doi: 10.1038/s41597-022-01625-7. URLhttps://doi.org/10.1038/s41597-022-01625-7. Jan-Matthis Lueckmann, Alexander Immer, Alex Bo-Yuan Chen, Peter H Li, Mariela D Petkova, Nirmala A Iyer, Luuk Willem Hesselink, Aparna Dev, Gudrun Ihrke, Woohyun Park, et al. Zapbench: A benchmark for whole-brain activity prediction in zebrafish.arXiv preprint arXiv:2503.02618,
-
[11]
2020, Results in Physics, 16, 102918, doi:10.1016/j
ISSN 0896-6273. doi: https://doi.org/10.1016/j. neuron.2024.02.011. URL https://www.sciencedirect.com/science/article/pii/S0896627324001211. Samuel A. Nastase, Ariel Goldstein, and Uri Hasson. Keep it real: Rethinking the primacy of experimental control in cognitive neuroscience.NeuroImage, 222:117254,
work page doi:10.1016/j 2024
-
[12]
doi: 10.1016/j.neuroimage.2020.117254. URL https://doi.org/10.1016/j.neuroimage.2020.117254. Open access under CC license. Samuel A. Nastase, Yung-Fang Liu, Harrison Hillman, et al. The “narratives” fmri dataset for evaluating models of naturalistic language comprehension.Scientific Data, 8:250,
-
[13]
doi: 10.1038/s41597-021-01033-3. URL https://doi.org/10.1038/s41597-021-01033-3. Petr Nejedly, Vaclav Kremen, Vladimir Sladky, Jan Cimbalnik, Petr Klimes, Filip Plesinger, Filip Mivalt, V ojtech Travnicek, Ivo Viscor, Martin Pail, et al. Multicenter intracranial eeg dataset for classification of graphoelements and artifactual signals.Scientific data, 7(1):179,
-
[14]
ISBN 9780195050387. doi: 10.1093/acprof:oso/9780195050387.001.0001. Iyad Obeid and Joseph Picone. The temple university hospital eeg data corpus.Frontiers in neuroscience, 10:196,
-
[15]
URLhttps://doi.org/10.1016/j.conb.2018.04.007
doi: 10.1016/j.conb.2018.04.007. URLhttps://doi.org/10.1016/j.conb.2018.04.007. Copyright © 2018 Elsevier Ltd. All rights reserved. Josef Parvizi and Sabine Kastner. Promises and limitations of human intracranial electroencephalography.Nature Neuroscience, 21(4):474–483,
-
[16]
URL https://doi.org/10.1038/ s41593-018-0108-2
doi: 10.1038/s41593-018-0108-2. URL https://doi.org/10.1038/ s41593-018-0108-2. Felix Pei, Joel Ye, David M. Zoltowski, Anqi Wu, Raeed H. Chowdhury, Hansem Sohn, Joseph E. O’Doherty, Krishna V . Shenoy, Matthew T. Kaufman, Mark Churchland, Mehrdad Jazayeri, Lee E. Miller, Jonathan Pillow, Il Memming Park, Eva L. Dyer, and Chethan Pandarinath. Neural laten...
-
[17]
11 Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli Matthew G
URLhttps://arxiv.org/abs/2109.04463. 11 Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli Matthew G. Perich, Lee E. Miller, Mehdi Azabou, and Eva L. Dyer. Long-term recordings of motor and premotor cortical spiking activity during reaching in monkeys. Data set,
-
[18]
URL https://doi.org/10.48324/dandi. 000688/0.250122.1735. Steven M Peterson, Zoe Steine-Hanson, Nathan Davis, Rajesh PN Rao, and Bingni W Brunton. Generalized neural decoders for transfer learning across participants and recording modalities.Journal of Neural Engineering, 18(2): 026014,
-
[19]
Matthias Schurz, Joaquim Radua, Markus Aichhorn, Fabio Richlan, and Josef Perner
doi: 10.1371/journal.pbio.1000610. Matthias Schurz, Joaquim Radua, Markus Aichhorn, Fabio Richlan, and Josef Perner. Fractionating theory of mind: A meta-analysis of functional brain imaging studies.Neuroscience & Biobehavioral Reviews, 42:9–34,
-
[20]
arXiv preprint arXiv:2403.11207 (2024)
Paul S Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, et al. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data.arXiv preprint arXiv:2403.11207,
-
[21]
arXiv preprint arXiv:1905.05950 , year =
Ian Tenney, Dipanjan Das, and Ellie Pavlick. Bert rediscovers the classical nlp pipeline.arXiv preprint arXiv:1905.05950,
-
[22]
doi: https://doi.org/10.1016/j.jneumeth.2021.109089
ISSN 0165-0270. doi: https://doi.org/10.1016/j.jneumeth.2021.109089. URL https://www.sciencedirect. com/science/article/pii/S0165027021000248. Franck Vidal, Boris Burle, Laure Spieser, Laurence Carbonnell, Cédric Meckler, Laurence Casini, and Thierry Has- broucq. Linking eeg signals, brain functions and mental operations: Advantages of the laplacian trans...
-
[23]
doi: https://doi.org/10.1016/j.ijpsycho
ISSN 0167-8760. doi: https://doi.org/10.1016/j.ijpsycho. 2015.04.022. URL https://www.sciencedirect.com/science/article/pii/S0167876015001737. On the benefits of using surface Laplacian (current source density) methodology in electrophysiology. Christopher Wang, Vighnesh Subramaniam, Adam Uri Yaari, Gabriel Kreiman, Boris Katz, Ignacio Cases, and Andrei B...
-
[24]
Leila Wehbe, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, and Tom Mitchell
URL https://arxiv.org/ abs/2411.08343. Leila Wehbe, Brian Murphy, Partha Talukdar, Alona Fyshe, Aaditya Ramdas, and Tom Mitchell. Simultaneously uncovering the patterns of brain regions involved in different story reading subprocesses.PLOS ONE, 9(11):e112575, November
-
[25]
doi: 10.1371/journal.pone.0112575
ISSN 1932-6203. doi: 10.1371/journal.pone.0112575. URL http://dx.plos.org/10.1371/ journal.pone.0112575. Francis R Willett, Erin M Kunz, Chaofei Fan, Donald T Avansino, Guy H Wilson, Eun Young Choi, Foram Kamdar, Matthew F Glasser, Leigh R Hochberg, Shaul Druckmann, et al. A high-performance speech neuroprosthesis.Nature, 620(7976):1031–1036,
-
[26]
Zhizhang Yuan, Fanqi Shen, Meng Li, Yuguo Yu, Chenhao Tan, and Yang Yang. Brainwave: A brain signal foundation model for clinical applications.arXiv preprint arXiv:2402.10251,
-
[27]
doi: 10.1038/s41597-025-03994-7. 12 Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli Daoze Zhang, Zhizhang Yuan, Yang Yang, Junru Chen, Jingjing Wang, and Yafeng Li. Brant: Foundation Model for Intracranial Neural Signal. InThirty-Seventh Conference on Neural Information Processing Systems, November
-
[28]
encoding the mask pattern into the representation
doi: 10.1109/TAMD.2015.2431497. 13 Neuroprobe: Evaluating Intracranial Brain Responses to Naturalistic Stimuli A Splits Neuroprobe includes 3 different types of splits. Within-SessionIn this split, models are trained and tested within the same subject and the same movie session. To avoid temporal data leakage, we are using 2-fold cross-validation using no...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.