Retrieval-Based Brain Decoding by Alignment, not Complexity

Matteo Ciferri; Matteo Ferrante; Nicola Toschi

arxiv: 2606.19081 · v1 · pith:2P77Y643new · submitted 2026-06-17 · 🧬 q-bio.NC · cs.HC

Retrieval-Based Brain Decoding by Alignment, not Complexity

Matteo Ciferri , Matteo Ferrante , Nicola Toschi This is my paper

Pith reviewed 2026-06-26 18:34 UTC · model grok-4.3

classification 🧬 q-bio.NC cs.HC

keywords brain decodingfMRIcontrastive learningfoundation modelslinear modelsembeddingsmultimodalretrieval

0 comments

The pith

Linear contrastive decoders map fMRI activity to foundation model embeddings more accurately than ridge regression or non-linear alternatives across images, text, and sound.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines how to map fMRI brain activity onto the high-dimensional embedding spaces used by foundation models for vision, language, and audio. It argues that although brain computations are non-linear at fine scales, fMRI averaging and noise produce an effectively linear observable signal. Experiments show that training a linear decoder with a contrastive objective yields better retrieval performance than either ridge regression or standard non-linear networks. This pattern holds consistently across multiple datasets and sensory modalities. A reader would care because the finding implies that decoding success depends more on the alignment objective than on added model complexity.

Core claim

Although neural computations are highly non-linear at the microscale, fMRI measurements average signals across space and time, further smoothed by noise, effectively linearizing the observable representation. As a result, linear contrastive decoders consistently outperform ridge regression and standard non-linear alternatives when mapping fMRI activity to the embedding spaces of foundation models, and these results generalize across images, text, and sound.

What carries the argument

Linear contrastive decoder that aligns fMRI activity vectors with foundation-model embeddings via a contrastive training objective.

If this is right

Decoding gains arise more from the choice of training objective than from architectural complexity.
Contrastive-linear models constitute a principled strategy for brain decoding.
The same linear-contrastive approach succeeds for vision, language, and audio stimuli.
Performance improvements are expected to generalize across additional fMRI datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Retrieval-based decoding may extend naturally to other coarse-grained neuroimaging signals such as EEG or MEG.
Semantic embedding spaces appear to capture a large fraction of the variance observable in averaged fMRI responses.
Linear alignment could serve as a baseline for testing whether finer-scale recordings require non-linear mappings.

Load-bearing premise

fMRI measurements average signals across space and time and are smoothed by noise, which linearizes the observable brain representation.

What would settle it

Non-linear models trained with the identical contrastive objective would need to produce reliably higher retrieval accuracy than the linear version on held-out fMRI data from at least two modalities.

Figures

Figures reproduced from arXiv: 2606.19081 by Matteo Ciferri, Matteo Ferrante, Nicola Toschi.

**Figure 1.** Figure 1: The same linear contrastive model is employed across three experimental conditions, dif fering only in the stimulus modality (audio, textual, or visual). For each modality, neural responses from fMRI are aligned through subject-specific linear transformations and mapped into the corresponding stimulus embedding space (obtained from a pretrained foundation model such as CLIP for images, CL AP for audios, or… view at source ↗

**Figure 2.** Figure 2: (a) Each heatmap represents the cosine similarity between predicted and ground-truth stimulus embeddings, computed sample-by-sample. Results are shown for three datasets (NSD, HUTH, GTZAN) and two models: a linear Ridge regression (lef t column) and the best contrastive learning model (the linear one, right column). The diagonal reflects correct predictions with high similarity between corresponding stimul… view at source ↗

**Figure 3.** Figure 3: Quantitative bar char ts to visualize decoding results. Stars above the bars reveal significance, according to the table in the Appendix. Double stars indicate pvalue lower than 1e-10. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Random samples of brain decoding results. For each panel, the target column shows the ground-truth stimulus (music track, image, or sentence, depending on the modality), while the neighbor columns display the top retrieved candidates from the model’s latent space based on cosine similarity. (a) Retrieval of images viewed by par ticipants. (b) Retrieval of text/sentences corresponding to the neural response… view at source ↗

read the original abstract

A prominent theory in cognitive science suggests that concepts in the brain are organized as high-dimensional vectors, with semantic meaning captured by directions and relative angles in this space. Brain decoding is the effort of reconstructing or retrieving stimuli (or their representations) from neural activity and involves finding a function that approximates how the brain represents concepts. This motivates the investigation of contrastive objectives as biologically plausible candidates to reverse the brain loss function. In this work, we study how functional MRI (fMRI) activity can generally be mapped with the embedding spaces of foundation models in vision, language, and audio. Although neural computations are highly non-linear at the microscale, fMRI measurements average signals across space and time, further smoothed by noise, effectively linearizing the observable representation. Consistent with these views, our experiments across multiple datasets demonstrate that linear contrastive decoders consistently outperform ridge regression and standard non-linear alternatives, and that these results generalize across images, text, and sound. These findings indicate that decoding gains arise more from the choice of training objective than from architectural complexity, pointing to contrastive-linear models as a principled strategy for brain decoding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Linear contrastive decoders beat the listed baselines on fMRI retrieval across modalities, but the paper does not isolate whether the objective or the model class drives the gains.

read the letter

The main thing here is that linear models trained with contrastive loss map fMRI activity to vision, language, and audio embeddings better than ridge regression or the standard non-linear alternatives the authors tested. The result holds across several datasets, which gives it some breadth.

The work applies an existing contrastive approach to brain decoding and shows that the linear version is enough. The motivation that fMRI averaging plus noise effectively linearizes the signal is reasonable and explains why they expected simple models to work. That part is straightforward and ties the experiments together.

The soft spot is the interpretation that gains come from the objective rather than complexity. The abstract calls the non-linear models "standard," which usually means they were trained with regression losses. If the authors did not also train non-linear models with the same contrastive objective, the comparison does not separate the two factors. The stress-test note correctly flags this gap. Without seeing the methods section it is impossible to know whether that control was run, but the provided summary gives no indication it was.

This is for people who build or use brain decoders and want practical, low-complexity options. It engages the literature on alignment without overclaiming. The paper deserves a serious referee to verify the exact baseline setups, dataset sizes, and any statistical controls. I would send it to review.

Referee Report

1 major / 2 minor

Summary. The paper claims that fMRI-based brain decoding to foundation model embeddings (vision, language, audio) is best performed by linear contrastive decoders, which outperform both ridge regression and standard non-linear alternatives across multiple datasets. It attributes the gains primarily to the choice of contrastive training objective rather than architectural complexity, and argues that fMRI's spatial/temporal averaging and noise effectively linearize the observable neural representations, making contrastive-linear models a principled decoding strategy.

Significance. If the central comparison is shown to isolate objective from architecture, the result would be significant: it supplies cross-modal evidence that simple linear models suffice for retrieval-based decoding when trained contrastively, offers a computationally efficient alternative to complex non-linear decoders, and aligns with the view that contrastive objectives are biologically plausible for approximating brain representations. The generalization across images, text, and sound is a clear strength of the reported experiments.

major comments (1)

[Abstract] Abstract: the central interpretation that 'decoding gains arise more from the choice of training objective than from architectural complexity' requires that the 'standard non-linear alternatives' were trained under the identical contrastive objective used for the linear decoders. The provided description gives no indication that non-linear contrastive variants were evaluated; without this control the results remain consistent with either objective or model-class effects and therefore do not support the claimed separation of factors.

minor comments (2)

Methods and results sections should report dataset sizes, number of subjects, exact statistical tests, and p-values supporting the outperformance claims.
The precise architectures, loss functions, and hyper-parameter regimes for the non-linear baselines need explicit description to allow replication of the comparison.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful and constructive comment. The observation correctly identifies a limitation in how the abstract frames the separation between training objective and model complexity. We address the point directly below and will revise the manuscript to ensure the claims accurately reflect the experiments performed.

read point-by-point responses

Referee: [Abstract] Abstract: the central interpretation that 'decoding gains arise more from the choice of training objective than from architectural complexity' requires that the 'standard non-linear alternatives' were trained under the identical contrastive objective used for the linear decoders. The provided description gives no indication that non-linear contrastive variants were evaluated; without this control the results remain consistent with either objective or model-class effects and therefore do not support the claimed separation of factors.

Authors: We agree with the referee's assessment. The non-linear alternatives evaluated in the manuscript were trained with standard regression objectives (primarily mean-squared error), not the contrastive loss used for the linear decoders. Consequently, the reported results show that linear contrastive models outperform both ridge regression and typical non-linear regression models, but they do not isolate the contribution of the objective from architectural capacity. We will revise the abstract and the corresponding discussion sections to remove the stronger claim of separation and instead state that contrastive training yields strong retrieval performance even when restricted to linear mappings, outperforming more complex non-linear models trained under conventional regression losses. This revision will be made in the next version of the manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison with no derivation chain

full rationale

The paper reports experimental results comparing linear contrastive decoders against ridge regression and non-linear baselines on fMRI datasets for image, text, and audio modalities. No equations, first-principles derivations, or load-bearing self-citations are present in the provided text that reduce any claimed result to its own inputs by construction. The central claim rests on observed performance differences, which are falsifiable via replication on held-out data and do not involve fitted parameters renamed as predictions or ansatzes smuggled through citations. This is a standard empirical study; the derivation chain is absent, so circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim depends on the domain assumption that fMRI effectively linearizes neural representations; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption fMRI measurements average signals across space and time, further smoothed by noise, effectively linearizing the observable representation
Invoked to justify why linear models are appropriate despite non-linear microscale computations.

pith-pipeline@v0.9.1-grok · 5726 in / 1190 out tokens · 23106 ms · 2026-06-26T18:34:43.041328+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

39 extracted references · 1 linked inside Pith

[1]

Why concepts are (probably) vectors

Steven T Piantadosi, Dyana CY Muller, Joshua S Rule, Karthikeya Kaushik, Mark Gorenstein, Elena R Leib, and Emily Sanford. Why concepts are (probably) vectors. Trends in Cognitive Sciences , 28(9):844–856, 2024

2024
[2]

Bassett, et al

Erfan Nozari, Dani S. Bassett, et al. Macroscopic resting-state brain dynamics are best described by linear models. Nature Biomedical Engineering , 8:7–8, 2024

2024
[3]

Reconstructing visual experiences from brain activity evoked by natural movies

Shinji Nishimoto, An T Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L Gallant. Reconstructing visual experiences from brain activity evoked by natural movies. Current biology , 21(19):1641–1646, 2011

2011
[4]

Huth, Willem A

Alexander G. Huth, Willem A. de Heer, Thomas L. Griffiths, et al. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016

2016
[5]

Through their eyes: Multi- subject brain decoding with simple alignment techniques

Matteo Ferrante, Tommaso Boccato, Furkan Ozcelik, Rufin VanRullen, and Nicola Toschi. Through their eyes: Multi- subject brain decoding with simple alignment techniques. Imaging Neuroscience, 2:1–21, 05 2024

2024
[6]

Scaling laws for de- coding images from brain activity, 2025

Hubert Banville, Y ohann Benchetrit, Stéphane d’Ascoli, Jérémy Rapin, and Jean-Rémi King. Scaling laws for de- coding images from brain activity, 2025

2025
[7]

Natural scene reconstruction from fmri signals using generative latent diffu- sion

Furkan Ozcelik and Rufin VanRullen. Natural scene reconstruction from fmri signals using generative latent diffu- sion. Scientific Reports , 13(1):15666, 2023

2023
[8]

Mind reader: Reconstructing complex images from brain activities

Sikun Lin, Thomas Sprague, and Ambuj K Singh. Mind reader: Reconstructing complex images from brain activities. Advances in Neural Information Processing Systems , 35:29624–29636, 2022

2022
[9]

Reconstructing the mind’s eye: fmri-to-image with con- trastive learning and diffusion priors

Paul Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Aidan Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth Norman, et al. Reconstructing the mind’s eye: fmri-to-image with con- trastive learning and diffusion priors. Advances in Neural Information Processing Systems , 36:24705–24728, 2023

2023
[10]

Dream: Visual decoding from reversing human visual system

Weihao Xia, Raoul De Charette, Cengiz Oztireli, and Jing-Hao Xue. Dream: Visual decoding from reversing human visual system. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages 8226– 8235, 2024

2024
[11]

Cinematic mindscapes: High-quality video reconstruction from brain activity

Zijiao Chen, Jiaxin Qing, and Juan Helen Zhou. Cinematic mindscapes: High-quality video reconstruction from brain activity. Advances in Neural Information Processing Systems , 36:24841–24858, 2023

2023
[12]

T ang, A

J. T ang, A. LeBel, S. Jain, et al. Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, 26:858–866, 2023

2023
[13]

Optimizing fmri data acquisition for decoding natural speech with limited participants

Louis Jalouzot, Alexis Thual, Y air Lakretz, Christophe Pallier, and Bertrand Thirion. Optimizing fmri data acquisition for decoding natural speech with limited participants. arXiv preprint arXiv:2505.21304 , 2025

arXiv 2025
[14]

Denk, Yu T akagi, T akuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, and Shinji Nishimoto

Timo I. Denk, Yu T akagi, T akuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, and Shinji Nishimoto. Brain2music: Reconstructing music from human brain activity, 2023. 8

2023
[15]

R&b-rhythm and brain: Cross-subject decoding of music from human brain activity

Matteo Ciferri, Matteo Ferrante, and Nicola Toschi. R&b-rhythm and brain: Cross-subject decoding of music from human brain activity. Neural Networks , page 109195, 2026

2026
[16]

Generative language reconstruction from brain recordings

Ziyi Y e, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Min Zhang, Christina Lioma, and Tuukka Ruotsalo. Generative language reconstruction from brain recordings. Communications Biology , 8(1):346, 2025

2025
[17]

Brainclip: Bridging brain and visual-linguistic representation via clip for generic natural visual stimulus decoding

Yulong Liu, Y ongqiang Ma, Wei Zhou, Guibo Zhu, and Nanning Zheng. Brainclip: Bridging brain and visual-linguistic representation via clip for generic natural visual stimulus decoding. arXiv preprint arXiv:2302.12971 , 2023

arXiv 2023
[18]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data

Paul S Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, et al. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data. arXiv preprint arXiv:2403.11207 , 2024

arXiv 2024
[19]

Umbrae: Unified multimodal brain decoding

Weihao Xia, Raoul de Charette, Cengiz Oztireli, and Jing-Hao Xue. Umbrae: Unified multimodal brain decoding. In European Conference on Computer Vision , pages 242–259. Springer, 2024

2024
[20]

Dynadiff: Single-stage decoding of images from contin- uously evolving fmri

Marlène Careil, Y ohann Benchetrit, and Jean-Rémi King. Dynadiff: Single-stage decoding of images from contin- uously evolving fmri. arXiv preprint arXiv:2505.14556 , 2025

arXiv 2025
[21]

Mapping whisper representa- tions to human ecog responses with interpretable time-resolved neural encoding

Matteo Ciferri, Tommaso Boccato, Michal Olak, Matteo Ferrante, and Nicola Toschi. Mapping whisper representa- tions to human ecog responses with interpretable time-resolved neural encoding. arXiv preprint arXiv:2606.02305, 2026

Pith/arXiv arXiv 2026
[22]

Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding

Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 22710–22720, 2023

2023
[23]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence

Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience, 25(1):116–126, 2022

2022
[24]

Glmdenoise: a fast, auto- mated technique for denoising task-based fmri data

Kendrick Kay, Ariel Rokem, Jonathan Winawer, Robert Dougherty, and Brian Wandell. Glmdenoise: a fast, auto- mated technique for denoising task-based fmri data. Frontiers in Neuroscience , 7, 2013

2013
[25]

Improving the accuracy of single-trial fmri response estimates using glmsingle

Jacob S Prince, Ian Charest, Jan W Kurzawski, John A Pyles, Michael J T arr, and Kendrick N Kay. Improving the accuracy of single-trial fmri response estimates using glmsingle. eLife, 11:e77599, nov 2022

2022
[26]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning , pages 87 48–8763. PmLR, 2021

2021
[27]

LeBel, L

A. LeBel, L. Wagner, S. Jain, et al. A natural language fmri dataset for voxelwise encoding models. Scientific Data , 10:555, 2023

2023
[28]

The llama 3 herd of models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Y ang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints , pages arXiv–2407, 2024

2024
[29]

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Mariya Toneva and Leila Wehbe. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). Advances in neural information processing systems , 32, 2019

2019
[30]

Music genre neuroimaging dataset

Tomoya Nakai, Naoko Koide-Majima, and Shinji Nishimoto. Music genre neuroimaging dataset. Data in Brief , 40:107675, 2022

2022
[31]

Clap learning audio concepts from natural language supervision

Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, and Huaming Wang. Clap learning audio concepts from natural language supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 1–5. IEEE, 2023

2023
[32]

Evidence for compositionality in fmri visual representations via brain algebra

Matteo Ferrante, Tommaso Boccato, Nicola Toschi, and Rufin VanRullen. Evidence for compositionality in fmri visual representations via brain algebra. Communications Biology , 8(1):1263, 2025

2025
[33]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning , pages 1597–1607. PmLR, 2020

2020
[34]

Different scaling of linear models and deep learning in ukbiobank brain images versus machine-learning datasets

Marc-Andre Schulz, BT Thomas Y eo, Joshua T Vogelstein, Janaina Mourao-Miranada, Jakob N Kather, Konrad Ko- rding, Blake Richards, and Danilo Bzdok. Different scaling of linear models and deep learning in ukbiobank brain images versus machine-learning datasets. Nature communications, 11(1):4238, 2020

2020
[35]

Semantic language decoding across participants and stimulus modalities

Jerry T ang and Alexander G Huth. Semantic language decoding across participants and stimulus modalities. Cur- rent Biology , 35(5):1023–1032, 2025

2025
[36]

Tribe: Trimodal brain encoder for whole-brain fmri response prediction

Stéphane d’Ascoli, Jérémy Rapin, Y ohann Benchetrit, Hubert Banville, and Jean-Rémi King. Tribe: Trimodal brain encoder for whole-brain fmri response prediction. arXiv preprint arXiv:2507.22229 , 2025

arXiv 2025
[37]

Across-subject ensemble-learning alleviates the need for large samples for fmri decoding

Himanshu Aggarwal, Liza Al-Shikhley, and Bertrand Thirion. Across-subject ensemble-learning alleviates the need for large samples for fmri decoding. In International Conference on Medical Image Computing and Computer- Assisted Intervention , pages 35–45. Springer, 2024

2024
[38]

Aligning brain functions boosts the decoding of visual semantics in novel subjects

Alexis Thual, Y ohann Benchetrit, Felix Geilert, Jérémy Rapin, Iurii Makarov, Hubert Banville, and Jean-Rémi King. Aligning brain functions boosts the decoding of visual semantics in novel subjects. arXiv preprint arXiv:2312.06467 , 2023

arXiv 2023
[39]

Identity

Rafael Yuste, Sara Goering, Blaise Agüera Y Arcas, Guoqiang Bi, Jose M Carmena, Adrian Carter, Joseph J Fins, Phoebe Friesen, Jack Gallant, Jane E Huggins, et al. Four ethical priorities for neurotechnologies and ai. Nature, 551(7679):159–163, 2017. 9 A Statements Ethics Statement This study uses only publicly available datasets. No new human-subject data...

2017

[1] [1]

Why concepts are (probably) vectors

Steven T Piantadosi, Dyana CY Muller, Joshua S Rule, Karthikeya Kaushik, Mark Gorenstein, Elena R Leib, and Emily Sanford. Why concepts are (probably) vectors. Trends in Cognitive Sciences , 28(9):844–856, 2024

2024

[2] [2]

Bassett, et al

Erfan Nozari, Dani S. Bassett, et al. Macroscopic resting-state brain dynamics are best described by linear models. Nature Biomedical Engineering , 8:7–8, 2024

2024

[3] [3]

Reconstructing visual experiences from brain activity evoked by natural movies

Shinji Nishimoto, An T Vu, Thomas Naselaris, Yuval Benjamini, Bin Yu, and Jack L Gallant. Reconstructing visual experiences from brain activity evoked by natural movies. Current biology , 21(19):1641–1646, 2011

2011

[4] [4]

Huth, Willem A

Alexander G. Huth, Willem A. de Heer, Thomas L. Griffiths, et al. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016

2016

[5] [5]

Through their eyes: Multi- subject brain decoding with simple alignment techniques

Matteo Ferrante, Tommaso Boccato, Furkan Ozcelik, Rufin VanRullen, and Nicola Toschi. Through their eyes: Multi- subject brain decoding with simple alignment techniques. Imaging Neuroscience, 2:1–21, 05 2024

2024

[6] [6]

Scaling laws for de- coding images from brain activity, 2025

Hubert Banville, Y ohann Benchetrit, Stéphane d’Ascoli, Jérémy Rapin, and Jean-Rémi King. Scaling laws for de- coding images from brain activity, 2025

2025

[7] [7]

Natural scene reconstruction from fmri signals using generative latent diffu- sion

Furkan Ozcelik and Rufin VanRullen. Natural scene reconstruction from fmri signals using generative latent diffu- sion. Scientific Reports , 13(1):15666, 2023

2023

[8] [8]

Mind reader: Reconstructing complex images from brain activities

Sikun Lin, Thomas Sprague, and Ambuj K Singh. Mind reader: Reconstructing complex images from brain activities. Advances in Neural Information Processing Systems , 35:29624–29636, 2022

2022

[9] [9]

Reconstructing the mind’s eye: fmri-to-image with con- trastive learning and diffusion priors

Paul Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Aidan Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth Norman, et al. Reconstructing the mind’s eye: fmri-to-image with con- trastive learning and diffusion priors. Advances in Neural Information Processing Systems , 36:24705–24728, 2023

2023

[10] [10]

Dream: Visual decoding from reversing human visual system

Weihao Xia, Raoul De Charette, Cengiz Oztireli, and Jing-Hao Xue. Dream: Visual decoding from reversing human visual system. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages 8226– 8235, 2024

2024

[11] [11]

Cinematic mindscapes: High-quality video reconstruction from brain activity

Zijiao Chen, Jiaxin Qing, and Juan Helen Zhou. Cinematic mindscapes: High-quality video reconstruction from brain activity. Advances in Neural Information Processing Systems , 36:24841–24858, 2023

2023

[12] [12]

T ang, A

J. T ang, A. LeBel, S. Jain, et al. Semantic reconstruction of continuous language from non-invasive brain recordings. Nature Neuroscience, 26:858–866, 2023

2023

[13] [13]

Optimizing fmri data acquisition for decoding natural speech with limited participants

Louis Jalouzot, Alexis Thual, Y air Lakretz, Christophe Pallier, and Bertrand Thirion. Optimizing fmri data acquisition for decoding natural speech with limited participants. arXiv preprint arXiv:2505.21304 , 2025

arXiv 2025

[14] [14]

Denk, Yu T akagi, T akuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, and Shinji Nishimoto

Timo I. Denk, Yu T akagi, T akuya Matsuyama, Andrea Agostinelli, Tomoya Nakai, Christian Frank, and Shinji Nishimoto. Brain2music: Reconstructing music from human brain activity, 2023. 8

2023

[15] [15]

R&b-rhythm and brain: Cross-subject decoding of music from human brain activity

Matteo Ciferri, Matteo Ferrante, and Nicola Toschi. R&b-rhythm and brain: Cross-subject decoding of music from human brain activity. Neural Networks , page 109195, 2026

2026

[16] [16]

Generative language reconstruction from brain recordings

Ziyi Y e, Qingyao Ai, Yiqun Liu, Maarten de Rijke, Min Zhang, Christina Lioma, and Tuukka Ruotsalo. Generative language reconstruction from brain recordings. Communications Biology , 8(1):346, 2025

2025

[17] [17]

Brainclip: Bridging brain and visual-linguistic representation via clip for generic natural visual stimulus decoding

Yulong Liu, Y ongqiang Ma, Wei Zhou, Guibo Zhu, and Nanning Zheng. Brainclip: Bridging brain and visual-linguistic representation via clip for generic natural visual stimulus decoding. arXiv preprint arXiv:2302.12971 , 2023

arXiv 2023

[18] [18]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data

Paul S Scotti, Mihir Tripathy, Cesar Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A Norman, et al. Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data. arXiv preprint arXiv:2403.11207 , 2024

arXiv 2024

[19] [19]

Umbrae: Unified multimodal brain decoding

Weihao Xia, Raoul de Charette, Cengiz Oztireli, and Jing-Hao Xue. Umbrae: Unified multimodal brain decoding. In European Conference on Computer Vision , pages 242–259. Springer, 2024

2024

[20] [20]

Dynadiff: Single-stage decoding of images from contin- uously evolving fmri

Marlène Careil, Y ohann Benchetrit, and Jean-Rémi King. Dynadiff: Single-stage decoding of images from contin- uously evolving fmri. arXiv preprint arXiv:2505.14556 , 2025

arXiv 2025

[21] [21]

Mapping whisper representa- tions to human ecog responses with interpretable time-resolved neural encoding

Matteo Ciferri, Tommaso Boccato, Michal Olak, Matteo Ferrante, and Nicola Toschi. Mapping whisper representa- tions to human ecog responses with interpretable time-resolved neural encoding. arXiv preprint arXiv:2606.02305, 2026

Pith/arXiv arXiv 2026

[22] [22]

Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding

Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 22710–22720, 2023

2023

[23] [23]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence

Emily J Allen, Ghislain St-Yves, Yihan Wu, Jesse L Breedlove, Jacob S Prince, Logan T Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, et al. A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience, 25(1):116–126, 2022

2022

[24] [24]

Glmdenoise: a fast, auto- mated technique for denoising task-based fmri data

Kendrick Kay, Ariel Rokem, Jonathan Winawer, Robert Dougherty, and Brian Wandell. Glmdenoise: a fast, auto- mated technique for denoising task-based fmri data. Frontiers in Neuroscience , 7, 2013

2013

[25] [25]

Improving the accuracy of single-trial fmri response estimates using glmsingle

Jacob S Prince, Ian Charest, Jan W Kurzawski, John A Pyles, Michael J T arr, and Kendrick N Kay. Improving the accuracy of single-trial fmri response estimates using glmsingle. eLife, 11:e77599, nov 2022

2022

[26] [26]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. In International conference on machine learning , pages 87 48–8763. PmLR, 2021

2021

[27] [27]

LeBel, L

A. LeBel, L. Wagner, S. Jain, et al. A natural language fmri dataset for voxelwise encoding models. Scientific Data , 10:555, 2023

2023

[28] [28]

The llama 3 herd of models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Y ang, Angela Fan, et al. The llama 3 herd of models. arXiv e-prints , pages arXiv–2407, 2024

2024

[29] [29]

Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain)

Mariya Toneva and Leila Wehbe. Interpreting and improving natural-language processing (in machines) with natural language-processing (in the brain). Advances in neural information processing systems , 32, 2019

2019

[30] [30]

Music genre neuroimaging dataset

Tomoya Nakai, Naoko Koide-Majima, and Shinji Nishimoto. Music genre neuroimaging dataset. Data in Brief , 40:107675, 2022

2022

[31] [31]

Clap learning audio concepts from natural language supervision

Benjamin Elizalde, Soham Deshmukh, Mahmoud Al Ismail, and Huaming Wang. Clap learning audio concepts from natural language supervision. In ICASSP 2023-2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) , pages 1–5. IEEE, 2023

2023

[32] [32]

Evidence for compositionality in fmri visual representations via brain algebra

Matteo Ferrante, Tommaso Boccato, Nicola Toschi, and Rufin VanRullen. Evidence for compositionality in fmri visual representations via brain algebra. Communications Biology , 8(1):1263, 2025

2025

[33] [33]

A simple framework for contrastive learning of visual representations

Ting Chen, Simon Kornblith, Mohammad Norouzi, and Geoffrey Hinton. A simple framework for contrastive learning of visual representations. In International conference on machine learning , pages 1597–1607. PmLR, 2020

2020

[34] [34]

Different scaling of linear models and deep learning in ukbiobank brain images versus machine-learning datasets

Marc-Andre Schulz, BT Thomas Y eo, Joshua T Vogelstein, Janaina Mourao-Miranada, Jakob N Kather, Konrad Ko- rding, Blake Richards, and Danilo Bzdok. Different scaling of linear models and deep learning in ukbiobank brain images versus machine-learning datasets. Nature communications, 11(1):4238, 2020

2020

[35] [35]

Semantic language decoding across participants and stimulus modalities

Jerry T ang and Alexander G Huth. Semantic language decoding across participants and stimulus modalities. Cur- rent Biology , 35(5):1023–1032, 2025

2025

[36] [36]

Tribe: Trimodal brain encoder for whole-brain fmri response prediction

Stéphane d’Ascoli, Jérémy Rapin, Y ohann Benchetrit, Hubert Banville, and Jean-Rémi King. Tribe: Trimodal brain encoder for whole-brain fmri response prediction. arXiv preprint arXiv:2507.22229 , 2025

arXiv 2025

[37] [37]

Across-subject ensemble-learning alleviates the need for large samples for fmri decoding

Himanshu Aggarwal, Liza Al-Shikhley, and Bertrand Thirion. Across-subject ensemble-learning alleviates the need for large samples for fmri decoding. In International Conference on Medical Image Computing and Computer- Assisted Intervention , pages 35–45. Springer, 2024

2024

[38] [38]

Aligning brain functions boosts the decoding of visual semantics in novel subjects

Alexis Thual, Y ohann Benchetrit, Felix Geilert, Jérémy Rapin, Iurii Makarov, Hubert Banville, and Jean-Rémi King. Aligning brain functions boosts the decoding of visual semantics in novel subjects. arXiv preprint arXiv:2312.06467 , 2023

arXiv 2023

[39] [39]

Identity

Rafael Yuste, Sara Goering, Blaise Agüera Y Arcas, Guoqiang Bi, Jose M Carmena, Adrian Carter, Joseph J Fins, Phoebe Friesen, Jack Gallant, Jane E Huggins, et al. Four ethical priorities for neurotechnologies and ai. Nature, 551(7679):159–163, 2017. 9 A Statements Ethics Statement This study uses only publicly available datasets. No new human-subject data...

2017