Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

Hubert Banville; Jean-R\'emi King; Marl\`ene Careil; Simon Dahan; St\'ephane d'Ascoli; Yohann Benchetrit

arxiv: 2606.06345 · v1 · pith:RQ3YF372new · submitted 2026-06-04 · 💻 cs.AI · cs.LG· q-bio.NC

Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation

Yohann Benchetrit , Marl\`ene Careil , Simon Dahan , Hubert Banville , St\'ephane d'Ascoli , Jean-R\'emi King This is my paper

Pith reviewed 2026-06-28 01:29 UTC · model grok-4.3

classification 💻 cs.AI cs.LGq-bio.NC

keywords brain decodingfMRIdata augmentationsynthetic dataimage retrievalneural encodingzero-shot decoding

0 comments

The pith

Augmenting fMRI datasets with synthetic data from TRIBE v2 improves brain-to-image decoding accuracy by up to 68 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether synthetic fMRI responses generated by a pretrained model can supplement scarce real neural recordings for image decoding tasks. It systematically varies the mix of real and synthetic data when training decoders on two public fMRI datasets and measures changes in image retrieval accuracy. Performance rises reliably with added synthetic examples, and in some cases a decoder trained only on synthetic responses still exceeds chance. The work shows that large-scale models of brain responses to stimuli can reduce the labeled data needed for effective brain-to-image mapping.

Core claim

Training image decoders on mixtures of real fMRI and synthetic responses produced by TRIBE v2 yields up to 68 percent higher Top-10 retrieval accuracy than training on real data alone; the best mixing ratio depends on the dataset, and decoders trained exclusively on synthetic data can still exceed chance performance in some conditions.

What carries the argument

TRIBE v2, a large encoding model pretrained on more than 1000 hours of fMRI responses to video, audio and language stimuli, which generates synthetic fMRI activity patterns for data augmentation.

If this is right

Image decoders require fewer real scans to reach a target accuracy once appropriate amounts of synthetic data are included.
The fraction of synthetic data that maximizes performance varies with the source of the real recordings.
Zero-shot brain-to-image decoding becomes feasible in limited settings when only synthetic responses are available.
Large pretrained models of multi-modal brain responses can serve as a general foundation for improving data efficiency across decoding tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same augmentation strategy could be tested on other recording modalities such as EEG if comparable pretrained generators become available.
Subject-specific fine-tuning of TRIBE v2 might further improve transfer when real data are extremely limited.
This method may help scale brain decoding to clinical populations where collecting large labeled datasets is impractical.
Performance gains could be measured not only in retrieval accuracy but also in reconstruction quality or semantic alignment metrics.

Load-bearing premise

The synthetic fMRI responses match the statistical structure of real neural activity closely enough that mixing them does not add biases that hurt decoder performance.

What would settle it

Train a decoder only on TRIBE v2 synthetic responses and test it on held-out real fMRI from a new subject or stimulus set; if accuracy falls to chance or below, the claim that synthetic data transfers usefully is falsified.

read the original abstract

Brain decoding is limited by the availability of labeled neural data, and remains challenging in low-data regimes. To address this issue, we investigate whether and when brain decoding can be boosted by augmenting small fMRI datasets with synthetic data generated by a pretrained model of fMRI responses to stimuli. We use TRIBE v2, a large encoding model pretrained on more than 1000 hours of fMRI responses to video, audio and language. For each dataset, we evaluate systematic grids that show how the performance of image decoders varies with the amount of synthetic data used for training. Our results, based on two datasets (the 7T fMRI Natural Scenes Dataset and 3T fMRI BOLD5000), show up to 68% improvement in Top-10 image-retrieval accuracy compared to decoders trained only on real data. Importantly, the proportion of augmented data required to reach a given image decoding performance needs to be adjusted depending on the data source. Surprisingly, image decoders trained exclusively on synthetic fMRI can perform above chance in some settings, suggesting that TRIBE v2 can support zero-shot brain-to-image decoding. Together, these results show how large-scale models of the fMRI responses to sight, sound and language may provide a foundation to improve the data efficiency for image decoding.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows TRIBE v2 synthetics can lift fMRI image decoding and enable some zero-shot cases, but the gains rest on an untested match between synthetic and real response distributions.

read the letter

The main thing to know is that adding synthetic fMRI from TRIBE v2 to small real datasets improves top-10 image retrieval accuracy by up to 68 percent on NSD and BOLD5000, with some above-chance results even when training only on the synthetics.

The authors apply a large pretrained encoding model to generate extra training examples and then run grids that track decoder performance as the mix of real and synthetic data changes. Testing the same approach on both a 7T and a 3T dataset gives a bit of breadth. The zero-shot observation is the part that stands out most if it survives closer inspection.

The soft spot is the missing evidence that the synthetic responses actually line up with the target subjects' real voxel patterns. TRIBE v2 was trained on over 1000 hours that include video, audio, and language across many subjects and scanners, while the test sets are image-only and use specific field strengths. Without reported checks such as per-voxel correlations or representational similarity between real and generated responses for the same stimuli, the reported lifts could partly reflect distribution shift rather than added signal. The abstract does not mention these diagnostics, so the full paper needs to show them clearly.

The work is aimed at groups trying to stretch limited fMRI collections for visual decoding. A reader already running similar augmentation experiments would find the grids useful to compare against.

I would send it for peer review. The experimental design is straightforward and the claims are falsifiable once the synthetic-data fidelity checks are in place.

Referee Report

2 major / 2 minor

Summary. The paper claims that augmenting limited real fMRI datasets (NSD at 7T and BOLD5000 at 3T) with synthetic fMRI responses generated by the pretrained TRIBE v2 encoding model (trained on >1000 hours of video/audio/language data) improves image decoding, yielding up to 68% gains in Top-10 retrieval accuracy; systematic grids over synthetic data proportions are evaluated, and decoders trained only on synthetic data can exceed chance in some cases, supporting zero-shot brain-to-image decoding.

Significance. If the central assumption holds, the work demonstrates a practical route to data-efficient brain decoding by leveraging large-scale multimodal pretrained models, with potential to generalize across scanners and modalities; the empirical grids and zero-shot result are falsifiable and directly address the low-data regime problem.

major comments (2)

[Results (grids and zero-shot experiments)] The headline performance claims (68% Top-10 gain and above-chance zero-shot) rest on the untested premise that TRIBE v2 synthetic responses are statistically interchangeable with subject-specific real responses in NSD/BOLD5000; no voxel-wise correlation, noise-structure match, or representational-similarity analysis between real and synthetic data is described, leaving open the possibility that gains arise from distribution shift rather than signal augmentation.
[Methods and Results] The abstract and described evaluation provide no details on dataset splits, subject-wise cross-validation, error bars, or statistical tests for the reported accuracy improvements; without these, it is impossible to assess whether the gains are robust or driven by particular subjects/scanners.

minor comments (2)

[Evaluation metrics] Clarify the exact definition of 'Top-10 image-retrieval accuracy' and how the retrieval pool is constructed (e.g., within-subject or across all images).
[Results] The statement that 'the proportion of augmented data required... needs to be adjusted depending on the data source' should be supported by explicit per-dataset curves rather than left as a qualitative observation.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below and will revise the manuscript accordingly.

read point-by-point responses

Referee: The headline performance claims (68% Top-10 gain and above-chance zero-shot) rest on the untested premise that TRIBE v2 synthetic responses are statistically interchangeable with subject-specific real responses in NSD/BOLD5000; no voxel-wise correlation, noise-structure match, or representational-similarity analysis between real and synthetic data is described, leaving open the possibility that gains arise from distribution shift rather than signal augmentation.

Authors: We agree that the manuscript would be strengthened by direct quantitative comparisons between real and synthetic responses. Our current results show that synthetic augmentation improves performance when tested on held-out real fMRI data and that purely synthetic training can exceed chance, but these do not constitute a direct test of distributional equivalence. In the revision we will add voxel-wise Pearson correlations, noise ceiling comparisons, and RSA between real and TRIBE v2 synthetic responses on the NSD and BOLD5000 subjects to address this concern. revision: yes
Referee: The abstract and described evaluation provide no details on dataset splits, subject-wise cross-validation, error bars, or statistical tests for the reported accuracy improvements; without these, it is impossible to assess whether the gains are robust or driven by particular subjects/scanners.

Authors: We acknowledge the omission. The full manuscript uses the standard NSD and BOLD5000 train/test partitions and reports results across multiple subjects, but does not explicitly describe subject-wise cross-validation, error bars, or statistical testing. In the revised version we will expand the Methods and Results sections to specify the exact splits, include subject-wise or fold-wise error bars, and report appropriate statistical tests (paired t-tests or permutation tests) for the accuracy gains. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results on held-out data

full rationale

The paper presents an empirical study comparing image decoders trained on real fMRI data versus real-plus-synthetic data from a pretrained TRIBE v2 model, with performance measured by Top-10 retrieval accuracy on held-out test sets from NSD and BOLD5000. No equations, parameter-fitting procedures, or self-citation chains are described that would reduce the reported gains to quantities defined by the same inputs or by construction. The augmentation experiments vary the proportion of synthetic data and evaluate on independent test splits, rendering the central claims self-contained against external benchmarks rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated in the provided text.

axioms (1)

domain assumption Synthetic fMRI from TRIBE v2 is distributionally close enough to real target data for useful augmentation
Required for the reported gains to be meaningful; location implicit in the method description.

pith-pipeline@v0.9.1-grok · 5797 in / 1104 out tokens · 27405 ms · 2026-06-28T01:29:36.322869+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 4 canonical work pages · 1 internal anchor

[1]

Allen, E. J. and St-Yves, G. and Wu, Y. and Breedlove, J. L. and Prince, J. S. and Dowdle, L. T. and Nau, M. and Caron, B. and Pestilli, F. and Charest, I. and others , title =. Nature Neuroscience , volume =
[2]

and Kornblith, S

Azizi, S. and Kornblith, S. and Saharia, C. and Norouzi, M. and Fleet, D. , title =. Transactions on Machine Learning Research , year =
[3]

arXiv preprint arXiv:2501.15322 , year=

Scaling laws for decoding images from brain activity , author=. arXiv preprint arXiv:2501.15322 , year=

work page arXiv
[4]

Dynadiff: Single-stage decoding of images from contin- uously evolving fmri

Careil, M. and Benchetrit, Y. and King, J.-R. , title =. arXiv preprint arXiv:2505.14556 , year =

work page arXiv
[5]

and Pyles, J

Chang, N. and Pyles, J. A. and Marcus, A. and Gupta, A. and Tarr, M. J. and Aminoff, E. M. , title =. Scientific Data , volume =. 2019 , doi =

2019
[6]

and Qing, J

Chen, Z. and Qing, J. and Xiang, T. and Yue, W. L. and Zhou, J. H. , title =. CVPR , year =
[7]

and Rapin, J

d'Ascoli, S. and Rapin, J. and Benchetrit, Y. and Brookes, T. and Begany, K. and Raugel, J. and Banville, H. and King, J.-R. , title =. arXiv preprint , year =
[8]

and Prince, J

Conwell, C. and Prince, J. S. and Kay, K. N. and Alvarez, G. A. and Konkle, T. , title =. bioRxiv , year =
[9]

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Eldan, R. and Li, Y. , title =. arXiv preprint arXiv:2305.07759 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[10]

and Pinaya, W

Fernandez, V. and Pinaya, W. H. and Borges, P. and Tudosiu, P.-D. and Graham, M. S. and Vercauteren, T. and Cardoso, M. J. , title =. MICCAI Workshop on Simulation and Synthesis in Medical Imaging , year =
[11]

and Wen, H

Han, K. and Wen, H. and Shi, J. and Lu, K.-H. and Zhang, Y. and Fu, D. and Liu, Z. , title =. NeuroImage , volume =. 2019 , doi =

2019
[12]

and Sun, S

He, R. and Sun, S. and Yu, X. and Xue, C. and Zhang, W. and Torr, P. and Bai, S. and Qi, X. , title =. ICLR , year =
[13]

and Tong, F

Kamitani, Y. and Tong, F. , title =. Nature Neuroscience , volume =
[14]

Kay, K. N. and Naselaris, T. and Prenger, R. J. and Gallant, J. L. , title =. Nature , volume =
[15]

and Dwivedi, K

Lahner, B. and Dwivedi, K. and Iamshchinina, P. and Graumann, M. and Lascelles, A. and Roig, G. and Oliva, A. and Cichy, R. M. , title =. Nature Communications , volume =
[16]

and Prenger, R

Naselaris, T. and Prenger, R. J. and Kay, K. N. and Oliver, M. and Gallant, J. L. , title =. Neuron , volume =
[17]

and Kay, K

Naselaris, T. and Kay, K. N. and Nishimoto, S. and Gallant, J. L. , title =. NeuroImage , volume =
[18]

and Darcet, T

Oquab, M. and Darcet, T. and Moutakanni, T. and Vo, H. and Szafraniec, M. and Khalidov, V. and Fernandez, P. and Haziza, D. and Massa, F. and El-Nouby, A. and others , title =. Transactions on Machine Learning Research , year =
[19]

and VanRullen, R

Ozcelik, F. and VanRullen, R. , title =. Scientific Reports , volume =
[20]

2023 , publisher=

Nguyen, Kevin P and Raval, Vyom and Minhajuddin, Abu and Carmody, Thomas and Trivedi, Madhukar H and Dewey Jr, Richard B and Montillo, Albert A , journal=. 2023 , publisher=

2023
[21]

2019 , publisher=

Esteban, Oscar and Markiewicz, Christopher J and Blair, Ross W and Moodie, Craig A and Isik, A Ilkay and Erramuzpe, Asier and Kent, James D and Goncalves, Mathias and DuPre, Elizabeth and Snyder, Madeleine and others , journal=. 2019 , publisher=

2019
[22]

2024 , booktitle=

Brain decoding: toward real-time reconstruction of visual perception , author=. 2024 , booktitle=

2024
[23]

Assran, Mido and Bardes, Adrien and Fan, David and Garrido, Quentin and Howes, Russell and Muckley, Matthew and Rizvi, Ammar and Roberts, Claire and Sinha, Koustuv and Zholus, Artem and others , journal=
[24]

Human brain mapping , volume=

High-resolution intersubject averaging and a coordinate system for the cortical surface , author=. Human brain mapping , volume=. 1999 , publisher=

1999
[25]

Learning sequential information in task-Based

Wang, Jiyao and Dvornek, Nicha C and Staib, Lawrence H and Duncan, James S , booktitle=. Learning sequential information in task-Based. 2023 , organization=

2023
[26]

and Kim, J

Radford, A. and Kim, J. W. and Hallacy, C. and Ramesh, A. and Goh, G. and Agarwal, S. and Sastry, G. and Askell, A. and Mishkin, P. and Clark, J. and others , title =. ICML , year =
[27]

2023 , eprint=

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors , author=. 2023 , eprint=

2023
[28]

2024 , eprint=

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data , author=. 2024 , eprint=

2024
[29]

and Allen, E

St-Yves, G. and Allen, E. J. and Wu, Y. and Kay, K. and Naselaris, T. , title =. Nature Communications , volume =. 2023 , doi =

2023
[30]

and LeBel, A

Tang, J. and LeBel, A. and Jain, S. and Huth, A. G. , title =. Nature Neuroscience , volume =
[31]

Thomas, A. W. and R\'. NeurIPS , year =
[32]

and Maire, M

Lin, T.-Y. and Maire, M. and Belongie, S. and Hays, J. and Perona, P. and Ramanan, D. and Doll\'. ECCV , year =
[33]

and Dwivedi, K

Shen, G. and Dwivedi, K. and Majima, K. and Horikawa, T. and Kamitani, Y. , title =. Frontiers in Computational Neuroscience , volume =
[34]

Hebart, Oliver Contier, Lina Teichmann, Adam H

Hebart, Martin N and Contier, Oliver and Teichmann, Lina and Rockter, Adam H and Zheng, Charles Y and Kidder, Alexis and Corriveau, Anna and Vaziri-Pashkam, Maryam and Baker, Chris I , editor =. eLife , issn =. doi:10.7554/eLife.82580 , url =

work page doi:10.7554/elife.82580

[1] [1]

Allen, E. J. and St-Yves, G. and Wu, Y. and Breedlove, J. L. and Prince, J. S. and Dowdle, L. T. and Nau, M. and Caron, B. and Pestilli, F. and Charest, I. and others , title =. Nature Neuroscience , volume =

[2] [2]

and Kornblith, S

Azizi, S. and Kornblith, S. and Saharia, C. and Norouzi, M. and Fleet, D. , title =. Transactions on Machine Learning Research , year =

[3] [3]

arXiv preprint arXiv:2501.15322 , year=

Scaling laws for decoding images from brain activity , author=. arXiv preprint arXiv:2501.15322 , year=

work page arXiv

[4] [4]

Dynadiff: Single-stage decoding of images from contin- uously evolving fmri

Careil, M. and Benchetrit, Y. and King, J.-R. , title =. arXiv preprint arXiv:2505.14556 , year =

work page arXiv

[5] [5]

and Pyles, J

Chang, N. and Pyles, J. A. and Marcus, A. and Gupta, A. and Tarr, M. J. and Aminoff, E. M. , title =. Scientific Data , volume =. 2019 , doi =

2019

[6] [6]

and Qing, J

Chen, Z. and Qing, J. and Xiang, T. and Yue, W. L. and Zhou, J. H. , title =. CVPR , year =

[7] [7]

and Rapin, J

d'Ascoli, S. and Rapin, J. and Benchetrit, Y. and Brookes, T. and Begany, K. and Raugel, J. and Banville, H. and King, J.-R. , title =. arXiv preprint , year =

[8] [8]

and Prince, J

Conwell, C. and Prince, J. S. and Kay, K. N. and Alvarez, G. A. and Konkle, T. , title =. bioRxiv , year =

[9] [9]

TinyStories: How Small Can Language Models Be and Still Speak Coherent English?

Eldan, R. and Li, Y. , title =. arXiv preprint arXiv:2305.07759 , year =

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

and Pinaya, W

Fernandez, V. and Pinaya, W. H. and Borges, P. and Tudosiu, P.-D. and Graham, M. S. and Vercauteren, T. and Cardoso, M. J. , title =. MICCAI Workshop on Simulation and Synthesis in Medical Imaging , year =

[11] [11]

and Wen, H

Han, K. and Wen, H. and Shi, J. and Lu, K.-H. and Zhang, Y. and Fu, D. and Liu, Z. , title =. NeuroImage , volume =. 2019 , doi =

2019

[12] [12]

and Sun, S

He, R. and Sun, S. and Yu, X. and Xue, C. and Zhang, W. and Torr, P. and Bai, S. and Qi, X. , title =. ICLR , year =

[13] [13]

and Tong, F

Kamitani, Y. and Tong, F. , title =. Nature Neuroscience , volume =

[14] [14]

Kay, K. N. and Naselaris, T. and Prenger, R. J. and Gallant, J. L. , title =. Nature , volume =

[15] [15]

and Dwivedi, K

Lahner, B. and Dwivedi, K. and Iamshchinina, P. and Graumann, M. and Lascelles, A. and Roig, G. and Oliva, A. and Cichy, R. M. , title =. Nature Communications , volume =

[16] [16]

and Prenger, R

Naselaris, T. and Prenger, R. J. and Kay, K. N. and Oliver, M. and Gallant, J. L. , title =. Neuron , volume =

[17] [17]

and Kay, K

Naselaris, T. and Kay, K. N. and Nishimoto, S. and Gallant, J. L. , title =. NeuroImage , volume =

[18] [18]

and Darcet, T

Oquab, M. and Darcet, T. and Moutakanni, T. and Vo, H. and Szafraniec, M. and Khalidov, V. and Fernandez, P. and Haziza, D. and Massa, F. and El-Nouby, A. and others , title =. Transactions on Machine Learning Research , year =

[19] [19]

and VanRullen, R

Ozcelik, F. and VanRullen, R. , title =. Scientific Reports , volume =

[20] [20]

2023 , publisher=

Nguyen, Kevin P and Raval, Vyom and Minhajuddin, Abu and Carmody, Thomas and Trivedi, Madhukar H and Dewey Jr, Richard B and Montillo, Albert A , journal=. 2023 , publisher=

2023

[21] [21]

2019 , publisher=

Esteban, Oscar and Markiewicz, Christopher J and Blair, Ross W and Moodie, Craig A and Isik, A Ilkay and Erramuzpe, Asier and Kent, James D and Goncalves, Mathias and DuPre, Elizabeth and Snyder, Madeleine and others , journal=. 2019 , publisher=

2019

[22] [22]

2024 , booktitle=

Brain decoding: toward real-time reconstruction of visual perception , author=. 2024 , booktitle=

2024

[23] [23]

Assran, Mido and Bardes, Adrien and Fan, David and Garrido, Quentin and Howes, Russell and Muckley, Matthew and Rizvi, Ammar and Roberts, Claire and Sinha, Koustuv and Zholus, Artem and others , journal=

[24] [24]

Human brain mapping , volume=

High-resolution intersubject averaging and a coordinate system for the cortical surface , author=. Human brain mapping , volume=. 1999 , publisher=

1999

[25] [25]

Learning sequential information in task-Based

Wang, Jiyao and Dvornek, Nicha C and Staib, Lawrence H and Duncan, James S , booktitle=. Learning sequential information in task-Based. 2023 , organization=

2023

[26] [26]

and Kim, J

Radford, A. and Kim, J. W. and Hallacy, C. and Ramesh, A. and Goh, G. and Agarwal, S. and Sastry, G. and Askell, A. and Mishkin, P. and Clark, J. and others , title =. ICML , year =

[27] [27]

2023 , eprint=

Reconstructing the Mind's Eye: fMRI-to-Image with Contrastive Learning and Diffusion Priors , author=. 2023 , eprint=

2023

[28] [28]

2024 , eprint=

MindEye2: Shared-Subject Models Enable fMRI-To-Image With 1 Hour of Data , author=. 2024 , eprint=

2024

[29] [29]

and Allen, E

St-Yves, G. and Allen, E. J. and Wu, Y. and Kay, K. and Naselaris, T. , title =. Nature Communications , volume =. 2023 , doi =

2023

[30] [30]

and LeBel, A

Tang, J. and LeBel, A. and Jain, S. and Huth, A. G. , title =. Nature Neuroscience , volume =

[31] [31]

Thomas, A. W. and R\'. NeurIPS , year =

[32] [32]

and Maire, M

Lin, T.-Y. and Maire, M. and Belongie, S. and Hays, J. and Perona, P. and Ramanan, D. and Doll\'. ECCV , year =

[33] [33]

and Dwivedi, K

Shen, G. and Dwivedi, K. and Majima, K. and Horikawa, T. and Kamitani, Y. , title =. Frontiers in Computational Neuroscience , volume =

[34] [34]

Hebart, Oliver Contier, Lina Teichmann, Adam H

Hebart, Martin N and Contier, Oliver and Teichmann, Lina and Rockter, Adam H and Zheng, Charles Y and Kidder, Alexis and Corriveau, Anna and Vaziri-Pashkam, Maryam and Baker, Chris I , editor =. eLife , issn =. doi:10.7554/eLife.82580 , url =

work page doi:10.7554/elife.82580