StableMind: Source-Free Cross-Subject fMRI Decoding with Regularized Adaptation

Hairong Zheng; Jian Zhang; Jintao Guo; Lin Wang; Luyang Cao; Shumeng Li; Yinghuan Shi; Yulin Zhou

arxiv: 2605.02586 · v1 · submitted 2026-05-04 · 💻 cs.CV

StableMind: Source-Free Cross-Subject fMRI Decoding with Regularized Adaptation

Jintao Guo , Lin Wang , Shumeng Li , Jian Zhang , Yulin Zhou , Luyang Cao , Hairong Zheng , Yinghuan Shi This is my paper

Pith reviewed 2026-05-08 18:33 UTC · model grok-4.3

classification 💻 cs.CV

keywords fMRI decodingcross-subject adaptationsource-free learningbrain retrievalimage retrievalregularized adaptationneuroimaging

0 comments

The pith

A regularized adaptation method reuses ridge projections and applies difficulty-aware image blur to stabilize brain representations and refine supervision for source-free fMRI decoding with limited new-subject data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that cross-subject fMRI decoding remains viable when raw data from prior subjects cannot be accessed and new-subject scans are restricted to roughly one hour. It traces performance drops to unstable brain-side representations caused by individual fMRI differences and to unreliable image supervision from fine visual details that limited signals cannot support. The proposed framework counters these by constraining adaptation with pretrained ridge projections, adding Fourier-based brain augmentations, and blurring images in a difficulty-aware manner to retain only reliably supported visual structure. If successful, this would make brain decoding more deployable in settings where data collection is expensive and privacy rules block sharing of earlier scans.

Core claim

StableMind is a regularized adaptation framework that reuses ridge projections from the pretrained model as priors to constrain limited-data adaptation on a new subject and applies Fourier-based feature-level brain augmentation to improve robustness to individual variability; it further introduces difficulty-aware image blur to align brain and image features by down-weighting fine-grained details weakly supported by limited fMRI signals while keeping stable visual structure.

What carries the argument

Regularized adaptation framework that treats reused ridge projections as adaptation priors and employs difficulty-aware blurring to enforce reliable brain-image alignment

If this is right

Adaptation succeeds with only one hour of paired fMRI-image data from the new subject and without any raw data from prior subjects.
Brain retrieval accuracy improves by several percentage points over prior source-free methods while using fewer trainable parameters.
Image retrieval accuracy reaches levels comparable to or above supervised cross-subject baselines under the unified protocol.
The combination of projection reuse and targeted blurring directly mitigates both representation instability and supervision noise.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The reliance on precomputed ridge projections as stable anchors could be tested on other neuroimaging modalities such as EEG or MEG where subject variability is similarly high.
If the priors prove robust, the approach suggests a general strategy for privacy-preserving transfer in any domain where raw source data must stay inaccessible.
Extending the difficulty-aware blur mechanism to other paired modalities might reduce the impact of noisy supervision signals beyond fMRI.
The method's performance under a fixed one-hour budget implies that further gains may come from optimizing the duration or selection of adaptation samples rather than increasing data volume.

Load-bearing premise

Ridge projections computed on source subjects remain effective and unbiased priors for constraining adaptation on a new subject whose fMRI responses may differ substantially in distribution, and difficulty-aware blurring preserves sufficient visual structure without discarding information the limited fMRI can actually support.

What would settle it

Collect fMRI responses from a new subject under the same 1-hour protocol but with deliberately introduced distribution shifts that break the effectiveness of source ridge projections, then measure whether image and brain retrieval accuracies fall below those of non-regularized baselines.

Figures

Figures reproduced from arXiv: 2605.02586 by Hairong Zheng, Jian Zhang, Jintao Guo, Lin Wang, Luyang Cao, Shumeng Li, Yinghuan Shi, Yulin Zhou.

**Figure 1.** Figure 1: Two key challenges in cross-subject brain decoding. The experiments view at source ↗

**Figure 2.** Figure 2: Ridge-level feature distributions visualized by t-SNE on Subject 1. We view at source ↗

**Figure 3.** Figure 3: Overview of StableMind. StableMind targets cross-subject fMRI-to-image decoding under limited new-subject data. (a) A multi-subject pretrained view at source ↗

**Figure 4.** Figure 4: Parameter sensitivity of StableMind to the weight view at source ↗

**Figure 5.** Figure 5: The Voxel-wise ridge weights on the NSD cortical flat map. For each view at source ↗

**Figure 6.** Figure 6: Qualitative fMRI-to-image reconstructions of our StableMind and other representive methods, including MindEye2 [ view at source ↗

**Figure 7.** Figure 7: The t-SNE visualization of MindEye2 and StableMind finetuned on view at source ↗

read the original abstract

Existing cross-subject fMRI decoding methods typically train a model on multiple scanned subjects and then adapt it to a new subject using substantial paired fMRI-image data. However, in realistic scenarios, new-subject fMRI data are often limited due to costly data acquisition, and raw data from previous subjects may be inaccessible, leading existing methods to suffer performance degradation during new-subject adaptation. In this paper, we identify that this degradation stems from two key issues: brain-side instability caused by large subject differences in fMRI responses, and image-side supervision unreliability caused by fine-grained visual details that are not reliably supported by limited fMRI signals. To address these challenges, we propose StableMind, a regularized adaptation framework designed to improve brain-side representation stability and image-side supervision reliability. (1) To stabilize brain representations, StableMind reuses ridge projections from the pretrained model as adaptation priors to constrain limited-data new-subject adaptation, and applies Fourier-based feature-level brain augmentation to improve robustness to individual variability. (2) To improve image supervision reliability, StableMind introduces difficulty-aware image blur for brain-image alignment, reducing the influence of fine-grained visual details that are weakly supported by limited fMRI signals while preserving stable visual structure. Experiments on the Natural Scenes Dataset under a unified 1-hour adaptation protocol demonstrate that StableMind achieves 84.02% image retrieval accuracy and 81.66% brain retrieval accuracy averaged over four subjects, surpassing the state-of-the-art method by 5.71% brain retrieval accuracy with fewer trainable adaptation parameters. Our code is available at https://github.com/lingeringlight/StableMind.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

StableMind pairs reused ridge priors with Fourier augmentation and difficulty-aware blur for source-free fMRI adaptation and reports clear gains on NSD, but the abstract lacks stats and the prior-bias risk needs direct checks.

read the letter

The paper's main move is to stabilize cross-subject fMRI decoding when source data are unavailable and new-subject scans are limited to roughly one hour. It reuses ridge projections from the source-pretrained model as adaptation priors, adds Fourier feature augmentation on the brain side, and applies difficulty-aware image blur to avoid over-relying on fine visual details that limited fMRI cannot reliably support. The combination is new even if the pieces are familiar from domain adaptation work. On the Natural Scenes Dataset it reaches 84.02% image retrieval and 81.66% brain retrieval averaged across four subjects, beating the prior method by 5.71 points on brain retrieval while using fewer trainable parameters. Code release helps reproducibility. These numbers address a practical pain point: cutting acquisition costs and avoiding privacy issues with raw source scans. The approach is straightforward and the protocol is unified, which makes the comparison cleaner than many adaptation papers. The soft spots are the usual ones for an abstract-only view. No error bars, no statistical tests, and no ablation tables are shown, so the size of the improvement is hard to judge. The stress-test concern about ridge projections pulling the new-subject model into the wrong subspace if its fMRI distribution falls outside the source span is real and needs explicit testing in the full results; the Fourier term is meant to help, but the paper must demonstrate that it actually does. If those checks hold, the method is useful. This is for labs doing brain decoding or BCI work who want lighter adaptation pipelines. It is worth sending to peer review because the problem is concrete, the claims are testable, and the code is public, even though the current evidence is preliminary.

Referee Report

2 major / 2 minor

Summary. The paper introduces StableMind, a source-free regularized adaptation framework for cross-subject fMRI decoding. It targets two issues in limited-data new-subject adaptation: brain-side instability from inter-subject fMRI variability (addressed via reuse of pretrained ridge projections as adaptation priors plus Fourier feature augmentation) and image-side supervision unreliability from fine-grained details weakly supported by limited fMRI (addressed via difficulty-aware image blurring). On the Natural Scenes Dataset under a unified 1-hour adaptation protocol, it reports average accuracies of 84.02% image retrieval and 81.66% brain retrieval across four subjects, outperforming prior SOTA by 5.71% in brain retrieval while using fewer trainable parameters.

Significance. If the empirical gains hold under rigorous validation, the work would meaningfully advance practical fMRI decoding by enabling privacy-preserving, low-data adaptation without source-subject data access. The regularization strategy directly tackles known inter-subject variability and limited-signal supervision challenges, potentially reducing the data acquisition burden in neuroimaging applications. The source-free constraint and parameter efficiency are particularly relevant for real-world deployment.

major comments (2)

[Method description and Experiments] The central performance claim (84.02% image / 81.66% brain retrieval, +5.71% over SOTA) rests on the assumption that ridge projections computed on source subjects remain effective, unbiased priors for new-subject adaptation despite large inter-subject fMRI distribution shifts. If a new subject's responses lie outside the linear span of these source-derived directions, the regularization term risks pulling the model toward an incorrect subspace rather than stabilizing it. The manuscript should include an explicit analysis (e.g., subspace alignment metrics or ablation removing the ridge prior) to test this assumption under the 1-hour protocol.
[Experiments] The reported accuracies lack error bars, statistical significance tests (e.g., paired t-tests or Wilcoxon across subjects), or ablation tables isolating the contribution of ridge-projection reuse versus Fourier augmentation versus difficulty-aware blur. Without these, it is impossible to determine whether the 5.71% delta is robust to subject variability or protocol details, weakening the cross-subject generalization claim.

minor comments (2)

[Method] The abstract and method sections would benefit from a concise equation or pseudocode block showing how the ridge projection reuse is formulated as a regularization term during adaptation (e.g., the exact form of the prior constraint loss).
[Experiments] Clarify the precise definition of the '1-hour adaptation protocol' (number of fMRI-image pairs, scanning time per subject, train/val/test split) to allow reproducibility and comparison with future work.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback on our manuscript. We address each of the major comments below and have made revisions to incorporate the suggested analyses and statistical validations.

read point-by-point responses

Referee: The central performance claim (84.02% image / 81.66% brain retrieval, +5.71% over SOTA) rests on the assumption that ridge projections computed on source subjects remain effective, unbiased priors for new-subject adaptation despite large inter-subject fMRI distribution shifts. If a new subject's responses lie outside the linear span of these source-derived directions, the regularization term risks pulling the model toward an incorrect subspace rather than stabilizing it. The manuscript should include an explicit analysis (e.g., subspace alignment metrics or ablation removing the ridge prior) to test this assumption under the 1-hour protocol.

Authors: We agree that validating the ridge-projection priors is crucial. In the revised version, we will add an ablation study removing the ridge-projection regularization to demonstrate its specific contribution under the 1-hour protocol. We will also include subspace alignment analysis, computing metrics such as the average cosine similarity between the top principal components of source and target fMRI features across subjects, to show that target responses largely align with the source subspace. This supports the effectiveness of the priors without introducing significant bias. revision: yes
Referee: The reported accuracies lack error bars, statistical significance tests (e.g., paired t-tests or Wilcoxon across subjects), or ablation tables isolating the contribution of ridge-projection reuse versus Fourier augmentation versus difficulty-aware blur. Without these, it is impossible to determine whether the 5.71% delta is robust to subject variability or protocol details, weakening the cross-subject generalization claim.

Authors: We acknowledge the need for more rigorous statistical reporting. The revised manuscript will include error bars (standard deviation across the four subjects) for all accuracy metrics. We will report results of paired statistical tests, specifically the Wilcoxon signed-rank test across subjects, to establish the significance of the 5.71% improvement. Additionally, we will provide a detailed ablation table breaking down the performance contributions of ridge-projection reuse, Fourier augmentation, and difficulty-aware image blur individually and in combination. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper presents an empirical regularized adaptation framework for source-free cross-subject fMRI decoding, identifying two issues (brain instability from inter-subject variability and unreliable image supervision from limited signals) and addressing them via reuse of ridge projections as priors, Fourier augmentation, and difficulty-aware blurring. No equations, derivations, or self-referential steps are shown that reduce the reported retrieval accuracies (84.02% image, 81.66% brain) to quantities defined solely by fitted parameters or prior outputs by construction. Performance claims rest on experimental results under a fixed 1-hour protocol rather than tautological predictions, and any self-citations (if present in the full text) are not load-bearing for the central method or results.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The framework rests on domain assumptions about the stability of pretrained ridge projections across subjects and the utility of selective blurring for supervision; no new physical entities are introduced, but several regularization hyperparameters are implied though not quantified in the abstract.

free parameters (2)

ridge projection reuse weight
Controls how strongly the pretrained projections constrain new-subject adaptation; value not reported in abstract.
Fourier augmentation parameters
Define the scale and type of feature-level variations added for robustness; specifics absent from abstract.

axioms (2)

domain assumption Ridge projections from a source-trained model provide useful priors that stabilize limited-data adaptation without introducing harmful bias from source distributions.
Directly invoked in the brain-side stabilization step described in the abstract.
domain assumption Fine-grained visual details in images are not reliably supported by limited fMRI signals and can be safely down-weighted via blurring.
Underlies the image-side difficulty-aware blur component.

pith-pipeline@v0.9.0 · 5616 in / 1682 out tokens · 69579 ms · 2026-05-08T18:33:27.919729+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

Cost.FunctionalEquation / Foundation.AlphaCoordinateFixation washburn_uniqueness_aczel (no analog used) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

StableMind reuses ridge projections from the pretrained model as adaptation priors to constrain limited-data new-subject adaptation, and applies Fourier-based feature-level brain augmentation... perturbs amplitude-related statistics of intermediate brain features while preserving their structural phase information
Foundation.BranchSelection RCLCombiner_isCoupling_iff (unrelated; this is statistical augmentation, not a coupling combiner argument) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we adopt Gaussian resampling... ˜µ(Ai) ∼ N(µ(Ai), Σ²_µ), ˜σ(Ai) ∼ N(σ(Ai), Σ²_σ)
Foundation (whole forcing chain) reality_from_one_distinction (paper has many tunable hyperparameters; RS chain has zero adjustable parameters) unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the source-prior fusion weight α in Eq. (4) is set to 0.1, the momentum m... is set to 0.85, the temperature T... 0.028, the global radius scaling factor s is set to 0.92, β_h is 0.18, and λ_α in Eq. (22) is 3

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

57 extracted references · 3 canonical work pages

[1]

Generic decoding of seen and imagined objects using hierarchical visual features,

T. Horikawa and Y . Kamitani, “Generic decoding of seen and imagined objects using hierarchical visual features,”Nature communications,
[2]

Binless kernel machine: Modeling spike train transformation for cognitive neural prostheses,

C. Qian, X. Sun, Y . Wang, X. Zheng, Y . Wang, and G. Pan, “Binless kernel machine: Modeling spike train transformation for cognitive neural prostheses,”Neural Computation, 2020. 1

2020
[3]

Dcnn-gan: Reconstructing realistic image from fmri,

Y . Lin, J. Li, and H. Wang, “Dcnn-gan: Reconstructing realistic image from fmri,” inMVA, 2019. 1

2019
[4]

From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri,

R. Beliy, G. Gaziv, A. Hoogi, F. Strappini, T. Golan, and M. Irani, “From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri,” inNeurIPS, 2019. 1

2019
[5]

UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity, August 2023

W. Mai and Z. Zhang, “Unibrain: Unify image reconstruction and captioning all in one diffusion model from human brain activity,”arXiv preprint arXiv:2308.07428, 2023. 1, 2

work page arXiv 2023
[6]

Dream: Visual decoding from reversing human visual system,

W. Xia, R. De Charette, C. Oztireli, and J.-H. Xue, “Dream: Visual decoding from reversing human visual system,” inWACV, 2024. 1, 2

2024
[7]

Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding,

Z. Chen, J. Qing, T. Xiang, W. L. Yue, and J. H. Zhou, “Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding,” inCVPR, 2023. 1, 2

2023
[8]

Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors,

P. Scotti, A. Banerjee, J. Goode, S. Shabalin, A. Nguyen, A. Dempster, N. Verlinde, E. Yundler, D. Weisberg, K. Normanet al., “Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors,” inNeurIPS, 2023. 1, 2, 3

2023
[9]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inICML, 2021. 1, 2, 7

2021
[10]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022. 1

2022
[11]

Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction,

Z. Gong, Q. Zhang, G. Bao, L. Zhu, R. Xu, K. Liu, L. Hu, and D. Miao, “Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction,” inAAAI, 2025. 1, 2, 3, 5, 6, 7, 8, 11

2025
[12]

Mindbridge: A cross-subject brain decoding framework,

S. Wang, S. Liu, Z. Tan, and X. Wang, “Mindbridge: A cross-subject brain decoding framework,” inCVPR, 2024. 1, 2, 3, 5, 7 13

2024
[13]

High-resolution image reconstruction with latent diffusion models from human brain activity,

Y . Takagi and S. Nishimoto, “High-resolution image reconstruction with latent diffusion models from human brain activity,” inCVPR, 2023. 2, 3

2023
[14]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data,

P. S. Scotti, M. Tripathy, C. K. T. Villanueva, R. Kneeland, T. Chen, A. Narang, C. Santhirasegaran, J. Xu, T. Naselaris, K. A. Normanet al., “Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data,” inICML, 2024. 2, 3, 5, 6, 7, 11, 12

2024
[15]

Mindaligner: Explicit brain func- tional alignment for cross-subject visual decoding from lim- ited fmri data.arXiv preprint arXiv:2502.05034, 2025

Y . Dai, Z. Yao, C. Song, Q. Zheng, W. Mai, K. Peng, S. Lu, W. Ouyang, J. Yang, and J. Wu, “Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fmri data,”arXiv preprint arXiv:2502.05034, 2025. 2, 3, 6, 7, 11, 12

work page arXiv 2025
[16]

Encoding and decoding in fmri,

T. Naselaris, K. N. Kay, S. Nishimoto, and J. L. Gallant, “Encoding and decoding in fmri,”Neuroimage, 2011. 2

2011
[17]

Brain decoding of spontaneous thought: Predictive modeling of self-relevance and valence using personal narratives,

H. J. Kim, B. K. Lux, E. Lee, E. S. Finn, and C.-W. Woo, “Brain decoding of spontaneous thought: Predictive modeling of self-relevance and valence using personal narratives,”Proceedings of the National Academy of Sciences, 2024. 2

2024
[18]

Decoding the brain: From neural representations to mechanistic mod- els,

M. W. Mathis, A. P. Rotondo, E. F. Chang, A. S. Tolias, and A. Mathis, “Decoding the brain: From neural representations to mechanistic mod- els,”Cell, 2024. 2

2024
[19]

Deep image reconstruction from human brain activity,

G. Shen, T. Horikawa, K. Majima, and Y . Kamitani, “Deep image reconstruction from human brain activity,”PLoS computational biology,
[20]

Generative adversarial networks,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” Communications of the ACM, 2020. 2

2020
[21]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inNeurIPS, 2020. 2

2020
[22]

Mind-3d: Reconstruct high-quality 3d objects in human brain,

J. Gao, Y . Fu, Y . Wang, X. Qian, J. Feng, and Y . Fu, “Mind-3d: Reconstruct high-quality 3d objects in human brain,” inECCV, 2024. 2

2024
[23]

Mind-3d++: Advancing fmri-based 3d reconstruction with high-quality textured mesh generation and a comprehensive dataset,

J. Gao, Y . Fu, Y . Fu, Y . Wang, X. Qian, and J. Feng, “Mind-3d++: Advancing fmri-based 3d reconstruction with high-quality textured mesh generation and a comprehensive dataset,”IEEE TPAMI, 2025. 2

2025
[24]

Wills aligner: Multi-subject collaborative brain visual decoding,

G. Bao, Q. Zhang, Z. Gong, J. Zhou, W. Fan, K. Yi, U. Naseem, L. Hu, and D. Miao, “Wills aligner: Multi-subject collaborative brain visual decoding,” inAAAI, 2025. 2

2025
[25]

Mind reader: Reconstructing complex images from brain activities,

S. Lin, T. Sprague, and A. K. Singh, “Mind reader: Reconstructing complex images from brain activities,” inNeurIPS, 2022. 2

2022
[26]

Versatile diffusion: Text, images and variations all in one diffusion model,

X. Xu, Z. Wang, G. Zhang, K. Wang, and H. Shi, “Versatile diffusion: Text, images and variations all in one diffusion model,” inICCV, 2023. 2

2023
[27]

Psychometry: An omnifit model for image reconstruction from human brain activity,

R. Quan, W. Wang, Z. Tian, F. Ma, and Y . Yang, “Psychometry: An omnifit model for image reconstruction from human brain activity,” in CVPR, 2024. 2

2024
[28]

arXiv preprint arXiv:2412.19487 (2024) 14 Ren et al

Z. Wang, Z. Zhao, L. Zhou, and P. Nachev, “Unibrain: A unified model for cross-subject brain decoding,”arXiv preprint arXiv:2412.19487,

work page arXiv
[29]

Umbrae: Unified multimodal brain decoding,

W. Xia, R. de Charette, C. Oztireli, and J.-H. Xue, “Umbrae: Unified multimodal brain decoding,” inECCV, 2024. 2

2024
[30]

Neuro-vision to language: Enhancing brain recording-based visual reconstruction and language interaction,

G. Shen, D. Zhao, X. He, L. Feng, Y . Dong, J. Wang, Q. Zhang, and Y . Zeng, “Neuro-vision to language: Enhancing brain recording-based visual reconstruction and language interaction,” inNeurIPS, 2024. 2

2024
[31]

Duala: Dual- level alignment of subjects and stimuli for cross-subject fmri decoding,

S. Li, J. Guo, J. Zhang, Y . Zhou, L. Cao, and Y . Shi, “Duala: Dual- level alignment of subjects and stimuli for cross-subject fmri decoding,”
[32]

Natural scene reconstruction from fmri signals using generative latent diffusion,

F. Ozcelik and R. VanRullen, “Natural scene reconstruction from fmri signals using generative latent diffusion,”Scientific Reports, 2023. 3

2023
[33]

See through their minds: Learning transferable brain decoding models from cross-subject fmri,

Y . Liu, Y . Ma, G. Zhu, H. Jing, and N. Zheng, “See through their minds: Learning transferable brain decoding models from cross-subject fmri,” inAAAI, 2025. 3

2025
[34]

Can brain state be manipulated to emphasize individual differences in functional connectivity?

E. S. Finn, D. Scheinost, D. M. Finn, X. Shen, X. Papademetris, and R. T. Constable, “Can brain state be manipulated to emphasize individual differences in functional connectivity?”NeuroImage, 2017. 3

2017
[35]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence,

E. J. Allen, G. St-Yves, Y . Wu, J. L. Breedlove, J. S. Prince, L. T. Dowdle, M. Nau, B. Caron, F. Pestilli, I. Charestet al., “A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence,” Nature neuroscience, 2022. 3, 6

2022
[36]

Brain decoding of the human connectome project tasks in a dense individual fmri dataset,

S. Rastegarnia, M. St-Laurent, E. DuPre, B. Pinsard, and P. Bellec, “Brain decoding of the human connectome project tasks in a dense individual fmri dataset,”NeuroImage, 2023. 3

2023
[37]

Through their eyes: multi-subject brain decoding with simple alignment techniques,

M. Ferrante, T. Boccato, F. Ozcelik, R. VanRullen, and N. Toschi, “Through their eyes: multi-subject brain decoding with simple alignment techniques,”Imaging Neuroscience, 2024. 3

2024
[38]

Hastie, R

T. Hastie, R. Tibshirani, and J. Friedman,The Elements of Statistical Learning. Springer, 2009. 4

2009
[39]

Sch ¨olkopf and A

B. Sch ¨olkopf and A. J. Smola,Learning with Kernels. MIT Press,
[40]

Analysis of representations for domain adaptation,

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” inNeurIPS, 2007. 5

2007
[41]

Domain adaptation with multiple sources,

Y . Mansour, M. Mohri, and A. Rostamizadeh, “Domain adaptation with multiple sources,” inNeurIPS, 2009. 5

2009
[42]

Aloft: A lightweight mlp- like architecture with dynamic low-frequency transform for domain generalization,

J. Guo, N. Wang, L. Qi, and Y . Shi, “Aloft: A lightweight mlp- like architecture with dynamic low-frequency transform for domain generalization,” inCVPR, 2023. 5

2023
[43]

The fast fourier transform,

H. J. Nussbaumer, “The fast fourier transform,” inFast Fourier transform and convolution algorithms, 1981. 5

1981
[44]

Sundararajan,The discrete Fourier transform: theory, algorithms and applications

D. Sundararajan,The discrete Fourier transform: theory, algorithms and applications. World Scientific, 2001. 5

2001
[45]

Discrete fourier transform,

S. K. Jena, “Discrete fourier transform,” inFourier, Laplace, and the Tangled Love Affair with Transforms: The Art of Signal Synthesis and Analysis, 2025. 5

2025
[46]

Empirical analyses of bold fmri statistics,

G. K. Aguirre, E. Zarahn, and M. D’Esposito, “Empirical analyses of bold fmri statistics,”Neuroimage, 1997. 5

1997
[47]

Statistical analysis of fmri data,

M. W. Woolrich, C. F. Beckmann, T. E. Nichols, S. M. Smith, P. Val- sasina, M. A. Rocca, and M. Filippi, “Statistical analysis of fmri data,” infMRI techniques and protocols, 2025. 5

2025
[48]

Catd: Unified representation learning for eeg-to-fmri cross-modal generation,

W. Yao, Z. Lyu, M. Mahmud, N. Zhong, B. Lei, and S. Wang, “Catd: Unified representation learning for eeg-to-fmri cross-modal generation,” IEEE Transactions on Medical Imaging, 2025. 5

2025
[49]

Bridging the vision-brain gap with an uncertainty-aware blur prior,

H. Wu, Q. Li, C. Zhang, Z. He, and X. Ying, “Bridging the vision-brain gap with an uncertainty-aware blur prior,” inCVPR, 2025. 5, 6

2025
[50]

Layercam: Exploring hierarchical class activation maps for localization,

P.-T. Jiang, C.-B. Zhang, Q. Hou, M.-M. Cheng, and Y . Wei, “Layercam: Exploring hierarchical class activation maps for localization,”IEEE transactions on image processing, 2021. 6

2021
[51]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inECCV, 2014. 6

2014
[52]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE transactions on image processing, 2004. 7

2004
[53]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inNeurIPS, 2012. 7

2012
[54]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR, 2016. 7

2016
[55]

Efficientnet: Rethinking model scaling for convo- lutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convo- lutional neural networks,” inICML, 2019. 7

2019
[56]

Unsupervised learning of visual features by contrasting cluster assign- ments,

M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assign- ments,” inNeurIPS, 2020. 7

2020
[57]

Pycortex: an interactive surface visualizer for fmri,

J. S. Gao, A. G. Huth, M. D. Lescroart, and J. L. Gallant, “Pycortex: an interactive surface visualizer for fmri,”Frontiers in neuroinformatics,

[1] [1]

Generic decoding of seen and imagined objects using hierarchical visual features,

T. Horikawa and Y . Kamitani, “Generic decoding of seen and imagined objects using hierarchical visual features,”Nature communications,

[2] [2]

Binless kernel machine: Modeling spike train transformation for cognitive neural prostheses,

C. Qian, X. Sun, Y . Wang, X. Zheng, Y . Wang, and G. Pan, “Binless kernel machine: Modeling spike train transformation for cognitive neural prostheses,”Neural Computation, 2020. 1

2020

[3] [3]

Dcnn-gan: Reconstructing realistic image from fmri,

Y . Lin, J. Li, and H. Wang, “Dcnn-gan: Reconstructing realistic image from fmri,” inMVA, 2019. 1

2019

[4] [4]

From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri,

R. Beliy, G. Gaziv, A. Hoogi, F. Strappini, T. Golan, and M. Irani, “From voxels to pixels and back: Self-supervision in natural-image reconstruction from fmri,” inNeurIPS, 2019. 1

2019

[5] [5]

UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity, August 2023

W. Mai and Z. Zhang, “Unibrain: Unify image reconstruction and captioning all in one diffusion model from human brain activity,”arXiv preprint arXiv:2308.07428, 2023. 1, 2

work page arXiv 2023

[6] [6]

Dream: Visual decoding from reversing human visual system,

W. Xia, R. De Charette, C. Oztireli, and J.-H. Xue, “Dream: Visual decoding from reversing human visual system,” inWACV, 2024. 1, 2

2024

[7] [7]

Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding,

Z. Chen, J. Qing, T. Xiang, W. L. Yue, and J. H. Zhou, “Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding,” inCVPR, 2023. 1, 2

2023

[8] [8]

Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors,

P. Scotti, A. Banerjee, J. Goode, S. Shabalin, A. Nguyen, A. Dempster, N. Verlinde, E. Yundler, D. Weisberg, K. Normanet al., “Reconstructing the mind’s eye: fmri-to-image with contrastive learning and diffusion priors,” inNeurIPS, 2023. 1, 2, 3

2023

[9] [9]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inICML, 2021. 1, 2, 7

2021

[10] [10]

High- resolution image synthesis with latent diffusion models,

R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022. 1

2022

[11] [11]

Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction,

Z. Gong, Q. Zhang, G. Bao, L. Zhu, R. Xu, K. Liu, L. Hu, and D. Miao, “Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction,” inAAAI, 2025. 1, 2, 3, 5, 6, 7, 8, 11

2025

[12] [12]

Mindbridge: A cross-subject brain decoding framework,

S. Wang, S. Liu, Z. Tan, and X. Wang, “Mindbridge: A cross-subject brain decoding framework,” inCVPR, 2024. 1, 2, 3, 5, 7 13

2024

[13] [13]

High-resolution image reconstruction with latent diffusion models from human brain activity,

Y . Takagi and S. Nishimoto, “High-resolution image reconstruction with latent diffusion models from human brain activity,” inCVPR, 2023. 2, 3

2023

[14] [14]

Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data,

P. S. Scotti, M. Tripathy, C. K. T. Villanueva, R. Kneeland, T. Chen, A. Narang, C. Santhirasegaran, J. Xu, T. Naselaris, K. A. Normanet al., “Mindeye2: Shared-subject models enable fmri-to-image with 1 hour of data,” inICML, 2024. 2, 3, 5, 6, 7, 11, 12

2024

[15] [15]

Mindaligner: Explicit brain func- tional alignment for cross-subject visual decoding from lim- ited fmri data.arXiv preprint arXiv:2502.05034, 2025

Y . Dai, Z. Yao, C. Song, Q. Zheng, W. Mai, K. Peng, S. Lu, W. Ouyang, J. Yang, and J. Wu, “Mindaligner: Explicit brain functional alignment for cross-subject visual decoding from limited fmri data,”arXiv preprint arXiv:2502.05034, 2025. 2, 3, 6, 7, 11, 12

work page arXiv 2025

[16] [16]

Encoding and decoding in fmri,

T. Naselaris, K. N. Kay, S. Nishimoto, and J. L. Gallant, “Encoding and decoding in fmri,”Neuroimage, 2011. 2

2011

[17] [17]

Brain decoding of spontaneous thought: Predictive modeling of self-relevance and valence using personal narratives,

H. J. Kim, B. K. Lux, E. Lee, E. S. Finn, and C.-W. Woo, “Brain decoding of spontaneous thought: Predictive modeling of self-relevance and valence using personal narratives,”Proceedings of the National Academy of Sciences, 2024. 2

2024

[18] [18]

Decoding the brain: From neural representations to mechanistic mod- els,

M. W. Mathis, A. P. Rotondo, E. F. Chang, A. S. Tolias, and A. Mathis, “Decoding the brain: From neural representations to mechanistic mod- els,”Cell, 2024. 2

2024

[19] [19]

Deep image reconstruction from human brain activity,

G. Shen, T. Horikawa, K. Majima, and Y . Kamitani, “Deep image reconstruction from human brain activity,”PLoS computational biology,

[20] [20]

Generative adversarial networks,

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial networks,” Communications of the ACM, 2020. 2

2020

[21] [21]

Denoising diffusion probabilistic models,

J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inNeurIPS, 2020. 2

2020

[22] [22]

Mind-3d: Reconstruct high-quality 3d objects in human brain,

J. Gao, Y . Fu, Y . Wang, X. Qian, J. Feng, and Y . Fu, “Mind-3d: Reconstruct high-quality 3d objects in human brain,” inECCV, 2024. 2

2024

[23] [23]

Mind-3d++: Advancing fmri-based 3d reconstruction with high-quality textured mesh generation and a comprehensive dataset,

J. Gao, Y . Fu, Y . Fu, Y . Wang, X. Qian, and J. Feng, “Mind-3d++: Advancing fmri-based 3d reconstruction with high-quality textured mesh generation and a comprehensive dataset,”IEEE TPAMI, 2025. 2

2025

[24] [24]

Wills aligner: Multi-subject collaborative brain visual decoding,

G. Bao, Q. Zhang, Z. Gong, J. Zhou, W. Fan, K. Yi, U. Naseem, L. Hu, and D. Miao, “Wills aligner: Multi-subject collaborative brain visual decoding,” inAAAI, 2025. 2

2025

[25] [25]

Mind reader: Reconstructing complex images from brain activities,

S. Lin, T. Sprague, and A. K. Singh, “Mind reader: Reconstructing complex images from brain activities,” inNeurIPS, 2022. 2

2022

[26] [26]

Versatile diffusion: Text, images and variations all in one diffusion model,

X. Xu, Z. Wang, G. Zhang, K. Wang, and H. Shi, “Versatile diffusion: Text, images and variations all in one diffusion model,” inICCV, 2023. 2

2023

[27] [27]

Psychometry: An omnifit model for image reconstruction from human brain activity,

R. Quan, W. Wang, Z. Tian, F. Ma, and Y . Yang, “Psychometry: An omnifit model for image reconstruction from human brain activity,” in CVPR, 2024. 2

2024

[28] [28]

arXiv preprint arXiv:2412.19487 (2024) 14 Ren et al

Z. Wang, Z. Zhao, L. Zhou, and P. Nachev, “Unibrain: A unified model for cross-subject brain decoding,”arXiv preprint arXiv:2412.19487,

work page arXiv

[29] [29]

Umbrae: Unified multimodal brain decoding,

W. Xia, R. de Charette, C. Oztireli, and J.-H. Xue, “Umbrae: Unified multimodal brain decoding,” inECCV, 2024. 2

2024

[30] [30]

Neuro-vision to language: Enhancing brain recording-based visual reconstruction and language interaction,

G. Shen, D. Zhao, X. He, L. Feng, Y . Dong, J. Wang, Q. Zhang, and Y . Zeng, “Neuro-vision to language: Enhancing brain recording-based visual reconstruction and language interaction,” inNeurIPS, 2024. 2

2024

[31] [31]

Duala: Dual- level alignment of subjects and stimuli for cross-subject fmri decoding,

S. Li, J. Guo, J. Zhang, Y . Zhou, L. Cao, and Y . Shi, “Duala: Dual- level alignment of subjects and stimuli for cross-subject fmri decoding,”

[32] [32]

Natural scene reconstruction from fmri signals using generative latent diffusion,

F. Ozcelik and R. VanRullen, “Natural scene reconstruction from fmri signals using generative latent diffusion,”Scientific Reports, 2023. 3

2023

[33] [33]

See through their minds: Learning transferable brain decoding models from cross-subject fmri,

Y . Liu, Y . Ma, G. Zhu, H. Jing, and N. Zheng, “See through their minds: Learning transferable brain decoding models from cross-subject fmri,” inAAAI, 2025. 3

2025

[34] [34]

Can brain state be manipulated to emphasize individual differences in functional connectivity?

E. S. Finn, D. Scheinost, D. M. Finn, X. Shen, X. Papademetris, and R. T. Constable, “Can brain state be manipulated to emphasize individual differences in functional connectivity?”NeuroImage, 2017. 3

2017

[35] [35]

A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence,

E. J. Allen, G. St-Yves, Y . Wu, J. L. Breedlove, J. S. Prince, L. T. Dowdle, M. Nau, B. Caron, F. Pestilli, I. Charestet al., “A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence,” Nature neuroscience, 2022. 3, 6

2022

[36] [36]

Brain decoding of the human connectome project tasks in a dense individual fmri dataset,

S. Rastegarnia, M. St-Laurent, E. DuPre, B. Pinsard, and P. Bellec, “Brain decoding of the human connectome project tasks in a dense individual fmri dataset,”NeuroImage, 2023. 3

2023

[37] [37]

Through their eyes: multi-subject brain decoding with simple alignment techniques,

M. Ferrante, T. Boccato, F. Ozcelik, R. VanRullen, and N. Toschi, “Through their eyes: multi-subject brain decoding with simple alignment techniques,”Imaging Neuroscience, 2024. 3

2024

[38] [38]

Hastie, R

T. Hastie, R. Tibshirani, and J. Friedman,The Elements of Statistical Learning. Springer, 2009. 4

2009

[39] [39]

Sch ¨olkopf and A

B. Sch ¨olkopf and A. J. Smola,Learning with Kernels. MIT Press,

[40] [40]

Analysis of representations for domain adaptation,

S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of representations for domain adaptation,” inNeurIPS, 2007. 5

2007

[41] [41]

Domain adaptation with multiple sources,

Y . Mansour, M. Mohri, and A. Rostamizadeh, “Domain adaptation with multiple sources,” inNeurIPS, 2009. 5

2009

[42] [42]

Aloft: A lightweight mlp- like architecture with dynamic low-frequency transform for domain generalization,

J. Guo, N. Wang, L. Qi, and Y . Shi, “Aloft: A lightweight mlp- like architecture with dynamic low-frequency transform for domain generalization,” inCVPR, 2023. 5

2023

[43] [43]

The fast fourier transform,

H. J. Nussbaumer, “The fast fourier transform,” inFast Fourier transform and convolution algorithms, 1981. 5

1981

[44] [44]

Sundararajan,The discrete Fourier transform: theory, algorithms and applications

D. Sundararajan,The discrete Fourier transform: theory, algorithms and applications. World Scientific, 2001. 5

2001

[45] [45]

Discrete fourier transform,

S. K. Jena, “Discrete fourier transform,” inFourier, Laplace, and the Tangled Love Affair with Transforms: The Art of Signal Synthesis and Analysis, 2025. 5

2025

[46] [46]

Empirical analyses of bold fmri statistics,

G. K. Aguirre, E. Zarahn, and M. D’Esposito, “Empirical analyses of bold fmri statistics,”Neuroimage, 1997. 5

1997

[47] [47]

Statistical analysis of fmri data,

M. W. Woolrich, C. F. Beckmann, T. E. Nichols, S. M. Smith, P. Val- sasina, M. A. Rocca, and M. Filippi, “Statistical analysis of fmri data,” infMRI techniques and protocols, 2025. 5

2025

[48] [48]

Catd: Unified representation learning for eeg-to-fmri cross-modal generation,

W. Yao, Z. Lyu, M. Mahmud, N. Zhong, B. Lei, and S. Wang, “Catd: Unified representation learning for eeg-to-fmri cross-modal generation,” IEEE Transactions on Medical Imaging, 2025. 5

2025

[49] [49]

Bridging the vision-brain gap with an uncertainty-aware blur prior,

H. Wu, Q. Li, C. Zhang, Z. He, and X. Ying, “Bridging the vision-brain gap with an uncertainty-aware blur prior,” inCVPR, 2025. 5, 6

2025

[50] [50]

Layercam: Exploring hierarchical class activation maps for localization,

P.-T. Jiang, C.-B. Zhang, Q. Hou, M.-M. Cheng, and Y . Wei, “Layercam: Exploring hierarchical class activation maps for localization,”IEEE transactions on image processing, 2021. 6

2021

[51] [51]

Microsoft coco: Common objects in context,

T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” inECCV, 2014. 6

2014

[52] [52]

Image quality assessment: from error visibility to structural similarity,

Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,”IEEE transactions on image processing, 2004. 7

2004

[53] [53]

Imagenet classification with deep convolutional neural networks,

A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inNeurIPS, 2012. 7

2012

[54] [54]

Rethinking the inception architecture for computer vision,

C. Szegedy, V . Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the inception architecture for computer vision,” inCVPR, 2016. 7

2016

[55] [55]

Efficientnet: Rethinking model scaling for convo- lutional neural networks,

M. Tan and Q. Le, “Efficientnet: Rethinking model scaling for convo- lutional neural networks,” inICML, 2019. 7

2019

[56] [56]

Unsupervised learning of visual features by contrasting cluster assign- ments,

M. Caron, I. Misra, J. Mairal, P. Goyal, P. Bojanowski, and A. Joulin, “Unsupervised learning of visual features by contrasting cluster assign- ments,” inNeurIPS, 2020. 7

2020

[57] [57]

Pycortex: an interactive surface visualizer for fmri,

J. S. Gao, A. G. Huth, M. D. Lescroart, and J. L. Gallant, “Pycortex: an interactive surface visualizer for fmri,”Frontiers in neuroinformatics,