Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction

Hoang-Son Vo; Minh-Huy Mai-Duc; Soo-Hyung Kim; Tien-Dung Mai; Van-Hung Bui

arxiv: 2606.04772 · v1 · pith:EK2MR32Onew · submitted 2026-06-03 · 💻 cs.CV · cs.AI

Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction

Hoang-Son Vo , Van-Hung Bui , Minh-Huy Mai-Duc , Tien-Dung Mai , Soo-Hyung Kim This is my paper

Pith reviewed 2026-06-28 06:56 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords image-to-fMRI encodingMamba architecturebrain reconstructionvisual cortex specializationhierarchical predictioncausal ablationfMRI voxel prediction

0 comments

The pith

A dual-stream Mamba model in coarse-to-fine stages predicts fMRI from images and shows causal stream specialization matching visual cortex regions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CHASMBrain, a two-stage hierarchical framework that first predicts coarse ROI-level fMRI activations and then refines them to voxel level via a Mamba-VAE. It processes global semantic tokens in one Mamba stream and local spatial patches in another, motivated by the visual cortex's organization. On the Natural Scenes Dataset this yields a Pearson correlation of 0.429 and MSE of 0.261, beating ridge regression and DINOv2 baselines. Branch ablations demonstrate that the patch stream is locked to early retinotopic areas while the CLS stream supplies semantic context to higher-order regions, and the effect holds causally. The backbone also transfers across subjects with minimal adaptation.

Core claim

The central claim is that separating semantic and spatial processing into distinct Mamba streams within a coarse-to-fine hierarchy produces stronger image-to-fMRI predictions than prior methods while causally revealing an asymmetric division of labor: the patch stream specializes in early visual cortex and the CLS stream in higher-order areas.

What carries the argument

Dual-stream Mamba architecture that routes global CLS tokens through one sequential path and local patch tokens through another, embedded in a two-stage coarse-to-fine pipeline with Mamba-VAE refinement at the second stage.

If this is right

The architecture reaches Pearson correlation 0.429 and MSE 0.261 on NSD, exceeding ridge regression and DINOv2 linear probes.
Patch-stream ablation selectively impairs predictions in early retinotopic visual cortex.
CLS-stream ablation selectively impairs predictions in higher-order semantic areas.
The learned representation transfers to new subjects with only minimal per-subject fine-tuning.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dual-stream separation could be tested on other sensory modalities to check whether analogous causal specialization appears.
If the pattern generalizes, the architecture offers a way to generate testable predictions about information flow between visual areas.
Cross-subject generalization suggests the streams capture invariants that might be used to align representations across individuals in brain-computer interface settings.

Load-bearing premise

The model design assumes that ablating one stream versus the other can be read as causal evidence of functional specialization rather than simple correlation.

What would settle it

An experiment in which ablating the patch stream does not produce a measurably larger drop in prediction accuracy specifically within retinotopic early visual cortex voxels would falsify the claimed causal specialization.

Figures

Figures reproduced from arXiv: 2606.04772 by Hoang-Son Vo, Minh-Huy Mai-Duc, Soo-Hyung Kim, Tien-Dung Mai, Van-Hung Bui.

**Figure 1.** Figure 1: Overview of the proposed hierarchical two-stage framework for image-to-fMRI reconstruction. Stage 1 (Coarse): A dual-stream Mamba architecture processes global (CLS token) and local (patch tokens) visual features separately, then fuses them via an adaptive gate to predict ROI-level activations. Stage 2 (Fine): A Mamba-VAE architecture takes the visual features and ROI predictions as input, encodes them thr… view at source ↗

**Figure 2.** Figure 2: Qualitative comparison of predicted fMRI activations and downstream image reconstructions. CHASMBrain accurately preserves spatial topographic patterns and enables the downstream decoder (MindEyeV2) to reconstruct visual semantics that closely align with the original stimuli. Dual-Stream Biological Alignment. To validate our biological motivation, we map the predictive correlations of the Local (patch) and… view at source ↗

**Figure 3.** Figure 3: Spatial analysis of dual-stream correlations. Left: Early visual areas (V1/V2) are primarily driven by the Local branch (blue), while higher-order semantic areas are driven by the Global branch (red). Right: 3D visualization showing distinct functional alignments. 3.4 Ablation Studies Architecture and Refinement Strategy [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

read the original abstract

Understanding the relationship between deep visual representations and the human visual system is a fundamental challenge in computational neuroscience. While modern vision models achieve strong performance in image recognition, their correspondence with the hierarchical organization of the human visual cortex remains an open question. In this study, we propose CHASMBrain, a novel hierarchical two-stage framework for image-to-fMRI encoding. Our architecture leverages a dual-stream Mamba design to explicitly separate and process global semantic tokens and local spatial patches, motivated by the functional organization of the visual cortex. A coarse-to-fine strategy is employed: Stage 1 predicts denoised ROI-level activations, while Stage 2 refines these coarse responses into full voxel-level predictions using a Mamba-VAE. Experiments on the Natural Scenes Dataset (NSD) demonstrate that our method achieves a Pearson correlation of 0.429 and an MSE of 0.261, outperforming all evaluated baselines including ridge regression and DINOv2 linear probes. Beyond predictive performance, causal branch-ablation experiments reveal an asymmetric specialization: the patch stream is specifically locked to early visual cortex (retinotopic regions), while the CLS stream contributes broader semantic context to higher-order areas -- a correspondence that holds causally, not merely correlationally. Cross-subject transfer experiments further show that the learned backbone generalizes across individuals with minimal per-subject adaptation, suggesting the model captures a shared, subject-agnostic visual representation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CHASMBrain adds a dual-stream Mamba plus coarse-to-fine VAE stage for image-to-fMRI prediction and reports a 0.429 correlation on NSD with some ablation evidence for stream specialization.

read the letter

The main thing here is a concrete new architecture: dual Mamba streams that split global CLS tokens from local patches, followed by a Mamba-VAE that first predicts coarse ROI responses then refines to voxels. That combination is not in the baselines they list, and the cross-subject transfer result is a practical plus if it holds.

What stands out is the causal ablation part. They remove one stream at a time and show the patch stream maps to early retinotopic cortex while the CLS stream affects higher areas. If the ablation protocol is clean, that moves beyond simple correlation and gives a testable claim about functional specialization.

The numbers are the obvious soft spot. The abstract gives 0.429 correlation and 0.261 MSE but no error bars, no split details, and no description of how the ridge and DINOv2 baselines were trained or tuned. Without those, it is hard to know whether the gap is stable or sensitive to implementation choices. The Mamba-VAE itself is presented as novel, yet the paper does not show whether the VAE component adds measurable value over a plain Mamba decoder.

This is the kind of work that belongs in a methods-focused venue or a computational neuroscience journal. Readers who care about brain encoding models and are willing to re-implement the architecture will get the most out of it. The central performance claim and the specialization story are worth checking with the full methods and code.

I would send it to review. The architecture is explicit enough to test, the dataset is standard, and the ablation angle is worth verifying even if the absolute numbers need more scrutiny.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes CHASMBrain, a coarse-to-fine hierarchical two-stage framework for image-to-fMRI encoding that employs a dual-stream Mamba architecture to separately process global semantic (CLS) tokens and local spatial patches. Stage 1 predicts denoised ROI-level activations while Stage 2 refines these into voxel-level predictions via a Mamba-VAE. On the Natural Scenes Dataset (NSD), the method reports a Pearson correlation of 0.429 and MSE of 0.261, outperforming baselines including ridge regression and DINOv2 linear probes. Causal branch-ablation experiments are presented as evidence of asymmetric specialization (patch stream locked to early visual cortex, CLS stream to higher-order areas), with additional results on cross-subject transfer showing generalization with minimal adaptation.

Significance. If the performance and ablation results prove robust under standard validation practices, the work could contribute a new architecture for brain encoding that explicitly incorporates hierarchical and dual-stream processing motivated by visual cortex organization. The reported cross-subject generalization would be a practical strength, and the causal reading of ablations (if substantiated) could offer interpretable insights beyond correlational findings common in the field.

major comments (2)

[Abstract / Experiments] Abstract and Experiments section: The central performance claims (Pearson r = 0.429, MSE = 0.261, outperformance over ridge regression and DINOv2) are presented without data-split details, subject count, error bars, or statistical testing. These omissions are load-bearing because they prevent assessment of whether the reported gains are reliable or generalizable.
[Experiments] Experiments section (branch-ablation results): The claim that ablations demonstrate 'causal' rather than correlational specialization is load-bearing for the dual-stream motivation and functional-organization interpretation, yet the protocol for the ablations (how branches are disabled, how 'locked' is quantified, and controls for indirect effects) is not specified in sufficient detail to support the causal language.

minor comments (2)

[Abstract] The term 'Mamba-VAE' appears without definition or citation; clarify whether it is a novel component or an adaptation of prior Mamba or VAE work.
[Experiments] Baseline implementation details (e.g., whether ridge regression and DINOv2 probes use identical training data, preprocessing, and hyperparameter search) should be stated explicitly to allow direct comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will revise the manuscript to improve clarity and completeness.

read point-by-point responses

Referee: [Abstract / Experiments] Abstract and Experiments section: The central performance claims (Pearson r = 0.429, MSE = 0.261, outperformance over ridge regression and DINOv2) are presented without data-split details, subject count, error bars, or statistical testing. These omissions are load-bearing because they prevent assessment of whether the reported gains are reliable or generalizable.

Authors: The referee correctly identifies that these specifics are absent from the abstract and main experiments section. In the revision we will add: NSD data splits (standard subject-wise train/test partitions across the 8 subjects), explicit subject count, error bars computed via 5-fold cross-validation, and statistical tests (paired Wilcoxon tests against baselines with p-values). These additions will appear in both the abstract and Experiments section. revision: yes
Referee: [Experiments] Experiments section (branch-ablation results): The claim that ablations demonstrate 'causal' rather than correlational specialization is load-bearing for the dual-stream motivation and functional-organization interpretation, yet the protocol for the ablations (how branches are disabled, how 'locked' is quantified, and controls for indirect effects) is not specified in sufficient detail to support the causal language.

Authors: We agree the ablation protocol requires more explicit description to justify the causal phrasing. The revision will expand the Experiments section with: (1) branch disabling via zero-masking of the patch or CLS stream outputs prior to fusion, (2) quantification of 'locked' specialization as the differential drop in voxel-wise Pearson r within early visual cortex (V1–V4) versus higher areas, and (3) control experiments using random feature ablation to isolate indirect effects. We will also moderate the language from 'causal' to 'intervention-based evidence of specialization' if the added controls warrant it. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The supplied paper text consists only of the abstract, which reports measured empirical results (Pearson 0.429, MSE 0.261 on NSD) and describes an architecture plus ablation experiments without any equations, derivations, fitted-parameter predictions, or self-citations. No load-bearing step reduces a claimed result to its own inputs by construction, and none of the six enumerated circularity patterns can be exhibited with a direct quote from the text. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

Review performed on abstract only; full model equations, training details, and hyperparameter choices are unavailable. The central claim rests on the unverified premise that the dual-stream separation mirrors visual cortex organization.

axioms (1)

domain assumption The functional organization of the visual cortex separates global semantic processing from local spatial patch processing in a manner that can be directly mapped to dual-stream token and patch streams.
Explicitly stated as motivation for the dual-stream Mamba design.

invented entities (1)

Mamba-VAE no independent evidence
purpose: Refine coarse ROI-level predictions into full voxel-level fMRI predictions in stage 2.
Introduced as the refinement component; no independent evidence supplied in abstract.

pith-pipeline@v0.9.1-grok · 5799 in / 1508 out tokens · 30234 ms · 2026-06-28T06:56:07.312677+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 8 canonical work pages

[1]

Adeli, H., Minni, S., Kriegeskorte, N.: Predicting brain activity using Transformers (Aug 2023).https://doi.org/10.1101/2023.08.02.551743

work page doi:10.1101/2023.08.02.551743 2023
[2]

Nature neuroscience25(1), 116– 126 (2022)

Allen, E.J., St-Yves, G., Wu, Y., Breedlove, J.L., Prince, J.S., Dowdle, L.T., Nau, M., Caron, B., Pestilli, F., Charest, I., et al.: A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience25(1), 116– 126 (2022)

2022
[3]

Castello, M.V.d.O., Deniz, F., la Tour, T.D., Gallant, J.L.: Encoding models in functional magnetic resonance imaging: The Voxelwise Encoding Model framework
[4]

In: International Conference on Machine Learning (ICML) (2024)

Dao, T., Gu, A.: Transformers are SSMs: Generalized models and efficient al- gorithms through structured state space duality. In: International Conference on Machine Learning (ICML) (2024)

2024
[5]

NeuroIm- age152, 184–194 (May 2017).https://doi.org/10.1016/j.neuroimage.2016

Eickenberg, M., Gramfort, A., Varoquaux, G., Thirion, B.: Seeing it all: Convo- lutional network layers map the function of the human visual system. NeuroIm- age152, 184–194 (May 2017).https://doi.org/10.1016/j.neuroimage.2016. 10.001

work page doi:10.1016/j.neuroimage.2016 2017
[6]

In: Advances in Neural Information Processing Systems (2019)

Kubilius, J., Schrimpf, M., Kar, K., Rajalingham, R., Hong, H., Majaj, N.J., Issa, E.B., Prescott-Roy, J., Schmidt, K., Nayebi, A., Bear, D., Yamins, D.L.K., Di- Carlo, J.J.: Brain-Like Object Recognition with High-Performing Shallow Recur- rent ANNs. In: Advances in Neural Information Processing Systems (2019)

2019
[7]

NeuroImage298, 120772 (Sep 2024).https://doi.org/10.1016/j.neuroimage.2024.120772

Lin, R., Naselaris, T., Kay, K., Wehbe, L.: Stacked regressions and structured variance partitioning for interpretable brain maps. NeuroImage298, 120772 (Sep 2024).https://doi.org/10.1016/j.neuroimage.2024.120772

work page doi:10.1016/j.neuroimage.2024.120772 2024
[8]

Lu, H., Zhou, Q., Fei, N., Lu, Z., Ding, M., Wen, J., Du, C., Zhao, X., Sun, H., He, H., Wen, J.R.: Multimodal foundation models are better simulators of the human brain (Aug 2022).https://doi.org/10.48550/arXiv.2208.08263

work page doi:10.48550/arxiv.2208.08263 2022
[9]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Mai, W., Wu, J., Zhu, Y., Yao, Z., Zhou, D., Luo, A., Zheng, Q., Ouyang, W., Song, C.: Synbrain: Enhancing visual-to-fmri synthesis via probabilistic represen- tation learning. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

2025
[10]

Oota, S.R., Chen, Z., Gupta, M., Bapi, R.S., Jobard, G., Alexandre, F., Hinaut, X.: Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey) (Dec 2024).https://doi.org/10.48550/arXiv.2307.10246

work page doi:10.48550/arxiv.2307.10246 2024
[11]

Transactions on Ma- chine Learning Research (2024),https://openreview.net/forum?id=a68SUt6zFt, featured Certification

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...

2024
[12]

In: Meila, M., Zhang, T

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceed- ings of Machine Learning Res...

2021
[13]

Cerebral cortex28(9), 3095–3114 (2018) 16 Son et al

Schaefer, A., Kong, R., Gordon, E.M., Laumann, T.O., Zuo, X.N., Holmes, A.J., Eickhoff, S.B., Yeo, B.T.: Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri. Cerebral cortex28(9), 3095–3114 (2018) 16 Son et al

2018
[14]

Schrimpf, M., Kubilius, J., Hong, H., Majaj, N.J., Rajalingham, R., Issa, E.B., Kar, K., Bashivan, P., Prescott-Roy, J., Geiger, F., Schmidt, K., Yamins, D.L.K., DiCarlo, J.J.: Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? (Sep 2018).https://doi.org/10.1101/407007

work page doi:10.1101/407007 2018
[15]

In: Proceedings of the 41st International Conference on Machine Learning

Scotti, P.S., Tripathy, M., Villanueva, C.K.T., Kneeland, R., Chen, T., Narang, A., Santhirasegaran, C., Xu, J., Naselaris, T., Norman, K.A., Abraham, T.M.: Mindeye2: shared-subject models enable fmri-to-image with 1 hour of data. In: Proceedings of the 41st International Conference on Machine Learning. ICML’24, JMLR.org (2024)

2024
[16]

NeuroImage180(Pt A), 188–202 (Oct 2018).https://doi.org/10.1016/j.neuroimage.2017.06.035

St-Yves, G., Naselaris, T.: The feature-weighted receptive field: An interpretable encoding model for complex feature spaces. NeuroImage180(Pt A), 188–202 (Oct 2018).https://doi.org/10.1016/j.neuroimage.2017.06.035

work page doi:10.1016/j.neuroimage.2017.06.035 2018
[17]

In: Advances in Neural Information Processing Systems

Wang, A., Tarr, M., Wehbe, L.: Neural Taskonomy: Inferring the Similarity of Task- Derived Representations from Brain Activity. In: Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019).https://doi.org/10. 1101/708016

2019
[18]

Cerebral Cortex , volume =

Wen, H., Shi, J., Zhang, Y., Lu, K.H., Cao, J., Liu, Z.: Neural Encoding and Decod- ing with Deep Learning for Dynamic Natural Vision. Cerebral Cortex (New York, NY)28(12), 4136–4160 (Dec 2018).https://doi.org/10.1093/cercor/bhx268

work page doi:10.1093/cercor/bhx268 2018
[19]

In: Yue, Y., Garg, A., Peng, N., Sha, F., Yu, R

Zhang, Q., Gong, Z., Wu, Z., Miao, D.: Mindsimulator: Exploring brain concept localization via synthetic fmri. In: Yue, Y., Garg, A., Peng, N., Sha, F., Yu, R. (eds.) International Conference on Learning Representations. vol. 2025, pp. 76781–76803 (2025),https://proceedings.iclr.cc/paper_files/paper/2025/ file/bf121b033db3bac31c3193e8a0dcbf66-Paper-Conference.pdf

2025

[1] [1]

Adeli, H., Minni, S., Kriegeskorte, N.: Predicting brain activity using Transformers (Aug 2023).https://doi.org/10.1101/2023.08.02.551743

work page doi:10.1101/2023.08.02.551743 2023

[2] [2]

Nature neuroscience25(1), 116– 126 (2022)

Allen, E.J., St-Yves, G., Wu, Y., Breedlove, J.L., Prince, J.S., Dowdle, L.T., Nau, M., Caron, B., Pestilli, F., Charest, I., et al.: A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence. Nature neuroscience25(1), 116– 126 (2022)

2022

[3] [3]

Castello, M.V.d.O., Deniz, F., la Tour, T.D., Gallant, J.L.: Encoding models in functional magnetic resonance imaging: The Voxelwise Encoding Model framework

[4] [4]

In: International Conference on Machine Learning (ICML) (2024)

Dao, T., Gu, A.: Transformers are SSMs: Generalized models and efficient al- gorithms through structured state space duality. In: International Conference on Machine Learning (ICML) (2024)

2024

[5] [5]

NeuroIm- age152, 184–194 (May 2017).https://doi.org/10.1016/j.neuroimage.2016

Eickenberg, M., Gramfort, A., Varoquaux, G., Thirion, B.: Seeing it all: Convo- lutional network layers map the function of the human visual system. NeuroIm- age152, 184–194 (May 2017).https://doi.org/10.1016/j.neuroimage.2016. 10.001

work page doi:10.1016/j.neuroimage.2016 2017

[6] [6]

In: Advances in Neural Information Processing Systems (2019)

Kubilius, J., Schrimpf, M., Kar, K., Rajalingham, R., Hong, H., Majaj, N.J., Issa, E.B., Prescott-Roy, J., Schmidt, K., Nayebi, A., Bear, D., Yamins, D.L.K., Di- Carlo, J.J.: Brain-Like Object Recognition with High-Performing Shallow Recur- rent ANNs. In: Advances in Neural Information Processing Systems (2019)

2019

[7] [7]

NeuroImage298, 120772 (Sep 2024).https://doi.org/10.1016/j.neuroimage.2024.120772

Lin, R., Naselaris, T., Kay, K., Wehbe, L.: Stacked regressions and structured variance partitioning for interpretable brain maps. NeuroImage298, 120772 (Sep 2024).https://doi.org/10.1016/j.neuroimage.2024.120772

work page doi:10.1016/j.neuroimage.2024.120772 2024

[8] [8]

Lu, H., Zhou, Q., Fei, N., Lu, Z., Ding, M., Wen, J., Du, C., Zhao, X., Sun, H., He, H., Wen, J.R.: Multimodal foundation models are better simulators of the human brain (Aug 2022).https://doi.org/10.48550/arXiv.2208.08263

work page doi:10.48550/arxiv.2208.08263 2022

[9] [9]

In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

Mai, W., Wu, J., Zhu, Y., Yao, Z., Zhou, D., Luo, A., Zheng, Q., Ouyang, W., Song, C.: Synbrain: Enhancing visual-to-fmri synthesis via probabilistic represen- tation learning. In: The Thirty-ninth Annual Conference on Neural Information Processing Systems (2025)

2025

[10] [10]

Oota, S.R., Chen, Z., Gupta, M., Bapi, R.S., Jobard, G., Alexandre, F., Hinaut, X.: Deep Neural Networks and Brain Alignment: Brain Encoding and Decoding (Survey) (Dec 2024).https://doi.org/10.48550/arXiv.2307.10246

work page doi:10.48550/arxiv.2307.10246 2024

[11] [11]

Transactions on Ma- chine Learning Research (2024),https://openreview.net/forum?id=a68SUt6zFt, featured Certification

Oquab, M., Darcet, T., Moutakanni, T., Vo, H.V., Szafraniec, M., Khalidov, V., Fernandez, P., HAZIZA, D., Massa, F., El-Nouby, A., Assran, M., Ballas, N., Galuba, W., Howes, R., Huang, P.Y., Li, S.W., Misra, I., Rabbat, M., Sharma, V., Synnaeve, G., Xu, H., Jegou, H., Mairal, J., Labatut, P., Joulin, A., Bojanowski, P.: DINOv2: Learning robust visual feat...

2024

[12] [12]

In: Meila, M., Zhang, T

Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., Krueger, G., Sutskever, I.: Learning transferable visual models from natural language supervision. In: Meila, M., Zhang, T. (eds.) Proceedings of the 38th International Conference on Machine Learning. Proceed- ings of Machine Learning Res...

2021

[13] [13]

Cerebral cortex28(9), 3095–3114 (2018) 16 Son et al

Schaefer, A., Kong, R., Gordon, E.M., Laumann, T.O., Zuo, X.N., Holmes, A.J., Eickhoff, S.B., Yeo, B.T.: Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri. Cerebral cortex28(9), 3095–3114 (2018) 16 Son et al

2018

[14] [14]

Schrimpf, M., Kubilius, J., Hong, H., Majaj, N.J., Rajalingham, R., Issa, E.B., Kar, K., Bashivan, P., Prescott-Roy, J., Geiger, F., Schmidt, K., Yamins, D.L.K., DiCarlo, J.J.: Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? (Sep 2018).https://doi.org/10.1101/407007

work page doi:10.1101/407007 2018

[15] [15]

In: Proceedings of the 41st International Conference on Machine Learning

Scotti, P.S., Tripathy, M., Villanueva, C.K.T., Kneeland, R., Chen, T., Narang, A., Santhirasegaran, C., Xu, J., Naselaris, T., Norman, K.A., Abraham, T.M.: Mindeye2: shared-subject models enable fmri-to-image with 1 hour of data. In: Proceedings of the 41st International Conference on Machine Learning. ICML’24, JMLR.org (2024)

2024

[16] [16]

NeuroImage180(Pt A), 188–202 (Oct 2018).https://doi.org/10.1016/j.neuroimage.2017.06.035

St-Yves, G., Naselaris, T.: The feature-weighted receptive field: An interpretable encoding model for complex feature spaces. NeuroImage180(Pt A), 188–202 (Oct 2018).https://doi.org/10.1016/j.neuroimage.2017.06.035

work page doi:10.1016/j.neuroimage.2017.06.035 2018

[17] [17]

In: Advances in Neural Information Processing Systems

Wang, A., Tarr, M., Wehbe, L.: Neural Taskonomy: Inferring the Similarity of Task- Derived Representations from Brain Activity. In: Advances in Neural Information Processing Systems. vol. 32. Curran Associates, Inc. (2019).https://doi.org/10. 1101/708016

2019

[18] [18]

Cerebral Cortex , volume =

Wen, H., Shi, J., Zhang, Y., Lu, K.H., Cao, J., Liu, Z.: Neural Encoding and Decod- ing with Deep Learning for Dynamic Natural Vision. Cerebral Cortex (New York, NY)28(12), 4136–4160 (Dec 2018).https://doi.org/10.1093/cercor/bhx268

work page doi:10.1093/cercor/bhx268 2018

[19] [19]

In: Yue, Y., Garg, A., Peng, N., Sha, F., Yu, R

Zhang, Q., Gong, Z., Wu, Z., Miao, D.: Mindsimulator: Exploring brain concept localization via synthetic fmri. In: Yue, Y., Garg, A., Peng, N., Sha, F., Yu, R. (eds.) International Conference on Learning Representations. vol. 2025, pp. 76781–76803 (2025),https://proceedings.iclr.cc/paper_files/paper/2025/ file/bf121b033db3bac31c3193e8a0dcbf66-Paper-Conference.pdf

2025