Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion
Pith reviewed 2026-06-29 07:39 UTC · model grok-4.3
The pith
A single discrete diffusion model unifies seven brain-vision-language encoding and decoding tasks via a shared Brain Tokenizer.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mind-Omni unifies seven distinct encoding and decoding tasks through a discrete diffusion paradigm, enabled by a Brain Tokenizer that transforms heterogeneous continuous brain signals into standardized discrete tokens for direct token-level interactions between any modalities in a shared semantic space, and is further supported by a curated Brain Question Answering instruction-tuning dataset that unlocks advanced reasoning capabilities.
What carries the argument
The Brain Tokenizer, which converts heterogeneous continuous brain signals into standardized discrete tokens to enable cross-modal token-level interactions in a shared semantic space.
If this is right
- Achieves a new state-of-the-art among multi-task unified frameworks on the seven tasks.
- Delivers performance competitive with or superior to larger specialized single-task models.
- Demonstrates measurable multi-task synergy through joint training in the shared space.
- Supports advanced reasoning via the added Brain Question Answering dataset.
- Provides a pathway toward foundation models of neural activity.
Where Pith is reading between the lines
- The tokenized representation could simplify connecting brain data to existing large language and vision models without custom adapters.
- If the tokenizer generalizes to new subjects or recording modalities, the same unified model could support broader clinical BCI applications.
- The discrete diffusion backbone may allow efficient sampling for real-time generation of brain-derived outputs once trained.
Load-bearing premise
The Brain Tokenizer successfully converts heterogeneous continuous brain signals into standardized discrete tokens that preserve task-relevant information across all seven tasks without significant loss or distortion.
What would settle it
A direct comparison in which the same model is retrained using continuous brain representations instead of the discrete tokens, and this version consistently outperforms the tokenized version across the seven tasks.
Figures
read the original abstract
Modeling the interplay between external stimuli and internal neural representations is a pivotal research area for Brain-Computer Interfaces (BCIs). A major limitation of prior work is the prevailing paradigm of specialized, single-task models, which curtails versatility and neglects inter-task synergies. To address this, we propose Mind-Omni, the first versatile framework that unifies seven distinct encoding and decoding tasks through a discrete diffusion paradigm. At its core is a novel Brain Tokenizer that transforms heterogeneous, continuous brain signals into standardized, discrete tokens. This enables direct, token-level interactions for mutual understanding and generation between any two or more modalities within a shared semantic space. To unlock advanced reasoning capabilities, we further curate a specialized Brain Question Answering (BQA) instruction-tuning dataset. Our model not only establishes a new state-of-the-art among multi-task unified frameworks but also provides strong evidence for multi-task synergy. By demonstrating performance competitive with, and at times superior to, larger specialized models, our work offers a powerful new paradigm for neural modeling and paves the way for foundation models of neural activity. The code is publicly available at https://github.com/ReedOnePeck/Mind-Omni.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. Mind-Omni proposes the first unified framework for seven brain encoding and decoding tasks in brain-vision-language modeling, built on a discrete diffusion paradigm. Its core component is a Brain Tokenizer that converts heterogeneous continuous brain signals into standardized discrete tokens, enabling direct token-level interactions across modalities in a shared space. The authors also curate a Brain Question Answering (BQA) instruction-tuning dataset. The manuscript claims new state-of-the-art results among multi-task unified frameworks, provides evidence of multi-task synergy, and shows performance competitive with or superior to larger specialized models. Code is released publicly.
Significance. If the central empirical claims hold, the work would be significant for shifting BCI research from specialized single-task models toward versatile unified frameworks that exploit inter-task synergies, potentially enabling foundation models of neural activity. The public code release is a clear strength supporting reproducibility.
major comments (2)
- [§3.1] §3.1 (Brain Tokenizer): The multi-task synergy claim rests on the tokenizer converting continuous brain signals into discrete tokens that preserve task-relevant information across all seven tasks without substantial loss. No reconstruction metrics, cross-task information-retention ablations, or analysis of retained modality-specific variance are reported, leaving open the possibility that any observed gains are artifacts of joint training rather than the tokenizer's fidelity.
- [§5] §5 (Experiments and results): The SOTA and competitiveness claims versus specialized models are load-bearing for the paper's contribution, yet the section supplies no explicit baseline tables, parameter counts for comparators, statistical tests, or error bars. Without these, it is impossible to verify whether reported gains reflect genuine synergy or post-hoc modeling choices.
minor comments (2)
- [Abstract] Abstract: The seven tasks are referenced but never enumerated; adding a short list would improve immediate readability.
- [§3] Notation: The diffusion process and token embedding dimensions are introduced without a consolidated table of symbols, which would aid readers following the token-level interaction equations.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below and will make the indicated revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [§3.1] The multi-task synergy claim rests on the tokenizer converting continuous brain signals into discrete tokens that preserve task-relevant information across all seven tasks without substantial loss. No reconstruction metrics, cross-task information-retention ablations, or analysis of retained modality-specific variance are reported, leaving open the possibility that any observed gains are artifacts of joint training rather than the tokenizer's fidelity.
Authors: We agree that direct metrics on the Brain Tokenizer would provide stronger support for the multi-task synergy claim. While competitive end-to-end performance offers indirect evidence of information preservation, we will add reconstruction metrics, cross-task retention ablations, and modality-specific variance analysis in the revised manuscript to rule out joint-training artifacts. revision: yes
-
Referee: [§5] The SOTA and competitiveness claims versus specialized models are load-bearing for the paper's contribution, yet the section supplies no explicit baseline tables, parameter counts for comparators, statistical tests, or error bars. Without these, it is impossible to verify whether reported gains reflect genuine synergy or post-hoc modeling choices.
Authors: We acknowledge that the current results section lacks the requested rigor. In revision we will include explicit baseline tables with parameter counts, statistical significance tests, and error bars from repeated runs to enable clear verification of the reported gains and synergy effects. revision: yes
Circularity Check
No circularity; empirical performance claims do not reduce to definitional inputs
full rationale
The provided manuscript text (abstract plus context) advances Mind-Omni as a unified discrete-diffusion framework whose tokenizer converts brain signals to tokens enabling cross-task interactions, with synergy evidenced by competitive performance versus specialized models. No equations, parameter-fitting procedures, or self-citations are exhibited that would make any reported result equivalent to its inputs by construction. The tokenizer's information-preservation role is an empirical precondition rather than a self-referential definition, and the multi-task results are positioned as falsifiable benchmarks against external models. This satisfies the criteria for a self-contained derivation with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
J., St-Yves, G., Wu, Y ., Breedlove, J
Allen, E. J., St-Yves, G., Wu, Y ., Breedlove, J. L., Prince, J. S., Dowdle, L. T., Nau, M., Caron, B., Pestilli, F., Charest, I., et al. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence.Nature neuroscience, 25(1):116–126, 2022
2022
-
[2]
D., Ho, J., Tarlow, D., and Van Den Berg, R
Austin, J., Johnson, D. D., Ho, J., Tarlow, D., and Van Den Berg, R. Structured denoising diffusion models in discrete state-spaces.Advances in neural information processing systems, 34:17981–17993, 2021
2021
-
[3]
Meissonic: Revitalizing masked generative transformers for ef- ficient high-resolution text-to-image synthesis
Bai, J., Ye, T., Chow, W., Song, E., Chen, Q.-G., Li, X., Dong, Z., Zhu, L., and Yan, S. Meissonic: Revitalizing masked generative transformers for ef- ficient high-resolution text-to-image synthesis. In The Thirteenth International Conference on Learn- ing Representations, 2024
2024
-
[4]
One transformer fits all distributions in multi-modal diffusion at scale
Bao, F., Nie, S., Xue, K., Li, C., Pu, S., Wang, Y ., Yue, G., Cao, Y ., Su, H., and Zhu, J. One transformer fits all distributions in multi-modal diffusion at scale. InInternational Conference on Machine Learning, pp. 1692–1717. PMLR, 2023
2023
-
[5]
Mindsimulator: Exploring brain concept localization via synthetic fmri.ICLR, 2025
Bao, G., Zhang, Q., Gong, Z., Wu, Z., and Miao, D. Mindsimulator: Exploring brain concept localization via synthetic fmri.ICLR, 2025
2025
-
[6]
Wills aligner: Multi-subject collaborative brain visual decoding
Bao, G., Zhang, Q., Gong, Z., Zhou, J., Fan, W., Yi, K., Naseem, U., Hu, L., and Miao, D. Wills aligner: Multi-subject collaborative brain visual decoding. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp. 14194–14202, 2025
2025
-
[7]
From voxels to pixels and back: Self- supervision in natural-image reconstruction from fMRI.Advances in Neural Information Processing Systems, 32, 2019
Beliy, R., Gaziv, G., Hoogi, A., Strappini, F., Golan, T., and Irani, M. From voxels to pixels and back: Self- supervision in natural-image reconstruction from fMRI.Advances in Neural Information Processing Systems, 32, 2019
2019
-
[8]
Chang, H., Zhang, H., Jiang, L., Liu, C., and Free- man, W. T. Maskgit: Masked generative image trans- former. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 11315–11325, 2022
2022
-
[9]
L., and Zhou, J
Chen, Z., Qing, J., Xiang, T., Yue, W. L., and Zhou, J. H. Seeing beyond the brain: Conditional diffu- sion model with sparse masked modeling for vision decoding. InProceedings of the IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition, pp. 22710–22720, 2023. 9 Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via D...
2023
-
[10]
E., et al
Chiang, W.-L., Li, Z., Lin, Z., Sheng, Y ., Wu, Z., Zhang, H., Zheng, L., Zhuang, S., Zhuang, Y ., Gon- zalez, J. E., et al. Vicuna: An open-source chatbot impressing gpt-4 with 90%* chatgpt quality.See https://vicuna. lmsys. org (accessed 14 April 2023), 2(3):6, 2023
2023
-
[11]
J., Zoccolan, D., and Rust, N
DiCarlo, J. J., Zoccolan, D., and Rust, N. C. How does the brain solve visual object recognition?Neu- ron, 73(3):415–434, 2012
2012
-
[12]
E., Jiang, Y ., Shuman, M., and Kan- wisher, N
Downing, P. E., Jiang, Y ., Shuman, M., and Kan- wisher, N. A cortical area selective for visual pro- cessing of the human body.Science, 293(5539):2470– 2473, 2001
2001
-
[13]
Du, C., Fu, K., Li, J., and He, H. Decoding vi- sual neural representations by multimodal learning of brain-visual-linguistic features.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9): 10760–10777, 2023
2023
-
[14]
Human-like object concept representations emerge naturally in multimodal large language models.Nature Machine Intelligence, pp
Du, C., Fu, K., Wen, B., Sun, Y ., Peng, J., Wei, W., Gao, Y ., Wang, S., Zhang, C., Li, J., et al. Human-like object concept representations emerge naturally in multimodal large language models.Nature Machine Intelligence, pp. 1–16, 2025
2025
-
[15]
The Llama 3 Herd of Models
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al- Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al. The Llama 3 Herd of Models. arXiv e-prints, pp. arXiv–2407, 2024
2024
-
[16]
and Kanwisher, N
Epstein, R. and Kanwisher, N. A cortical represen- tation of the local visual environment.Nature, 392 (6676):598–601, 1998
1998
-
[17]
Taming trans- formers for high-resolution image synthesis
Esser, P., Rombach, R., and Ommer, B. Taming trans- formers for high-resolution image synthesis. InPro- ceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 12873–12883, 2021
2021
-
[18]
Scaling rectified flow transform- ers for high-resolution image synthesis
Esser, P., Kulal, S., Blattmann, A., Entezari, R., M¨uller, J., Saini, H., Levi, Y ., Lorenz, D., Sauer, A., Boesel, F., et al. Scaling rectified flow transform- ers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
2024
-
[19]
Reconstructing percep- tive images from brain activity by shape-semantic GAN.Advances in Neural Information Processing Systems, 33:13038–13048, 2020
Fang, T., Qi, Y ., and Pan, G. Reconstructing percep- tive images from brain activity by shape-semantic GAN.Advances in Neural Information Processing Systems, 33:13038–13048, 2020
2020
-
[20]
Alleviating the semantic gap for generalized fMRI-to-image recon- struction.Advances in Neural Information Process- ing Systems, 36:15096–15107, 2023
Fang, T., Zheng, Q., and Pan, G. Alleviating the semantic gap for generalized fMRI-to-image recon- struction.Advances in Neural Information Process- ing Systems, 36:15096–15107, 2023
2023
-
[21]
Felleman, D. J. and Van Essen, D. C. Distributed hierarchical processing in the primate cerebral cortex. Cerebral cortex (New York, NY: 1991), 1(1):1–47, 1991
1991
-
[22]
Ferrante, M., Ozcelik, F., Boccato, T., VanRullen, R., and Toschi, N. Brain Captioning: Decoding human brain activity into images and text.arXiv preprint arXiv:2305.11560, 2023
-
[23]
C., Botteron, K., Almli, C
Fonov, V ., Evans, A. C., Botteron, K., Almli, C. R., McKinstry, R. C., Collins, D. L., Group, B. D. C., et al. Unbiased average age-appropriate atlases for pediatric studies.Neuroimage, 54(1):313–327, 2011
2011
-
[24]
Mod- ular encoding and decoding models derived from Bayesian canonical correlation analysis.Neural com- putation, 25(4):979–1005, 2013
Fujiwara, Y ., Miyawaki, Y ., and Kamitani, Y . Mod- ular encoding and decoding models derived from Bayesian canonical correlation analysis.Neural com- putation, 25(4):979–1005, 2013
2013
-
[25]
SEED-X: Multimodal Models with Unified Multi-granularity Comprehension and Generation
Ge, Y ., Zhao, S., Zhu, J., Ge, Y ., Yi, K., Song, L., Li, C., Ding, X., and Shan, Y . Seed-x: Multimodal models with unified multi-granularity comprehension and generation.arXiv preprint arXiv:2404.14396, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[26]
Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction
Gong, Z., Zhang, Q., Bao, G., Zhu, L., Xu, R., Liu, K., Hu, L., and Miao, D. Mindtuner: Cross-subject visual decoding with visual fingerprint and semantic correction. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp. 14247–14255, 2025
2025
-
[27]
Onellm: One framework to align all modalities with language
Han, J., Gong, K., Zhang, Y ., Wang, J., Zhang, K., Lin, D., Qiao, Y ., Gao, P., and Yue, X. Onellm: One framework to align all modalities with language. arXiv preprint arXiv:2312.03700, 2023
-
[28]
Variational Autoencoder: An unsupervised model for encoding and decoding fMRI activity in visual cortex.NeuroImage, 198:125–136, 2019
Han, K., Wen, H., Shi, J., Lu, K.-H., Zhang, Y ., Fu, D., and Liu, Z. Variational Autoencoder: An unsupervised model for encoding and decoding fMRI activity in visual cortex.NeuroImage, 198:125–136, 2019
2019
-
[29]
V ., Guntupalli, J
Haxby, J. V ., Guntupalli, J. S., Nastase, S. A., and Fei- long, M. Hyperalignment: Modeling shared informa- tion encoded in idiosyncratic cortical topographies. elife, 9:e56601, 2020
2020
-
[30]
Masked autoencoders are scalable vision learners
He, K., Chen, X., Xie, S., Li, Y ., Doll´ar, P., and Gir- shick, R. Masked autoencoders are scalable vision learners. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pp. 16000–16009, 2022
2022
-
[31]
Mind captioning: Evolving descriptive text of mental content from human brain activity
Horikawa, T. Mind captioning: Evolving descriptive text of mental content from human brain activity. Science Advances, 11(45):eadw1464, 2025. 10 Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion
2025
-
[32]
and Kamitani, Y
Horikawa, T. and Kamitani, Y . Generic decoding of seen and imagined objects using hierarchical visual features.Nature communications, 8(1):15037, 2017
2017
-
[33]
Hubel, D. H. and Wiesel, T. N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex.The Journal of physiology, 160(1):106, 1962
1962
-
[34]
G., De Heer, W
Huth, A. G., De Heer, W. A., Griffiths, T. L., Theunis- sen, F. E., and Gallant, J. L. Natural speech reveals the semantic maps that tile human cerebral cortex. Nature, 532(7600):453–458, 2016
2016
-
[35]
Neu- roLM: A universal multi-task foundation model for bridging the gap between language and EEG signals
Jiang, W.-B., Wang, Y ., Lu, B.-L., and Li, D. Neu- roLM: A universal multi-task foundation model for bridging the gap between language and EEG signals. ICLR, 2025
2025
-
[36]
Kanwisher, N., McDermott, J., and Chun, M. M. The fusiform face area: a module in human extrastriate cortex specialized for face perception.Journal of neuroscience, 17(11):4302–4311, 1997
1997
-
[37]
N., Naselaris, T., Prenger, R
Kay, K. N., Naselaris, T., Prenger, R. J., and Gallant, J. L. Identifying natural images from human brain activity.Nature, 452(7185):352–355, 2008
2008
-
[38]
J., Saleem, K
Kravitz, D. J., Saleem, K. S., Baker, C. I., and Mishkin, M. A new neural framework for visuospa- tial processing.Nature Reviews Neuroscience, 12(4): 217–230, 2011
2011
-
[39]
Kriegeskorte, N., Mur, M., and Bandettini, P. A. Representational similarity analysis-connecting the branches of systems neuroscience.Frontiers in sys- tems neuroscience, 2:249, 2008
2008
-
[40]
Labs, B. F. Flux, 2024
2024
-
[41]
Labs, B. F., Batifol, S., Blattmann, A., Boesel, F., Consul, S., Diagne, C., Dockhorn, T., English, J., English, Z., Esser, P., Kulal, S., Lacey, K., Levi, Y ., Li, C., Lorenz, D., M¨uller, J., Podell, D., Rom- bach, R., Saini, H., Sauer, A., and Smith, L. Flux.1 kontext: Flow matching for in-context image gen- eration and editing in latent space, 2025. U...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[42]
Multimodal data fusion: an overview of methods, challenges, and prospects.Proceedings of the IEEE, 103(9):1449– 1477, 2015
Lahat, D., Adali, T., and Jutten, C. Multimodal data fusion: an overview of methods, challenges, and prospects.Proceedings of the IEEE, 103(9):1449– 1477, 2015
2015
-
[43]
Crafting papers on machine learning
Langley, P. Crafting papers on machine learning. In Langley, P. (ed.),Proceedings of the 17th Inter- national Conference on Machine Learning (ICML 2000), pp. 1207–1216, Stanford, CA, 2000. Morgan Kaufmann
2000
-
[44]
Li, D., Wei, C., Li, S., Zou, J., Qin, H., and Liu, Q. Visual decoding and reconstruction via EEG embeddings with guided diffusion.arXiv preprint arXiv:2403.07721, 2024
-
[45]
Lin, T.-Y ., Maire, M., Belongie, S., Hays, J., Per- ona, P., Ramanan, D., Doll´ar, P., and Zitnick, C. L. Microsoft COCO: Common objects in context. In Computer Vision–ECCV 2014: 13th European Con- ference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pp. 740–755. Springer, 2014
2014
-
[46]
Liu, H., Li, C., Wu, Q., and Lee, Y . J. Visual in- struction tuning.Advances in neural information processing systems, 36:34892–34916, 2023
2023
-
[47]
Liu, H., Li, C., Li, Y ., and Lee, Y . J. Improved base- lines with visual instruction tuning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 26296–26306, 2024
2024
-
[48]
F., Cheng, K.-T., and Chen, M.-H
Liu, S.-Y ., Wang, C.-Y ., Yin, H., Molchanov, P., Wang, Y .-C. F., Cheng, K.-T., and Chen, M.-H. Dora: Weight-decomposed low-rank adaptation. InForty- first International Conference on Machine Learning, 2024
2024
-
[49]
Brain diffusion for visual exploration: Cortical dis- covery using large scale generative models.Advances in Neural Information Processing Systems, 36:75740– 75781, 2023
Luo, A., Henderson, M., Wehbe, L., and Tarr, M. Brain diffusion for visual exploration: Cortical dis- covery using large scale generative models.Advances in Neural Information Processing Systems, 36:75740– 75781, 2023
2023
-
[50]
M., Tarr, M
Luo, A., Henderson, M. M., Tarr, M. J., and We- hbe, L. Brainscuba: Fine-grained natural language captions of visual cortex selectivity. InThe Twelfth In- ternational Conference on Learning Representations, 2023
2023
-
[51]
F., Zheng, Q., Ouyang, W., and Song, C
Mai, W., Wu, J., Zhu, Y ., Yao, Z., Zhou, D., Luo, A. F., Zheng, Q., Ouyang, W., and Song, C. SynBrain: Enhancing Visual-to-fMRI Synthesis via Probabilis- tic Representation Learning.NeurIPS, 2025
2025
-
[52]
A four-dimensional probabilistic atlas of the human brain.Journal of the American Medical Informatics Association, 8(5):401–430, 2001
Mazziotta, J., Toga, A., Evans, A., Fox, P., Lancaster, J., Zilles, K., Woods, R., Paus, T., Simpson, G., Pike, B., et al. A four-dimensional probabilistic atlas of the human brain.Journal of the American Medical Informatics Association, 8(5):401–430, 2001
2001
-
[53]
C., Toga, A
Mazziotta, J. C., Toga, A. W., Evans, A., Fox, P., Lancaster, J., et al. A probabilistic atlas of the hu- man brain: theory and rationale for its development. Neuroimage, 2(2):89–101, 1995
1995
-
[54]
J., Kay, K
Naselaris, T., Prenger, R. J., Kay, K. N., Oliver, M., and Gallant, J. L. Bayesian reconstruction of natural images from human brain activity.Neuron, 63(6): 902–915, 2009. 11 Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion
2009
-
[55]
Large Language Diffusion Models
Nie, S., Zhu, F., You, Z., Zhang, X., Ou, J., Hu, J., Zhou, J., Lin, Y ., Wen, J.-R., and Li, C. Large language diffusion models.arXiv preprint arXiv:2502.09992, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[56]
Olshausen, B. A. and Field, D. J. Emergence of simple-cell receptive field properties by learning a sparse code for natural images.Nature, 381(6583): 607–609, 1996
1996
-
[57]
and VanRullen, R
Ozcelik, F. and VanRullen, R. Natural scene recon- struction from fMRI signals using generative latent diffusion.Scientific Reports, 13(1):15666, 2023
2023
-
[58]
Reconstruction of perceived images from fMRI patterns and semantic brain exploration using instance-conditioned GANs
Ozcelik, F., Choksi, B., Mozafari, M., Reddy, L., and VanRullen, R. Reconstruction of perceived images from fMRI patterns and semantic brain exploration using instance-conditioned GANs. In2022 Interna- tional Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE, 2022
2022
-
[59]
J., and Rogers, T
Patterson, K., Nestor, P. J., and Rogers, T. T. Where do you know what you know? the representation of semantic knowledge in the human brain.Nature reviews neuroscience, 8(12):976–987, 2007
2007
-
[60]
S., Charest, I., Kurzawski, J
Prince, J. S., Charest, I., Kurzawski, J. W., Pyles, J. A., Tarr, M. J., and Kay, K. N. Improving the ac- curacy of single-trial fMRI response estimates using GLMsingle.Elife, 11:e77599, 2022
2022
-
[61]
Psychometry: An omnifit model for image recon- struction from human brain activity
Quan, R., Wang, W., Tian, Z., Ma, F., and Yang, Y . Psychometry: An omnifit model for image recon- struction from human brain activity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 233–243, 2024
2024
-
[62]
W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al
Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual mod- els from natural language supervision. InInterna- tional conference on machine learning, pp. 8748–
-
[63]
High-resolution image synthesis with latent diffusion models
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pat- tern recognition, pp. 10684–10695, 2022
2022
-
[64]
E., McClelland, J
Rumelhart, D. E., McClelland, J. L., Group, P. R., et al.Parallel distributed processing, volume 1: Ex- plorations in the microstructure of cognition: Foun- dations. The MIT press, 1986
1986
-
[65]
J., Murty, N
Schrimpf, M., Kubilius, J., Lee, M. J., Murty, N. A. R., Ajemian, R., and DiCarlo, J. J. Integrative benchmarking to advance neurally mechanistic mod- els of human intelligence.Neuron, 108(3):413–423, 2020
2020
-
[66]
Reconstruct- ing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors.Advances in Neural Information Processing Systems, 36, 2024
Scotti, P., Banerjee, A., Goode, J., Shabalin, S., Nguyen, A., Dempster, A., Verlinde, N., Yundler, E., Weisberg, D., Norman, K., et al. Reconstruct- ing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors.Advances in Neural Information Processing Systems, 36, 2024
2024
-
[67]
S., Tripathy, M., Villanueva, C
Scotti, P. S., Tripathy, M., Villanueva, C. K. T., Knee- land, R., Chen, T., Narang, A., Santhirasegaran, C., Xu, J., Naselaris, T., Norman, K. A., et al. Mindeye2: Shared-subject models enable fMRI-to-image with 1 hour of data.International Conference on Machine Learning, 2024
2024
-
[68]
Neuro-vision to language: Enhancing brain recording-based visual reconstruc- tion and language interaction.Advances in Neural Information Processing Systems, 37:98083–98110, 2024
Shen, G., Zhao, D., He, X., Feng, L., Dong, Y ., Wang, J., Zhang, Q., and Zeng, Y . Neuro-vision to language: Enhancing brain recording-based visual reconstruc- tion and language interaction.Advances in Neural Information Processing Systems, 37:98083–98110, 2024
2024
-
[69]
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Shi, Q., Bai, J., Zhao, Z., Chai, W., Yu, K., Wu, J., Song, S., Tong, Y ., Li, X., Li, X., et al. Mud- dit: Liberating generation beyond text-to-image with a unified discrete diffusion model.arXiv preprint arXiv:2505.23606, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[70]
M., Jenkinson, M., Woolrich, M
Smith, S. M., Jenkinson, M., Woolrich, M. W., Beck- mann, C. F., Behrens, T. E., Johansen-Berg, H., Ban- nister, P. R., De Luca, M., Drobnjak, I., Flitney, D. E., et al. Advances in functional and structural mr image analysis and implementation as fsl.Neuroimage, 23: S208–S219, 2004
2004
-
[71]
Decoding natural images from EEG for object recognition.arXiv preprint arXiv:2308.13234, 2023
Song, Y ., Liu, B., Li, X., Shi, N., Wang, Y ., and Gao, X. Decoding natural images from EEG for object recognition.arXiv preprint arXiv:2308.13234, 2023
-
[72]
Revealing vision-language integration in the brain with multi- modal networks.ArXiv, pp
Subramaniam, V ., Conwell, C., Wang, C., Kreiman, G., Katz, B., Cases, I., and Barbu, A. Revealing vision-language integration in the brain with multi- modal networks.ArXiv, pp. arXiv–2406, 2024
2024
-
[73]
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation
Sun, P., Jiang, Y ., Chen, S., Zhang, S., Peng, B., Luo, P., and Yuan, Z. Autoregressive model beats diffusion: Llama for scalable image generation.arXiv preprint arXiv:2406.06525, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[74]
Gener- ative multimodal models are in-context learners
Sun, Q., Cui, Y ., Zhang, X., Zhang, F., Yu, Q., Wang, Y ., Rao, Y ., Liu, J., Huang, T., and Wang, X. Gener- ative multimodal models are in-context learners. In Proceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pp. 14398– 14409, 2024. 12 Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Di...
2024
-
[75]
Emu: Generative pretraining in multimodality.ICLR, 2024
Sun, Q., Yu, Q., Cui, Y ., Zhang, F., Zhang, X., Wang, Y ., Gao, H., Liu, J., Huang, T., and Wang, X. Emu: Generative pretraining in multimodality.ICLR, 2024
2024
-
[76]
M., Daunhawer, I., and V ogt, J
Sutter, T. M., Daunhawer, I., and V ogt, J. E. General- ized multimodal ELBO.ICLR, 2021
2021
-
[77]
and Nishimoto, S
Takagi, Y . and Nishimoto, S. High-resolution image reconstruction with latent diffusion models from hu- man brain activity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition, pp. 14453–14463, 2023
2023
-
[78]
Inferotemporal cortex and object vision
Tanaka, K. Inferotemporal cortex and object vision. Annual review of neuroscience, 19(1):109–139, 1996
1996
-
[79]
Brain encoding models based on multimodal transformers can transfer across language and vision.Advances in Neural Information Processing Systems, 36:29654– 29666, 2023
Tang, J., Du, M., V o, V ., Lal, V ., and Huth, A. Brain encoding models based on multimodal transformers can transfer across language and vision.Advances in Neural Information Processing Systems, 36:29654– 29666, 2023
2023
-
[80]
Tang, J., LeBel, A., Jain, S., and Huth, A. G. Se- mantic reconstruction of continuous language from non-invasive brain recordings.Nature Neuroscience, 26(5):858–866, 2023
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.