MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery
Pith reviewed 2026-05-20 13:42 UTC · model grok-4.3
The pith
A linear backbone with multi-modal text and image features lets a diffusion model reconstruct mental images from fMRI after training only on external visual stimuli.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MIRAGE trains on large-scale datasets of external visual stimuli to decode mental images from brain activity. It uses a linear backbone that combines multi-modal text features with both high- and low-level image features as conditioning input to a diffusion model. On the NSD-Imagery benchmark this yields state-of-the-art reconstructions according to feature-space metrics and human raters. Ablation experiments indicate that performance peaks when image features are kept low-dimensional and when text guidance is included alongside both high- and low-level visual features.
What carries the argument
The MIRAGE linear backbone that fuses multi-modal text and image features to condition a diffusion model for fMRI-to-image translation.
If this is right
- Mental-image reconstruction reaches state-of-the-art levels without any direct training on internally generated imagery data.
- Low-dimensional image features plus text and both high- and low-level visual cues produce the most accurate mental-image outputs.
- Existing large-scale vision datasets become viable training resources for mental-image decoders once the architecture is chosen appropriately.
- The gap between seen-image and mental-image decoding performance can be closed by explicit multi-modal conditioning rather than by scaling model size alone.
Where Pith is reading between the lines
- If the transfer works, brain-computer interfaces could visualize a person's current visual thought without requiring them to look at matching external pictures.
- The same multi-modal conditioning strategy might extend to decoding other internal states such as auditory imagery or spatial navigation.
- Future tests could check whether the low-dimensional feature preference holds when the diffusion model is replaced by a different generative backbone.
- The result implies that mental imagery and external vision share a common low-dimensional representational subspace that can be read out with modest additional guidance.
Load-bearing premise
Brain activity patterns evoked by external visual stimuli are similar enough to those generated during mental imagery that a decoder trained on the former can be applied to the latter.
What would settle it
A controlled experiment in which participants generate mental images while scanned, a model is trained directly on those mental-image fMRI pairs, and that model produces reconstructions rated higher by humans or closer in feature space than MIRAGE outputs on the same test set.
Figures
read the original abstract
To be useful for downstream applications, vision decoding models that are trained to reconstruct seen images from human brain activity must be able to generalize to internally generated visual representations, i.e., mental images. In an analysis of the recently released NSD-Imagery dataset, we demonstrated that while some modern vision decoders can perform quite well on mental image reconstruction, some fail, and that state-of-the-art (SOTA) performance on seen image reconstruction is no guarantee of SOTA performance on mental image reconstruction. Motivated by these findings, we developed MIRAGE, a method explicitly designed to train on vision datasets and cross-decode mental images from brain activity. MIRAGE employs a linear backbone and multi-modal text and image features as input to a diffusion model. Feature metrics and human raters establish MIRAGE as SOTA for mental image reconstruction on the NSD-Imagery benchmark. With ablation analysis we show that mental image reconstruction works best when decoders use image features with relatively few dimensions and include guidance from text-based and both high- and low-level image-based features. Our work indicates that--given the right architecture--existing large-scale datasets using external stimuli are viable training data for decoding mental images, and warrant optimism about the future success and utility of mental image reconstruction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MIRAGE, a linear multi-modal architecture that trains on external vision datasets and uses text plus image features (high- and low-level) as input to a diffusion model for reconstructing mental images from fMRI. On the NSD-Imagery benchmark it reports state-of-the-art performance via feature metrics and human ratings, with ablations showing best results when image features are low-dimensional and guidance from text and both high- and low-level image features is included. The central conclusion is that large-scale seen-image datasets are viable training data for mental-image decoding.
Significance. If the reported generalization holds, the result would be significant for brain decoding: it supplies an explicit architecture and training recipe that bridges seen-image and mental-image regimes, supplies concrete ablation evidence on which feature types matter, and offers a relatively simple linear backbone that may aid interpretability. The work also supplies a falsifiable prediction that performance on cued mental imagery should transfer when the same linear mapping is applied to novel, non-cued internal content.
major comments (2)
- [Abstract and §4] Abstract and §4 (Evaluation): the claim that 'existing large-scale datasets using external stimuli are viable training data for decoding mental images' is load-bearing for the paper's main contribution, yet the NSD-Imagery mental-imagery trials are cued by previously viewed natural scenes. This leaves open the possibility that brain activity contains recall components aligned with the training distribution rather than arbitrary internally generated content; the reported ablations do not isolate this factor.
- [Results] Results section: the assertion of SOTA performance is supported only by the statement that 'feature metrics and human raters establish MIRAGE as SOTA'; no numerical values, baseline scores, error bars, or statistical tests appear in the abstract or summary text, preventing direct verification of the performance gap.
minor comments (2)
- [Methods] Methods: the exact procedure for obtaining and reducing the dimensionality of the image features (one of the free parameters listed in the axiom ledger) should be stated explicitly, including the source embedding model and any learned projection.
- [Figure 1 and §3] Figure captions and §3: the multi-modal input diagram should label each feature stream (text, high-level image, low-level image) and indicate whether the linear backbone is trained exclusively on vision data before zero-shot application to mental-imagery fMRI.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. These help us clarify the scope of our generalization claims and strengthen the presentation of quantitative results. We respond to each major comment below.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Evaluation): the claim that 'existing large-scale datasets using external stimuli are viable training data for decoding mental images' is load-bearing for the paper's main contribution, yet the NSD-Imagery mental-imagery trials are cued by previously viewed natural scenes. This leaves open the possibility that brain activity contains recall components aligned with the training distribution rather than arbitrary internally generated content; the reported ablations do not isolate this factor.
Authors: We agree that the NSD-Imagery mental-imagery trials are cued by previously viewed scenes and therefore may engage recall processes in addition to internally generated content. Our ablations examine feature-type contributions rather than isolating recall versus pure generation. Nevertheless, the central result remains that a linear multi-modal model trained exclusively on external vision data successfully decodes these mental images, supporting the viability of large-scale seen-image datasets for mental-image reconstruction on this benchmark. We will revise the abstract and add a paragraph in the Discussion to explicitly note the cued nature of the imagery, distinguish it from uncued internal content, and frame this as a limitation for future work. revision: partial
-
Referee: [Results] Results section: the assertion of SOTA performance is supported only by the statement that 'feature metrics and human raters establish MIRAGE as SOTA'; no numerical values, baseline scores, error bars, or statistical tests appear in the abstract or summary text, preventing direct verification of the performance gap.
Authors: We agree that the abstract and high-level summary would be strengthened by including concrete numerical comparisons. The full Results section already reports detailed feature-metric values, baseline scores, human ratings, and statistical comparisons in tables and figures. We will revise the abstract to include representative quantitative results (e.g., key metric improvements and human preference rates) with pointers to the supporting tables, enabling immediate verification of the SOTA claim. revision: yes
Circularity Check
No significant circularity; derivation is empirically grounded
full rationale
The paper trains MIRAGE (linear backbone + multi-modal features into diffusion model) on external-stimulus vision datasets and reports SOTA metrics on the held-out NSD-Imagery mental-imagery benchmark. The central claim—that such training data are viable for mental-image decoding—rests on cross-domain empirical performance rather than any self-definitional mapping, fitted parameter renamed as prediction, or load-bearing self-citation chain. No equations or sections in the provided text reduce the generalization result to a quantity defined by the model’s own fitted values. The architecture choices and ablation results (low-dimensional image features plus text/high-low guidance) are presented as independent design decisions whose success is measured externally. This is the most common honest non-finding for a methods paper whose test set is distinct from its training distribution.
Axiom & Free-Parameter Ledger
free parameters (1)
- dimensionality of image features
axioms (1)
- domain assumption Large-scale external-stimulus fMRI datasets can serve as effective training data for mental imagery decoding.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
MIRAGE employs a linear backbone and multi-modal text and image features as input to a diffusion model... ablation analysis we show that mental image reconstruction works best when decoders use image features with relatively few dimensions and include guidance from text-based and both high- and low-level image-based features.
-
IndisputableMonolith/Foundation/DimensionForcing.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce MIRAGE... linear decoding backbones with low-dimensional multi-modal feature spaces
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Paul S. Scotti, Mihir Tripathy, Cesare Kadir Torrico Villanueva, Reese Kneeland, Tong Chen, Ashutosh Narang, Charan Santhirasegaran, Jonathan Xu, Thomas Naselaris, Kenneth A. Nor- man, and Tanishq Mathew Abraham. Mindeye2: shared-subject models enable fmri-to-image with 1 hour of data. InProceedings of the 41st International Conference on Machine Learning, 2024
work page 2024
-
[2]
Scotti, Ghislain St-Yves, Jesse Breedlove, Kendrick Kay, and Thomas Naselaris
Reese Kneeland, Paul S. Scotti, Ghislain St-Yves, Jesse Breedlove, Kendrick Kay, and Thomas Naselaris. Nsd-imagery: A benchmark dataset for extending fmri vision decoding methods to mental imagery. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 28852–28862, June 2025
work page 2025
-
[4]
Stephen M Kosslyn, William L Thompson, and Giorgio Ganis.The case for mental imagery. Oxford University Press, 2006. 12
work page 2006
-
[5]
Mark Stokes, Russell Thompson, Rhodri Cusack, and John Duncan. Top-Down Activation of Shape-Specific Population Codes in Visual Cortex during Mental Imagery.Journal of Neuroscience, 29(5):1565–1572, February 2009. ISSN 0270-6474, 1529-2401. doi: 10.1523/ JNEUROSCI.4657-08.2009. URL https://www.jneurosci.org/content/29/5/1565. Publisher: Society for Neuros...
work page 2009
-
[6]
Reading Imagined Letter Shapes from the Mind’s Eye Using Real-time 7 Tesla fMRI
Rainer Goebel, Rick van Hoof, Salil Bhat, Michael Lührs, and Mario Senden. Reading Imagined Letter Shapes from the Mind’s Eye Using Real-time 7 Tesla fMRI. In2022 10th International Winter Conference on Brain-Computer Interface (BCI), pages 1–3, February 2022. doi: 10.1109/BCI53720.2022.9735031. ISSN: 2572-7672
-
[7]
Thomas Naselaris, Cheryl A. Olman, Dustin E. Stansbury, Kamil Ugurbil, and Jack L. Gallant. A voxel-wise encoding model for early visual areas decodes mental images of remembered scenes.NeuroImage, 105:215–228, January 2015. ISSN 1053-8119. doi: 10.1016/j.neuroimage.2014.10.018. URL https://www.sciencedirect.com/science/ article/pii/S1053811914008428
-
[8]
Leila Reddy, Naotsugu Tsuchiya, and Thomas Serre. Reading the mind’s eye: Decoding category information during mental imagery.NeuroImage, 50(2):818–825, April 2010. ISSN 1053-8119. doi: 10.1016/j.neuroimage.2009.11.084. URL https://www.sciencedirect. com/science/article/pii/S1053811909012701
-
[9]
Disentangling visual imagery and perception of real-world objects.Neuroimage, 59(4):4064–4073, 2012
Sue-Hyun Lee, Dwight J Kravitz, and Chris I Baker. Disentangling visual imagery and perception of real-world objects.Neuroimage, 59(4):4064–4073, 2012
work page 2012
-
[10]
The human imagination: the cognitive neuroscience of visual mental imagery
Joel Pearson. The human imagination: the cognitive neuroscience of visual mental imagery. Nature reviews neuroscience, 20(10):624–634, 2019
work page 2019
-
[11]
Comparison of signal to noise in vision and imagery for qualitatively different kinds of stimuli
Tiasha Saha Roy, Jesse Breedlove, Ghislain St-Yves, Kendrick Kay, and Thomas Naselaris. Comparison of signal to noise in vision and imagery for qualitatively different kinds of stimuli. Journal of Vision, 23(9):5961, 2023. ISSN 1534-7362. doi: 10.1167/jov.23.9.5961. URL https://doi.org/10.1167/jov.23.9.5961
-
[12]
Tiasha Saha Roy, Jesse Breedlove, Ghislain St-Yves, Kendrick Kay, and Thomas Naselaris. Mental imagery: Weak vision or compressed vision? InConference on Cognitive Computational Neuroscience, 2023. doi: 10.32470/CCN.2023.1693-0. URL https://2023.ccneuro.org/ view_paper4eea.html?PaperNum=1693
-
[13]
Breedlove, Ghislain St-Yves, Cheryl A
Jesse L. Breedlove, Ghislain St-Yves, Cheryl A. Olman, and Thomas Naselaris. Generative feedback explains distinct brain activity codes for seen and mental images.Current Biology, 30 (12):2211–2224.e6, 2020. ISSN 0960-9822. doi: https://doi.org/10.1016/j.cub.2020.04.014. URLhttps://www.sciencedirect.com/science/article/pii/S0960982220304942
-
[14]
Serra E Favila, Brice A Kuhl, and Jonathan Winawer. Spatial perception and memory have distinct activation profiles in human visual cortex.BioRxiv, page 811331, 2019
work page 2019
-
[15]
Radoslaw M Cichy, Jakob Heinzle, and John-Dylan Haynes. Imagery and perception share cortical representations of content and location.Cerebral cortex, 22(2):372–380, 2012
work page 2012
-
[16]
Anke Marit Albers, Peter Kok, Ivan Toni, H Chris Dijkerman, and Floris P De Lange. Shared representations for working memory and mental imagery in early visual cortex.Current Biology, 23(15):1427–1431, 2013
work page 2013
-
[17]
Ghislain St-Yves, Jesse Breedlove, Kendrick Kay, and Thomas Naselaris. Do better models of fmri visual response better predict mental imagery responses? InConference on Cognitive Computational Neuroscience, 2023. doi: 10.32470/CCN.2023.1644-0. URL https://2023. ccneuro.org/view_paper37c6.html?PaperNum=1644
-
[18]
Bertrand Thirion, Edouard Duchesnay, Edward Hubbard, Jessica Dubois, Jean-Baptiste Poline, Denis Lebihan, and Stanislas Dehaene. Inverse retinotopy: Inferring the visual content of images from brain activation patterns.NeuroImage, 33(4):1104–1116, December 2006. ISSN 10538119. doi: 10.1016/j.neuroimage.2006.06.062. URL https://linkinghub.elsevier. com/ret...
-
[19]
Emmerling, Rick van Hoof, Martin A
Mario Senden, Thomas C. Emmerling, Rick van Hoof, Martin A. Frost, and Rainer Goebel. Reconstructing imagined letters from early visual cortex reveals tight topographic correspon- dence between visual mental imagery and perception.Brain Structure and Function, 224(3): 1167–1183, Jan 2019. doi: 10.1007/s00429-019-01828-6
-
[20]
Hongmi Lee and Brice A. Kuhl. Reconstructing perceived and retrieved faces from activity patterns in lateral parietal cortex.Journal of Neuroscience, 36(22):6069–6082, 2016. Publisher: Soc Neuroscience
work page 2016
-
[21]
Guohua Shen, Tomoyasu Horikawa, Kei Majima, and Yukiyasu Kamitani. Deep image recon- struction from human brain activity.PLOS Computational Biology, 15(1):e1006633, January
-
[22]
doi: 10.1371/journal.pcbi.1006633
ISSN 1553-7358. doi: 10.1371/journal.pcbi.1006633. URL https://dx.plos.org/ 10.1371/journal.pcbi.1006633
-
[23]
Naoko Koide-Majima, Shinji Nishimoto, and Kei Majima. Mental image reconstruction from human brain activity: Neural decoding of mental imagery via deep neural network-based bayesian estimation.Neural Networks, 170:349–363, 2024. ISSN 0893-6080. doi: https:// doi.org/10.1016/j.neunet.2023.11.024. URL https://www.sciencedirect.com/science/ article/pii/S0893...
-
[24]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. In Marina Meila and Tong Zhang, editors,Proceedings of the 38th International Confer- ence on Machin...
work page 2021
-
[25]
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models.CoRR, abs/2112.10752, 2021. URL https://arxiv.org/abs/2112.10752
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[26]
Allen, Ghislain St-Yves, Yihan Wu, Jesse L
Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dowdle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchin- son, Thomas Naselaris, and Kendrick Kay. A massive 7T fMRI dataset to bridge cog- nitive neuroscience and artificial intelligence.Nature Neuroscience, 25(1):116–126, Jan- uary 2022. ...
-
[27]
High-resolution image reconstruction with latent diffusion models from human brain activity
Yu Takagi and Shinji Nishimoto. High-resolution image reconstruction with latent diffusion models from human brain activity. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14453–14463, 2023
work page 2023
-
[28]
Yu Takagi and Shinji Nishimoto. Improving visual image reconstruction from human brain activity using latent diffusion models via multiple decoded inputs, 2023
work page 2023
-
[29]
Furkan Ozcelik and Rufin VanRullen. Natural scene reconstruction from fmri signals using gen- erative latent diffusion.Scientific Reports, 13, 2023. URL https://api.semanticscholar. org/CorpusID:260439960
work page 2023
-
[30]
Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors
Paul Steven Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Cohen Ethan, Aidan James Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth Norman, and Tanishq Mathew Abraham. Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors. InThirty-seventh Conference on Neural Information Pro...
work page 2023
-
[31]
Reconstructing seen images from human brain activity via guided stochastic search
Reese Kneeland, Jordyn Ojeda, Ghislain St-Yves, and Thomas Naselaris. Reconstructing seen images from human brain activity via guided stochastic search. InConference on Cognitive Computational Neuroscience, 2023. doi: 10.32470/CCN.2023.1672-0. URL https://2023. ccneuro.org/view_paper1337.html?PaperNum=1672
-
[32]
Reese Kneeland, Jordyn Ojeda, Ghislain St-Yves, and Thomas Naselaris. Second Sight: Using brain-optimized encoding models to align image distributions with human brain activity, June
-
[33]
URLhttp://arxiv.org/abs/2306.00927. arXiv:2306.00927 [cs, q-bio]. 14
-
[34]
Brain-optimized inference improves reconstructions of fMRI brain activity, December 2023
Reese Kneeland, Jordyn Ojeda, Ghislain St-Yves, and Thomas Naselaris. Brain-optimized inference improves reconstructions of fMRI brain activity, December 2023. URL http: //arxiv.org/abs/2312.07705. arXiv:2312.07705 [cs, q-bio]
-
[35]
Matteo Ferrante, Tommaso Boccato, Furkan Ozcelik, Rufin VanRullen, and Nicola Toschi. Through their eyes: multi-subject brain decoding with simple alignment techniques.Imaging Neuroscience, 2, 04 2024. doi: 10.1162/imag_a_00170
-
[36]
Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding
Zijiao Chen, Jiaxin Qing, Tiange Xiang, Wan Lin Yue, and Juan Helen Zhou. Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding. 2023 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 22710–22720, 2022. URLhttps://api.semanticscholar.org/CorpusID:253510456
work page 2023
-
[37]
Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities, December 2023
Jingyuan Sun, Mingxiao Li, Zijiao Chen, Yunhao Zhang, Shaonan Wang, and Marie-Francine Moens. Contrast, Attend and Diffuse to Decode High-Resolution Images from Brain Activities, December 2023. URLhttp://arxiv.org/abs/2305.17214. arXiv:2305.17214 [cs]
-
[38]
Weijian Mai and Zhijun Zhang. UniBrain: Unify Image Reconstruction and Captioning All in One Diffusion Model from Human Brain Activity, August 2023. URL http://arxiv.org/ abs/2308.07428. arXiv:2308.07428 [cs]
-
[39]
Very deep {vae}s generalize autoregressive models and can outperform them on images
Rewon Child. Very deep {vae}s generalize autoregressive models and can outperform them on images. InInternational Conference on Learning Representations, 2021. URL https: //openreview.net/forum?id=RLRXCV6DbEJ
work page 2021
-
[40]
SDXL: Improving latent diffusion models for high-resolution image synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. SDXL: Improving latent diffusion models for high-resolution image synthesis. InThe Twelfth International Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=di52zR8xgf
work page 2024
-
[41]
Xingqian Xu, Zhangyang Wang, Eric Zhang, Kai Wang, and Humphrey Shi. Versatile diffusion: Text, images and variations all in one diffusion model.2023 IEEE/CVF International Conference on Computer Vision (ICCV), pages 7720–7731, 2022. URLhttps://api.semanticscholar. org/CorpusID:253523371
work page 2023
-
[42]
Wang, Kendrick Kay, Thomas Naselaris, Michael J
Aria Y . Wang, Kendrick Kay, Thomas Naselaris, Michael J. Tarr, and Leila Wehbe. Incorporating natural language into vision models improves prediction and understanding of higher visual cortex, September 2022. URL https://www.biorxiv.org/content/10.1101/2022.09. 27.508760v1. Pages: 2022.09.27.508760 Section: New Results
-
[43]
Mindbridge: A cross-subject brain decoding framework
Shizun Wang, Songhua Liu, Zhenxiong Tan, and Xinchao Wang. Mindbridge: A cross-subject brain decoding framework. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11333–11342, 2024
work page 2024
-
[44]
Jingyang Huo, Yikai Wang, Xuelin Qian, Yun Wang, Chong Li, Jianfeng Feng, and Yanwei Fu. Neuropictor: Refining fmri-to-image reconstruction via multi-individual pretraining and multi-level modulation, 2024
work page 2024
-
[45]
Brainram: Cross- modality retrieval-augmented image reconstruction from human brain activity
Dian Xie, Peiang Zhao, Jiarui Zhang, Kangqi Wei, Xiaobao Ni, and Jiong Xia. Brainram: Cross- modality retrieval-augmented image reconstruction from human brain activity. InProceedings of the 32nd ACM International Conference on Multimedia, MM ’24, page 3994–4003, New York, NY , USA, 2024. Association for Computing Machinery. ISBN 9798400706868. doi: 10.11...
-
[46]
Zhou Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE Transactions on Image Processing, 13(4):600–612, April 2004. ISSN 1941-0042. doi: 10.1109/TIP.2003.819861. Conference Name: IEEE Transactions on Image Processing
-
[47]
Imagenet classification with deep convolutional neural networks
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In F. Pereira, C.J. Burges, L. Bottou, and K.Q. Weinberger, editors,Advances in Neural Information Processing Systems, volume 25. Curran Associates, Inc., 2012. URL https://proceedings.neurips.cc/paper_files/paper/2012/file/ c399862d3b9...
work page 2012
-
[48]
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, and Zbigniew Wojna. Rethinking the inception architecture for computer vision.CoRR, abs/1512.00567, 2015. URL http://arxiv.org/abs/1512.00567
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[49]
Mingxing Tan and Quoc V . Le. Efficientnet: Rethinking model scaling for convolutional neural networks. In Kamalika Chaudhuri and Ruslan Salakhutdinov, editors,Proceedings of the 36th International Conference on Machine Learning, ICML 2019, 9-15 June 2019, Long Beach, California, USA, volume 97 ofProceedings of Machine Learning Research, pages 6105–6114. ...
work page 2019
-
[50]
Mathilde Caron, Ishan Misra, Julien Mairal, Priya Goyal, Piotr Bojanowski, and Armand Joulin. Unsupervised learning of visual features by contrasting cluster assignments.CoRR, abs/2006.09882, 2020. URLhttps://arxiv.org/abs/2006.09882
-
[51]
A perceptually based comparison of image similarity metrics
Pawan Sinha and Richard Russell. A perceptually based comparison of image similarity metrics. Perception, 40(11):1269–1281, 2011. doi: 10.1068/p7063. URL https://doi.org/10. 1068/p7063. PMID: 22416586
-
[52]
Pick-a-pic: An open dataset of user preferences for text-to-image generation
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation. InThirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview. net/forum?id=G5RwHpBUv0
work page 2023
-
[53]
Aoki, Kei Majima, Yusuke Muraki, and Yukiyasu Kamitani
Ken Shirakawa, Yoshihiro Nagano, Misato Tanaka, Shuntaro C. Aoki, Kei Majima, Yusuke Muraki, and Yukiyasu Kamitani. Spurious reconstruction from brain activity: The thin line between reconstruction, classification, and hallucination.Journal of Vision, 2024. URL https://api.semanticscholar.org/CorpusID:269791182
work page 2024
-
[54]
David Mayo, Christopher Wang, Asa Harbin, Abdulrahman Alabdulkareem, Albert Eaton Shaw, Boris Katz, and Andrei Barbu. Brainbits: How much of the brain are generative reconstruction methods using? InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URLhttps://openreview.net/forum?id=KAAUvi4kpb
work page 2024
-
[55]
Mental imagery in emotion and emotional disorders
Emily A Holmes and Andrew Mathews. Mental imagery in emotion and emotional disorders. Clinical psychology review, 30(3):349–362, 2010
work page 2010
-
[56]
Joseph T. Giacino and Kathleen Kalmar. The vegetative and minimally conscious states: A comparison of clinical features and functional outcome.Journal of Head Trauma Rehabilitation, 12(4):36–51, 1997. doi: 10.1097/00001199-199708000-00005
-
[57]
Brian L Edlow, Camille Chatelle, Camille A. Spencer, Catherine J. Chu, Yelena G. Bodien, Kathryn L. O’Connor, Ronald E. Hirschberg, Leigh R. Hochberg, Joseph T. Giacino, Eric S. Rosenthal, and et al. Early detection of consciousness in patients with acute severe traumatic brain injury.Brain, 140(9):2399–2414, 2017. doi: 10.1093/brain/awx176
-
[58]
Turgeon, François Lauzier, Jean-François Simard, Damon C
Alexis F. Turgeon, François Lauzier, Jean-François Simard, Damon C. Scales, Karen E.A. Burns, Lynne Moore, David A. Zygun, Francis Bernard, Maureen O. Meade, Tran Cong Dung, and et al. Mortality associated with withdrawal of life-sustaining therapy for patients with severe traumatic brain injury: A canadian multicentre cohort study.Canadian Medical Associ...
-
[59]
Livia Livint, Popa, Diana Chira, S, tefan Strilciuc, and Dafin F. Mures, anu. Non-invasive systems application in traumatic brain injury rehabilitation.Brain Sciences, 13(11), 2023. ISSN 2076-
work page 2023
-
[60]
URL https://www.mdpi.com/2076-3425/13/11/ 1594
doi: 10.3390/brainsci13111594. URL https://www.mdpi.com/2076-3425/13/11/ 1594
-
[61]
Shiyu Luo, Qinwan Rabbani, and Nathan E. Crone. Brain-computer interface: Applications to speech decoding and synthesis to augment communication.Neurotherapeutics, 19(1):263–273, Jan 2022. doi: 10.1007/s13311-022-01190-2
-
[62]
Evan Canny, Mariska J. Vansteensel, Sandra M. van der Salm, Gernot R. Müller-Putz, and Julia Berezutskaya. Boosting brain–computer interfaces with functional electrical stimulation: Potential applications in people with locked-in syndrome.Journal of NeuroEngineering and Rehabilitation, 20(1), Nov 2023. doi: 10.1186/s12984-023-01272-y. 16
-
[63]
Emma C. Gordon and Anil K. Seth. Ethical considerations for the use of brain–computer interfaces for cognitive enhancement.PLOS Biology, 22(10):1–15, 10 2024. doi: 10.1371/ journal.pbio.3002899. URLhttps://doi.org/10.1371/journal.pbio.3002899
-
[64]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In David Fleet, Tomas Pajdla, Bernt Schiele, and Tinne Tuytelaars, editors,Computer Vision – ECCV 2014, pages 740–755, Cham, 2014. Springer International Publishing. ISBN 978-3-319-10602-1
work page 2014
-
[65]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning. In NeurIPS, 2023
work page 2023
-
[66]
Improved baselines with visual instruction tuning
Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved baselines with visual instruction tuning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 26296–26306, June 2024
work page 2024
-
[67]
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti, Romain Beaumont, Ross Wightman, Mitchell Wortsman, Gabriel Ilharco, Cade Gordon, Christoph Schuhmann, Ludwig Schmidt, and Jenia Jitsev. Reproducible scaling laws for contrastive language-image learning. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2818–2829, 2023
work page 2023
-
[68]
Neural networks and the bias/variance dilemma.Neural Computation, 4(1):1–58, Jan 1992
Stuart Geman, Elie Bienenstock, and René Doursat. Neural networks and the bias/variance dilemma.Neural Computation, 4(1):1–58, Jan 1992. doi: 10.1162/neco.1992.4.1.1
-
[69]
Arthur E. Hoerl and Robert W. Kennard. Ridge regression: Biased estimation for nonorthogonal problems.Technometrics, 12(1):55–67, 1970. ISSN 00401706. URL http://www.jstor. org/stable/1267351
-
[70]
doi: 10.1038/s42256-023-00753-y
Aria Y . Wang, Kendrick Kay, Thomas Naselaris, Michael J. Tarr, and Leila Wehbe. Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset.Nature Machine Intelligence, 5(12):1415–1426, December 2023. ISSN 2522-5839. doi: 10.1038/s42256-023-00753-y. Publisher Copyright: 2023, The Author(s), un...
-
[71]
Würstchen: An efficient architecture for large-scale text-to-image diffusion mod- els
Pablo Pernias, Dominic Rampas, Mats Leon Richter, Christopher Pal, and Marc Aubre- ville. Würstchen: An efficient architecture for large-scale text-to-image diffusion mod- els. InThe Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=gU58d5QeGv
work page 2024
-
[72]
Jianfeng Wang, Zhengyuan Yang, Xiaowei Hu, Linjie Li, Kevin Lin, Zhe Gan, Zicheng Liu, Ce Liu, and Lijuan Wang. GIT: A generative image-to-text transformer for vision and language.Transactions on Machine Learning Research, 2022. ISSN 2835-8856. URL https://openreview.net/forum?id=b4tMhpN0JC
work page 2022
-
[73]
SDEdit: Guided Image Synthesis and Editing with Stochastic Differential Equations
Chenlin Meng, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. SDEdit: Image synthesis and editing with stochastic differential equations.CoRR, abs/2108.01073, 2021. URLhttps://arxiv.org/abs/2108.01073
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[74]
Brain Captioning: Decoding human brain activity into images and text, May 2023
Matteo Ferrante, Furkan Ozcelik, Tommaso Boccato, Rufin VanRullen, and Nicola Toschi. Brain Captioning: Decoding human brain activity into images and text, May 2023. URL http://arxiv.org/abs/2305.11560. arXiv:2305.11560 [cs]
-
[75]
Ghislain St-Yves, Emily J. Allen, Yihan Wu, Kendrick Kay, and Thomas Naselaris. Brain- optimized deep neural network models of human visual areas learn non-hierarchical repre- sentations.Nature Communications, 14(1):3329, 2023. ISSN 2041-1723. doi: 10.1038/ s41467-023-38674-4. URLhttps://doi.org/10.1038/s41467-023-38674-4. 17 Supporting information S1 Tex...
-
[76]
Normalization of images:We disabled normalization of images when computing VGG19 features. During our initial trials, normalization led to unexpected color distortions in the reconstructed images. Removing normalization allowed the reconstructions to maintain their original color integrity, which is particularly crucial for visual comparisons in tasks req...
-
[77]
For clarity, Derivative Works do not include the output of any Model
Feature decoding with Ridge Regression:Instead of the fastl2lir library, we employed the Ridge Regression implementation from the sklearn library. This change enhanced compatibility with the rest of our workflow and provided better support for managing memory-intensive computations. For VGG19 layers with a large feature space, feature decoding was perform...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.