Brain-Inspired Capture: Evidence-Driven Neuromimetic Perceptual Simulation for Visual Decoding
Pith reviewed 2026-05-10 05:45 UTC · model grok-4.3
The pith
BI-Cap aligns neural and visual modalities by emulating four stages of human visual processing plus an evidence-driven latent space to model uncertainty.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BI-Cap constructs a neuromimetic perceptual simulation paradigm comprising four biologically plausible dynamic and static transformations coupled with MI-guided dynamic blur regulation to emulate HVS processing, together with an evidence-driven latent space representation that explicitly models uncertainty to produce robust neural embeddings and thereby align neural and visual modalities.
What carries the argument
Neuromimetic pipeline of four biologically plausible transformations with MI-guided dynamic blur regulation and evidence-driven latent space for uncertainty modeling.
Load-bearing premise
The four biologically plausible transformations combined with the evidence-driven latent space will reliably align neural signals with visual data despite systematic and stochastic gaps.
What would settle it
Running the released code on the two public benchmarks and obtaining relative gains below 8 percent over the same state-of-the-art baselines would falsify the central performance claim.
Figures
read the original abstract
Visual decoding of neurophysiological signals is a critical challenge for brain-computer interfaces (BCIs) and computational neuroscience. However, current approaches are often constrained by the systematic and stochastic gaps between neural and visual modalities, largely neglecting the intrinsic computational mechanisms of the Human Visual System (HVS). To address this, we propose Brain-Inspired Capture (BI-Cap), a neuromimetic perceptual simulation paradigm that aligns these modalities by emulating HVS processing. Specifically, we construct a neuromimetic pipeline comprising four biologically plausible dynamic and static transformations, coupled with Mutual Information (MI)-guided dynamic blur regulation to simulate adaptive visual processing. Furthermore, to mitigate the inherent non-stationarity of neural activity, we introduce an evidence-driven latent space representation. This formulation explicitly models uncertainty, thereby ensuring robust neural embeddings. Extensive evaluations on zero-shot brain-to-image retrieval across two public benchmarks demonstrate that BI-Cap substantially outperforms state-of-the-art methods, achieving relative gains of 9.2\% and 8.0\%, respectively. We have released the source code on GitHub through the link https://github.com/flysnow1024/BI-Cap.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Brain-Inspired Capture (BI-Cap), a neuromimetic perceptual simulation paradigm for visual decoding of neurophysiological signals. It constructs a pipeline with four biologically plausible dynamic and static transformations, mutual information (MI)-guided dynamic blur regulation to emulate adaptive HVS processing, and an evidence-driven latent space that explicitly models uncertainty to address neural non-stationarity. The central empirical claim is that this approach yields relative gains of 9.2% and 8.0% over state-of-the-art methods on zero-shot brain-to-image retrieval across two public benchmarks, with source code released on GitHub.
Significance. If the performance gains can be rigorously attributed to the neuromimetic components and survive standard controls, the work would offer a biologically grounded alternative for aligning neural and visual modalities in BCI applications, potentially improving robustness to non-stationarity. The public code release is a clear strength for reproducibility and further scrutiny.
major comments (2)
- [Results section] Results section (and sections 3-4): the reported 9.2% and 8.0% relative gains on the two benchmarks are presented as aggregate metrics without any ablation tables or component-wise removal experiments (e.g., disabling MI-guided blur regulation or the uncertainty modeling in the evidence-driven latent space). This absence prevents attribution of the improvements to the claimed HVS-emulation elements rather than base encoder choices or hyper-parameters.
- [Experimental setup] Experimental setup (sections 3-4): no statistical significance tests on the performance deltas, no explicit description of data splits, cross-validation procedures, or controls against post-hoc selection/fitting are provided, leaving open the possibility that the gains reflect implementation details rather than the neuromimetic pipeline.
minor comments (2)
- [Section 3] The abstract and method description introduce an 'evidence-driven latent space' without a precise mathematical formulation or comparison to standard variational or uncertainty-aware embeddings; a short derivation or pseudocode would clarify its novelty.
- [Results] Figure and table captions should explicitly state the number of runs, random seeds, and whether error bars represent standard deviation or standard error.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which highlights important aspects for strengthening the attribution of results and the transparency of our experimental protocol. We address each major comment below and will incorporate revisions to enhance the manuscript's rigor.
read point-by-point responses
-
Referee: [Results section] Results section (and sections 3-4): the reported 9.2% and 8.0% relative gains on the two benchmarks are presented as aggregate metrics without any ablation tables or component-wise removal experiments (e.g., disabling MI-guided blur regulation or the uncertainty modeling in the evidence-driven latent space). This absence prevents attribution of the improvements to the claimed HVS-emulation elements rather than base encoder choices or hyper-parameters.
Authors: We agree that explicit ablation studies are necessary to rigorously attribute the observed gains to the neuromimetic components. The original submission focused on the integrated pipeline performance but did not include component-wise removal experiments. In the revised manuscript, we will add detailed ablation tables in the Results section that isolate the contributions of the MI-guided dynamic blur regulation and the evidence-driven latent space (including uncertainty modeling), along with comparisons to base encoder variants, to directly address this concern. revision: yes
-
Referee: [Experimental setup] Experimental setup (sections 3-4): no statistical significance tests on the performance deltas, no explicit description of data splits, cross-validation procedures, or controls against post-hoc selection/fitting are provided, leaving open the possibility that the gains reflect implementation details rather than the neuromimetic pipeline.
Authors: We acknowledge that the absence of statistical tests and detailed protocol descriptions limits the strength of the empirical claims. The manuscript did not report significance testing or fully specify data handling procedures. We will revise Sections 3 and 4 to include statistical significance tests (such as paired t-tests with reported p-values) on the performance deltas, provide explicit descriptions of data splits and cross-validation procedures, and document controls (e.g., pre-defined evaluation protocols) to mitigate risks of post-hoc selection or fitting. revision: yes
Circularity Check
No circularity: method components are independently specified and evaluated on external benchmarks
full rationale
The paper introduces BI-Cap as a novel neuromimetic pipeline consisting of four biologically plausible transformations, MI-guided dynamic blur regulation, and an evidence-driven latent space for uncertainty modeling. These elements are described as constructed from HVS principles and applied to align neural-visual modalities, with performance measured via zero-shot retrieval on two public benchmarks (reporting relative gains of 9.2% and 8.0%). No equations or claims reduce the reported gains to self-fitted parameters, self-citations, or definitional loops; the method is presented as a self-contained construction with released code for independent verification. The absence of ablations concerns evidential strength rather than circular derivation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Four biologically plausible dynamic and static transformations emulate HVS processing to align neural and visual modalities
- domain assumption Mutual Information-guided dynamic blur regulation simulates adaptive visual processing
invented entities (1)
-
Evidence-driven latent space representation
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Where does eeg come from and what does it mean?
M. X. Cohen, “Where does eeg come from and what does it mean?” Trends Neurosci., vol. 40, no. 4, pp. 208–218, 2017
work page 2017
-
[2]
Eeg and meg: relevance to neuroscience
F. Lopes da Silva, “Eeg and meg: relevance to neuroscience.”Neuron, vol. 80, no. 5, pp. 1112–1128, 2013
work page 2013
-
[3]
What does fmri tell us about neuronal activity?
D. J. Heeger and D. Ress, “What does fmri tell us about neuronal activity?”Nature Rev. Neurosci., vol. 3, no. 2, pp. 142–151, 2002
work page 2002
-
[4]
A brain-media deep framework towards seeing imaginations inside brains,
J. Jiang, A. Fares, and S.-H. Zhong, “A brain-media deep framework towards seeing imaginations inside brains,”IEEE Trans. Multimedia, vol. 23, pp. 1454–1465, 2021
work page 2021
-
[5]
K. Liu, Y . Liu, L. Wei, C. Tang, Y . Zhan, Z. Chen, and Z. Chen, “Smile on the face, sadness in the eyes: Bridging the emotion gap with a multimodal dataset of eye and facial behaviors,”IEEE Trans. Multimedia, pp. 1–12, 2026
work page 2026
-
[6]
Bridging the semantic gap via functional brain imaging,
X. Hu, K. Li, J. Han, X. Hua, L. Guo, and T. Liu, “Bridging the semantic gap via functional brain imaging,”IEEE Trans. Multimedia, vol. 14, no. 2, pp. 314–325, 2012
work page 2012
-
[7]
H. Huang, L. Zhao, H. Dai, L. Zhang, X. Hu, D. Zhu, and T. Liu, “Bi-avan: A brain-inspired adversarial visual attention network for characterizing human visual attention from neural activity,”IEEE Trans. Multimedia, vol. 26, pp. 11 191–11 203, 2024
work page 2024
-
[8]
Bridging the vision-brain gap with an uncertainty-aware blur prior,
H. Wu, Q. Li, C. Zhang, Z. He, and X. Ying, “Bridging the vision-brain gap with an uncertainty-aware blur prior,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2025, pp. 2246–2257
work page 2025
-
[9]
Shrinking the teacher: An adaptive teaching paradigm for asymmetric eeg-vision alignment,
L. Wu, J. Li, Z. Ren, K. Zhang, and X. Gao, “Shrinking the teacher: An adaptive teaching paradigm for asymmetric eeg-vision alignment,” arXiv:2511.11422, 2025
-
[10]
W. Zhang, S. Wang, Y . Su, X. Li, C. Zhang, and S. Zhong, “Neurobridge: Bio-inspired self-supervised eeg-to-image decoding via cognitive priors and bidirectional semantic alignment,”arXiv:2511.06836, 2025
-
[11]
Cortical magnification factor predicts the photopic contrast sensitivity of peripheral vision,
J. Rovamo, V . Virsu, and R. N ¨as¨anen, “Cortical magnification factor predicts the photopic contrast sensitivity of peripheral vision,”Nature, vol. 271, no. 5640, pp. 54–56, 1978
work page 1978
-
[12]
A. A. Faisal, L. P. Selen, and D. M. Wolpert, “Noise in the nervous system,”Nature Rev. Neurosci., vol. 9, no. 4, pp. 292–303, 2008
work page 2008
-
[13]
Global and fine information coded by single neurons in the temporal visual cortex,
Y . Sugase, S. Yamane, S. Ueno, and K. Kawano, “Global and fine information coded by single neurons in the temporal visual cortex,” Nature, vol. 400, no. 6747, pp. 869–873, 1999
work page 1999
-
[14]
Distinct spatial frequency sensitivities for processing faces and emotional expressions,
P. Vuilleumier, J. L. Armony, J. Driver, and R. J. Dolan, “Distinct spatial frequency sensitivities for processing faces and emotional expressions,” Nature Neurosci., vol. 6, no. 6, pp. 624–631, 2003
work page 2003
-
[15]
The represen- tational dynamics of visual objects in rapid serial visual processing streams,
T. Grootswagers, A. K. Robinson, and T. A. Carlson, “The represen- tational dynamics of visual objects in rapid serial visual processing streams,”NeuroImage, vol. 188, pp. 668–679, 2019
work page 2019
-
[16]
The arrangement of the three cone classes in the living human eye,
A. Roorda and D. R. Williams, “The arrangement of the three cone classes in the living human eye,”Nature, vol. 397, no. 6719, pp. 520– 522, 1999
work page 1999
-
[17]
Perceptual filling in of artificially induced scotomas in human vision,
V . S. Ramachandran and R. L. Gregory, “Perceptual filling in of artificially induced scotomas in human vision,”Nature, vol. 350, no. 6320, pp. 699–702, 1991
work page 1991
-
[18]
A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence,
E. J. Allen, G. St-Yves, Y . Wu, J. L. Breedlove, J. S. Prince, L. T. Dowdle, M. Nau, B. Caron, F. Pestilli, I. Charestet al., “A massive 7t fmri dataset to bridge cognitive neuroscience and artificial intelligence,” Nature Neurosci., vol. 25, no. 1, pp. 116–126, 2022
work page 2022
-
[19]
Z. Chen, J. Qing, T. Xiang, W. L. Yue, and J. H. Zhou, “Seeing beyond the brain: Conditional diffusion model with sparse masked modeling for vision decoding,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 22 710–22 720
work page 2023
-
[20]
Deep learning human mind for automated visual classifica- tion,
C. Spampinato, S. Palazzo, I. Kavasidis, D. Giordano, N. Souly, and M. Shah, “Deep learning human mind for automated visual classifica- tion,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2017, pp. 6809–6817
work page 2017
-
[21]
Brain2image: Converting brain signals into images,
I. Kavasidis, S. Palazzo, C. Spampinato, D. Giordano, and M. Shah, “Brain2image: Converting brain signals into images,” inProc. ACM Int. Conf. Multimedia, 2017, pp. 1809–1817
work page 2017
-
[22]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inProc. Int. Conf. Mach. Learn. (ICML), 2021, pp. 8748–8763
work page 2021
-
[23]
Visual decoding and reconstruction via eeg embeddings with guided diffusion,
D. Li, C. Wei, S. Li, J. Zou, and Q. Liu, “Visual decoding and reconstruction via eeg embeddings with guided diffusion,” inProc. Conf. Neural Inf. Process. Syst. (NeurIPS), 2024, pp. 102 822–102 864
work page 2024
-
[24]
Decoding natural images from eeg for object recognition,
Y . Song, B. Liu, X. Li, N. Shi, Y . Wang, and X. Gao, “Decoding natural images from eeg for object recognition,” inProc. Int. Conf. Learn. Represent. (ICLR), 2024
work page 2024
-
[25]
Decoding visual neural representations by multimodal learning of brain-visual-linguistic features,
C. Du, K. Fu, J. Li, and H. He, “Decoding visual neural representations by multimodal learning of brain-visual-linguistic features,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 45, no. 9, pp. 10 760–10 777, 2023
work page 2023
-
[26]
Human-aligned image models improve visual decoding from the brain,
N. Rajabi, A. H. Ribeiro, M. Vasco, F. Taleb, M. Bj ¨orkman, and D. Kragic, “Human-aligned image models improve visual decoding from the brain,” inProc. Int. Conf. Mach. Learn. (ICML), 2025
work page 2025
-
[27]
The perils and pitfalls of block design for eeg classification experiments,
R. Li, J. S. Johansen, H. Ahmed, T. V . Ilyevsky, R. B. Wilbur, H. M. Bharadwaj, and J. M. Siskind, “The perils and pitfalls of block design for eeg classification experiments,”IEEE Trans. Pattern Anal. Mach. Intell., vol. 43, no. 1, pp. 316–333, 2020
work page 2020
-
[28]
Object classification from randomized eeg trials. in 2021 ieee,
H. Ahmed, R. B. Wilbur, H. M. Bharadwaj, and J. M. Siskind, “Object classification from randomized eeg trials. in 2021 ieee,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2021, pp. 3844–3853
work page 2021
-
[29]
Vieeg: Hierarchical visual neural representation for eeg brain decoding,
M. Liu, D. Guan, C. Zheng, C. Tian, J. Wen, and Q. Zhu, “Vieeg: Hierarchical neural coding with cross-modal progressive enhancement for eeg-based visual decoding,”arXiv:2505.12408, 2025
-
[30]
A generalization of bayesian inference,
A. P. Dempster, “A generalization of bayesian inference,”J. Roy. Statist. Soc. Ser. B (Methodol.), vol. 30, no. 2, pp. 205–232, 1968
work page 1968
-
[31]
Jsang,Subjective Logic: A formalism for reasoning under uncertainty
A. Jsang,Subjective Logic: A formalism for reasoning under uncertainty. Springer Pub., 2018
work page 2018
-
[32]
Trustworthy long-tailed classification,
B. Li, Z. Han, H. Li, H. Fu, and C. Zhang, “Trustworthy long-tailed classification,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2022, pp. 6970–6979
work page 2022
-
[33]
A comprehensive survey on evidential deep learning and its applications,
J. Gao, M. Chen, L. Xiang, and C. Xu, “A comprehensive survey on evidential deep learning and its applications,”IEEE Trans. Pattern Anal. Mach. Intell., 2025
work page 2025
-
[34]
Trustworthy multimodal regression with mixture of normal-inverse gamma distribu- tions,
H. Ma, Z. Han, C. Zhang, H. Fu, J. T. Zhou, and Q. Hu, “Trustworthy multimodal regression with mixture of normal-inverse gamma distribu- tions,”Proc. Neural Inf. Process. Syst. (NeurIPS), vol. 34, pp. 6881– 6893, 2021
work page 2021
-
[35]
J. Gao, M. Chen, and C. Xu, “Collecting cross-modal presence-absence evidence for weakly-supervised audio-visual event perception,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 18 827–18 836
work page 2023
-
[36]
Dcel: deep cross-modal evidential learning for text-based person retrieval,
S. Li, X. Xu, Y . Yang, F. Shen, Y . Mo, Y . Li, and H. T. Shen, “Dcel: deep cross-modal evidential learning for text-based person retrieval,” in Proc. ACM Int. Conf. Multimedia, 2023, pp. 6292–6300
work page 2023
-
[37]
Deep evidential learning with noisy correspondence for cross-modal retrieval,
Y . Qin, D. Peng, X. Peng, X. Wang, and P. Hu, “Deep evidential learning with noisy correspondence for cross-modal retrieval,” inProc. ACM Int. Conf. Multimedia, 2022, pp. 4948–4956
work page 2022
-
[38]
Prototype-based aleatoric uncertainty quantification for cross-modal retrieval,
H. Li, J. Song, L. Gao, X. Zhu, and H. Shen, “Prototype-based aleatoric uncertainty quantification for cross-modal retrieval,”Proc. Neural Inf. Process. Syst. (NeurIPS), vol. 36, pp. 24 564–24 585, 2023
work page 2023
-
[39]
Representation Learning with Contrastive Predictive Coding
A. v. d. Oord, Y . Li, and O. Vinyals, “Representation learning with contrastive predictive coding,”arXiv:1807.03748, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[40]
Neuroclip: Brain-inspired prompt tuning for eeg-to-image multimodal contrastive learning,
J. Wang, L. Zhang, H. Lin, Q. Liu, G. Huang, Z. Li, Z. Liang, and X. Wu, “Neuroclip: Brain-inspired prompt tuning for eeg-to-image multimodal contrastive learning,”arXiv:2511.09250, 2025
-
[41]
Reproducible scal- ing laws for contrastive language-image learning,
M. Cherti, R. Beaumont, R. Wightman, M. Wortsman, G. Ilharco, C. Gordon, C. Schuhmann, L. Schmidt, and J. Jitsev, “Reproducible scal- ing laws for contrastive language-image learning,” inProc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), 2023, pp. 2818–2829
work page 2023
-
[42]
A large and rich eeg dataset for modeling human visual object recognition,
A. T. Gifford, K. Dwivedi, G. Roig, and R. M. Cichy, “A large and rich eeg dataset for modeling human visual object recognition,”J. Vis., vol. 23, no. 9, pp. 4579–4579, 2023
work page 2023
-
[43]
M. N. Hebart, O. Contier, L. Teichmann, A. H. Rockter, C. Y . Zheng, A. Kidder, A. Corriveau, M. Vaziri-Pashkam, and C. I. Baker, “Things- data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior,”Elife, vol. 12, p. e82580, 2023
work page 2023
-
[44]
Event-related brain potentials in the study of visual selective attention,
S. A. Hillyard and L. Anllo-Vento, “Event-related brain potentials in the study of visual selective attention,”Proc. Nat. Acad. Sci., vol. 95, no. 3, pp. 781–787, 1998
work page 1998
-
[45]
Oscillatory gamma activity in humans and its role in object representation,
C. Tallon-Baudry and O. Bertrand, “Oscillatory gamma activity in humans and its role in object representation,”Trends Cogn. Sci., vol. 3, no. 4, pp. 151–162, 1999
work page 1999
-
[46]
G. Michalareas, J. Vezoli, S. Van Pelt, J.-M. Schoffelen, H. Kennedy, and P. Fries, “Alpha-beta and gamma rhythms subserve feedback and feedforward influences among human visual cortical areas,”Neuron, vol. 89, no. 2, pp. 384–397, 2016. IEEE TRANSACTIONS ON MULTIMEDIA, 2026 11
work page 2016
-
[47]
Am/eeg-fmri fusion primer: resolving human brain responses in space and time,
R. M. Cichy and A. Oliva, “Am/eeg-fmri fusion primer: resolving human brain responses in space and time,”Neuron, vol. 107, no. 5, pp. 772–781, 2020
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.