Mind-Omni unifies seven brain-vision-language tasks in one discrete-diffusion framework with a brain tokenizer and a new BQA dataset, claiming SOTA multi-task performance competitive with larger single-task models.
Brain Captioning: Decoding human brain activity into images and text, May 2023
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
Brain-IT-VQA decodes visual question answers from fMRI using a transformer to extract language tokens and introduces the NSD-VQA benchmark with 20 controlled questions per image across 20 categories.
NeuroFlow is the first unified flow model for bidirectional visual encoding and decoding from neural activity using NeuroVAE and cross-modal flow matching.
MIRAGE achieves state-of-the-art mental image reconstruction from fMRI on the NSD-Imagery benchmark by using a linear backbone with multi-modal text and image features fed to a diffusion model.
A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.
BrainJanus presents a unified autoregressive model with a brain tokenizer that maps between neural activity, vision, and language for encoding and decoding tasks.
FPED is a functional-network prior-guided MoE framework for fMRI visual reconstruction that claims competitive performance at 0.68B parameters and biologically meaningful routing interpretability.
citing papers explorer
-
Mind-Omni: A Unified Multi-Task Framework for Brain-Vision-Language Modeling via Discrete Diffusion
Mind-Omni unifies seven brain-vision-language tasks in one discrete-diffusion framework with a brain tokenizer and a new BQA dataset, claiming SOTA multi-task performance competitive with larger single-task models.
-
Brain-IT-VQA: From Brain Signals to Answers
Brain-IT-VQA decodes visual question answers from fMRI using a transformer to extract language tokens and introduces the NSD-VQA benchmark with 20 controlled questions per image across 20 categories.
-
NeuroFlow: Toward Unified Visual Encoding and Decoding from Neural Activity
NeuroFlow is the first unified flow model for bidirectional visual encoding and decoding from neural activity using NeuroVAE and cross-modal flow matching.
-
MIRAGE: Robust multi-modal architectures translate fMRI-to-image models from vision to mental imagery
MIRAGE achieves state-of-the-art mental image reconstruction from fMRI on the NSD-Imagery benchmark by using a linear backbone with multi-modal text and image features fed to a diffusion model.
-
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.
-
BrainJanus: A Unified Model for Understanding and Generation across Brain, Vision, and Language
BrainJanus presents a unified autoregressive model with a brain tokenizer that maps between neural activity, vision, and language for encoding and decoding tasks.
-
FPED: A Functional-Network Prior-Guided Mixture-of-Experts Framework for Interpretable Brain Decoding
FPED is a functional-network prior-guided MoE framework for fMRI visual reconstruction that claims competitive performance at 0.68B parameters and biologically meaningful routing interpretability.