A per-timestep conditioned diffusion transformer generates realistic fMRI dynamics for unseen cognitive tasks by injecting compositional language and optional spatial priors in-context.
fMRI-LM: Towards a Universal Foundation Model for Language-Aligned fMRI Understanding
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Recent advances in multimodal large language models (LLMs) have enabled unified reasoning across images, audio, and video, but extending such capability to brain imaging remains largely unexplored. Bridging this gap is essential to link neural activity with semantic cognition and to develop cross-modal brain representations. To this end, we present fMRI-LM, a foundational model that bridges functional MRI (fMRI) and language through a three-stage framework. In Stage 1, we learn a neural tokenizer that maps fMRI into discrete tokens embedded in a language-consistent space. In Stage 2, a pretrained LLM is adapted to jointly model fMRI tokens and text, treating brain activity as a sequence that can be temporally predicted and linguistically described. To overcome the lack of natural fMRI-text pairs, we construct a large descriptive corpus that translates diverse imaging-based features into structured textual descriptors, capturing the low-level organization of fMRI signals. In Stage 3, we perform multi-task, multi-paradigm instruction tuning to endow fMRI-LM with high-level semantic understanding, supporting diverse downstream applications. Across various benchmarks, fMRI-LM achieves strong zero-shot and few-shot performance, and adapts efficiently with parameter-efficient tuning (LoRA), establishing a scalable pathway toward a language-aligned, universal model for structural and semantic understanding of fMRI.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
SABER integrates LLM semantics into brain networks via global self-attention and multi-scale hypergraphs with decision-level alignment, claiming SOTA performance, stability, and interpretability on ABIDE and ADHD-200.
BrainWorld is a structural-prior-conditioned generative model that produces stable whole-brain 4D fMRI trajectories up to 400 frames, augments downstream tasks, and learns transferable multimodal representations across 22 datasets.
citing papers explorer
-
Flow Matching with In-Context Priors for Out-of-Distribution Brain Dynamics
A per-timestep conditioned diffusion transformer generates realistic fMRI dynamics for unseen cognitive tasks by injecting compositional language and optional spatial priors in-context.
-
SABER: A Semantic-Aligned Brain Network Analysis Framework via Multi-scale Hypergraphs
SABER integrates LLM semantics into brain networks via global self-attention and multi-scale hypergraphs with decision-level alignment, claiming SOTA performance, stability, and interpretability on ABIDE and ADHD-200.
-
BrainWorld: A Structural-Prior-Conditioned Generative Model for Whole-Brain 4D fMRI Dynamics
BrainWorld is a structural-prior-conditioned generative model that produces stable whole-brain 4D fMRI trajectories up to 400 frames, augments downstream tasks, and learns transferable multimodal representations across 22 datasets.