pith. sign in

arxiv: 2406.12179 · v4 · pith:7BV4HKZPnew · submitted 2024-06-18 · 💻 cs.CV

The Wisdom of a Crowd of Brains: A Universal Brain Encoder

classification 💻 cs.CV
keywords brain-voxelbraindataencodersubjectsarchitecturebrainscross-attention
0
0 comments X
read the original abstract

Image-to-fMRI encoding is important for both neuroscience research and practical applications. However, such "Brain-Encoders" have been typically trained per-subject and per fMRI-dataset, thus restricted to very limited training data. In this paper we propose a Universal Brain-Encoder, which can be trained jointly on data from many different subjects/datasets/machines. What makes this possible is our new voxel-centric Encoder architecture, which learns a unique "voxel-embedding" per brain-voxel. Our Encoder trains to predict the response of each brain-voxel on every image, by directly computing the cross-attention between the brain-voxel embedding and multi-level deep image features. This voxel-centric architecture allows the functional role of each brain-voxel to naturally emerge from the voxel-image cross-attention. We show the power of this approach to (i) combine data from multiple different subjects (a "Crowd of Brains") to improve each individual brain-encoding, (ii) quick & effective Transfer-Learning across subjects, datasets, and machines (e.g., 3-Tesla, 7-Tesla), with few training examples, and (iii) use the learned voxel-embeddings as a powerful tool to explore brain functionality (e.g., what is encoded where in the brain).

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Activation to Causality: Discovery of Causal Visual Representations in the Human Brain

    cs.CV 2026-05 unverdicted novelty 7.0

    BrainCause recovers known visual localizations and finds new candidate representations by validating causal specificity via counterfactual stimuli and encoding models, showing activation alone produces many false positives.

  2. A foundation model of vision, audition, and language for in-silico neuroscience

    q-bio.NC 2026-05 unverdicted novelty 7.0

    TRIBE v2 is a multimodal AI model that predicts human brain activity more accurately than linear encoding models and recovers established neuroscientific findings through in-silico testing.

  3. Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

    cs.LG 2026-04 unverdicted novelty 6.0

    A meta-optimized in-context learning approach enables training-free cross-subject semantic visual decoding from fMRI by inferring individual neural encoding patterns via hierarchical inference on a few examples.

  4. ViBE: Visual-to-M/EEG Brain Encoding via Spatio-Temporal VAE and Distribution-Aligned Projection

    cs.CV 2026-04 unverdicted novelty 4.0

    ViBE generates M/EEG signals from visual stimuli by reconstructing neural responses with a TSC-VAE and aligning CLIP image features to its latent space via Q-Former, MSE, and sliced Wasserstein losses.