Foundation models yield less human-interpretable features than supervised vision transformers, with interpretability tied to activation locality and coarse semantic alignment rather than task performance.
THINGS-data, a multimodal collection of large-scale datasets for investigating object representations in human brain and behavior , volume =
9 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
NeuralBench is a new benchmarking framework for neuroAI models on EEG data that finds foundation models only marginally outperform task-specific ones while many tasks like cognitive decoding stay highly challenging.
A tri-modal contrastive learning method for EEG-based zero-shot visual decoding reports 54.1% top-1 accuracy on the Things-EEG2 200-way benchmark, outperforming prior baselines of 32.4%.
Sparse autoencoders applied to GPT-2 and Llama models recover semantic features accounting for 94% of peak brain encoding performance and map onto distinct cortical semantic regions across three languages.
fMRI responses to natural scenes in human visual cortex exhibit a consistent scale-free structure with power-law decaying variance across four orders of magnitude of dimensions, shared across individuals via hyperalignment.
Augmenting limited fMRI datasets with synthetic responses from TRIBE v2 improves brain-to-image decoding accuracy and can yield above-chance performance using only synthetic data.
The paper introduces a time-resolved neural encoder combining Whisper embeddings with recurrent temporal modeling and soft attention to predict ECoG responses, finding strongest alignment in intermediate layers and anatomically coherent phoneme organization in electrodes.
Mathematical analysis shows sparse linear regression mitigates output dimension collapse in brain-to-image reconstruction at small data scales by exploiting sparsity in the brain-to-feature mapping.
RSA on 7T fMRI during natural scene viewing identifies ventromedial and lateral occipitotemporal representational routes for scene context versus animate content, with differential alignment to vision and language models.
citing papers explorer
-
MindAlign: Bridging EEG, Vision, and Language for Zero-Shot Visual Decoding
A tri-modal contrastive learning method for EEG-based zero-shot visual decoding reports 54.1% top-1 accuracy on the Things-EEG2 200-way benchmark, outperforming prior baselines of 32.4%.
-
Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography
Sparse autoencoders applied to GPT-2 and Llama models recover semantic features accounting for 94% of peak brain encoding performance and map onto distinct cortical semantic regions across three languages.
-
Universal scale-free representations in human visual cortex
fMRI responses to natural scenes in human visual cortex exhibit a consistent scale-free structure with power-law decaying variance across four orders of magnitude of dimensions, shared across individuals via hyperalignment.
-
Boosting Brain-to-Image Decoding with TRIBE v2 Data Augmentation
Augmenting limited fMRI datasets with synthetic responses from TRIBE v2 improves brain-to-image decoding accuracy and can yield above-chance performance using only synthetic data.
-
Mapping Whisper Representations to Human ECoG Responses with Interpretable Time-Resolved Neural Encoding
The paper introduces a time-resolved neural encoder combining Whisper embeddings with recurrent temporal modeling and soft attention to predict ECoG responses, finding strongest alignment in intermediate layers and anatomically coherent phoneme organization in electrodes.
-
Overcoming Output Dimension Collapse: When Sparsity Enables Zero-shot Brain-to-Image Reconstruction at Small Data Scales
Mathematical analysis shows sparse linear regression mitigates output dimension collapse in brain-to-image reconstruction at small data scales by exploiting sparsity in the brain-to-feature mapping.
-
Shared representations in brains and models reveal a two-route cortical organization during scene perception
RSA on 7T fMRI during natural scene viewing identifies ventromedial and lateral occipitotemporal representational routes for scene context versus animate content, with differential alignment to vision and language models.