EmoMind: Decoding Affective Captions from Human Brain fMRI

Bilal A. Mohammed; Lin Gu; Ruogo Fang

arxiv: 2605.16739 · v1 · pith:I6U63OQ4new · submitted 2026-05-16 · 💻 cs.LG · cs.AI· cs.CL· q-bio.NC

EmoMind: Decoding Affective Captions from Human Brain fMRI

Bilal A. Mohammed , Lin Gu , Ruogo Fang This is my paper

Pith reviewed 2026-05-19 20:19 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLq-bio.NC

keywords fMRI decodingaffective captionsbrain-to-textemotion vectorclassifier-free guidancesubject-specificitycontinuous affect

0 comments

The pith

EmoMind decodes continuous 34-dimensional affect from fMRI to rewrite neutral scene descriptions into subject-specific affective captions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces EmoMind as the first end-to-end pipeline that pulls a neutral scene description from brain-decoded visual features in fMRI and then rewrites it using a continuous 34-dimensional emotion vector decoded from the same signals. This continuous vector is meant to keep the detailed differences in how each person experiences emotion, unlike coarse category labels that flatten those differences. The rewriter is trained with classifier-free guidance against an identity-preserving null branch so the output can slide smoothly between staying faithful to the scene and adding emotional tone. Across two separate fMRI datasets the method beats GPT-4 prompted with the top five brain-decoded emotion labels on every axis of a three-part test that checks personal match, structural properties, and causal control, with the biggest edge on measures that need individual affective patterns.

Core claim

EmoMind retrieves a semantically grounded neutral scene description from brain-decoded visual features, then rewrites it with a continuous 34-dimensional emotion vector decoded from the same fMRI recording; the rewriter is trained with classifier-free guidance against an identity-preserving null branch to allow controllable interpolation between semantic fidelity and affective expressivity, and this yields captions that outperform label-prompted GPT-4 on subject-specificity, structural geometry, and causal control across two independent emotion fMRI datasets.

What carries the argument

The continuous 34-dimensional emotion vector decoded from fMRI, used inside a rewriter trained with classifier-free guidance against an identity-preserving null branch to modify neutral scene descriptions.

If this is right

Continuous affect decoded from fMRI functions as a usable control signal for generating captions that reflect individual emotional responses rather than averaged categories.
The three-axis validation framework measures subject-specific affective structure, structural geometry, and causal control in brain-to-text systems.
A synthetic-brain substitution test checks whether the pipeline remains stable when the measurement apparatus changes.
The largest performance gains appear on metrics that require person-specific affective structure instead of population-level aggregation.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same continuous-vector approach could be tested for generating other personalized outputs such as image edits or dialogue responses that match a user's current brain state.
The interpolation mechanism might be adapted to other brain-decoding tasks where a user wants adjustable levels of semantic versus stylistic control.
The two-dataset result suggests the method could be applied to study how affective organisation differs across individuals in larger populations.

Load-bearing premise

That a 34-dimensional continuous vector extracted from fMRI accurately captures rich inter-subject affective variability and that classifier-free guidance produces smooth, artifact-free control over the balance of content and emotion.

What would settle it

A direct comparison in which EmoMind-generated captions show no higher correlation with individual subjects' fMRI patterns than captions produced from population-averaged emotion labels would falsify the claimed advantage in person-specific structure.

Figures

Figures reproduced from arXiv: 2605.16739 by Bilal A. Mohammed, Lin Gu, Ruogo Fang.

**Figure 1.** Figure 1: (a) Existing approaches decode the scene without affect [1–3] or condition LLMs on shared categorical labels [4–6]. (b) Ours. Per-subject fMRI is decoded along two paths – semantic retrieval for the neutral caption x and a per-subject ridge for the continuous 34-D Cowen & Keltner vector es. The axis-token rewriter (Section 3.2) maps (x, es) to a subject-specific affective caption. In recent years, deep lea… view at source ↗

**Figure 2.** Figure 2: Axis-token rewriter. (A) Training mixes a reconstruction target (Lrecon, probability ρ) and an emotion-specific target (Lemo) via a Bernoulli switch on e ∈ R 34, which scales axis matrix A ∈ R 34×768 . (B) Inference applies classifier-free guidance, Hcfg = Hnull+(1+w)(Hcond−Hnull), trading content fidelity (smaller w, w=0 recovers Hcond) for amplified affect (larger w). cross-attention or prepending them a… view at source ↗

**Figure 3.** Figure 3: Qualitative captions across four affect regimes (Entrancement, Nostalgia, Romance, Fear/Horror). Each cell, one MC test clip with its dominant CK34 emotions, three frames, and three captions. Neutral is Stage-1 retrieval (no affect). GPT-4 is GPT-4o prompted with the clip’s GT top-5 C&K labels. Ours is the brain-conditioned rewriter for one subject. The Fear cell illustrates the gap. GPT-4 describes what t… view at source ↗

**Figure 4.** Figure 4: Parameter sensitivity. (a) SWAP causal effect and target-conditional alignment rtarget as functions of reconstruction-supervision rate ρ (with w=0 fixed). (b) Same metrics as functions of CFG guidance weight w (with ρ=0.4 fixed). Both metrics remain positive across the full range tested. ρ=0.4, w=2 marks the operating point used throughout the paper. A.5 Qualitative caption examples Six MC test clips spann… view at source ↗

read the original abstract

Decoding visual experience from brain activity has advanced substantially, but cur- rent brain-to-text systems largely recover semantic content while discarding affect. Additionally, language models can generate emotional text when prompted with categorical labels, but such labels collapse rich inter-subject variability into coarse discrete bins. We present EmoMind, the first end-to-end pipeline for decoding affective captions directly from fMRI signals. EmoMind first retrieves a semanti- cally grounded neutral scene description from brain-decoded visual features, then rewrites it using a continuous 34-dimensional emotion vector decoded from the same fMRI recording. To control the balance between content preservation and affective expression, we train the rewriter with classifier-free guidance against an identity-preserving null branch, enabling smooth interpolation between semantic fidelity and affective expressivity. We evaluate affective caption generation with a three-axis validation framework spanning subject-specificity, structural geometry, and causal control. We further augment this framework with a synthetic-brain substitution test that probes robustness to the measurement apparatus, and we benchmark each axis against GPT-4 prompted with brain-decoded top-5 emotion labels as a strong discrete baseline. Across two independent emotion fMRI datasets, EmoMind significantly outperforms label-prompted GPT-4 on all three axes, with the largest gains on metrics that require person-specific affective structure rather than population-level emotion aggregation. These results establish continuous brain-decoded affect as a viable control signal for individualized affective cap- tion generation and open new directions for studying individual affective brain organisation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces EmoMind, an end-to-end pipeline for decoding affective captions directly from fMRI signals. It first retrieves a semantically grounded neutral scene description from brain-decoded visual features, then rewrites the description using a continuous 34-dimensional emotion vector decoded from the same fMRI recording. Classifier-free guidance is applied against an identity-preserving null branch to interpolate between semantic fidelity and affective expressivity. Evaluation uses a three-axis framework (subject-specificity, structural geometry, causal control) plus a synthetic-brain substitution test, with benchmarks against GPT-4 prompted by brain-decoded top-5 emotion labels. The paper claims significant outperformance over this baseline across two independent emotion fMRI datasets, with largest gains on metrics requiring person-specific affective structure.

Significance. If the quantitative results and controls hold, this would constitute a meaningful advance in brain-to-text decoding by incorporating continuous, subject-specific affective signals rather than discrete categorical labels. It directly addresses the limitation that current systems discard affect and could enable more individualized affective caption generation while providing a new framework for studying inter-subject variability in affective brain organization.

major comments (2)

[Abstract and §4] Abstract and §4 (Evaluation framework): The central claim of significant outperformance on all three axes lacks any reported quantitative metrics, error bars, statistical tests, or data exclusion criteria in the provided text. Without these, the reader cannot assess effect sizes or verify that gains are driven by person-specific structure rather than population-level aggregation.
[Methods] Methods (emotion vector decoding): The 34-dimensional emotion vector is presented as decoded from fMRI, yet no derivation, fitting procedure, or validation against ground-truth affective variability is shown. This is load-bearing for the claim that it captures rich inter-subject differences and enables the reported gains over discrete labels.

minor comments (2)

[Methods] Clarify the precise implementation of classifier-free guidance, including the identity-preserving null branch and how interpolation avoids artifacts.
[Evaluation] Add explicit description of the synthetic-brain substitution test procedure and the specific robustness properties it measures.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their detailed and constructive review of our manuscript. Their comments highlight important areas for clarification and improvement in the presentation of our results and methods. Below, we provide point-by-point responses to the major comments and outline the revisions we plan to make.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Evaluation framework): The central claim of significant outperformance on all three axes lacks any reported quantitative metrics, error bars, statistical tests, or data exclusion criteria in the provided text. Without these, the reader cannot assess effect sizes or verify that gains are driven by person-specific structure rather than population-level aggregation.

Authors: We acknowledge that the abstract and §4 currently emphasize the outperformance claims without accompanying quantitative details such as specific metric values, error bars, or statistical tests. This omission makes it difficult for readers to fully evaluate the effect sizes and the contribution of person-specific structure. In the revised version, we will update §4 to include full reporting of all evaluation metrics with means ± standard deviations, 95% confidence intervals or error bars in visualizations, p-values from appropriate statistical tests (e.g., paired t-tests or permutation tests), and clear criteria for data exclusion (such as motion thresholds or signal quality checks). We will also add a new table summarizing the comparative results against the GPT-4 baseline to directly address concerns about population-level vs. subject-specific gains. revision: yes
Referee: [Methods] Methods (emotion vector decoding): The 34-dimensional emotion vector is presented as decoded from fMRI, yet no derivation, fitting procedure, or validation against ground-truth affective variability is shown. This is load-bearing for the claim that it captures rich inter-subject differences and enables the reported gains over discrete labels.

Authors: The referee is correct that the current Methods section does not provide sufficient detail on how the 34-dimensional emotion vector is derived from fMRI signals. To address this, we will expand the Methods with a new subsection titled 'Emotion Vector Decoding' that describes: (1) the source of the 34D vector (e.g., from validated affective rating scales like the Self-Assessment Manikin or similar), (2) the decoding model architecture and training procedure (e.g., voxel-wise linear regression with regularization, trained on subject-specific fMRI data), (3) the cross-validation scheme used for fitting, and (4) validation results showing correlation coefficients or prediction accuracy against ground-truth affective variability from independent ratings. This will demonstrate how the continuous vector preserves inter-subject differences that discrete labels cannot capture. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The abstract and available text describe an end-to-end pipeline that decodes a 34-dimensional emotion vector from fMRI to rewrite neutral scene descriptions, with classifier-free guidance for control. No equations, fitting procedures, or self-citations are shown that would reduce any claimed prediction, uniqueness, or result to its own inputs by construction. The three-axis validation framework and GPT-4 baseline are presented as external benchmarks, and the central claims rely on decoded signals rather than self-definitional or fitted-input reductions. The derivation is therefore self-contained against the described external evaluation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Only abstract available; ledger is therefore minimal and provisional.

free parameters (1)

34-dimensional emotion vector
Dimension and specific representation of affect are introduced without derivation shown; likely derived or fitted from fMRI data.

axioms (1)

domain assumption fMRI signals contain sufficient information to decode a continuous 34D affective representation that generalizes across subjects for caption rewriting
Central to the pipeline; invoked when the rewriter uses the decoded vector to control affective expression.

pith-pipeline@v0.9.0 · 5811 in / 1224 out tokens · 36566 ms · 2026-05-19T20:19:59.205050+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 1 internal anchor

[1]

Mind captioning: Evolving descriptive text of mental content from human brain activity.Science Advances, 2024

Tomoyasu Horikawa. Mind captioning: Evolving descriptive text of mental content from human brain activity.Science Advances, 2024

work page 2024
[2]

Semantic reconstruction of continuous language from non-invasive brain recordings.Nature Neuroscience, 26(5):858–866, 2023

Jerry Tang, Amanda LeBel, Shailee Jain, and Alexander G Huth. Semantic reconstruction of continuous language from non-invasive brain recordings.Nature Neuroscience, 26(5):858–866, 2023

work page 2023
[3]

Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors.Advances in Neural Information Processing Systems (NeurIPS), 36, 2024

Paul S Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Aidan Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A Norman, and Tan- ishq Mathew Abraham. Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors.Advances in Neural Information Processing Systems (NeurIPS), 36, 2024

work page 2024
[4]

CTRL: A Conditional Transformer Language Model for Controllable Generation

Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and Richard Socher. CTRL: A conditional transformer language model for controllable generation.arXiv preprint arXiv:1909.05858, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1909
[5]

Plug and play language models: A simple approach to controlled text generation

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. Plug and play language models: A simple approach to controlled text generation. InInternational Conference on Learning Representations (ICLR), 2020

work page 2020
[6]

Stay on topic with classifier-free guidance.arXiv preprint arXiv:2306.17806, 2023

Guillaume Sanchez, Alexander Spangher, Honglu Fan, Elad Levi, Pawan Sasanka Ammana- manchi, and Stella Biderman. Stay on topic with classifier-free guidance.arXiv preprint arXiv:2306.17806, 2023

work page arXiv 2023
[7]

fMRI-to-image reconstruction with personalized visual-language alignment.Advances in Neural Information Processing Systems (NeurIPS), 2025

Carmen Cammarota et al. fMRI-to-image reconstruction with personalized visual-language alignment.Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[8]

Brain-inspired fMRI-to-text decoding via incremental and wrap-up language modeling

Wentao Lu, Dong Nie, Pengcheng Xue, Zheng Cui, Piji Li, Daoqiang Zhang, and Xuyun Wen. Brain-inspired fMRI-to-text decoding via incremental and wrap-up language modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025
[9]

UniCoRN: Unified cognitive signal reconstruction bridging cognitive signals and human language.arXiv preprint arXiv:2307.05355, 2023

Nuwa Xi, Sendong Zhao, Haochun Wang, Chi Liu, Bing Qin, and Ting Liu. UniCoRN: Unified cognitive signal reconstruction bridging cognitive signals and human language.arXiv preprint arXiv:2307.05355, 2023

work page arXiv 2023
[10]

Open-vocabulary auditory neural decoding using fMRI-prompted LLM.arXiv preprint arXiv:2405.07840, 2024

Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, and Huiguang He. Open-vocabulary auditory neural decoding using fMRI-prompted LLM.arXiv preprint arXiv:2405.07840, 2024

work page arXiv 2024
[11]

Changde Du, Kaicheng Fu, Jinpeng Li, and Huiguang He. Decoding visual neural representa- tions by multimodal learning of brain-visual-linguistic features.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10760–10777, 2023

work page 2023
[12]

A multi-view spectral-spatial- temporal masked autoencoder for decoding emotions with self-supervised learning

Rui Li, Yiting Wang, Wei-Long Zheng, and Bao-Liang Lu. A multi-view spectral-spatial- temporal masked autoencoder for decoding emotions with self-supervised learning. InProceed- ings of the 30th ACM International Conference on Multimedia, pages 6–14, 2022

work page 2022
[13]

Mind reader: Reconstructing complex images from brain activities

Sikun Lin, Thomas Sprague, and Ambuj K Singh. Mind reader: Reconstructing complex images from brain activities. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, pages 29624–29636, 2022

work page 2022
[14]

arXiv preprint arXiv:2309.14030 , year=

Yiqun Duan, Jinzhao Zhou, Zhen Wang, Yu-Kai Wang, and Chin-Teng Lin. DeWave: Discrete EEG waves encoding for brain dynamics to text translation.arXiv preprint arXiv:2309.14030, 2023

work page arXiv 2023
[15]

BrainX: A universal brain decoding framework with feature disentanglement and neuro-geometric repre- sentation learning

Zheng Cui, Dong Nie, Pengcheng Xue, Xia Wu, Daoqiang Zhang, and Xuyun Wen. BrainX: A universal brain decoding framework with feature disentanglement and neuro-geometric repre- sentation learning. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 478–487, 2025

work page 2025
[16]

Multivariate neural biomarkers of emotional states are categorically distinct.Social Cognitive and Affective Neuroscience, 10(11):1437–1448, 2015

Philip A Kragel and Kevin S LaBar. Multivariate neural biomarkers of emotional states are categorically distinct.Social Cognitive and Affective Neuroscience, 10(11):1437–1448, 2015. 10

work page 2015
[17]

Discrete neural signatures of basic emotions.Cerebral Cortex, 26(6):2563–2573, 2016

Heini Saarimäki, Athanasios Gotsopoulos, Iiro P Jääskeläinen, Jouko Lampinen, Patrik Vuilleu- mier, Riitta Hari, Mikko Sams, and Lauri Nummenmaa. Discrete neural signatures of basic emotions.Cerebral Cortex, 26(6):2563–2573, 2016

work page 2016
[18]

Self-report captures 27 distinct categories of emotion bridged by continuous gradients.Proceedings of the National Academy of Sciences (PNAS), 114(38):E7900–E7909, 2017

Alan S Cowen and Dacher Keltner. Self-report captures 27 distinct categories of emotion bridged by continuous gradients.Proceedings of the National Academy of Sciences (PNAS), 114(38):E7900–E7909, 2017

work page 2017
[19]

The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across transmodal brain regions.iScience, 23(5):101060, 2020

Tomoyasu Horikawa, Alan S Cowen, Dacher Keltner, and Yukiyasu Kamitani. The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across transmodal brain regions.iScience, 23(5):101060, 2020. doi: 10.1016/j.isci.2020.101060

work page doi:10.1016/j.isci.2020.101060 2020
[20]

Emotionotopy in the human right temporo-parietal cortex.Nature Communications, 10(1):5568, 2019

Giada Lettieri, Giacomo Handjaras, Emiliano Ricciardi, Andrea Leo, Paolo Papale, Monica Betta, Pietro Pietrini, and Luca Cecchetti. Emotionotopy in the human right temporo-parietal cortex.Nature Communications, 10(1):5568, 2019

work page 2019
[21]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020
[22]

BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020

work page 2020
[23]

Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis

Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, and Rif A Saurous. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. InInternational Conference on Machine Learning (ICML), 2018

work page 2018
[24]

Representations of modality- general valence for videos and music derived from fMRI data.NeuroImage, 148:42–54, 2017

Jongwan Kim, Svetlana V Shinkareva, and Douglas H Wedell. Representations of modality- general valence for videos and music derived from fMRI data.NeuroImage, 148:42–54, 2017

work page 2017
[25]

SentiCap: Generating image descriptions with sentiments

Alexander Mathews, Lexing Xie, and Xuming He. SentiCap: Generating image descriptions with sentiments. InProceedings of the AAAI Conference on Artificial Intelligence, 2016

work page 2016
[26]

StyleNet: Generating attractive visual captions with styles

Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, and Li Deng. StyleNet: Generating attractive visual captions with styles. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017
[27]

GeDi: Generative discriminator guided sequence generation

Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, and Nazneen Fatema Rajani. GeDi: Generative discriminator guided sequence generation. InFindings of the Association for Computational Linguistics (EMNLP), 2021

work page 2021
[28]

Classifier-free diffusion guidance.NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021

work page 2021
[29]

Style transfer from non-parallel text by cross-alignment

Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. Style transfer from non-parallel text by cross-alignment. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017
[30]

Style transfer through back-translation

Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, and Alan W Black. Style transfer through back-translation. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

work page 2018
[31]

Large language models understand and can be enhanced by emotional stimuli.arXiv preprint arXiv:2307.11760, 2023

Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, and Xing Xie. Large language models understand and can be enhanced by emotional stimuli.arXiv preprint arXiv:2307.11760, 2023

work page arXiv 2023
[32]

Emotional framing in prompts modulates large language model performance.Big Data and Cognitive Computing, 10(4):102, 2025

Marco Gozzi and Francesca Fallucchi. Emotional framing in prompts modulates large language model performance.Big Data and Cognitive Computing, 10(4):102, 2025. 11

work page 2025
[33]

Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

work page 2008
[34]

TRIBE: TRImodal brain encoder for whole-brain fMRI response prediction.arXiv preprint arXiv:2507.22229, 2025

Stéphane d’Ascoli, Jérémy Rapin, Yohann Benchetrit, Hubert Banville, and Jean-Rémi King. TRIBE: TRImodal brain encoder for whole-brain fMRI response prediction.arXiv preprint arXiv:2507.22229, 2025

work page arXiv 2025
[35]

{neutral_caption}

Elenor Morgenroth, Stefano Moia, Laura Vilaclara, Raphael Fournier, Michal Muszynski, Maria Ploumitsakou, Marina Almató-Bellavista, Patrik Vuilleumier, and Dimitri Van De Ville. Emo-FilM: A multimodal dataset for affective neuroscience using naturalistic stimuli.Scientific Data, 12:684, 2025. 12 A Appendix A.1 Retrieval implementation details We pre-compu...

work page 2025

[1] [1]

Mind captioning: Evolving descriptive text of mental content from human brain activity.Science Advances, 2024

Tomoyasu Horikawa. Mind captioning: Evolving descriptive text of mental content from human brain activity.Science Advances, 2024

work page 2024

[2] [2]

Semantic reconstruction of continuous language from non-invasive brain recordings.Nature Neuroscience, 26(5):858–866, 2023

Jerry Tang, Amanda LeBel, Shailee Jain, and Alexander G Huth. Semantic reconstruction of continuous language from non-invasive brain recordings.Nature Neuroscience, 26(5):858–866, 2023

work page 2023

[3] [3]

Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors.Advances in Neural Information Processing Systems (NeurIPS), 36, 2024

Paul S Scotti, Atmadeep Banerjee, Jimmie Goode, Stepan Shabalin, Alex Nguyen, Aidan Dempster, Nathalie Verlinde, Elad Yundler, David Weisberg, Kenneth A Norman, and Tan- ishq Mathew Abraham. Reconstructing the mind’s eye: fMRI-to-image with contrastive learning and diffusion priors.Advances in Neural Information Processing Systems (NeurIPS), 36, 2024

work page 2024

[4] [4]

CTRL: A Conditional Transformer Language Model for Controllable Generation

Nitish Shirish Keskar, Bryan McCann, Lav R Varshney, Caiming Xiong, and Richard Socher. CTRL: A conditional transformer language model for controllable generation.arXiv preprint arXiv:1909.05858, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1909

[5] [5]

Plug and play language models: A simple approach to controlled text generation

Sumanth Dathathri, Andrea Madotto, Janice Lan, Jane Hung, Eric Frank, Piero Molino, Jason Yosinski, and Rosanne Liu. Plug and play language models: A simple approach to controlled text generation. InInternational Conference on Learning Representations (ICLR), 2020

work page 2020

[6] [6]

Stay on topic with classifier-free guidance.arXiv preprint arXiv:2306.17806, 2023

Guillaume Sanchez, Alexander Spangher, Honglu Fan, Elad Levi, Pawan Sasanka Ammana- manchi, and Stella Biderman. Stay on topic with classifier-free guidance.arXiv preprint arXiv:2306.17806, 2023

work page arXiv 2023

[7] [7]

fMRI-to-image reconstruction with personalized visual-language alignment.Advances in Neural Information Processing Systems (NeurIPS), 2025

Carmen Cammarota et al. fMRI-to-image reconstruction with personalized visual-language alignment.Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[8] [8]

Brain-inspired fMRI-to-text decoding via incremental and wrap-up language modeling

Wentao Lu, Dong Nie, Pengcheng Xue, Zheng Cui, Piji Li, Daoqiang Zhang, and Xuyun Wen. Brain-inspired fMRI-to-text decoding via incremental and wrap-up language modeling. In Advances in Neural Information Processing Systems (NeurIPS), 2025

work page 2025

[9] [9]

UniCoRN: Unified cognitive signal reconstruction bridging cognitive signals and human language.arXiv preprint arXiv:2307.05355, 2023

Nuwa Xi, Sendong Zhao, Haochun Wang, Chi Liu, Bing Qin, and Ting Liu. UniCoRN: Unified cognitive signal reconstruction bridging cognitive signals and human language.arXiv preprint arXiv:2307.05355, 2023

work page arXiv 2023

[10] [10]

Open-vocabulary auditory neural decoding using fMRI-prompted LLM.arXiv preprint arXiv:2405.07840, 2024

Xiaoyu Chen, Changde Du, Che Liu, Yizhe Wang, and Huiguang He. Open-vocabulary auditory neural decoding using fMRI-prompted LLM.arXiv preprint arXiv:2405.07840, 2024

work page arXiv 2024

[11] [11]

Changde Du, Kaicheng Fu, Jinpeng Li, and Huiguang He. Decoding visual neural representa- tions by multimodal learning of brain-visual-linguistic features.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(9):10760–10777, 2023

work page 2023

[12] [12]

A multi-view spectral-spatial- temporal masked autoencoder for decoding emotions with self-supervised learning

Rui Li, Yiting Wang, Wei-Long Zheng, and Bao-Liang Lu. A multi-view spectral-spatial- temporal masked autoencoder for decoding emotions with self-supervised learning. InProceed- ings of the 30th ACM International Conference on Multimedia, pages 6–14, 2022

work page 2022

[13] [13]

Mind reader: Reconstructing complex images from brain activities

Sikun Lin, Thomas Sprague, and Ambuj K Singh. Mind reader: Reconstructing complex images from brain activities. InAdvances in Neural Information Processing Systems (NeurIPS), volume 35, pages 29624–29636, 2022

work page 2022

[14] [14]

arXiv preprint arXiv:2309.14030 , year=

Yiqun Duan, Jinzhao Zhou, Zhen Wang, Yu-Kai Wang, and Chin-Teng Lin. DeWave: Discrete EEG waves encoding for brain dynamics to text translation.arXiv preprint arXiv:2309.14030, 2023

work page arXiv 2023

[15] [15]

BrainX: A universal brain decoding framework with feature disentanglement and neuro-geometric repre- sentation learning

Zheng Cui, Dong Nie, Pengcheng Xue, Xia Wu, Daoqiang Zhang, and Xuyun Wen. BrainX: A universal brain decoding framework with feature disentanglement and neuro-geometric repre- sentation learning. InProceedings of the 34th ACM International Conference on Information and Knowledge Management, pages 478–487, 2025

work page 2025

[16] [16]

Multivariate neural biomarkers of emotional states are categorically distinct.Social Cognitive and Affective Neuroscience, 10(11):1437–1448, 2015

Philip A Kragel and Kevin S LaBar. Multivariate neural biomarkers of emotional states are categorically distinct.Social Cognitive and Affective Neuroscience, 10(11):1437–1448, 2015. 10

work page 2015

[17] [17]

Discrete neural signatures of basic emotions.Cerebral Cortex, 26(6):2563–2573, 2016

Heini Saarimäki, Athanasios Gotsopoulos, Iiro P Jääskeläinen, Jouko Lampinen, Patrik Vuilleu- mier, Riitta Hari, Mikko Sams, and Lauri Nummenmaa. Discrete neural signatures of basic emotions.Cerebral Cortex, 26(6):2563–2573, 2016

work page 2016

[18] [18]

Self-report captures 27 distinct categories of emotion bridged by continuous gradients.Proceedings of the National Academy of Sciences (PNAS), 114(38):E7900–E7909, 2017

Alan S Cowen and Dacher Keltner. Self-report captures 27 distinct categories of emotion bridged by continuous gradients.Proceedings of the National Academy of Sciences (PNAS), 114(38):E7900–E7909, 2017

work page 2017

[19] [19]

The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across transmodal brain regions.iScience, 23(5):101060, 2020

Tomoyasu Horikawa, Alan S Cowen, Dacher Keltner, and Yukiyasu Kamitani. The neural representation of visually evoked emotion is high-dimensional, categorical, and distributed across transmodal brain regions.iScience, 23(5):101060, 2020. doi: 10.1016/j.isci.2020.101060

work page doi:10.1016/j.isci.2020.101060 2020

[20] [20]

Emotionotopy in the human right temporo-parietal cortex.Nature Communications, 10(1):5568, 2019

Giada Lettieri, Giacomo Handjaras, Emiliano Ricciardi, Andrea Leo, Paolo Papale, Monica Betta, Pietro Pietrini, and Luca Cecchetti. Emotionotopy in the human right temporo-parietal cortex.Nature Communications, 10(1):5568, 2019

work page 2019

[21] [21]

Retrieval-augmented generation for knowledge-intensive NLP tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAdvances in Neural Information Processing Systems (NeurIPS), 2020

work page 2020

[22] [22]

BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension

Mike Lewis, Yinhan Liu, Naman Goyal, Marjan Ghazvininejad, Abdelrahman Mohamed, Omer Levy, Veselin Stoyanov, and Luke Zettlemoyer. BART: Denoising sequence-to-sequence pre- training for natural language generation, translation, and comprehension. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL), 2020

work page 2020

[23] [23]

Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis

Yuxuan Wang, Daisy Stanton, Yu Zhang, RJ Skerry-Ryan, Eric Battenberg, Joel Shor, Ying Xiao, Ye Jia, Fei Ren, and Rif A Saurous. Style tokens: Unsupervised style modeling, control and transfer in end-to-end speech synthesis. InInternational Conference on Machine Learning (ICML), 2018

work page 2018

[24] [24]

Representations of modality- general valence for videos and music derived from fMRI data.NeuroImage, 148:42–54, 2017

Jongwan Kim, Svetlana V Shinkareva, and Douglas H Wedell. Representations of modality- general valence for videos and music derived from fMRI data.NeuroImage, 148:42–54, 2017

work page 2017

[25] [25]

SentiCap: Generating image descriptions with sentiments

Alexander Mathews, Lexing Xie, and Xuming He. SentiCap: Generating image descriptions with sentiments. InProceedings of the AAAI Conference on Artificial Intelligence, 2016

work page 2016

[26] [26]

StyleNet: Generating attractive visual captions with styles

Chuang Gan, Zhe Gan, Xiaodong He, Jianfeng Gao, and Li Deng. StyleNet: Generating attractive visual captions with styles. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017

work page 2017

[27] [27]

GeDi: Generative discriminator guided sequence generation

Ben Krause, Akhilesh Deepak Gotmare, Bryan McCann, Nitish Shirish Keskar, Shafiq Joty, Richard Socher, and Nazneen Fatema Rajani. GeDi: Generative discriminator guided sequence generation. InFindings of the Association for Computational Linguistics (EMNLP), 2021

work page 2021

[28] [28]

Classifier-free diffusion guidance.NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance.NeurIPS Workshop on Deep Generative Models and Downstream Applications, 2021

work page 2021

[29] [29]

Style transfer from non-parallel text by cross-alignment

Tianxiao Shen, Tao Lei, Regina Barzilay, and Tommi Jaakkola. Style transfer from non-parallel text by cross-alignment. InAdvances in Neural Information Processing Systems (NeurIPS), 2017

work page 2017

[30] [30]

Style transfer through back-translation

Shrimai Prabhumoye, Yulia Tsvetkov, Ruslan Salakhutdinov, and Alan W Black. Style transfer through back-translation. InProceedings of the 56th Annual Meeting of the Association for Computational Linguistics (ACL), 2018

work page 2018

[31] [31]

Large language models understand and can be enhanced by emotional stimuli.arXiv preprint arXiv:2307.11760, 2023

Cheng Li, Jindong Wang, Yixuan Zhang, Kaijie Zhu, Wenxin Hou, Jianxun Lian, Fang Luo, Qiang Yang, and Xing Xie. Large language models understand and can be enhanced by emotional stimuli.arXiv preprint arXiv:2307.11760, 2023

work page arXiv 2023

[32] [32]

Emotional framing in prompts modulates large language model performance.Big Data and Cognitive Computing, 10(4):102, 2025

Marco Gozzi and Francesca Fallucchi. Emotional framing in prompts modulates large language model performance.Big Data and Cognitive Computing, 10(4):102, 2025. 11

work page 2025

[33] [33]

Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

Nikolaus Kriegeskorte, Marieke Mur, and Peter A Bandettini. Representational similarity analysis – connecting the branches of systems neuroscience.Frontiers in Systems Neuroscience, 2:4, 2008

work page 2008

[34] [34]

TRIBE: TRImodal brain encoder for whole-brain fMRI response prediction.arXiv preprint arXiv:2507.22229, 2025

Stéphane d’Ascoli, Jérémy Rapin, Yohann Benchetrit, Hubert Banville, and Jean-Rémi King. TRIBE: TRImodal brain encoder for whole-brain fMRI response prediction.arXiv preprint arXiv:2507.22229, 2025

work page arXiv 2025

[35] [35]

{neutral_caption}

Elenor Morgenroth, Stefano Moia, Laura Vilaclara, Raphael Fournier, Michal Muszynski, Maria Ploumitsakou, Marina Almató-Bellavista, Patrik Vuilleumier, and Dimitri Van De Ville. Emo-FilM: A multimodal dataset for affective neuroscience using naturalistic stimuli.Scientific Data, 12:684, 2025. 12 A Appendix A.1 Retrieval implementation details We pre-compu...

work page 2025