pith. sign in

arxiv: 2607.02317 · v1 · pith:3UXV3QZ7new · submitted 2026-07-02 · 💻 cs.CV

NEvo: Neural-Guided Evolutionary Video Synthesis for Dynamic Visual Selectivity

Pith reviewed 2026-07-03 15:27 UTC · model grok-4.3

classification 💻 cs.CV
keywords video synthesisneural encoding modelsevolutionary optimizationvisual selectivitybrain regionsdynamic stimulifMRI localizers
0
0 comments X

The pith

Evolutionary search over video prompts finds stimuli that hyper-activate target brain regions more than handcrafted localizers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a framework that uses evolutionary algorithms to search a space of video prompts for stimuli that maximize predicted activity in chosen brain regions. It relies on a model that forecasts voxel responses to dynamic video inputs to steer the search. The resulting videos activate the target areas more strongly than traditional localizer videos and recover established functional selectivities in ventral, dorsal, and lateral visual pathways. They also expose consistent differences in how these pathways respond to temporal features in the stimuli. This supplies a computational route to generating new probes for studying dynamic visual processing.

Core claim

By performing evolutionary search over a structured prompt space guided by a dynamic encoding model that predicts voxel-level responses to video inputs, the framework generates stimuli that maximize predicted activity for a target ROI, consistently surpassing handcrafted localizer videos while recovering known selectivities across ventral, dorsal, and lateral pathways and revealing systematic differences in sensitivity to temporal dynamics.

What carries the argument

Evolutionary search over structured prompt space guided by dynamic encoding model predictions to maximize target ROI activity

If this is right

  • Synthesized videos recover known selectivities across ventral, dorsal, and lateral pathways.
  • Systematic differences appear in sensitivity to temporal dynamics.
  • Searchlight analysis shows progression toward complex social-dynamic features along the lateral stream.
  • The framework supplies new predictions for in vivo experiments on dynamic visual selectivity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same search procedure could be used to generate candidate stimuli for testing specific hypotheses about brain function before running actual scans.
  • Applying the method to abstract non-naturalistic videos may isolate the minimal features sufficient to drive selectivity.
  • Extending the framework beyond visual cortex could identify optimized stimuli for other sensory or cognitive regions.

Load-bearing premise

The dynamic encoding model must accurately predict voxel responses to the arbitrary videos produced during search.

What would settle it

Run fMRI on human subjects viewing both the synthesized videos and the handcrafted localizers and compare measured activation in the target ROI; equal or lower activation for the synthesized videos would falsify the claim.

Figures

Figures reproduced from arXiv: 2607.02317 by Amir Zamir, Leyla Isik, Martin Schrimpf, Ming Zhou, Sogand Salehi, Yingtian Tang.

Figure 1
Figure 1. Figure 1: A. Video synthesis framework. We propose NEvo, a framework that iteratively optimizes video-generation prompts to maximize predicted activation in a target brain region. B. Overview of results. By combining dynamic brain encoding with video synthesis, NEvo generates highly activating stimuli, highlights visual dynamics, and enables probing along the lateral visual pathway. More recently, deep neural networ… view at source ↗
Figure 2
Figure 2. Figure 2: A. Evolutionary prompting. Visual attributes are encoded as interpretable genes, decoded into text prompts, and rendered into stimuli by the generation model. Predicted ROI activations from the brain model then guide selection, crossover, and mutation across generations. B. Two-stage search. We identify the best-scoring image through image prompt search (Tab. S1), then use it to initialize image-to-video s… view at source ↗
Figure 3
Figure 3. Figure 3: A. Example synthetic videos from NEvo. Two-second videos optimized to maximize predicted activation in ventral and dorsal/lateral regions. B. Two-stage search over iterations. Optimization trajectories across genetic-search evaluations for each ROI, spanning both image and video search phases. Faded lines denote individual seeds; the solid line denotes the mean. C. Comparison of ROI activations. Predicted … view at source ↗
Figure 4
Figure 4. Figure 4: A. Two-stage search over time. Optimization trajectories across search evaluations and wall-clock time, aggregated across ROIs. B. Dynamic stimuli improve activation. For each ROI, we compare predicted activation between the synthetic video and its first frame presented as a static video. C. Ablation of two-stage search. The two-stage strategy outperforms single-stage search baselines in two representative… view at source ↗
Figure 5
Figure 5. Figure 5: A. Searchlight synthesis. Top-synthesis stimuli are shown alongside word clouds from Gemini-annotated descriptions for example searchlight patch along the lateral stream trajectory (V1 to aSTS). Patch indices are marked on the cortical surface. B. Visual property–activation correlations. Left: Representative correlations between annotated visual-property scores and predicted activations for two example cas… view at source ↗
Figure 6
Figure 6. Figure 6: Controlled synthesis from a non-naturalistic anchor. Starting from a fixed first frame of two stacked plasticine discs, we run video-stage NEvo to maximize predicted pSTS or MT activation. For each ROI, examples show the highest and lowest activation boosts across seeds, measured relative to the static first-frame baseline. pSTS optimization introduces face-like features and coordinated interactions, where… view at source ↗
read the original abstract

The human brain processes dynamic visual input through hierarchically organized, functionally specialized regions. While recent in silico brain encoding models can synthesize optimal stimuli to probe selectivity in different brain regions, prior work has been largely limited to static images, leaving dynamic visual processing underexplored. We introduce a novel neural-guided video synthesis framework that generates stimuli optimized for target brain regions across visual cortex. Our method performs evolutionary search over a structured prompt space, guided by a dynamic encoding model that predicts voxel-level responses to video inputs. By maximizing predicted activity for a target ROI, the framework efficiently discovers hyper-activating dynamic stimuli that consistently surpass handcrafted localizer videos. The synthesized videos recover known selectivities across ventral, dorsal, and lateral pathways, and further reveal systematic differences in sensitivity to temporal dynamics. A searchlight analysis provides new insight into the progression toward increasingly complex social-dynamic features along the lateral stream, further supported by probing with synthesized abstract, non-naturalistic stimuli. Taken together, our framework enables in silico exploration of dynamic visual selectivity, with new predictions for in vivo experiments

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper introduces NEvo, a neural-guided evolutionary video synthesis framework that performs evolutionary search over a structured prompt space, guided by a dynamic encoding model predicting voxel-level responses to videos. By maximizing predicted activity in target ROIs, it claims to discover hyper-activating dynamic stimuli that surpass handcrafted localizer videos, recover known selectivities across ventral/dorsal/lateral pathways, reveal differences in temporal dynamics sensitivity, and provide searchlight insights into progression toward complex social-dynamic features along the lateral stream, supported by abstract non-naturalistic stimuli probes.

Significance. If the encoding model generalizes accurately, the framework offers a scalable in silico method for generating novel dynamic stimuli to probe functional selectivity in visual cortex, extending prior static-image work and generating testable predictions for in vivo experiments.

major comments (2)
  1. [Abstract] Abstract: The central claim that evolved videos 'consistently surpass handcrafted localizer videos' and recover known selectivities requires the dynamic encoding model to accurately predict responses to the novel, complex temporal dynamics and abstract features it generates; no held-out correlation, noise-ceiling, or OOD validation metrics are reported for these stimuli.
  2. [Abstract] Abstract: The searchlight analysis claiming 'new insight into the progression toward increasingly complex social-dynamic features along the lateral stream' is presented without quantitative details on the analysis procedure, statistical thresholds, or how the synthesized abstract stimuli specifically support this progression.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on the validation of the dynamic encoding model and the searchlight analysis. We address each point below and will incorporate clarifications and additional analyses in the revised manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that evolved videos 'consistently surpass handcrafted localizer videos' and recover known selectivities requires the dynamic encoding model to accurately predict responses to the novel, complex temporal dynamics and abstract features it generates; no held-out correlation, noise-ceiling, or OOD validation metrics are reported for these stimuli.

    Authors: We acknowledge the importance of OOD validation for the generated stimuli. The dynamic encoding model was trained and cross-validated on large-scale video datasets with standard metrics, but we agree that explicit evaluation on stimuli with complex temporal dynamics would strengthen the claims. In the revision, we will add held-out correlation, noise-ceiling, and OOD performance metrics for the encoding model on a held-out video set that includes abstract and dynamic features similar to NEvo outputs. revision: yes

  2. Referee: [Abstract] Abstract: The searchlight analysis claiming 'new insight into the progression toward increasingly complex social-dynamic features along the lateral stream' is presented without quantitative details on the analysis procedure, statistical thresholds, or how the synthesized abstract stimuli specifically support this progression.

    Authors: We will revise the methods and results sections to provide full quantitative details on the searchlight procedure (including voxel selection, radius, and cross-validation), statistical thresholds (e.g., cluster-corrected p-values), and specific metrics (such as feature complexity scores or activation gradients) demonstrating how the abstract non-naturalistic stimuli support the observed progression along the lateral stream. Additional supplementary figures will illustrate these quantitative results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external encoding model

full rationale

The paper introduces a neural-guided evolutionary search framework that uses a pre-existing dynamic encoding model to predict voxel responses and optimize video stimuli. No equations, parameter fitting steps, or derivations appear in the abstract or described method. The encoding model is treated as an independent input whose accuracy is an external assumption, not derived or fitted within the paper itself. The central claim of discovering hyper-activating stimuli therefore does not reduce to any self-referential construction or self-citation chain. This is the common case of a method whose validity hinges on an external benchmark rather than internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; full manuscript required to audit modeling assumptions such as the structure of the prompt space or the encoding model's training regime.

pith-pipeline@v0.9.1-grok · 5732 in / 1134 out tokens · 23758 ms · 2026-07-03T15:27:28.198284+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

97 extracted references · 52 canonical work pages · 6 internal anchors

  1. [1]

    Cognitive neuroscience of human social behaviour.Nature reviews neuroscience, 4(3):165–178, 2003

    Ralph Adolphs. Cognitive neuroscience of human social behaviour.Nature reviews neuroscience, 4(3):165–178, 2003

  2. [2]

    From language to cognition: How llms outgrow the human language network

    Badr AlKhamissi, Greta Tuckute, Yingtian Tang, Taha Osama A Binhuraib, Antoine Bosselut, and Martin Schrimpf. From language to cognition: How llms outgrow the human language network. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 24332–24350, 2025

  3. [3]

    Social perception from visual cues: role of the sts region.Trends in cognitive sciences, 4(7):267–278, 2000

    Truett Allison, Aina Puce, and Gregory McCarthy. Social perception from visual cues: role of the sts region.Trends in cognitive sciences, 4(7):267–278, 2000

  4. [4]

    V-JEPA 2: Self-Supervised Video Models Enable Understanding, Prediction and Planning

    Mido Assran, Adrien Bardes, David Fan, Quentin Garrido, Russell Howes, Matthew Muckley, Ammar Rizvi, Claire Roberts, Koustuv Sinha, Artem Zholus, et al. V-jepa 2: Self-supervised video models enable understanding, prediction and planning.arXiv preprint arXiv:2506.09985, 2025

  5. [5]

    The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning

    Shahab Bakhtiari, Patrick Mineault, Timothy Lillicrap, Christopher Pack, and Blake Richards. The functional specialization of visual cortex emerges from training parallel pathways with self-supervised predictive learning. InAdvances in Neural Information Processing Systems, volume 34, pages 25164–25178. Curran Associates, Inc., 2021. URL https://papers.ni...

  6. [6]

    Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

    Guangyin Bao, Qi Zhang, Zixuan Gong, Zhuojia Wu, and Duoqian Miao. Mindsimulator: Exploring brain concept localization via synthetic fmri.arXiv preprint arXiv:2503.02351, 2025

  7. [7]

    A map of object space in primate inferotemporal cortex.Nature, 583(7814):103–108, 2020

    Pinglei Bao, Liang She, Mason McGill, and Doris Y Tsao. A map of object space in primate inferotemporal cortex.Nature, 583(7814):103–108, 2020

  8. [8]

    Pouya Bashivan, Kohitij Kar, and James J. DiCarlo. Neural population control via deep image synthesis.Science, May 2019. doi: 10.1126/science.aav9436. URL https://www. science.org/doi/10.1126/science.aav9436. Publisher: American Association for the Advancement of Science

  9. [9]

    Vansteensel, Erik J

    Julia Berezutskaya, Mariska J. Vansteensel, Erik J. Aarnoutse, Zachary V . Freudenburg, Giovanni Piantoni, Mariana P. Branco, and Nick F. Ramsey. Open multimodal iEEG- fMRI dataset from naturalistic stimulation with a short audiovisual film.Scientific Data, 9:91, March 2022. ISSN 2052-4463. doi: 10.1038/s41597-022-01173-0. URL https: //www.ncbi.nlm.nih.go...

  10. [10]

    Born and David C

    Richard T. Born and David C. Bradley. STRUCTURE AND FUNCTION OF VISUAL AREA MT.Annual Review of Neuroscience, 28(1):157–189, July 2005. ISSN 0147-006X, 1545-4126. doi: 10.1146/annurev.neuro.26.041002.131052. URL https://www.annualreviews.org/ doi/10.1146/annurev.neuro.26.041002.131052

  11. [11]

    Cneuromod data collection complete: 200h of individual fmri across diverse naturalistic and controlled tasks to build neuroai models

    Julie A Boyle, Basile Pinsard, and Lune Pierre Bellec. Cneuromod data collection complete: 200h of individual fmri across diverse naturalistic and controlled tasks to build neuroai models

  12. [12]

    Brains and algorithms partially converge in nat- ural language processing.Communications Biology, 5(1):1–10, February 2022

    Charlotte Caucheteux and Jean-Rémi King. Brains and algorithms partially converge in nat- ural language processing.Communications Biology, 5(1):1–10, February 2022. ISSN 2399-

  13. [13]

    URL https://www.nature.com/articles/ s42003-022-03036-1

    doi: 10.1038/s42003-022-03036-1. URL https://www.nature.com/articles/ s42003-022-03036-1. Number: 1 Publisher: Nature Publishing Group

  14. [14]

    Brainactiv: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation.bioRxiv, pages 2024–10, 2024

    Diego García Cerdas, Christina Sartzetaki, Magnus Petersen, Gemma Roig, Pascal Mettes, and Iris Groen. Brainactiv: Identifying visuo-semantic properties driving cortical selectivity using diffusion-based image manipulation.bioRxiv, pages 2024–10, 2024

  15. [15]

    Minkyu Choi, Kuan Han, Xiaokai Wang, Yizhen Zhang, and Zhongming Liu. A dual-stream neural network explains the functional segregation of dorsal and ventral visual pathways in human brains.Advances in Neural Information Processing Systems, 36:50408–50428, 2023. 11

  16. [16]

    Claeys, Delwin T

    Kristl G. Claeys, Delwin T. Lindsey, Erik De Schutter, and Guy A. Orban. A higher order motion region in human inferior parietal lobule: evidence from fMRI.Neuron, 40(3):631–642, October 2003. ISSN 0896-6273. doi: 10.1016/s0896-6273(03)00590-7

  17. [17]

    d'Ascoli, J

    Stéphane d’Ascoli, Jérémy Rapin, Yohann Benchetrit, Hubert Banville, and Jean-Rémi King. Tribe: Trimodal brain encoder for whole-brain fmri response prediction.arXiv preprint arXiv:2507.22229, 2025

  18. [18]

    Functional organization of social perception and cognition in the superior temporal sulcus.Cerebral cortex, 25(11): 4596–4609, 2015

    Ben Deen, Kami Koldewyn, Nancy Kanwisher, and Rebecca Saxe. Functional organization of social perception and cognition in the superior temporal sulcus.Cerebral cortex, 25(11): 4596–4609, 2015

  19. [20]

    The parahippocampal place area: recognition, navigation, or encoding?Neuron, 23(1):115–125, 1999

    Russell Epstein, Alison Harris, Damian Stanley, and Nancy Kanwisher. The parahippocampal place area: recognition, navigation, or encoding?Neuron, 23(1):115–125, 1999

  20. [21]

    A single computational objective drives specialization of streams in visual cortex.bioRxiv, pages 2023–12, 2023

    Dawn Finzi, Eshed Margalit, Kendrick Kay, Daniel LK Yamins, and Kalanit Grill-Spector. A single computational objective drives specialization of streams in visual cortex.bioRxiv, pages 2023–12, 2023

  21. [22]

    FreeSurfer

    Bruce Fischl. FreeSurfer. 62(2):774–781. ISSN 10538119. doi: 10.1016/j.neuroimage.2012.01

  22. [23]

    URLhttps://linkinghub.elsevier.com/retrieve/pii/S1053811912000389

  23. [24]

    The brain basis of language processing: from structure to function

    Angela D Friederici. The brain basis of language processing: from structure to function. Physiological reviews, 2011

  24. [25]

    Gibson.The Ecological Approach to Visual Perception: Classic Edition

    James J. Gibson.The Ecological Approach to Visual Perception: Classic Edition. Psychology Press, November 2014. ISBN 978-1-317-57938-0. Google-Books-ID: 8BSLBQAAQBAJ

  25. [26]

    Catalyzing in silico neuroscience with a toolkit of accurate encoding models of the brain

    Alessandro T Gifford, Domenic Bersch, Daniel Janini, Gemma Roig, and Radoslaw M Cichy. Catalyzing in silico neuroscience with a toolkit of accurate encoding models of the brain

  26. [27]

    The role of human ventral visual cortex in motion perception.Brain, 136(9): 2784–2798, 2013

    Sharon Gilaie-Dotan, Ayse P Saygin, Lauren J Lorenzi, Ryan Egan, Geraint Rees, and Marlene Behrmann. The role of human ventral visual cortex in motion perception.Brain, 136(9): 2784–2798, 2013

  27. [28]

    Glasser, Timothy S

    Matthew F. Glasser, Timothy S. Coalson, Emma C. Robinson, Carl D. Hacker, John Harwell, Essa Yacoub, Kamil Ugurbil, Jesper Andersson, Christian F. Beckmann, Mark Jenkinson, Stephen M. Smith, and David C. Van Essen. A multi-modal parcellation of human cerebral cortex.Nature, 536(7615):171–178, August 2016. ISSN 1476-4687. doi: 10.1038/nature18933. URL http...

  28. [29]

    Scaling laws for task-optimized models of the primate visual ventral stream.arXiv preprint arXiv:2411.05712, 2024

    Abdulkadir Gokce and Martin Schrimpf. Scaling laws for task-optimized models of the primate visual ventral stream.arXiv preprint arXiv:2411.05712, 2024

  29. [30]

    Ariel Goldstein, Zaid Zada, Eliav Buchnik, Mariano Schain, Amy Price, Bobbi Aubrey, Samuel A. Nastase, Amir Feder, Dotan Emanuel, Alon Cohen, Aren Jansen, Harshvard- han Gazula, Gina Choe, Aditi Rao, Catherine Kim, Colton Casto, Lora Fanda, Werner Doyle, Daniel Friedman, Patricia Dugan, Lucia Melloni, Roi Reichart, Sasha Devore, Adeen Flinker, Liat Hasenf...

  30. [31]

    M. A. Goodale and A. D. Milner. Separate visual pathways for perception and action.Trends in Neurosciences, 15(1):20–25, January 1992. ISSN 0166-2236. doi: 10.1016/0166-2236(92) 90344-8. 12

  31. [32]

    Google. Gemini. https://gemini.google.com/, 2026. Large language model; accessed 2026-05-06

  32. [33]

    The human visual cortex.Annu

    Kalanit Grill-Spector and Rafael Malach. The human visual cortex.Annu. Rev. Neurosci., 27 (1):649–677, 2004

  33. [34]

    Brain areas active during visual perception of biological motion.Neuron, 35(6):1167–1175, 2002

    Emily D Grossman and Randolph Blake. Brain areas active during visual perception of biological motion.Neuron, 35(6):1167–1175, 2002

  34. [35]

    Neurogen: activation optimized image synthesis for discovery neuroscience.NeuroImage, 247:118812, 2022

    Zijin Gu, Keith Wakefield Jamison, Meenakshi Khosla, Emily J Allen, Yihan Wu, Ghislain St-Yves, Thomas Naselaris, Kendrick Kay, Mert R Sabuncu, and Amy Kuceyeski. Neurogen: activation optimized image synthesis for discovery neuroscience.NeuroImage, 247:118812, 2022

  35. [36]

    LTX-Video: Realtime Video Latent Diffusion

    Yoav HaCohen, Nisan Chiprut, Benny Brazowski, Daniel Shalem, Dudu Moshe, Eitan Richard- son, Eran Levin, Guy Shiran, Nir Zabari, Ori Gordon, Poriya Panet, Sapir Weissbuch, Victor Kulikov, Yaki Bitterman, Zeev Melumian, and Ofir Bibi. Ltx-video: Realtime video latent diffusion, 2024. URLhttps://arxiv.org/abs/2501.00103

  36. [37]

    Ltx-2: Efficient joint audio-visual foundation model, 2026

    Yoav HaCohen, Benny Brazowski, Nisan Chiprut, Yaki Bitterman, Andrew Kvochko, Avishai Berkowitz, Daniel Shalem, Daphna Lifschitz, Dudu Moshe, Eitan Porat, Eitan Richardson, Guy Shiran, Itay Chachy, Jonathan Chetboun, Michael Finkelson, Michael Kupchick, Nir Zabari, Nitzan Guetta, Noa Kotler, Ofir Bibi, Ori Gordon, Poriya Panet, Roi Benita, Shahar Armon, V...

  37. [38]

    Uri Hasson, Janice Chen, and Christopher J. Honey. Hierarchical process memory: memory as an integral component of information processing.Trends in cognitive sciences, 19(6):304–313, June 2015. ISSN 1364-6613. doi: 10.1016/j.tics.2015.04.006. URL https://www.ncbi.nlm. nih.gov/pmc/articles/PMC4457571/

  38. [39]

    Naturalistic stimuli reveal a dominant role for agentic action in visual representation.Neuroimage, 216:116561, 2020

    James V Haxby, M Ida Gobbini, and Samuel A Nastase. Naturalistic stimuli reveal a dominant role for agentic action in visual representation.Neuroimage, 216:116561, 2020

  39. [40]

    In silico mapping of visual categorical selectivity across the whole brain.arXiv preprint arXiv:2510.21142, 2025

    Ethan Hwang, Hossein Adeli, Wenxuan Guo, Andrew Luo, and Nikolaus Kriegeskorte. In silico mapping of visual categorical selectivity across the whole brain.arXiv preprint arXiv:2510.21142, 2025

  40. [41]

    Perceiving social interactions in the posterior superior temporal sulcus.Proceedings of the National Academy of Sciences, 114(43):E9145–E9152, October 2017

    Leyla Isik, Kami Koldewyn, David Beeler, and Nancy Kanwisher. Perceiving social interactions in the posterior superior temporal sulcus.Proceedings of the National Academy of Sciences, 114(43):E9145–E9152, October 2017. doi: 10.1073/pnas.1714471114

  41. [42]

    Popivanov, Rufin V ogels, Wim Vanduffel, and Guy A

    Jan Jastorff, Ivo D. Popivanov, Rufin V ogels, Wim Vanduffel, and Guy A. Orban. Integration of shape and motion cues in biological motion processing in the monkey STS.NeuroImage, 60(2):911–921, April 2012. ISSN 1053-8119. doi: 10.1016/j.neuroimage.2011.12.087. URL https://www.sciencedirect.com/science/article/pii/S1053811912000286

  42. [43]

    The fusiform face area: a module in human extrastriate cortex specialized for face perception.Journal of neuroscience, 17(11): 4302–4311, 1997

    Nancy Kanwisher, Josh McDermott, and Marvin M Chun. The fusiform face area: a module in human extrastriate cortex specialized for face perception.Journal of neuroscience, 17(11): 4302–4311, 1997

  43. [44]

    Umit Keles, Julien Dubois, Kevin J. M. Le, J. Michael Tyszka, David A. Kahn, Chrystal M. Reed, Jeffrey M. Chung, Adam N. Mamelak, Ralph Adolphs, and Ueli Rutishauser. Multimodal single- neuron, intracranial EEG, and fMRI brain responses during movie watching in human patients. Scientific Data, 11(1):214, February 2024. ISSN 2052-4463. doi: 10.1038/s41597-...

  44. [45]

    Kell, Daniel L.K

    Alexander J.E. Kell, Daniel L.K. Yamins, Erica N. Shook, Sam V . Norman-Haignere, and Josh H. McDermott. A Task-Optimized Neural Network Replicates Human Auditory Behavior, Predicts Brain Responses, and Reveals a Cortical Processing Hierarchy.Neuron, 98(3):630– 644.e16, May 2018. ISSN 08966273. doi: 10.1016/j.neuron.2018.03.044. URL https: //linkinghub.el...

  45. [46]

    Deep supervised, but not unsuper- vised, models may explain IT cortical representation.PLoS computational biology, 10(11): e1003915, November 2014

    Seyed-Mahdi Khaligh-Razavi and Nikolaus Kriegeskorte. Deep supervised, but not unsuper- vised, models may explain IT cortical representation.PLoS computational biology, 10(11): e1003915, November 2014. ISSN 1553-7358. doi: 10.1371/journal.pcbi.1003915

  46. [47]

    Kietzmann, Courtney J

    Tim C. Kietzmann, Courtney J. Spoerer, Lynn K. A. Sörensen, Radoslaw M. Cichy, Olaf Hauk, and Nikolaus Kriegeskorte. Recurrence is required to capture the representational dynamics of the human visual system.Proceedings of the National Academy of Sciences, 116(43):21854– 21863, October 2019. doi: 10.1073/pnas.1905544116. URL https://www.pnas.org/doi/ abs/...

  47. [48]

    HunyuanVideo: A Systematic Framework For Large Video Generative Models

    Weijie Kong, Qi Tian, Zijian Zhang, Rox Min, Zuozhuo Dai, Jin Zhou, Jiangfeng Xiong, Xin Li, Bo Wu, Jianwei Zhang, et al. Hunyuanvideo: A systematic framework for large video generative models.arXiv preprint arXiv:2412.03603, 2024

  48. [49]

    Koo and Mae Y

    Terry K. Koo and Mae Y . Li. A Guideline of Selecting and Reporting Intraclass Cor- relation Coefficients for Reliability Research. 15(2):155–163. ISSN 1556-3707. doi: 10.1016/j.jcm.2016.02.012. URL https://www.sciencedirect.com/science/article/ pii/S1556370716000158

  49. [50]

    Information-based functional brain mapping.Proceedings of the National Academy of Sciences, 103(10):3863–3868, March

    Nikolaus Kriegeskorte, Rainer Goebel, and Peter Bandettini. Information-based functional brain mapping.Proceedings of the National Academy of Sciences, 103(10):3863–3868, March

  50. [51]

    doi: 10.1073/pnas.0600244103

  51. [52]

    Moving and static faces, bod- ies, objects and scenes are differentially represented across the three visual pathways

    Emel Küçük, Matthew Foxwell, Daniel Kaiser, and David Pitcher. Moving and static faces, bod- ies, objects and scenes are differentially represented across the three visual pathways. preprint, Neuroscience, December 2022. URL http://biorxiv.org/lookup/doi/10.1101/2022. 11.30.518408

  52. [53]

    Naturalistic fmri mapping reveals superior temporal sulcus as the hub for the distributed brain network for social perception.Frontiers in human neuroscience, 6:233, 2012

    Juha M Lahnakoski, Enrico Glerean, Juha Salmi, Iiro P Jääskeläinen, Mikko Sams, Riitta Hari, and Lauri Nummenmaa. Naturalistic fmri mapping reveals superior temporal sulcus as the hub for the distributed brain network for social perception.Frontiers in human neuroscience, 6:233, 2012

  53. [54]

    Apurva Ratan Murty, Kendrick Kay, Aude Oliva, and Radoslaw Cichy

    Benjamin Lahner, Kshitij Dwivedi, Polina Iamshchinina, Monika Graumann, Alex Lascelles, Gemma Roig, Alessandro Thomas Gifford, Bowen Pan, SouYoung Jin, N. Apurva Ratan Murty, Kendrick Kay, Aude Oliva, and Radoslaw Cichy. Modeling short visual events through the BOLD moments video fMRI dataset and metadata.Nature Communications, 15(1):6241, July

  54. [55]

    doi: 10.1038/s41467-024-50310-3

    ISSN 2041-1723. doi: 10.1038/s41467-024-50310-3. URL https://www.nature. com/articles/s41467-024-50310-3. Publisher: Nature Publishing Group

  55. [56]

    Anumanchipalli, Abdelrahman Mohamed, Peili Chen, Laurel H

    Yuanning Li, Gopala K. Anumanchipalli, Abdelrahman Mohamed, Peili Chen, Laurel H. Carney, Junfeng Lu, Jinsong Wu, and Edward F. Chang. Dissecting neural computations in the human auditory pathway using deep neural networks for speech.Nature Neuroscience, 26(12):2213– 2225, December 2023. ISSN 1097-6256, 1546-1726. doi: 10.1038/s41593-023-01468-4. URL http...

  56. [57]

    Brain diffusion for visual exploration: Cortical discovery using large scale generative models.Advances in Neural Information Processing Systems, 36:75740–75781, 2023

    Andrew Luo, Maggie Henderson, Leila Wehbe, and Michael Tarr. Brain diffusion for visual exploration: Cortical discovery using large scale generative models.Advances in Neural Information Processing Systems, 36:75740–75781, 2023

  57. [58]

    Forming inferences about some intraclass correlation coefficients.Psychological methods, 1(1):30, 1996

    Kenneth O McGraw and Seok P Wong. Forming inferences about some intraclass correlation coefficients.Psychological methods, 1(1):30, 1996

  58. [59]

    Seeing social interactions.Trends in Cognitive Sciences, 27(12):1165–1179, December 2023

    Emalie McMahon and Leyla Isik. Seeing social interactions.Trends in Cognitive Sciences, 27(12):1165–1179, December 2023. ISSN 1364-6613, 1879-307X. doi: 10.1016/j.tics. 2023.09.001. URL https://www.cell.com/trends/cognitive-sciences/abstract/ S1364-6613(23)00248-6. Publisher: Elsevier

  59. [60]

    Bonner, and Leyla Isik

    Emalie McMahon, Michael F. Bonner, and Leyla Isik. Hierarchical organization of social action features along the lateral visual pathway.Current Biology, 33(23):5035–5047.e8, December 2023. ISSN 0960-9822. doi: 10.1016/j.cub.2023.10.015. URL https://www. sciencedirect.com/science/article/pii/S0960982223013738. 14

  60. [61]

    Oup Oxford, 2006

    David Milner and Mel Goodale.The visual brain in action, volume 27. Oup Oxford, 2006

  61. [62]

    Ungerleider, and Kathleen A

    Mortimer Mishkin, Leslie G. Ungerleider, and Kathleen A. Macko. Object vision and spatial vision: two cortical pathways.Trends in Neurosciences, 6:414–417, January 1983. ISSN 0166-2236. doi: 10.1016/0166-2236(83)90190-X. URL https://www.sciencedirect. com/science/article/pii/016622368390190X

  62. [63]

    Moments in time dataset: one million videos for event understanding.IEEE transactions on pattern analysis and machine intelligence, 42(2):502–508, 2019

    Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfreund, Carl V ondrick, et al. Moments in time dataset: one million videos for event understanding.IEEE transactions on pattern analysis and machine intelligence, 42(2):502–508, 2019

  63. [64]

    Moments in Time Dataset: one million videos for event understanding

    Mathew Monfort, Alex Andonian, Bolei Zhou, Kandan Ramakrishnan, Sarah Adel Bargal, Tom Yan, Lisa Brown, Quanfu Fan, Dan Gutfruend, Carl V ondrick, and Aude Oliva. Moments in Time Dataset: one million videos for event understanding, February 2019. URL http: //arxiv.org/abs/1801.03150. arXiv:1801.03150 [cs]

  64. [65]

    Task-Driven Convolutional Recurrent Models of the Vi- sual System

    Aran Nayebi, Daniel Bear, Jonas Kubilius, Kohitij Kar, Surya Ganguli, David Sussillo, James J DiCarlo, and Daniel L Yamins. Task-Driven Convolutional Recurrent Models of the Vi- sual System. InAdvances in Neural Information Processing Systems, volume 31. Curran Associates, Inc., 2018. URL https://proceedings.neurips.cc/paper/2018/hash/ 6be93f7a96fed60c477...

  65. [66]

    Subotnik, Lauren Cassani Davis, and Frank C

    Paula Olszewski-Kubilius, Rena F. Subotnik, Lauren Cassani Davis, and Frank C. Worrell. Benchmarking Psychosocial Skills Important for Talent Development.New Directions for Child and Adolescent Development, 2019(168):161–176, 2019. ISSN 1534-8687. doi: 10.1002/ cad.20318. URL https://onlinelibrary.wiley.com/doi/abs/10.1002/cad.20318. _eprint: https://onli...

  66. [67]

    Orban, Alessia Sepe, and Luca Bonini

    Guy A. Orban, Alessia Sepe, and Luca Bonini. Parietal maps of visual signals for bod- ily action planning.Brain Structure and Function, 226(9):2967–2988, December 2021. ISSN 1863-2661. doi: 10.1007/s00429-021-02378-6. URL https://doi.org/10.1007/ s00429-021-02378-6

  67. [68]

    Energy guided diffusion for generating neurally exciting images.Advances in Neural Information Processing Systems, 36:32574–32601, 2023

    Pawel Pierzchlewicz, Konstantin Willeke, Arne Nix, Pavithra Elumalai, Kelli Restivo, Tori Shinn, Cate Nealley, Gabrielle Rodriguez, Saumil Patel, Katrin Franke, et al. Energy guided diffusion for generating neurally exciting images.Advances in Neural Information Processing Systems, 36:32574–32601, 2023

  68. [69]

    Ungerleider

    David Pitcher and Leslie G. Ungerleider. Evidence for a Third Visual Pathway Specialized for Social Perception.Trends in Cognitive Sciences, 25(2):100–110, February 2021. ISSN 1364- 6613, 1879-307X. doi: 10.1016/j.tics.2020.11.006. URL https://www.cell.com/trends/ cognitive-sciences/abstract/S1364-6613(20)30278-3. Publisher: Elsevier

  69. [70]

    Dilks, Rebecca R

    David Pitcher, Daniel D. Dilks, Rebecca R. Saxe, Christina Triantafyllou, and Nancy Kanwisher. Differential selectivity for dynamic versus static information in face-selective cortical regions. NeuroImage, 56(4):2356–2363, June 2011. ISSN 1053-8119. doi: 10.1016/j.neuroimage.2011. 03.067

  70. [71]

    Ponce, Will Xiao, Peter F

    Carlos R. Ponce, Will Xiao, Peter F. Schade, Till S. Hartmann, Gabriel Kreiman, and Margaret S. Livingstone. Evolving Images for Visual Neurons Using a Deep Generative Network Reveals Coding Principles and Neuronal Preferences.Cell, 177(4):999–1009.e10, May 2019. ISSN 0092-8674. doi: 10.1016/j.cell.2019.04.005. URL https://www.sciencedirect.com/ science/a...

  71. [72]

    Learning transferable visual models from natural language supervision

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

  72. [73]

    Computational models of category-selective brain regions enable high-throughput tests of selectivity.Nature communications, 12(1):5540, 2021

    N Apurva Ratan Murty, Pouya Bashivan, Alex Abate, James J DiCarlo, and Nancy Kanwisher. Computational models of category-selective brain regions enable high-throughput tests of selectivity.Nature communications, 12(1):5540, 2021. 15

  73. [74]

    Ungerleider, and Maryam Vaziri-Pashkam

    Sophia Robert, Leslie G. Ungerleider, and Maryam Vaziri-Pashkam. Disentangling Ob- ject Category Representations Driven by Dynamic and Static Visual Input.The Jour- nal of Neuroscience, 43(4):621–634, January 2023. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.0371-22.2022. URL https://www.jneurosci.org/lookup/doi/ 10.1523/JNEUROSCI.0371-22.2022

  74. [75]

    Christina Sartzetaki, Gemma Roig, Cees G. M. Snoek, and Iris I. A. Groen. One Hun- dred Neural Networks and Brains Watching Videos: Lessons from Alignment, Decem- ber 2024. URL https://www.biorxiv.org/content/10.1101/2024.12.05.626975v1. Pages: 2024.12.05.626975 Section: New Results

  75. [76]

    One hundred neural networks and brains watching videos: Lessons from alignment

    Christina Sartzetaki, Gemma Roig, Cees GM Snoek, and Iris Groen. One hundred neural networks and brains watching videos: Lessons from alignment. InThe Thirteenth International Conference on Learning Representations, 2025

  76. [77]

    The human brain as a dynamic mixture of expert models in video understanding.bioRxiv, pages 2025–10, 2025

    Christina Sartzetaki, Anne W Zonneveld, Pablo Oyarzo, Alessandro T Gifford, Radoslaw M Cichy, Pascal Mettes, and Iris IA Groen. The human brain as a dynamic mixture of expert models in video understanding.bioRxiv, pages 2025–10, 2025

  77. [78]

    Adversarial diffusion distillation, 2023

    Axel Sauer, Dominik Lorenz, Andreas Blattmann, and Robin Rombach. Adversarial diffusion distillation, 2023. URLhttps://arxiv.org/abs/2311.17042

  78. [79]

    Individual differences in neural event segmentation of continuous experiences.Cerebral Cortex, 33(13):8164–8178, July 2023

    Clara Sava-Segal, Chandler Richards, Megan Leung, and Emily S Finn. Individual differences in neural event segmentation of continuous experiences.Cerebral Cortex, 33(13):8164–8178, July 2023. ISSN 1047-3211. doi: 10.1093/cercor/bhad106. URL https://doi.org/10. 1093/cercor/bhad106

  79. [80]

    Majaj, Rishi Rajalingham, Elias B

    Martin Schrimpf, Jonas Kubilius, Ha Hong, Najib J. Majaj, Rishi Rajalingham, Elias B. Issa, Kohitij Kar, Pouya Bashivan, Jonathan Prescott-Roy, Franziska Geiger, Kailyn Schmidt, Daniel L. K. Yamins, and James J. DiCarlo. Brain-Score: Which Artificial Neural Network for Object Recognition is most Brain-Like? preprint, Neuroscience, September 2018. URL http...

  80. [81]

    Martin Schrimpf, Jonas Kubilius, Michael J. Lee, N. Apurva Ratan Murty, Robert Ajemian, and James J. DiCarlo. Integrative Benchmarking to Advance Neurally Mechanistic Models of Human Intelligence.Neuron, 108(3):413–423, November 2020. ISSN 0896-6273. doi: 10.1016/j. neuron.2020.07.040. URL https://www.cell.com/neuron/abstract/S0896-6273(20) 30605-X. Publi...

Showing first 80 references.