Mechanistically Interpretable Neural Encoding Reveals Fine-Grained Functional Selectivity in Human Visual Cortex
Pith reviewed 2026-05-20 19:31 UTC · model grok-4.3
The pith
Mechanistic interpretability applied to neural encoding models identifies the specific visual features driving responses in individual human brain voxels.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Mechanistically Interpretable Neural Encoding (MINE) opens black-box encoding models by applying attribution techniques to language-aligned image representations, thereby localizing the visual features inside natural images that drive each voxel's response. These per-image feature descriptions suffice to synthesize new images that produce voxel activations statistically indistinguishable from the originals, outperforming random or low-attribution controls. Counterfactual insertion or deletion of the identified features shifts voxel activation in the expected direction; the same edits guided by the aggregated per-voxel activation profiles produce still larger shifts, confirming that the per-v
What carries the argument
Mechanistically Interpretable Neural Encoding (MINE), which extracts and semantically describes image features via mechanistic interpretability applied to language-aligned neural network representations in order to predict and causally test voxel-level responses.
If this is right
- Counterfactual editing guided by per-voxel activation profiles yields stronger and more reliable shifts in measured brain activity than edits based on single-image features.
- The same framework recovers the expected categorical preferences of well-studied regions while exposing distinct selectivity patterns unique to each voxel inside those regions.
- Images synthesized solely from the localized features elicit voxel responses that match the originals more accurately than control images.
- Generalizing per-image attributions into per-voxel profiles produces faithful summaries of each voxel's functional selectivity.
Where Pith is reading between the lines
- Language-aligned models may capture semantic structure that aligns more closely with human visual cortex than purely vision-trained networks.
- The causal-editing validation step could be adapted to test feature hypotheses in other brain regions or sensory modalities.
- Fine-grained voxel-level profiles might eventually support more precise brain-computer interface decoding or stimulation.
- The method supplies a concrete route for generating and testing new, testable hypotheses about visual coding that go beyond existing category-level descriptions.
Load-bearing premise
The attributions extracted from the artificial network correctly mark the image features that actually cause the observed human voxel responses rather than reflecting model-specific artifacts.
What would settle it
Counterfactual edits that insert or remove the predicted features fail to shift voxel activation in the direction the model forecasts, or images generated from those features produce responses no closer to the originals than images generated from random or low-attribution controls.
Figures
read the original abstract
A central goal in understanding human vision is to uncover the visual features that drive neuronal activity. A growing body of work has used artificial neural networks as encoding models to predict cortical responses to natural images, revealing the visual content that activates category-selective regions. However, existing approaches are largely correlational and treat the encoder as a black box, leaving open which image features drive each voxel's response. We introduce Mechanistically Interpretable Neural Encoding (MINE), a framework that opens this black box by applying mechanistic-interpretability tools to localize the features within natural images that drive millimeter-scale (voxel-level) activity. MINE predicts each voxel's response using language-aligned image representations, and produces semantically interpretable descriptions of the features critical for the voxel's activation. We further generalize these per-image features into per-voxel functional profiles. To validate the per-image descriptions, we show they are sufficient to generate images that elicit voxel responses matching the responses to the original images, more accurately than images generated from random or low-attribution controls. Moreover, counterfactually inserting or removing the predicted features from images shifts activation in the expected direction, providing causal evidence. Counterfactual editing guided by the per-voxel activation profiles produces even stronger activation shifts, indicating that the profiles faithfully capture each voxel's selectivity. Finally, we apply MINE to well-studied category-selective brain regions, showing it recovers their known categorical preferences while revealing fine-grained unique voxel structure within each region. Overall, our results establish mechanistic interpretability as a path to discover and causally validate fine-grained hypotheses about neural function.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Mechanistically Interpretable Neural Encoding (MINE), a framework that applies mechanistic-interpretability tools to language-aligned image representations from neural networks to predict voxel-level responses in human visual cortex and extract semantically interpretable feature descriptions. It validates these attributions by showing that images generated from the descriptions elicit matching voxel responses (superior to random or low-attribution controls) and that counterfactual insertion or removal of the predicted features shifts activations in the expected direction. Per-voxel functional profiles derived from these attributions produce even stronger shifts, and the method is applied to category-selective regions to recover known preferences while revealing fine-grained voxel structure.
Significance. If the causal claims survive rigorous controls for editing confounds, the work would meaningfully advance the integration of mechanistic interpretability with systems neuroscience by moving encoding models from correlational to causally testable accounts of fine-grained functional selectivity. The combination of per-image attributions, generative validation, and profile-guided counterfactuals provides a concrete path for hypothesis generation and testing that could generalize beyond visual cortex.
major comments (2)
- [Validation / Counterfactual experiments] The central causal interpretation rests on the counterfactual editing results (described in the abstract and validation sections). The manuscript does not report quantitative controls demonstrating that edits alter only the target attributed features while leaving low- and mid-level statistics (contrast, texture, spatial frequency, object co-occurrence) unchanged; without such metrics or matched non-semantic edit controls, directionally correct voxel shifts could arise from un-attributed confounds rather than the MINE attributions themselves.
- [Results on per-voxel profiles] The per-voxel activation profile results inherit the same vulnerability: stronger shifts are reported, yet the editing procedure used to test them is not shown to isolate the profile-derived features. A load-bearing test would be to compare against edits that preserve the profile but scramble the specific attribution map, or to quantify residual changes in non-profile features.
minor comments (2)
- [Abstract] The abstract refers to 'language-aligned image representations' without naming the specific model, alignment objective, or layer used; this notation should be clarified in the main text with a reference to the exact architecture.
- [Abstract and Results] Quantitative effect sizes (e.g., mean activation change in standard deviations, statistical tests against controls) are not summarized for the generation or counterfactual experiments; adding these would improve readability.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which identify key opportunities to strengthen the causal interpretations in our validation experiments. We respond to each major comment below and will incorporate the suggested controls into the revised manuscript.
read point-by-point responses
-
Referee: [Validation / Counterfactual experiments] The central causal interpretation rests on the counterfactual editing results (described in the abstract and validation sections). The manuscript does not report quantitative controls demonstrating that edits alter only the target attributed features while leaving low- and mid-level statistics (contrast, texture, spatial frequency, object co-occurrence) unchanged; without such metrics or matched non-semantic edit controls, directionally correct voxel shifts could arise from un-attributed confounds rather than the MINE attributions themselves.
Authors: We agree that additional quantitative controls are necessary to rule out low- and mid-level confounds. Although our existing random and low-attribution controls provide initial evidence of specificity, they do not directly quantify preservation of low-level statistics. In the revised manuscript we will add explicit metrics comparing contrast, texture, spatial frequency, and object co-occurrence between original and edited images. We will also introduce matched non-semantic edit controls (e.g., low-level perturbations that preserve semantic content) to demonstrate that activation shifts are driven by the attributed features rather than incidental image statistics. revision: yes
-
Referee: [Results on per-voxel profiles] The per-voxel activation profile results inherit the same vulnerability: stronger shifts are reported, yet the editing procedure used to test them is not shown to isolate the profile-derived features. A load-bearing test would be to compare against edits that preserve the profile but scramble the specific attribution map, or to quantify residual changes in non-profile features.
Authors: This is a fair and important point for validating the per-voxel profiles. We will add the requested controls in the revision: (1) edits that preserve the overall profile statistics but scramble the specific per-image attribution maps, and (2) quantification of residual changes in non-profile features. These analyses will directly test whether the stronger activation shifts arise from the profile-guided features rather than from unaccounted confounds. revision: yes
Circularity Check
No circularity: validations depend on independent external brain measurements
full rationale
The paper introduces MINE to predict voxel responses from language-aligned representations and derives per-image feature descriptions plus per-voxel profiles. Validation proceeds by generating images from those descriptions and measuring actual fMRI voxel responses, plus performing counterfactual feature insertion/removal on images and observing directional shifts in measured human brain activations. These steps rely on external empirical data collected independently of the model's internal attributions or any fitted parameters within the present work. No equations or claims reduce a prediction to a fitted input by construction, no self-citation chain bears the central load, and no ansatz or uniqueness result is imported from prior author work to force the outcome. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
D. Marr and T. Poggio. From Understanding Computation to Understanding Neural Circuitry. May 1976
work page 1976
-
[2]
D. Marr. Visual information processing: The structure and creation of visual representations. Philosophical Transactions of the Royal Society of London. B, Biological Sciences, 290(1038): 199–218, July 1980. ISSN 0080-4622. doi: 10.1098/rstb.1980.0091
-
[3]
Nancy Kanwisher, Josh McDermott, and Marvin M Chun. The fusiform face area: A module in human extrastriate cortex specialized for face perception.Journal of neuroscience, 17(11): 4302–4311, 1997
work page 1997
-
[4]
A cortical representation of the local visual environment
Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392(6676):598–601, 1998. ISSN 1476-4687. doi: 10.1038/33402
-
[5]
Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher
Paul E. Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher. A Cortical Area Selective for Visual Processing of the Human Body.Science, 293(5539):2470–2473, September
-
[6]
URL https://www.science.org/doi/10.1126/ science.1063414
ISSN 0036-8075, 1095-9203. doi: 10.1126/science.1063414
-
[7]
LaVCa: LLM-assisted Visual Cortex Captioning.arXiv preprint arXiv:2502.13606, 2025
Takuya Matsuyama, Shinji Nishimoto, and Yu Takagi. LaVCa: LLM-assisted Visual Cortex Captioning.arXiv preprint arXiv:2502.13606, 2025
-
[8]
In Silico Mapping of Visual Categorical Selectivity Across the Whole Brain, October 2025
Ethan Hwang, Hossein Adeli, Wenxuan Guo, Andrew Luo, and Nikolaus Kriegeskorte. In Silico Mapping of Visual Categorical Selectivity Across the Whole Brain, October 2025
work page 2025
-
[9]
Navve Wasserman, Matias Cosarinsky, Yuval Golbari, Aude Oliva, Antonio Torralba, Tamar Rott Shaham, and Michal Irani. BrainExplore: Large-Scale Discovery of Interpretable Visual Representations in the Human Brain, December 2025
work page 2025
-
[10]
Andrew F Luo, Margaret M Henderson, Michael J Tarr, and Leila Wehbe. Brainscuba: Fine- grained natural language captions of visual cortex selectivity.arXiv preprint arXiv:2310.04420, 2023
-
[11]
CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex
Guoyuan Yang, Mufan Xue, Ziming Mao, Haofang Zheng, Jia Xu, Dabin Sheng, Ruotian Sun, Ruoqi Yang, and Xuesong Li. CLIP-MSM: A Multi-Semantic Mapping Brain Representation for Human High-Level Visual Cortex. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 9184–9192, 2025
work page 2025
-
[12]
Andrew Luo, Maggie Henderson, Leila Wehbe, and Michael Tarr. Brain diffusion for visual exploration: Cortical discovery using large scale generative models.Advances in Neural Information Processing Systems, 36:75740–75781, 2023. 10
work page 2023
-
[13]
Tan Gao, Mufan Xue, Haofang Zheng, Shuo Lv, Jia Xu, Dabin Sheng, Ziming Mao, Xinyu Wu, Andrew Luo, and Guoyuan Yang. BrainLMM: A Label-Free Framework for Mapping Multi- Semantic Representation in the Human Visual Cortex.Proceedings of the AAAI Conference on Artificial Intelligence, 40(6):4176–4184, March 2026. ISSN 2374-3468. doi: 10.1609/aaai. v40i6.42413
-
[14]
Luo, Jacob Yeung, Rushikesh Zawar, Shaurya Dewan, Margaret M
Andrew F. Luo, Jacob Yeung, Rushikesh Zawar, Shaurya Dewan, Margaret M. Henderson, Leila Wehbe, and Michael J. Tarr. Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers, June 2025
work page 2025
-
[15]
Transformer brain encoders explain human high-level visual responses, May 2025
Hossein Adeli, Minni Sun, and Nikolaus Kriegeskorte. Transformer brain encoders explain human high-level visual responses, May 2025
work page 2025
-
[16]
Towards Interpretable Visual Decoding with Attention to Brain Representa- tions, September 2025
Pinyuan Feng, Hossein Adeli, Wenxuan Guo, Fan Cheng, Ethan Hwang, and Nikolaus Kriegeskorte. Towards Interpretable Visual Decoding with Attention to Brain Representa- tions, September 2025
work page 2025
-
[17]
An Interpretability Illusion for BERT, April 2021
Tolga Bolukbasi, Adam Pearce, Ann Yuan, Andy Coenen, Emily Reif, Fernanda Viégas, and Martin Wattenberg. An Interpretability Illusion for BERT, April 2021
work page 2021
-
[18]
Causal Analysis for Robust Interpretability of Neural Networks
Ola Ahmad, Nicolas Béreux, Loïc Baret, Vahid Hashemi, and Freddy Lecue. Causal Analysis for Robust Interpretability of Neural Networks. InProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 4685–4694, 2024
work page 2024
-
[19]
Lee Sharkey, Bilal Chughtai, Joshua Batson, Jack Lindsey, Jeff Wu, Lucius Bushnaq, Nicholas Goldowsky-Dill, Stefan Heimersheim, Alejandro Ortega, Joseph Bloom, Stella Biderman, Adria Garriga-Alonso, Arthur Conmy, Neel Nanda, Jessica Rumbelow, Martin Wattenberg, Nandi Schoots, Joseph Miller, Eric J. Michaud, Stephen Casper, Max Tegmark, William Saunders, D...
work page 2025
-
[20]
Adam Davies and Ashkan Khakzar. The Cognitive Revolution in Interpretability: From Explaining Behavior to Interpreting Representations and Algorithms, August 2024
work page 2024
-
[21]
Interpreting GPT: The logit lens — LessWrong
nostalgebraist. Interpreting GPT: The logit lens — LessWrong. August 2020
work page 2020
-
[22]
Muquan Yu, Mu Nan, Hossein Adeli, Jacob S. Prince, John A. Pyles, Leila Wehbe, Margaret M. Henderson, Michael J. Tarr, and Andrew F. Luo. Meta-Learning an In-Context Transformer Model of Human Higher Visual Cortex, November 2025
work page 2025
-
[23]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational Conference on Machine Learning, pages 8748–8763. PmLR, 2021
work page 2021
-
[24]
Wang, Kendrick Kay, Thomas Naselaris, Michael J
Aria Y . Wang, Kendrick Kay, Thomas Naselaris, Michael J. Tarr, and Leila Wehbe. Better models of human high-level visual cortex emerge from natural language supervision with a large and diverse dataset.Nature Machine Intelligence 2023 5:12, 5(12):1415–1426, November
work page 2023
-
[25]
Wang, Kendrick Kay, Thomas Naselaris, Michael J
ISSN 2522-5839. doi: 10.1038/s42256-023-00753-y
-
[26]
arXiv preprint arXiv:2204.10965 , year=
Tuomas Oikarinen and Tsui-Wei Weng. Clip-dissect: Automatic description of neuron repre- sentations in deep vision networks.arXiv preprint arXiv:2204.10965, 2022
-
[27]
Ron Mokady, Amir Hertz, and Amit H. Bermano. ClipCap: CLIP Prefix for Image Captioning. November 2021
work page 2021
-
[28]
Emogen: Emotional image content generation with text-to-image diffusion models,
Jack Urbanek, Florian Bordes, Pietro Astolfi, Mary Williamson, Vasu Sharma, and Adriana Romero-Soriano. A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions. In2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 26690–26699, Seattle, W A, USA, June 2024. IEEE. ISBN 979-8-3503-5300-6. doi...
-
[29]
Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual Instruction Tuning. Advances in Neural Information Processing Systems, 36, April 2023. ISSN 10495258. 11
work page 2023
-
[30]
Improved Baselines with Visual Instruction Tuning, 2024
Haotian Liu, Chunyuan Li, Yuheng Li, and Yong Jae Lee. Improved Baselines with Visual Instruction Tuning, 2024
work page 2024
-
[31]
Towards Interpreting Visual Information Processing in Vision-Language Models
Clement Neo, Luke Ong, Philip Torr, Mor Geva, David Krueger, and Fazl Barez. Towards Interpreting Visual Information Processing in Vision-Language Models. October 2024
work page 2024
-
[32]
Allen, Ghislain St-Yves, Yihan Wu, Jesse L
Emily J. Allen, Ghislain St-Yves, Yihan Wu, Jesse L. Breedlove, Jacob S. Prince, Logan T. Dow- dle, Matthias Nau, Brad Caron, Franco Pestilli, Ian Charest, J. Benjamin Hutchinson, Thomas Naselaris, and Kendrick Kay. A massive 7T fMRI dataset to bridge cognitive neuroscience and artificial intelligence.Nature Neuroscience, 25(1):116–126, January 2022. ISSN...
-
[33]
Guy Gaziv, Roman Beliy, Niv Granot, Assaf Hoogi, Francesca Strappini, Tal Golan, and Michal Irani. Self-supervised Natural Image Reconstruction and Large-scale Semantic Classification from Brain Activity.NeuroImage, 254:119121, July 2022. ISSN 1053-8119. doi: 10.1016/J. NEUROIMAGE.2022.119121
work page doi:10.1016/j 2022
-
[34]
Guy Gaziv and Michal Irani. More than meets the eye: Self-supervised depth reconstruction from brain activity.arXiv preprint arXiv:2106.05113, 2021
-
[35]
Sarthak Jain and Byron C. Wallace. Attention is not Explanation, May 2019
work page 2019
-
[36]
Kevin Meng, David Bau, Alex Andonian, and Yonatan Belinkov. Locating and Editing Factual Associations in GPT.Advances in Neural Information Processing Systems, 35, 2022. ISSN 10495258
work page 2022
-
[37]
Mor Geva, Avi Caciularu, Kevin Ro Wang, and Yoav Goldberg. Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the V ocabulary Space.Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, pages 30–45, March 2022. doi: 10.18653/v1/2022.emnlp-main.3
-
[38]
Analyzing Transformers in Embedding Space, 2023
Guy Dar, Mor Geva, Ankit Gupta, and Jonathan Berant. Analyzing Transformers in Embedding Space, 2023
work page 2023
-
[39]
Mor Geva, Jasmijn Bastings, Katja Filippova, and Amir Globerson. Dissecting Recall of Factual Associations in Auto-Regressive Language Models.EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings, pages 12216–12235, April 2023. doi: 10.18653/v1/2023.emnlp-main.751
-
[40]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention Is All You Need.Advances in Neural Information Processing Systems, 2017-December:5999–6009, June 2017. ISSN 10495258. doi: 10.48550/ arxiv.1706.03762
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[41]
Yuval Ran-Milo. Attention Sinks Are Provably Necessary in Softmax Transformers: Evidence from Trigger-Conditional Tasks, March 2026
work page 2026
-
[42]
Why do LLMs attend to the first token?, August 2025
Federico Barbero, Álvaro Arroyo, Xiangming Gu, Christos Perivolaropoulos, Michael Bronstein, Petar Veliˇckovi´c, and Razvan Pascanu. Why do LLMs attend to the first token?, August 2025
work page 2025
-
[43]
Tianyu Guo, Druv Pai, Yu Bai, Jiantao Jiao, Michael I. Jordan, and Song Mei. Active-Dormant Attention Heads: Mechanistically Demystifying Extreme-Token Phenomena in LLMs, Novem- ber 2024
work page 2024
-
[44]
The Linear Representation Hypothesis and the Geometry of Large Language Models, July 2024
Kiho Park, Yo Joong Choe, and Victor Veitch. The Linear Representation Hypothesis and the Geometry of Large Language Models, July 2024
work page 2024
-
[45]
Enhancing Auto- mated Interpretability with Output-Centric Feature Descriptions, May 2025
Yoav Gur-Arieh, Roy Mayan, Chen Agassy, Atticus Geiger, and Mor Geva. Enhancing Auto- mated Interpretability with Output-Centric Feature Descriptions, May 2025
work page 2025
-
[46]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InComputer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part v 13, pages 740–755. Springer, 2014. 12
work page 2014
-
[47]
Liang Wang, Ryan E.B. Mruczek, Michael J. Arcaro, and Sabine Kastner. Probabilistic Maps of Visual Topography in Human Cortex.Cerebral Cortex, 25(10):3911–3931, October 2015. ISSN 1047-3211. doi: 10.1093/CERCOR/BHU277
-
[48]
The Wisdom of a Crowd of Brains: A Universal Brain Encoder, March 2025
Roman Beliy, Navve Wasserman, Amit Zalcher, and Michal Irani. The Wisdom of a Crowd of Brains: A Universal Brain Encoder, March 2025
work page 2025
-
[49]
Axiomatic Attribution for Deep Networks
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic Attribution for Deep Networks. InProceedings of the 34th International Conference on Machine Learning, pages 3319–3328. PMLR, July 2017
work page 2017
-
[50]
Anthropic. Introducing Claude Haiku 4.5. https://www.anthropic.com/news/claude-haiku-4-5, October 2025
work page 2025
-
[51]
Hsu, Richard Antonello, Shailee Jain, Alexander G
Chandan Singh, Aliyah R. Hsu, Richard Antonello, Shailee Jain, Alexander G. Huth, Bin Yu, and Jianfeng Gao. Explaining black box text modules in natural language with language models, November 2023
work page 2023
- [52]
-
[53]
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space, June 2025
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. FLUX.1 Kontext: Flow Matching for In-Context Image ...
work page 2025
-
[54]
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms, July 2024
Michael Hanna, Sandro Pezzelle, and Yonatan Belinkov. Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms, July 2024
work page 2024
-
[55]
S. Priebe and F. Röhricht. Specific body image pathology in acute schizophrenia.Psychiatry Research, 101(3), 2001. ISSN 01651781. doi: 10.1016/S0165-1781(01)00214-1
-
[56]
URL https://www.nature.com/articles/s41467-024-53147-y
Colin Conwell, Jacob S. Prince, Kendrick N. Kay, George A. Alvarez, and Talia Konkle. A large-scale examination of inductive biases shaping high-level visual representation in brains and machines.Nature communications, 15(1):1–18, October 2024. ISSN 2041-1723. doi: 10.1038/s41467-024-53147-y
-
[57]
Weiner, and Kalanit Grill-Spector
Anthony Stigliani, Kevin S. Weiner, and Kalanit Grill-Spector. Temporal Processing Capacity in High-Level Visual Cortex Is Domain Specific.Journal of Neuroscience, 35(36):12412–12424, September 2015. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.4822-14.2015
-
[58]
Deep residual learning for image recognition,
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, volume 2016-December, 2016. doi: 10.1109/CVPR.2016.90
-
[59]
Root mean square layer normalization.Advances in Neural Information Processing Systems, 32, 2019
Biao Zhang and Rico Sennrich. Root mean square layer normalization.Advances in Neural Information Processing Systems, 32, 2019
work page 2019
-
[60]
On Layer Normalization in the Transformer Architecture, June 2020
Ruibin Xiong, Yunchang Yang, Di He, Kai Zheng, Shuxin Zheng, Chen Xing, Huishuai Zhang, Yanyan Lan, Liwei Wang, and Tie-Yan Liu. On Layer Normalization in the Transformer Architecture, June 2020
work page 2020
-
[61]
Ilya Loshchilov and Frank Hutter. Decoupled Weight Decay Regularization.7th International Conference on Learning Representations, ICLR 2019, November 2017
work page 2019
-
[62]
Super-convergence: Very fast training of neural networks using large learning rates
Leslie N Smith and Nicholay Topin. Super-convergence: Very fast training of neural networks using large learning rates. InArtificial Intelligence and Machine Learning for Multi-Domain Operations Applications, volume 11006, pages 369–386. SPIE, 2019
work page 2019
-
[63]
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, Alban Desmaison, Andreas Köpf, Edward Yang, Zach DeVito, Martin Raison, Alykhan Tejani, Sasank Chilamkurthy, Benoit Steiner, Lu Fang, Junjie Bai, and Soumith Chintala. PyTorch: An Imperative Style, High-Performa...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1912.01703 2019
-
[64]
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mahmoud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Hervé Jegou, Julien Mairal, Patrick ...
work page 2023
-
[65]
Vision- Language Models Align with Human Neural Representations in Concept Processing, January 2026
Anna Bavaresco, Marianne de Heer Kloots, Sandro Pezzelle, and Raquel Fernández. Vision- Language Models Align with Human Neural Representations in Concept Processing, January 2026
work page 2026
-
[66]
Adva Shoham, Rotem Broday-Dvir, Rafael Malach, and Galit Yovel. High-level visual cortex representations are organized along visual rather than abstract principles, April 2025
work page 2025
-
[67]
Team OLMo, Pete Walsh, Luca Soldaini, Dirk Groeneveld, Kyle Lo, Shane Arora, Akshita Bha- gia, Yuling Gu, Shengyi Huang, Matt Jordan, Nathan Lambert, Dustin Schwenk, Oyvind Tafjord, Taira Anderson, David Atkinson, Faeze Brahman, Christopher Clark, Pradeep Dasigi, Nouha Dziri, Allyson Ettinger, Michal Guerquin, David Heineman, Hamish Ivison, Pang Wei Koh, ...
work page 2025
-
[68]
HuggingFace's Transformers: State-of-the-art Natural Language Processing
Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, et al. Huggingface’s transform- ers: State-of-the-art natural language processing.arXiv preprint arXiv:1910.03771, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[69]
Diffusers: State-of-the-art diffusion models, 2022
Patrick von Platen, Suraj Patil, Anton Lozhkov, Pedro Cuenca, Nathan Lambert, Kashif Rasul, Mishig Davaadorj, Dhruv Nair, Sayak Paul, William Berman, Yiyi Xu, Steven Liu, and Thomas Wolf. Diffusers: State-of-the-art diffusion models, 2022
work page 2022
-
[70]
Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive nlp,
Omar Khattab, Keshav Santhanam, Xiang Lisa Li, David Hall, Percy Liang, Christopher Potts, and Matei Zaharia. Demonstrate-search-predict: Composing retrieval and language models for knowledge-intensive NLP.arXiv preprint arXiv:2212.14024, 2022
-
[71]
Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts
Omar Khattab, Arnav Singhvi, Paridhi Maheshwari, Zhiyuan Zhang, Keshav Santhanam, Sri Vardhamanan, Saiful Haq, Ashutosh Sharma, Thomas T. Joshi, Hanna Moazam, Heather Miller, Matei Zaharia, and Christopher Potts. DSPy: Compiling declarative language model calls into self-improving pipelines. The Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[72]
J. D. Hunter. Matplotlib: A 2D graphics environment.Computing in Science & Engineering, 9 (3):90–95, 2007. doi: 10.1109/MCSE.2007.55
-
[73]
Plotly Technologies Inc. Collaborative data science. https://plot.ly, 2015
work page 2015
-
[74]
ImageNet Large Scale Visual Recognition Challenge,
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C Berg, Li Fei-Fei, O Russakovsky, J Deng, H Su, J Krause, S Satheesh, S Ma, Z Huang, A Karpathy, A Khosla, M Bernstein, A C Berg, and L Fei-Fei. ImageNet Large Scale Visual Recognition Challenge. Inte...
-
[75]
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods, January 2024
Fred Zhang and Neel Nanda. Towards Best Practices of Activation Patching in Language Models: Metrics and Methods, January 2024
work page 2024
-
[76]
Learning multiple layers of features from tiny images
Alex Krizhevsky. Learning multiple layers of features from tiny images. Technical report, University of Toronto, 2009
work page 2009
- [77]
-
[78]
Nikos K. Logothetis. The Underpinnings of the BOLD Functional Magnetic Resonance Imaging Signal.Journal of Neuroscience, 23(10):3963–3971, May 2003. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.23-10-03963.2003. 14
-
[79]
Seong-Gi Kim and Seiji Ogawa. Biophysical and Physiological Origins of Blood Oxygenation Level-Dependent fMRI Signals.Journal of Cerebral Blood Flow & Metabolism, 32(7):1188– 1206, July 2012. ISSN 0271-678X. doi: 10.1038/jcbfm.2012.23
-
[80]
Nikos K. Logothetis and Brian A. Wandell. Interpreting the BOLD Signal.Annual Review of Physiology, 66(V olume 66, 2004):735–769, February 2004. ISSN 0066-4278, 1545-1585. doi: 10.1146/annurev.physiol.66.082602.092845
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.