pith. sign in

arxiv: 2605.20610 · v1 · pith:DXGMSHC2new · submitted 2026-05-20 · 💻 cs.CV · cs.AI

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Pith reviewed 2026-05-21 05:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords mixture of expertsexpert tuningvision modelsanimate-inanimate distinctionmodel interpretabilityrepresentational similaritycontrastive learningexpert specialisation
0
0 comments X

The pith

An animate-inanimate distinction dominates expert partitioning in vision mixture-of-experts models from gating through readout and remains stable across independent trainings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains sparsely-gated convolutional mixture-of-experts models on natural images with a contrastive objective and then probes what each expert actually encodes. It moves from routing statistics to direct measurements of per-expert category separability and tuning via most-exciting inputs. Semantic dimensions drawn from human behavioural data are used to interpret those tunings at a finer grain than categories alone. The central result is that an animate-inanimate split structures the allocation of expertise from the earliest gating decisions onward and reappears reliably when models are retrained from scratch. Routing gives the appearance of sparse categorical preferences, yet expert-level analyses show broader, continuous tuning to visual and semantic features that cross category boundaries.

Core claim

Expert specialisation in vision mixture-of-experts models is dominated by an animate-inanimate distinction that appears from gating through to expert readout and proves stable across independently trained models. Although routing statistics indicate relatively sparse, categorical preferences, the experts themselves exhibit tuning to continuous visual and semantic dimensions that extend beyond category boundaries. Experts achieve similar levels of category separability despite maintaining distinct feature tuning, showing the explanatory gain from moving past category-level descriptions.

What carries the argument

The animate-inanimate distinction that organises expert partitioning, tracked by combining gating analysis with per-expert category separability, most-exciting-input tuning, semantic-dimension interpretation, and cross-model representational similarity.

If this is right

  • Expert specialisation involves continuous feature tuning that crosses category lines rather than rigid category assignment.
  • The animate-inanimate structure emerges reliably from the training process on natural images regardless of initialisation.
  • Comparable category separability can be achieved through distinct underlying tunings across different experts.
  • Analyses limited to routing statistics miss the graded visual and semantic dimensions that experts actually use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Tools for measuring fine-grained tuning could be applied to other modular vision architectures to check whether similar continuous dimensions appear.
  • If the animate-inanimate axis proves general, it may indicate an organisational bias that vision models acquire from natural-image statistics.
  • Disrupting this distinction during training and measuring effects on downstream tasks would test its functional importance.

Load-bearing premise

That semantic dimensions derived from human behavioural judgements on object similarities supply a valid basis for interpreting the tuning of individual model experts.

What would settle it

Repeated training runs that show no consistent animate-inanimate separation in gating weights, expert activations, or most-exciting inputs would falsify the claim that this distinction dominates and stabilises expert partitioning.

Figures

Figures reproduced from arXiv: 2605.20610 by Gene Tangtartharakul, Katherine R. Storrs.

Figure 1
Figure 1. Figure 1: Overview of the model architecture. The input [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Expert specialisation through the lens of gating on the held-out STL10 test set (8k images; [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Images eliciting the highest response magnitudes at the readout layer for each expert. The [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Scatterplot showing the relationship between gating logits and readout activation norms [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Category-separability heatmaps for an example four-expert model. The heatmaps depict [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Consistency of expert specialisations across 10 independently trained model instantiations [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

Mixture-of-Experts (MoE) models are often interpreted by analysing which categories are routed to which experts. However, routing alone does not reveal what each expert actually encodes. We train sparsely-gated convolutional MoE models with a contrastive objective on natural images and characterise expert specialisation using tools from visual neuroscience. Extending from gating-level to expert-level analyses, we measure per-expert category separability, and per-expert tuning using the most exciting inputs. Extending from category-level to feature-level explanations, we interpret tuning via semantic dimensions derived from a dataset of human behavioural judgements (THINGS). Finally, we use tuning and representational similarity analysis to assess the stability of expertise-allocation across independent initialisations. We find that an animate-inanimate distinction dominates expert partitioning, apparent from gating through to expert readout, and is stable across independently trained models. Although routing statistics suggest relatively sparse, categorical preferences, expert analyses reveal broader tuning to continuous visual and semantic dimensions that extend beyond category boundaries. Experts exhibit similar category-separability to one another, despite distinct feature tuning, demonstrating the explanatory benefits of moving beyond category-level analyses. Together, these results show that expert specialisation in vision MoEs extends well beyond category routing and is better understood by probing fine-grained expert-level tuning and representational structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript trains sparsely-gated convolutional Mixture-of-Experts models with a contrastive objective on natural images and characterises expert specialisation using visual neuroscience tools. Analyses extend from gating statistics to per-expert category separability and tuning measured via most-exciting inputs, interpreted through semantic dimensions extracted from the THINGS dataset of human behavioural judgements. Stability of expertise allocation is assessed via tuning and representational similarity analysis across independent initialisations. The central claims are that an animate-inanimate distinction dominates expert partitioning from gating through readout and remains stable across models, while routing appears sparse and categorical but expert tuning is broader and extends to continuous visual and semantic dimensions beyond category boundaries.

Significance. If the results hold, the work provides a valuable bridge between MoE interpretability and visual neuroscience methods, showing that category routing alone is insufficient and that expert-level tuning analyses reveal richer structure. The stability finding across initialisations and the demonstration that experts can share category separability while differing in feature tuning are useful for both theory and practical MoE design. The approach of using most-exciting inputs and THINGS dimensions is a strength when properly validated.

major comments (2)
  1. The interpretation that expert tuning extends to continuous visual and semantic dimensions beyond category boundaries rests on semantic dimensions derived from the THINGS human behavioural judgements dataset. The manuscript should supply direct evidence (e.g., comparison of THINGS axes to model-derived embeddings or ablation of the interpretation) that these dimensions align with the features actually encoded by the contrastively trained convolutional experts rather than imposing an external human similarity ontology. Without such validation the central move from routing statistics to expert-level tuning claims is weakened.
  2. §5 (stability analysis): The claim that the animate-inanimate distinction is stable across independently trained models is load-bearing for the robustness conclusion. The text should report the exact number of independent runs, the specific representational similarity metric employed, variance in routing preferences, and any statistical tests. The current description leaves these details underspecified, making it difficult to assess the strength of the stability result.
minor comments (2)
  1. Abstract: Adding at least one quantitative anchor (e.g., mean category separability across experts or average correlation with THINGS dimensions) would give readers an immediate sense of effect size and support the qualitative claims of dominance and broader tuning.
  2. Methods section: Provide the precise number of experts, gating temperature or sparsity schedule, and contrastive loss hyperparameters to facilitate exact reproduction of the reported routing and tuning behaviours.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and strengthen the manuscript. We address each major point below and will revise accordingly.

read point-by-point responses
  1. Referee: The interpretation that expert tuning extends to continuous visual and semantic dimensions beyond category boundaries rests on semantic dimensions derived from the THINGS human behavioural judgements dataset. The manuscript should supply direct evidence (e.g., comparison of THINGS axes to model-derived embeddings or ablation of the interpretation) that these dimensions align with the features actually encoded by the contrastively trained convolutional experts rather than imposing an external human similarity ontology. Without such validation the central move from routing statistics to expert-level tuning claims is weakened.

    Authors: We agree that direct validation would strengthen the link between THINGS dimensions and model features. In the revised manuscript we will add a comparison of the THINGS semantic axes against the leading principal components of per-expert activation vectors computed on the same image set, together with a quantitative alignment metric (e.g., canonical correlation). We will also include a brief ablation that substitutes model-derived dimensions for the THINGS axes and re-evaluates the reported tuning patterns. These additions will demonstrate that the continuous dimensions reflect structure present in the contrastively trained experts rather than an external ontology alone. revision: yes

  2. Referee: §5 (stability analysis): The claim that the animate-inanimate distinction is stable across independently trained models is load-bearing for the robustness conclusion. The text should report the exact number of independent runs, the specific representational similarity metric employed, variance in routing preferences, and any statistical tests. The current description leaves these details underspecified, making it difficult to assess the strength of the stability result.

    Authors: We thank the referee for highlighting the missing methodological details. The stability results were obtained from five independent training runs that differed only in random seed. Representational similarity was measured with cosine similarity between expert tuning vectors (most-exciting-input embeddings). In the revision we will explicitly state the number of runs, report the observed variance in routing preferences across runs, name the similarity metric, and include statistical support (permutation tests on the animate-inanimate separability scores) to quantify the reliability of the stability finding. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core claims about animate-inanimate dominance in expert partitioning and broader tuning to continuous dimensions are derived from empirical analyses: routing statistics, per-expert category separability, most-exciting inputs, and representational similarity, all interpreted using external tools from visual neuroscience and the independent THINGS human behavioural dataset. No steps reduce by construction to self-defined quantities, fitted parameters renamed as predictions, or load-bearing self-citations. The stability assessment across independent initialisations and the move from category-level to feature-level explanations rely on standard methods applied to model outputs rather than internal redefinitions. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions from machine learning and visual neuroscience without introducing new free parameters or invented entities; the central claims depend on the applicability of existing tools to MoE activations.

axioms (1)
  • domain assumption Tools from visual neuroscience, including category separability and most-exciting-input analysis, can be meaningfully applied to interpret activations in artificial neural network experts.
    Invoked when extending gating-level analyses to expert-level tuning and representational similarity.

pith-pipeline@v0.9.0 · 5763 in / 1263 out tokens · 48398 ms · 2026-05-21T05:52:26.850435+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 5 internal anchors

  1. [1]

    Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

    William Fedus, Barret Zoph, and Noam Shazeer. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022. ISSN 1533-7928. URL http://jmlr.org/papers/v23/21-0998. html

  2. [2]

    Adaptive Mixtures of Local Experts

    Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive Mixtures of Local Experts.Neural Computation, 3(1):79–87, 1991. ISSN 1530-888X. doi: 10.1162/neco.1991.3.1.79

  3. [3]

    Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, January 2017. URLhttp://arxiv.org/abs/1701.06538. arXiv:1701.06538 [cs]

  4. [4]

    Jordan and Robert A

    Michael I. Jordan and Robert A. Jacobs. Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation, 6(2):181–214, March 1994. ISSN 0899-7667. doi: 10.1162/neco.1994.6. 2.181. URLhttps://ieeexplore.ieee.org/abstract/document/6796382

  5. [5]

    A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications, January 2026

    Siyuan Mu and Sen Lin. A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications, January 2026. URL http://arxiv.org/abs/2503.07137. arXiv:2503.07137 [cs]

  6. [6]

    ViMoE: An Empirical Study of Designing Vision Mixture-of- Experts, November 2024

    Xumeng Han, Longhui Wei, Zhiyang Dou, Zipeng Wang, Chenhui Qiang, Xin He, Yingfei Sun, Zhenjun Han, and Qi Tian. ViMoE: An Empirical Study of Designing Vision Mixture-of- Experts, November 2024. URL http://arxiv.org/abs/2410.15732. arXiv:2410.15732 [cs]

  7. [7]

    Scaling Vision with Sparse Mixture of Experts, June 2021

    Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, An- dré Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling Vision with Sparse Mixture of Experts, June 2021. URLhttp://arxiv.org/abs/2106.05974. arXiv:2106.05974 [cs]

  8. [8]

    Scaling Vision-Language Models with Sparse Mixture of Experts, March 2023

    Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, and Yuxiong He. Scaling Vision-Language Models with Sparse Mixture of Experts, March 2023. URLhttp://arxiv. org/abs/2303.07226. arXiv:2303.07226 [cs]

  9. [9]

    Wensheng Gan, Zhenyao Ning, Zhenlian Qi, and Philip S. Yu. Mixture of Experts (MoE): A Big Data Perspective, January 2025. URL http://arxiv.org/abs/2501.16352. arXiv:2501.16352 [cs]

  10. [10]

    Mixture of Experts Made Intrinsically Interpretable

    Xingyi Yang, Constantin Venhoff, and Ashkan Khakzar. Mixture of Experts Made Intrinsically Interpretable. May 2025

  11. [11]

    Marius Zöllner

    Svetlana Pavlitska, Christian Hubschneider, Lukas Struppek, and J. Marius Zöllner. Sparsely- gated Mixture-of-Expert Layers for CNN Interpretability. In2023 International Joint Con- ference on Neural Networks (IJCNN), pages 1–10, June 2023. doi: 10.1109/IJCNN54540. 2023.10191904. URL https://ieeexplore.ieee.org/document/10191904. ISSN: 2161- 4407

  12. [12]

    Multi- modal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts, June 2022

    Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, and Neil Houlsby. Multi- modal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts, June 2022. URLhttp://arxiv.org/abs/2206.02770. arXiv:2206.02770 [cs]

  13. [13]

    MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models, January 2026

    Dianyi Wang, Siyuan Wang, Zejun Li, Yikun Wang, Yitong Li, Duyu Tang, Xiaoyu Shen, Xuanjing Huang, and Zhongyu Wei. MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models, January 2026. URL http://arxiv.org/abs/2508.09779. arXiv:2508.09779 [cs]

  14. [14]

    Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities, April 2025

    Raman Dutt, Harleen Hanspal, Guoxuan Xia, Petru-Daniel Tudosiu, Alexander Black, Yongxin Yang, Steven McDonagh, and Sarah Parisot. Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities, April 2025. URL http://arxiv.org/abs/2503. 22517. arXiv:2503.22517 [cs]. 11

  15. [15]

    Mixture of Experts in Image Classification: What’s the Sweet Spot?, October 2025

    Mathurin Videau, Alessandro Leite, Marc Schoenauer, and Olivier Teytaud. Mixture of Experts in Image Classification: What’s the Sweet Spot?, October 2025. URL http://arxiv.org/ abs/2411.18322. arXiv:2411.18322 [cs]

  16. [16]

    Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model, April

    Chaoxiang Cai, Longrong Yang, Minghe Weng, Xuewei Li, Zequn Qin, and Xi Li. Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model, April

  17. [17]

    arXiv:2507.01351 [cs]

    URLhttp://arxiv.org/abs/2507.01351. arXiv:2507.01351 [cs]

  18. [18]

    MoE Lens – An Expert Is All You Need, March 2026

    Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, and Shivam Raval. MoE Lens – An Expert Is All You Need, March 2026. URLhttp://arxiv.org/abs/2603.05806. arXiv:2603.05806 [cs]

  19. [19]

    A Closer Look into Mixture-of- Experts in Large Language Models, June 2025

    Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, and Jie Fu. A Closer Look into Mixture-of- Experts in Large Language Models, June 2025. URL http://arxiv.org/abs/2406.18219. arXiv:2406.18219 [cs]

  20. [20]

    Probing Semantic Routing in Large Mixture-of-Expert Models

    Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Man Luo, Sungduk Yu, Chendi Xue, and Vasudev Lal. Probing Semantic Routing in Large Mixture-of-Expert Models. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18263–18278, Suzhou, China, ...

  21. [21]

    URL https://aclanthology.org/2025

    doi: 10.18653/v1/2025.findings-emnlp.991. URL https://aclanthology.org/2025. findings-emnlp.991/

  22. [22]

    Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms, September 2025

    Jiahao Ying, Mingbao Lin, Qianru Sun, and Yixin Cao. Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms, September 2025. URL http:// arxiv.org/abs/2509.23933. arXiv:2509.23933 [cs]

  23. [23]

    Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts, September 2025

    Strahinja Nikolic, Ilker Oguz, and Demetri Psaltis. Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts, September 2025. URL http://arxiv. org/abs/2509.10025. arXiv:2509.10025 [cs]

  24. [24]

    Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models, December 2024

    Elie Antoine, Frédéric Béchet, and Philippe Langlais. Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models, December 2024. URL http://arxiv.org/abs/2412.16971. arXiv:2412.16971 [cs]

  25. [25]

    ST-MoE: Designing Stable and Transferable Sparse Expert Models, May

    Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. ST-MoE: Designing Stable and Transferable Sparse Expert Models, May

  26. [26]

    ST-MoE: Designing Stable and Transferable Sparse Expert Models

    URLhttp://arxiv.org/abs/2202.08906. arXiv:2202.08906 [cs]

  27. [27]

    OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models, March 2024

    Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, and Yang You. OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models, March 2024. URLhttp://arxiv.org/abs/2402.01739. arXiv:2402.01739 [cs]

  28. [28]

    Marius Zöllner

    Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, and J. Marius Zöllner. Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation, April

  29. [29]
  30. [30]

    D. J. McKeefry and S. Zeki. The position and topography of the human colour centre as revealed by functional magnetic resonance imaging.Brain: A Journal of Neurology, 120 ( Pt 12):2229–2242, December 1997. ISSN 0006-8950. doi: 10.1093/brain/120.12.2229

  31. [31]

    Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.Proceedings of the National Academy of Sciences, 92(18):8135–8139, August 1995

    R Malach, J B Reppas, R R Benson, K K Kwong, H Jiang, W A Kennedy, P J Ledden, T J Brady, B R Rosen, and R B Tootell. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.Proceedings of the National Academy of Sciences, 92(18):8135–8139, August 1995. doi: 10.1073/pnas.92.18.8135. URL https://www.pnas. org/doi...

  32. [32]

    Cue-Invariant Activation in Object-Related Areas of the Human Occipital Lobe.Neuron, 21 (1):191–202, July 1998

    Kalanit Grill-Spector, Tamar Kushnir, Shimon Edelman, Yacov Itzchak, and Rafael Malach. Cue-Invariant Activation in Object-Related Areas of the Human Occipital Lobe.Neuron, 21 (1):191–202, July 1998. ISSN 0896-6273. doi: 10.1016/S0896-6273(00)80526-7. URL https://www.sciencedirect.com/science/article/pii/S0896627300805267. 12

  33. [33]

    Nancy Kanwisher and Galit Yovel. The fusiform face area: a cortical region specialized for the perception of faces.Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109–2128, December 2006. ISSN 0962-8436. doi: 10.1098/rstb.2006.1934. URL https://pmc.ncbi.nlm.nih.gov/articles/PMC1857737/

  34. [34]

    Nancy Kanwisher, Josh McDermott, and Marvin M. Chun. The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception.Journal of Neuroscience, 17(11): 4302–4311, June 1997. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.17-11-04302

  35. [35]

    URLhttps://www.jneurosci.org/content/17/11/4302

  36. [36]

    A cortical representation of the local visual environment

    Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392(6676):598–601, April 1998. ISSN 1476-4687. doi: 10.1038/33402. URL https: //www.nature.com/articles/33402

  37. [37]

    Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher

    Paul E. Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher. A Cortical Area Selective for Visual Processing of the Human Body.Science, 293(5539):2470–2473, September

  38. [38]

    URL https://www.science.org/doi/10.1126/ science.1063414

    doi: 10.1126/science.1063414. URL https://www.science.org/doi/10.1126/ science.1063414

  39. [39]

    Origins of the specialization for letters and numbers in ventral occipitotemporal cortex

    Thomas Hannagan, Amir Amedi, Laurent Cohen, Ghislaine Dehaene-Lambertz, and Stanislas Dehaene. Origins of the specialization for letters and numbers in ventral occipitotemporal cortex. Trends in Cognitive Sciences, 19(7):374–382, July 2015. ISSN 1364-6613, 1879-307X. doi: 10. 1016/j.tics.2015.05.006. URL https://www.cell.com/trends/cognitive-sciences/ abs...

  40. [40]

    Aliette Lochy, Corentin Jacques, Louis Maillard, Sophie Colnat-Coulbois, Bruno Rossion, and Jacques Jonas. Selective visual representation of letters and words in the left ventral occipito-temporal cortex with intracerebral recordings.Proceedings of the National Academy of Sciences, 115(32):E7595–E7604, August 2018. doi: 10.1073/pnas.1718987115. URL https...

  41. [41]

    Baker, and Martin N

    Oliver Contier, Chris I. Baker, and Martin N. Hebart. Distributed representations of behaviour- derived object dimensions in the human visual system.Nature Human Behaviour, 8(11): 2179–2193, November 2024. ISSN 2397-3374. doi: 10.1038/s41562-024-01980-y. URL https://www.nature.com/articles/s41562-024-01980-y

  42. [42]

    van Dyck, Martin N

    Leonard E. van Dyck, Martin N. Hebart, and Katharina Dobs. Multidimensional feature tuning in category-selective areas of human visual cortex, June 2025. URL https://www.biorxiv. org/content/10.1101/2025.06.17.659578v2. Pages: 2025.06.17.659578 Section: New Results

  43. [43]

    Visual feature processing in a large stroke cohort: evidence against modular organization.Brain, 148(4):1144–1154, April 2025

    Selma Lugtmeijer, Aleksandra M Sobolewska, Edward H F De Haan, and H Steven Scholte. Visual feature processing in a large stroke cohort: evidence against modular organization.Brain, 148(4):1144–1154, April 2025. ISSN 0006-8950, 1460-2156. doi: 10.1093/brain/awaf009. URLhttps://academic.oup.com/brain/article/148/4/1144/7952043

  44. [44]

    Brendan Ritchie, Susan G

    J. Brendan Ritchie, Susan G. Wardle, Maryam Vaziri-Pashkam, Dwight J. Kravitz, and Chris I. Baker. Rethinking category-selectivity in human visual cortex.Cognitive Neuroscience, 17(2):49–76, April 2026. ISSN 1758-8928. doi: 10.1080/17588928. 2025.2543890. URL https://doi.org/10.1080/17588928.2025.2543890. _eprint: https://doi.org/10.1080/17588928.2025.2543890

  45. [45]

    Hebart, Adam H

    Martin N. Hebart, Adam H. Dickter, Alexis Kidder, Wan Y . Kwok, Anna Corriveau, Caitlin Van Wicklin, and Chris I. Baker. THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images.PLOS ONE, 14(10):e0223792, October 2019. ISSN 1932-

  46. [46]

    URL https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0223792

    doi: 10.1371/journal.pone.0223792. URL https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0223792

  47. [47]

    Hebart, Charles Y

    Martin N. Hebart, Charles Y . Zheng, Francisco Pereira, and Chris I. Baker. Revealing the multidimensional mental representations of natural objects underlying human similarity judgements.Nature Human Behaviour, 4(11):1173–1185, November 2020. ISSN 2397-

  48. [48]

    URL https://www.nature.com/articles/ s41562-020-00951-3

    doi: 10.1038/s41562-020-00951-3. URL https://www.nature.com/articles/ s41562-020-00951-3. 13

  49. [49]

    Residual Mixture of Experts, October 2022

    Lemeng Wu, Mengchen Liu, Yinpeng Chen, Dongdong Chen, Xiyang Dai, and Lu Yuan. Residual Mixture of Experts, October 2022. URL http://arxiv.org/abs/2204.09636. arXiv:2204.09636 [cs]

  50. [50]

    HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection

    Vadim Vashkelis and Natalia Trukhina. HI-MoE: Hierarchical Instance-Conditioned Mixture- of-Experts for Object Detection, April 2026. URL http://arxiv.org/abs/2604.04908. arXiv:2604.04908 [cs]

  51. [51]

    E. D. Adrian and D. W. Bronk. The discharge of impulses in motor nerve fibres.The Journal of Physiology, 66(1):81–101, 1928. ISSN 1469-7793. doi: 10.1113/jphysiol.1928.sp002509. URL https://onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.1928.sp002509. _eprint: https://physoc.onlinelibrary.wiley.com/doi/pdf/10.1113/jphysiol.1928.sp002509

  52. [52]

    H. K. Hartline. The response of single optic nerve fibers of the vertebrate eye to illumination of the retina.American Journal of Physiology-Legacy Content, 121(2):400–415, January

  53. [53]

    doi: 10.1152/ajplegacy.1938.121.2.400

    ISSN 0002-9513. doi: 10.1152/ajplegacy.1938.121.2.400. URL https://journals. physiology.org/doi/abs/10.1152/ajplegacy.1938.121.2.400

  54. [54]

    Walker, Fabian H

    Edgar Y . Walker, Fabian H. Sinz, Erick Cobos, Taliah Muhammad, Emmanouil Froudarakis, Paul G. Fahey, Alexander S. Ecker, Jacob Reimer, Xaq Pitkow, and Andreas S. Tolias. Inception loops discover what excites neurons most using deep predictive models.Nature Neuroscience, 22(12):2060–2065, December 2019. ISSN 1546-1726. doi: 10.1038/s41593-019-0517-x. URL ...

  55. [55]

    Neural tuning and representational geom- etry.Nature Reviews Neuroscience, 22(11):703–718, November 2021

    Nikolaus Kriegeskorte and Xue-Xin Wei. Neural tuning and representational geom- etry.Nature Reviews Neuroscience, 22(11):703–718, November 2021. ISSN 1471-

  56. [56]

    URL https://www.nature.com/articles/ s41583-021-00502-3

    doi: 10.1038/s41583-021-00502-3. URL https://www.nature.com/articles/ s41583-021-00502-3

  57. [57]

    Nikolaus Kriegeskorte and Rogier A. Kievit. Representational geometry: integrating cognition, computation, and the brain.Trends in Cognitive Sciences, 17(8):401–412, August 2013. ISSN 1364-6613. doi: 10.1016/j.tics.2013.06.007. URL https://pmc.ncbi.nlm.nih.gov/ articles/PMC3730178/

  58. [58]

    S. E. Petersen, P. T. Fox, M. I. Posner, M. Mintun, and M. E. Raichle. Positron emission tomographic studies of the cortical anatomy of single-word processing.Nature, 331(6157): 585–589, February 1988. ISSN 1476-4687. doi: 10.1038/331585a0. URL https://www. nature.com/articles/331585a0

  59. [59]

    The unique role of the visual word form area in reading.Trends in Cognitive Sciences, 15(6):254–262, June 2011

    Stanislas Dehaene and Laurent Cohen. The unique role of the visual word form area in reading.Trends in Cognitive Sciences, 15(6):254–262, June 2011. ISSN 1364-6613. doi: 10.1016/j.tics.2011.04.003. URL https://www.sciencedirect.com/science/article/ pii/S1364661311000738

  60. [60]

    Rice, David M

    Grace E. Rice, David M. Watson, Tom Hartley, and Timothy J. Andrews. Low-Level Image Properties of Visual Objects Predict Patterns of Neural Response across Category-Selective Regions of the Ventral Visual Pathway.Journal of Neuroscience, 34(26):8837–8844, June

  61. [61]

    doi: 10.1523/JNEUROSCI.5265-13.2014

    ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.5265-13.2014. URL https: //www.jneurosci.org/content/34/26/8837

  62. [62]

    Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas.Neuron, 34(3):479– 490, April 2002

    Uri Hasson, Ifat Levy, Marlene Behrmann, Talma Hendler, and Rafael Malach. Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas.Neuron, 34(3):479– 490, April 2002. ISSN 0896-6273. doi: 10.1016/S0896-6273(02)00662-1. URL https: //www.sciencedirect.com/science/article/pii/S0896627302006621

  63. [63]

    Arcaro, Stephanie A

    Michael J. Arcaro, Stephanie A. McMains, Benjamin D. Singer, and Sabine Kastner. Retinotopic Organization of Human Ventral Visual Cortex.Journal of Neuroscience, 29(34):10638–10652, August 2009. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.2807-09.2009. URL https://www.jneurosci.org/content/29/34/10638

  64. [64]

    Bria Long, Chen-Ping Yu, and Talia Konkle. Mid-level visual features underlie the high- level categorical organization of the ventral stream.Proceedings of the National Academy of Sciences, 115(38):E9015–E9024, September 2018. doi: 10.1073/pnas.1719616115. URL https://www.pnas.org/doi/full/10.1073/pnas.1719616115. 14

  65. [65]

    The nature of the animacy organization in human ventral temporal cortex.eLife, 8:e47142, September 2019

    Sushrut Thorat, Daria Proklova, and Marius V Peelen. The nature of the animacy organization in human ventral temporal cortex.eLife, 8:e47142, September 2019. ISSN 2050-084X. doi: 10.7554/eLife.47142. URLhttps://doi.org/10.7554/eLife.47142

  66. [66]

    Tripartite Organization of the Ventral Stream by Animacy and Object Size.Journal of Neuroscience, 33(25):10235–10242, June 2013

    Talia Konkle and Alfonso Caramazza. Tripartite Organization of the Ventral Stream by Animacy and Object Size.Journal of Neuroscience, 33(25):10235–10242, June 2013. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.0983-13.2013. URL https://www.jneurosci.org/ content/33/25/10235

  67. [67]

    Deep Residual Learning for Image Recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, December 2015. URL http://arxiv.org/abs/1512.03385. arXiv:1512.03385 [cs]

  68. [68]

    Investigating the Benefits of Projection Head for Representation Learning, March 2024

    Yihao Xue, Eric Gan, Jiayi Ni, Siddharth Joshi, and Baharan Mirzasoleiman. Investigating the Benefits of Projection Head for Representation Learning, March 2024. URL http://arxiv. org/abs/2403.11391. arXiv:2403.11391 [cs]

  69. [69]

    Prince, George A

    Jacob S. Prince, George A. Alvarez, and Talia Konkle. Contrastive learning explains the emer- gence and function of visual category-selective regions.Science Advances, 10(39):eadl1776, September 2024. doi: 10.1126/sciadv.adl1776. URL https://www.science.org/doi/10. 1126/sciadv.adl1776

  70. [70]

    An Analysis of Single-Layer Networks in Unsupervised Feature Learning

    Adam Coates, Honglak Lee, and Andrew Y Ng. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. 2011

  71. [71]

    Xiao-Xiong Lin, Andreas Nieder, and Simon N. Jacob. The neuronal implementation of representational geometry in primate prefrontal cortex.Science Advances, 9(50):eadh8685, December 2023. doi: 10.1126/sciadv.adh8685. URL https://www.science.org/doi/10. 1126/sciadv.adh8685

  72. [72]

    Martin, Rhodri Cusack, and Stefan Köhler

    Anna Blumenthal, Bobby Stojanoski, Chris B. Martin, Rhodri Cusack, and Stefan Köhler. Animacy and real-world size shape object representations in the human medial temporal lobes. Human Brain Mapping, 39(9):3779–3792, June 2018. ISSN 1065-9471. doi: 10.1002/hbm. 24212. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC6866524/

  73. [73]

    Spoerer, Emer C

    Johannes Mehrer, Courtney J. Spoerer, Emer C. Jones, Nikolaus Kriegeskorte, and Tim C. Kietzmann. An ecologically motivated image dataset for deep learning yields better models of human vision.Proceedings of the National Academy of Sciences, 118(8):e2011417118, February 2021. doi: 10.1073/pnas.2011417118. URL https://www.pnas.org/doi/10. 1073/pnas.2011417...