Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Gene Tangtartharakul; Katherine R. Storrs

arxiv: 2605.20610 · v1 · pith:DXGMSHC2new · submitted 2026-05-20 · 💻 cs.CV · cs.AI

Beyond Routing: Characterising Expert Tuning and Representation in Vision Mixture-of-Experts

Gene Tangtartharakul , Katherine R. Storrs This is my paper

Pith reviewed 2026-05-21 05:52 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords mixture of expertsexpert tuningvision modelsanimate-inanimate distinctionmodel interpretabilityrepresentational similaritycontrastive learningexpert specialisation

0 comments

The pith

An animate-inanimate distinction dominates expert partitioning in vision mixture-of-experts models from gating through readout and remains stable across independent trainings.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper trains sparsely-gated convolutional mixture-of-experts models on natural images with a contrastive objective and then probes what each expert actually encodes. It moves from routing statistics to direct measurements of per-expert category separability and tuning via most-exciting inputs. Semantic dimensions drawn from human behavioural data are used to interpret those tunings at a finer grain than categories alone. The central result is that an animate-inanimate split structures the allocation of expertise from the earliest gating decisions onward and reappears reliably when models are retrained from scratch. Routing gives the appearance of sparse categorical preferences, yet expert-level analyses show broader, continuous tuning to visual and semantic features that cross category boundaries.

Core claim

Expert specialisation in vision mixture-of-experts models is dominated by an animate-inanimate distinction that appears from gating through to expert readout and proves stable across independently trained models. Although routing statistics indicate relatively sparse, categorical preferences, the experts themselves exhibit tuning to continuous visual and semantic dimensions that extend beyond category boundaries. Experts achieve similar levels of category separability despite maintaining distinct feature tuning, showing the explanatory gain from moving past category-level descriptions.

What carries the argument

The animate-inanimate distinction that organises expert partitioning, tracked by combining gating analysis with per-expert category separability, most-exciting-input tuning, semantic-dimension interpretation, and cross-model representational similarity.

If this is right

Expert specialisation involves continuous feature tuning that crosses category lines rather than rigid category assignment.
The animate-inanimate structure emerges reliably from the training process on natural images regardless of initialisation.
Comparable category separability can be achieved through distinct underlying tunings across different experts.
Analyses limited to routing statistics miss the graded visual and semantic dimensions that experts actually use.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Tools for measuring fine-grained tuning could be applied to other modular vision architectures to check whether similar continuous dimensions appear.
If the animate-inanimate axis proves general, it may indicate an organisational bias that vision models acquire from natural-image statistics.
Disrupting this distinction during training and measuring effects on downstream tasks would test its functional importance.

Load-bearing premise

That semantic dimensions derived from human behavioural judgements on object similarities supply a valid basis for interpreting the tuning of individual model experts.

What would settle it

Repeated training runs that show no consistent animate-inanimate separation in gating weights, expert activations, or most-exciting inputs would falsify the claim that this distinction dominates and stabilises expert partitioning.

Figures

Figures reproduced from arXiv: 2605.20610 by Gene Tangtartharakul, Katherine R. Storrs.

**Figure 2.** Figure 2: Expert specialisation through the lens of gating on the held-out STL10 test set (8k images; [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Images eliciting the highest response magnitudes at the readout layer for each expert. The [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Scatterplot showing the relationship between gating logits and readout activation norms [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Category-separability heatmaps for an example four-expert model. The heatmaps depict [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Consistency of expert specialisations across 10 independently trained model instantiations [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

read the original abstract

Mixture-of-Experts (MoE) models are often interpreted by analysing which categories are routed to which experts. However, routing alone does not reveal what each expert actually encodes. We train sparsely-gated convolutional MoE models with a contrastive objective on natural images and characterise expert specialisation using tools from visual neuroscience. Extending from gating-level to expert-level analyses, we measure per-expert category separability, and per-expert tuning using the most exciting inputs. Extending from category-level to feature-level explanations, we interpret tuning via semantic dimensions derived from a dataset of human behavioural judgements (THINGS). Finally, we use tuning and representational similarity analysis to assess the stability of expertise-allocation across independent initialisations. We find that an animate-inanimate distinction dominates expert partitioning, apparent from gating through to expert readout, and is stable across independently trained models. Although routing statistics suggest relatively sparse, categorical preferences, expert analyses reveal broader tuning to continuous visual and semantic dimensions that extend beyond category boundaries. Experts exhibit similar category-separability to one another, despite distinct feature tuning, demonstrating the explanatory benefits of moving beyond category-level analyses. Together, these results show that expert specialisation in vision MoEs extends well beyond category routing and is better understood by probing fine-grained expert-level tuning and representational structure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Routing stats miss the picture here: animate-inanimate splits dominate expert partitioning in these vision MoEs and hold across runs, while tuning reaches continuous dimensions.

read the letter

The main thing to know is that routing statistics understate how vision MoE experts actually work. The authors find a dominant animate-inanimate split that runs from the gates to the expert readouts and stays stable across different training runs. Expert analyses show tuning to continuous dimensions rather than just categorical preferences. What is new is the use of most-exciting inputs to probe individual experts and then interpreting those with semantic dimensions from the THINGS human judgement dataset. This lets them move past category-level routing to feature-level descriptions. They also show that experts have comparable category separability despite different tunings, and they check consistency across initializations. That combination of neuroscience tools with MoE analysis is a reasonable extension. The approach has some merit in highlighting why deeper probes matter. The soft spot is the THINGS mapping. Human behavioural dimensions may not capture the model's learned features accurately, especially since the training is contrastive on images. If the axes are imposed rather than discovered from the model, the claim about broader continuous tuning could rest on shaky ground. The abstract lacks any quantitative results or error estimates, so it's difficult to assess how clear the effects really are. This is for people studying model interpretability in sparse architectures or trying to connect ML representations to perceptual dimensions. A reader focused on vision MoEs or neuroscience-inspired analysis would get the most out of it. The paper has enough of a core idea to warrant a serious referee. I would send it for peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript trains sparsely-gated convolutional Mixture-of-Experts models with a contrastive objective on natural images and characterises expert specialisation using visual neuroscience tools. Analyses extend from gating statistics to per-expert category separability and tuning measured via most-exciting inputs, interpreted through semantic dimensions extracted from the THINGS dataset of human behavioural judgements. Stability of expertise allocation is assessed via tuning and representational similarity analysis across independent initialisations. The central claims are that an animate-inanimate distinction dominates expert partitioning from gating through readout and remains stable across models, while routing appears sparse and categorical but expert tuning is broader and extends to continuous visual and semantic dimensions beyond category boundaries.

Significance. If the results hold, the work provides a valuable bridge between MoE interpretability and visual neuroscience methods, showing that category routing alone is insufficient and that expert-level tuning analyses reveal richer structure. The stability finding across initialisations and the demonstration that experts can share category separability while differing in feature tuning are useful for both theory and practical MoE design. The approach of using most-exciting inputs and THINGS dimensions is a strength when properly validated.

major comments (2)

The interpretation that expert tuning extends to continuous visual and semantic dimensions beyond category boundaries rests on semantic dimensions derived from the THINGS human behavioural judgements dataset. The manuscript should supply direct evidence (e.g., comparison of THINGS axes to model-derived embeddings or ablation of the interpretation) that these dimensions align with the features actually encoded by the contrastively trained convolutional experts rather than imposing an external human similarity ontology. Without such validation the central move from routing statistics to expert-level tuning claims is weakened.
§5 (stability analysis): The claim that the animate-inanimate distinction is stable across independently trained models is load-bearing for the robustness conclusion. The text should report the exact number of independent runs, the specific representational similarity metric employed, variance in routing preferences, and any statistical tests. The current description leaves these details underspecified, making it difficult to assess the strength of the stability result.

minor comments (2)

Abstract: Adding at least one quantitative anchor (e.g., mean category separability across experts or average correlation with THINGS dimensions) would give readers an immediate sense of effect size and support the qualitative claims of dominance and broader tuning.
Methods section: Provide the precise number of experts, gating temperature or sparsity schedule, and contrastive loss hyperparameters to facilitate exact reproduction of the reported routing and tuning behaviours.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to clarify and strengthen the manuscript. We address each major point below and will revise accordingly.

read point-by-point responses

Referee: The interpretation that expert tuning extends to continuous visual and semantic dimensions beyond category boundaries rests on semantic dimensions derived from the THINGS human behavioural judgements dataset. The manuscript should supply direct evidence (e.g., comparison of THINGS axes to model-derived embeddings or ablation of the interpretation) that these dimensions align with the features actually encoded by the contrastively trained convolutional experts rather than imposing an external human similarity ontology. Without such validation the central move from routing statistics to expert-level tuning claims is weakened.

Authors: We agree that direct validation would strengthen the link between THINGS dimensions and model features. In the revised manuscript we will add a comparison of the THINGS semantic axes against the leading principal components of per-expert activation vectors computed on the same image set, together with a quantitative alignment metric (e.g., canonical correlation). We will also include a brief ablation that substitutes model-derived dimensions for the THINGS axes and re-evaluates the reported tuning patterns. These additions will demonstrate that the continuous dimensions reflect structure present in the contrastively trained experts rather than an external ontology alone. revision: yes
Referee: §5 (stability analysis): The claim that the animate-inanimate distinction is stable across independently trained models is load-bearing for the robustness conclusion. The text should report the exact number of independent runs, the specific representational similarity metric employed, variance in routing preferences, and any statistical tests. The current description leaves these details underspecified, making it difficult to assess the strength of the stability result.

Authors: We thank the referee for highlighting the missing methodological details. The stability results were obtained from five independent training runs that differed only in random seed. Representational similarity was measured with cosine similarity between expert tuning vectors (most-exciting-input embeddings). In the revision we will explicitly state the number of runs, report the observed variance in routing preferences across runs, name the similarity metric, and include statistical support (permutation tests on the animate-inanimate separability scores) to quantify the reliability of the stability finding. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core claims about animate-inanimate dominance in expert partitioning and broader tuning to continuous dimensions are derived from empirical analyses: routing statistics, per-expert category separability, most-exciting inputs, and representational similarity, all interpreted using external tools from visual neuroscience and the independent THINGS human behavioural dataset. No steps reduce by construction to self-defined quantities, fitted parameters renamed as predictions, or load-bearing self-citations. The stability assessment across independent initialisations and the move from category-level to feature-level explanations rely on standard methods applied to model outputs rather than internal redefinitions. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions from machine learning and visual neuroscience without introducing new free parameters or invented entities; the central claims depend on the applicability of existing tools to MoE activations.

axioms (1)

domain assumption Tools from visual neuroscience, including category separability and most-exciting-input analysis, can be meaningfully applied to interpret activations in artificial neural network experts.
Invoked when extending gating-level analyses to expert-level tuning and representational similarity.

pith-pipeline@v0.9.0 · 5763 in / 1263 out tokens · 48398 ms · 2026-05-21T05:52:26.850435+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We find that an animate—inanimate distinction dominates expert partitioning, apparent from gating through to expert readout, and is stable across independently trained models.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we interpret tuning via semantic dimensions derived from a dataset of human behavioural judgements (THINGS)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 5 internal anchors

[1]

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

William Fedus, Barret Zoph, and Noam Shazeer. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022. ISSN 1533-7928. URL http://jmlr.org/papers/v23/21-0998. html

work page 2022
[2]

Adaptive Mixtures of Local Experts

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive Mixtures of Local Experts.Neural Computation, 3(1):79–87, 1991. ISSN 1530-888X. doi: 10.1162/neco.1991.3.1.79

work page doi:10.1162/neco.1991.3.1.79 1991
[3]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, January 2017. URLhttp://arxiv.org/abs/1701.06538. arXiv:1701.06538 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[4]

Jordan and Robert A

Michael I. Jordan and Robert A. Jacobs. Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation, 6(2):181–214, March 1994. ISSN 0899-7667. doi: 10.1162/neco.1994.6. 2.181. URLhttps://ieeexplore.ieee.org/abstract/document/6796382

work page doi:10.1162/neco.1994.6 1994
[5]

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications, January 2026

Siyuan Mu and Sen Lin. A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications, January 2026. URL http://arxiv.org/abs/2503.07137. arXiv:2503.07137 [cs]

work page arXiv 2026
[6]

ViMoE: An Empirical Study of Designing Vision Mixture-of- Experts, November 2024

Xumeng Han, Longhui Wei, Zhiyang Dou, Zipeng Wang, Chenhui Qiang, Xin He, Yingfei Sun, Zhenjun Han, and Qi Tian. ViMoE: An Empirical Study of Designing Vision Mixture-of- Experts, November 2024. URL http://arxiv.org/abs/2410.15732. arXiv:2410.15732 [cs]

work page arXiv 2024
[7]

Scaling Vision with Sparse Mixture of Experts, June 2021

Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, An- dré Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling Vision with Sparse Mixture of Experts, June 2021. URLhttp://arxiv.org/abs/2106.05974. arXiv:2106.05974 [cs]

work page arXiv 2021
[8]

Scaling Vision-Language Models with Sparse Mixture of Experts, March 2023

Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, and Yuxiong He. Scaling Vision-Language Models with Sparse Mixture of Experts, March 2023. URLhttp://arxiv. org/abs/2303.07226. arXiv:2303.07226 [cs]

work page arXiv 2023
[9]

Wensheng Gan, Zhenyao Ning, Zhenlian Qi, and Philip S. Yu. Mixture of Experts (MoE): A Big Data Perspective, January 2025. URL http://arxiv.org/abs/2501.16352. arXiv:2501.16352 [cs]

work page arXiv 2025
[10]

Mixture of Experts Made Intrinsically Interpretable

Xingyi Yang, Constantin Venhoff, and Ashkan Khakzar. Mixture of Experts Made Intrinsically Interpretable. May 2025

work page 2025
[11]

Marius Zöllner

Svetlana Pavlitska, Christian Hubschneider, Lukas Struppek, and J. Marius Zöllner. Sparsely- gated Mixture-of-Expert Layers for CNN Interpretability. In2023 International Joint Con- ference on Neural Networks (IJCNN), pages 1–10, June 2023. doi: 10.1109/IJCNN54540. 2023.10191904. URL https://ieeexplore.ieee.org/document/10191904. ISSN: 2161- 4407

work page doi:10.1109/ijcnn54540 2023
[12]

Multi- modal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts, June 2022

Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, and Neil Houlsby. Multi- modal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts, June 2022. URLhttp://arxiv.org/abs/2206.02770. arXiv:2206.02770 [cs]

work page arXiv 2022
[13]

MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models, January 2026

Dianyi Wang, Siyuan Wang, Zejun Li, Yikun Wang, Yitong Li, Duyu Tang, Xiaoyu Shen, Xuanjing Huang, and Zhongyu Wei. MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models, January 2026. URL http://arxiv.org/abs/2508.09779. arXiv:2508.09779 [cs]

work page arXiv 2026
[14]

Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities, April 2025

Raman Dutt, Harleen Hanspal, Guoxuan Xia, Petru-Daniel Tudosiu, Alexander Black, Yongxin Yang, Steven McDonagh, and Sarah Parisot. Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities, April 2025. URL http://arxiv.org/abs/2503. 22517. arXiv:2503.22517 [cs]. 11

work page arXiv 2025
[15]

Mixture of Experts in Image Classification: What’s the Sweet Spot?, October 2025

Mathurin Videau, Alessandro Leite, Marc Schoenauer, and Olivier Teytaud. Mixture of Experts in Image Classification: What’s the Sweet Spot?, October 2025. URL http://arxiv.org/ abs/2411.18322. arXiv:2411.18322 [cs]

work page arXiv 2025
[16]

Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model, April

Chaoxiang Cai, Longrong Yang, Minghe Weng, Xuewei Li, Zequn Qin, and Xi Li. Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model, April

work page
[17]

arXiv:2507.01351 [cs]

URLhttp://arxiv.org/abs/2507.01351. arXiv:2507.01351 [cs]

work page arXiv
[18]

MoE Lens – An Expert Is All You Need, March 2026

Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, and Shivam Raval. MoE Lens – An Expert Is All You Need, March 2026. URLhttp://arxiv.org/abs/2603.05806. arXiv:2603.05806 [cs]

work page arXiv 2026
[19]

A Closer Look into Mixture-of- Experts in Large Language Models, June 2025

Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, and Jie Fu. A Closer Look into Mixture-of- Experts in Large Language Models, June 2025. URL http://arxiv.org/abs/2406.18219. arXiv:2406.18219 [cs]

work page arXiv 2025
[20]

Probing Semantic Routing in Large Mixture-of-Expert Models

Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Man Luo, Sungduk Yu, Chendi Xue, and Vasudev Lal. Probing Semantic Routing in Large Mixture-of-Expert Models. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18263–18278, Suzhou, China, ...

work page 2025
[21]

URL https://aclanthology.org/2025

doi: 10.18653/v1/2025.findings-emnlp.991. URL https://aclanthology.org/2025. findings-emnlp.991/

work page doi:10.18653/v1/2025.findings-emnlp.991 2025
[22]

Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms, September 2025

Jiahao Ying, Mingbao Lin, Qianru Sun, and Yixin Cao. Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms, September 2025. URL http:// arxiv.org/abs/2509.23933. arXiv:2509.23933 [cs]

work page arXiv 2025
[23]

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts, September 2025

Strahinja Nikolic, Ilker Oguz, and Demetri Psaltis. Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts, September 2025. URL http://arxiv. org/abs/2509.10025. arXiv:2509.10025 [cs]

work page arXiv 2025
[24]

Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models, December 2024

Elie Antoine, Frédéric Béchet, and Philippe Langlais. Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models, December 2024. URL http://arxiv.org/abs/2412.16971. arXiv:2412.16971 [cs]

work page arXiv 2024
[25]

ST-MoE: Designing Stable and Transferable Sparse Expert Models, May

Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. ST-MoE: Designing Stable and Transferable Sparse Expert Models, May

work page
[26]

ST-MoE: Designing Stable and Transferable Sparse Expert Models

URLhttp://arxiv.org/abs/2202.08906. arXiv:2202.08906 [cs]

work page internal anchor Pith review Pith/arXiv arXiv
[27]

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models, March 2024

Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, and Yang You. OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models, March 2024. URLhttp://arxiv.org/abs/2402.01739. arXiv:2402.01739 [cs]

work page arXiv 2024
[28]

Marius Zöllner

Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, and J. Marius Zöllner. Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation, April

work page
[29]

Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation

URLhttp://arxiv.org/abs/2604.13761. arXiv:2604.13761 [cs]

work page internal anchor Pith review Pith/arXiv arXiv
[30]

D. J. McKeefry and S. Zeki. The position and topography of the human colour centre as revealed by functional magnetic resonance imaging.Brain: A Journal of Neurology, 120 ( Pt 12):2229–2242, December 1997. ISSN 0006-8950. doi: 10.1093/brain/120.12.2229

work page doi:10.1093/brain/120.12.2229 1997
[31]

Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.Proceedings of the National Academy of Sciences, 92(18):8135–8139, August 1995

R Malach, J B Reppas, R R Benson, K K Kwong, H Jiang, W A Kennedy, P J Ledden, T J Brady, B R Rosen, and R B Tootell. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.Proceedings of the National Academy of Sciences, 92(18):8135–8139, August 1995. doi: 10.1073/pnas.92.18.8135. URL https://www.pnas. org/doi...

work page doi:10.1073/pnas.92.18.8135 1995
[32]

Cue-Invariant Activation in Object-Related Areas of the Human Occipital Lobe.Neuron, 21 (1):191–202, July 1998

Kalanit Grill-Spector, Tamar Kushnir, Shimon Edelman, Yacov Itzchak, and Rafael Malach. Cue-Invariant Activation in Object-Related Areas of the Human Occipital Lobe.Neuron, 21 (1):191–202, July 1998. ISSN 0896-6273. doi: 10.1016/S0896-6273(00)80526-7. URL https://www.sciencedirect.com/science/article/pii/S0896627300805267. 12

work page doi:10.1016/s0896-6273(00)80526-7 1998
[33]

Nancy Kanwisher and Galit Yovel. The fusiform face area: a cortical region specialized for the perception of faces.Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109–2128, December 2006. ISSN 0962-8436. doi: 10.1098/rstb.2006.1934. URL https://pmc.ncbi.nlm.nih.gov/articles/PMC1857737/

work page doi:10.1098/rstb.2006.1934 2006
[34]

Nancy Kanwisher, Josh McDermott, and Marvin M. Chun. The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception.Journal of Neuroscience, 17(11): 4302–4311, June 1997. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.17-11-04302

work page doi:10.1523/jneurosci.17-11-04302 1997
[35]

URLhttps://www.jneurosci.org/content/17/11/4302

work page
[36]

A cortical representation of the local visual environment

Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392(6676):598–601, April 1998. ISSN 1476-4687. doi: 10.1038/33402. URL https: //www.nature.com/articles/33402

work page doi:10.1038/33402 1998
[37]

Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher

Paul E. Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher. A Cortical Area Selective for Visual Processing of the Human Body.Science, 293(5539):2470–2473, September

work page
[38]

URL https://www.science.org/doi/10.1126/ science.1063414

doi: 10.1126/science.1063414. URL https://www.science.org/doi/10.1126/ science.1063414

work page doi:10.1126/science.1063414
[39]

Origins of the specialization for letters and numbers in ventral occipitotemporal cortex

Thomas Hannagan, Amir Amedi, Laurent Cohen, Ghislaine Dehaene-Lambertz, and Stanislas Dehaene. Origins of the specialization for letters and numbers in ventral occipitotemporal cortex. Trends in Cognitive Sciences, 19(7):374–382, July 2015. ISSN 1364-6613, 1879-307X. doi: 10. 1016/j.tics.2015.05.006. URL https://www.cell.com/trends/cognitive-sciences/ abs...

work page 2015
[40]

Aliette Lochy, Corentin Jacques, Louis Maillard, Sophie Colnat-Coulbois, Bruno Rossion, and Jacques Jonas. Selective visual representation of letters and words in the left ventral occipito-temporal cortex with intracerebral recordings.Proceedings of the National Academy of Sciences, 115(32):E7595–E7604, August 2018. doi: 10.1073/pnas.1718987115. URL https...

work page doi:10.1073/pnas.1718987115 2018
[41]

Baker, and Martin N

Oliver Contier, Chris I. Baker, and Martin N. Hebart. Distributed representations of behaviour- derived object dimensions in the human visual system.Nature Human Behaviour, 8(11): 2179–2193, November 2024. ISSN 2397-3374. doi: 10.1038/s41562-024-01980-y. URL https://www.nature.com/articles/s41562-024-01980-y

work page doi:10.1038/s41562-024-01980-y 2024
[42]

van Dyck, Martin N

Leonard E. van Dyck, Martin N. Hebart, and Katharina Dobs. Multidimensional feature tuning in category-selective areas of human visual cortex, June 2025. URL https://www.biorxiv. org/content/10.1101/2025.06.17.659578v2. Pages: 2025.06.17.659578 Section: New Results

work page doi:10.1101/2025.06.17.659578v2 2025
[43]

Visual feature processing in a large stroke cohort: evidence against modular organization.Brain, 148(4):1144–1154, April 2025

Selma Lugtmeijer, Aleksandra M Sobolewska, Edward H F De Haan, and H Steven Scholte. Visual feature processing in a large stroke cohort: evidence against modular organization.Brain, 148(4):1144–1154, April 2025. ISSN 0006-8950, 1460-2156. doi: 10.1093/brain/awaf009. URLhttps://academic.oup.com/brain/article/148/4/1144/7952043

work page doi:10.1093/brain/awaf009 2025
[44]

Brendan Ritchie, Susan G

J. Brendan Ritchie, Susan G. Wardle, Maryam Vaziri-Pashkam, Dwight J. Kravitz, and Chris I. Baker. Rethinking category-selectivity in human visual cortex.Cognitive Neuroscience, 17(2):49–76, April 2026. ISSN 1758-8928. doi: 10.1080/17588928. 2025.2543890. URL https://doi.org/10.1080/17588928.2025.2543890. _eprint: https://doi.org/10.1080/17588928.2025.2543890

work page doi:10.1080/17588928 2026
[45]

Hebart, Adam H

Martin N. Hebart, Adam H. Dickter, Alexis Kidder, Wan Y . Kwok, Anna Corriveau, Caitlin Van Wicklin, and Chris I. Baker. THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images.PLOS ONE, 14(10):e0223792, October 2019. ISSN 1932-

work page 2019
[46]

URL https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0223792

doi: 10.1371/journal.pone.0223792. URL https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0223792

work page doi:10.1371/journal.pone.0223792
[47]

Hebart, Charles Y

Martin N. Hebart, Charles Y . Zheng, Francisco Pereira, and Chris I. Baker. Revealing the multidimensional mental representations of natural objects underlying human similarity judgements.Nature Human Behaviour, 4(11):1173–1185, November 2020. ISSN 2397-

work page 2020
[48]

URL https://www.nature.com/articles/ s41562-020-00951-3

doi: 10.1038/s41562-020-00951-3. URL https://www.nature.com/articles/ s41562-020-00951-3. 13

work page doi:10.1038/s41562-020-00951-3
[49]

Residual Mixture of Experts, October 2022

Lemeng Wu, Mengchen Liu, Yinpeng Chen, Dongdong Chen, Xiyang Dai, and Lu Yuan. Residual Mixture of Experts, October 2022. URL http://arxiv.org/abs/2204.09636. arXiv:2204.09636 [cs]

work page arXiv 2022
[50]

HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection

Vadim Vashkelis and Natalia Trukhina. HI-MoE: Hierarchical Instance-Conditioned Mixture- of-Experts for Object Detection, April 2026. URL http://arxiv.org/abs/2604.04908. arXiv:2604.04908 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2026
[51]

E. D. Adrian and D. W. Bronk. The discharge of impulses in motor nerve fibres.The Journal of Physiology, 66(1):81–101, 1928. ISSN 1469-7793. doi: 10.1113/jphysiol.1928.sp002509. URL https://onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.1928.sp002509. _eprint: https://physoc.onlinelibrary.wiley.com/doi/pdf/10.1113/jphysiol.1928.sp002509

work page doi:10.1113/jphysiol.1928.sp002509 1928
[52]

H. K. Hartline. The response of single optic nerve fibers of the vertebrate eye to illumination of the retina.American Journal of Physiology-Legacy Content, 121(2):400–415, January

work page
[53]

doi: 10.1152/ajplegacy.1938.121.2.400

ISSN 0002-9513. doi: 10.1152/ajplegacy.1938.121.2.400. URL https://journals. physiology.org/doi/abs/10.1152/ajplegacy.1938.121.2.400

work page doi:10.1152/ajplegacy.1938.121.2.400 1938
[54]

Walker, Fabian H

Edgar Y . Walker, Fabian H. Sinz, Erick Cobos, Taliah Muhammad, Emmanouil Froudarakis, Paul G. Fahey, Alexander S. Ecker, Jacob Reimer, Xaq Pitkow, and Andreas S. Tolias. Inception loops discover what excites neurons most using deep predictive models.Nature Neuroscience, 22(12):2060–2065, December 2019. ISSN 1546-1726. doi: 10.1038/s41593-019-0517-x. URL ...

work page doi:10.1038/s41593-019-0517-x 2060
[55]

Neural tuning and representational geom- etry.Nature Reviews Neuroscience, 22(11):703–718, November 2021

Nikolaus Kriegeskorte and Xue-Xin Wei. Neural tuning and representational geom- etry.Nature Reviews Neuroscience, 22(11):703–718, November 2021. ISSN 1471-

work page 2021
[56]

URL https://www.nature.com/articles/ s41583-021-00502-3

doi: 10.1038/s41583-021-00502-3. URL https://www.nature.com/articles/ s41583-021-00502-3

work page doi:10.1038/s41583-021-00502-3
[57]

Nikolaus Kriegeskorte and Rogier A. Kievit. Representational geometry: integrating cognition, computation, and the brain.Trends in Cognitive Sciences, 17(8):401–412, August 2013. ISSN 1364-6613. doi: 10.1016/j.tics.2013.06.007. URL https://pmc.ncbi.nlm.nih.gov/ articles/PMC3730178/

work page doi:10.1016/j.tics.2013.06.007 2013
[58]

S. E. Petersen, P. T. Fox, M. I. Posner, M. Mintun, and M. E. Raichle. Positron emission tomographic studies of the cortical anatomy of single-word processing.Nature, 331(6157): 585–589, February 1988. ISSN 1476-4687. doi: 10.1038/331585a0. URL https://www. nature.com/articles/331585a0

work page doi:10.1038/331585a0 1988
[59]

The unique role of the visual word form area in reading.Trends in Cognitive Sciences, 15(6):254–262, June 2011

Stanislas Dehaene and Laurent Cohen. The unique role of the visual word form area in reading.Trends in Cognitive Sciences, 15(6):254–262, June 2011. ISSN 1364-6613. doi: 10.1016/j.tics.2011.04.003. URL https://www.sciencedirect.com/science/article/ pii/S1364661311000738

work page doi:10.1016/j.tics.2011.04.003 2011
[60]

Rice, David M

Grace E. Rice, David M. Watson, Tom Hartley, and Timothy J. Andrews. Low-Level Image Properties of Visual Objects Predict Patterns of Neural Response across Category-Selective Regions of the Ventral Visual Pathway.Journal of Neuroscience, 34(26):8837–8844, June

work page
[61]

doi: 10.1523/JNEUROSCI.5265-13.2014

ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.5265-13.2014. URL https: //www.jneurosci.org/content/34/26/8837

work page doi:10.1523/jneurosci.5265-13.2014 2014
[62]

Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas.Neuron, 34(3):479– 490, April 2002

Uri Hasson, Ifat Levy, Marlene Behrmann, Talma Hendler, and Rafael Malach. Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas.Neuron, 34(3):479– 490, April 2002. ISSN 0896-6273. doi: 10.1016/S0896-6273(02)00662-1. URL https: //www.sciencedirect.com/science/article/pii/S0896627302006621

work page doi:10.1016/s0896-6273(02)00662-1 2002
[63]

Arcaro, Stephanie A

Michael J. Arcaro, Stephanie A. McMains, Benjamin D. Singer, and Sabine Kastner. Retinotopic Organization of Human Ventral Visual Cortex.Journal of Neuroscience, 29(34):10638–10652, August 2009. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.2807-09.2009. URL https://www.jneurosci.org/content/29/34/10638

work page doi:10.1523/jneurosci.2807-09.2009 2009
[64]

Bria Long, Chen-Ping Yu, and Talia Konkle. Mid-level visual features underlie the high- level categorical organization of the ventral stream.Proceedings of the National Academy of Sciences, 115(38):E9015–E9024, September 2018. doi: 10.1073/pnas.1719616115. URL https://www.pnas.org/doi/full/10.1073/pnas.1719616115. 14

work page doi:10.1073/pnas.1719616115 2018
[65]

The nature of the animacy organization in human ventral temporal cortex.eLife, 8:e47142, September 2019

Sushrut Thorat, Daria Proklova, and Marius V Peelen. The nature of the animacy organization in human ventral temporal cortex.eLife, 8:e47142, September 2019. ISSN 2050-084X. doi: 10.7554/eLife.47142. URLhttps://doi.org/10.7554/eLife.47142

work page doi:10.7554/elife.47142 2019
[66]

Tripartite Organization of the Ventral Stream by Animacy and Object Size.Journal of Neuroscience, 33(25):10235–10242, June 2013

Talia Konkle and Alfonso Caramazza. Tripartite Organization of the Ventral Stream by Animacy and Object Size.Journal of Neuroscience, 33(25):10235–10242, June 2013. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.0983-13.2013. URL https://www.jneurosci.org/ content/33/25/10235

work page doi:10.1523/jneurosci.0983-13.2013 2013
[67]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, December 2015. URL http://arxiv.org/abs/1512.03385. arXiv:1512.03385 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2015
[68]

Investigating the Benefits of Projection Head for Representation Learning, March 2024

Yihao Xue, Eric Gan, Jiayi Ni, Siddharth Joshi, and Baharan Mirzasoleiman. Investigating the Benefits of Projection Head for Representation Learning, March 2024. URL http://arxiv. org/abs/2403.11391. arXiv:2403.11391 [cs]

work page arXiv 2024
[69]

Prince, George A

Jacob S. Prince, George A. Alvarez, and Talia Konkle. Contrastive learning explains the emer- gence and function of visual category-selective regions.Science Advances, 10(39):eadl1776, September 2024. doi: 10.1126/sciadv.adl1776. URL https://www.science.org/doi/10. 1126/sciadv.adl1776

work page doi:10.1126/sciadv.adl1776 2024
[70]

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Adam Coates, Honglak Lee, and Andrew Y Ng. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. 2011

work page 2011
[71]

Xiao-Xiong Lin, Andreas Nieder, and Simon N. Jacob. The neuronal implementation of representational geometry in primate prefrontal cortex.Science Advances, 9(50):eadh8685, December 2023. doi: 10.1126/sciadv.adh8685. URL https://www.science.org/doi/10. 1126/sciadv.adh8685

work page doi:10.1126/sciadv.adh8685 2023
[72]

Martin, Rhodri Cusack, and Stefan Köhler

Anna Blumenthal, Bobby Stojanoski, Chris B. Martin, Rhodri Cusack, and Stefan Köhler. Animacy and real-world size shape object representations in the human medial temporal lobes. Human Brain Mapping, 39(9):3779–3792, June 2018. ISSN 1065-9471. doi: 10.1002/hbm. 24212. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC6866524/

work page doi:10.1002/hbm 2018
[73]

Spoerer, Emer C

Johannes Mehrer, Courtney J. Spoerer, Emer C. Jones, Nikolaus Kriegeskorte, and Tim C. Kietzmann. An ecologically motivated image dataset for deep learning yields better models of human vision.Proceedings of the National Academy of Sciences, 118(8):e2011417118, February 2021. doi: 10.1073/pnas.2011417118. URL https://www.pnas.org/doi/10. 1073/pnas.2011417...

work page doi:10.1073/pnas.2011417118 2021

[1] [1]

Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

William Fedus, Barret Zoph, and Noam Shazeer. Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022. ISSN 1533-7928. URL http://jmlr.org/papers/v23/21-0998. html

work page 2022

[2] [2]

Adaptive Mixtures of Local Experts

Robert A. Jacobs, Michael I. Jordan, Steven J. Nowlan, and Geoffrey E. Hinton. Adaptive Mixtures of Local Experts.Neural Computation, 3(1):79–87, 1991. ISSN 1530-888X. doi: 10.1162/neco.1991.3.1.79

work page doi:10.1162/neco.1991.3.1.79 1991

[3] [3]

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer

Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer, January 2017. URLhttp://arxiv.org/abs/1701.06538. arXiv:1701.06538 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2017

[4] [4]

Jordan and Robert A

Michael I. Jordan and Robert A. Jacobs. Hierarchical Mixtures of Experts and the EM Algorithm. Neural Computation, 6(2):181–214, March 1994. ISSN 0899-7667. doi: 10.1162/neco.1994.6. 2.181. URLhttps://ieeexplore.ieee.org/abstract/document/6796382

work page doi:10.1162/neco.1994.6 1994

[5] [5]

A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications, January 2026

Siyuan Mu and Sen Lin. A Comprehensive Survey of Mixture-of-Experts: Algorithms, Theory, and Applications, January 2026. URL http://arxiv.org/abs/2503.07137. arXiv:2503.07137 [cs]

work page arXiv 2026

[6] [6]

ViMoE: An Empirical Study of Designing Vision Mixture-of- Experts, November 2024

Xumeng Han, Longhui Wei, Zhiyang Dou, Zipeng Wang, Chenhui Qiang, Xin He, Yingfei Sun, Zhenjun Han, and Qi Tian. ViMoE: An Empirical Study of Designing Vision Mixture-of- Experts, November 2024. URL http://arxiv.org/abs/2410.15732. arXiv:2410.15732 [cs]

work page arXiv 2024

[7] [7]

Scaling Vision with Sparse Mixture of Experts, June 2021

Carlos Riquelme, Joan Puigcerver, Basil Mustafa, Maxim Neumann, Rodolphe Jenatton, An- dré Susano Pinto, Daniel Keysers, and Neil Houlsby. Scaling Vision with Sparse Mixture of Experts, June 2021. URLhttp://arxiv.org/abs/2106.05974. arXiv:2106.05974 [cs]

work page arXiv 2021

[8] [8]

Scaling Vision-Language Models with Sparse Mixture of Experts, March 2023

Sheng Shen, Zhewei Yao, Chunyuan Li, Trevor Darrell, Kurt Keutzer, and Yuxiong He. Scaling Vision-Language Models with Sparse Mixture of Experts, March 2023. URLhttp://arxiv. org/abs/2303.07226. arXiv:2303.07226 [cs]

work page arXiv 2023

[9] [9]

Wensheng Gan, Zhenyao Ning, Zhenlian Qi, and Philip S. Yu. Mixture of Experts (MoE): A Big Data Perspective, January 2025. URL http://arxiv.org/abs/2501.16352. arXiv:2501.16352 [cs]

work page arXiv 2025

[10] [10]

Mixture of Experts Made Intrinsically Interpretable

Xingyi Yang, Constantin Venhoff, and Ashkan Khakzar. Mixture of Experts Made Intrinsically Interpretable. May 2025

work page 2025

[11] [11]

Marius Zöllner

Svetlana Pavlitska, Christian Hubschneider, Lukas Struppek, and J. Marius Zöllner. Sparsely- gated Mixture-of-Expert Layers for CNN Interpretability. In2023 International Joint Con- ference on Neural Networks (IJCNN), pages 1–10, June 2023. doi: 10.1109/IJCNN54540. 2023.10191904. URL https://ieeexplore.ieee.org/document/10191904. ISSN: 2161- 4407

work page doi:10.1109/ijcnn54540 2023

[12] [12]

Multi- modal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts, June 2022

Basil Mustafa, Carlos Riquelme, Joan Puigcerver, Rodolphe Jenatton, and Neil Houlsby. Multi- modal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts, June 2022. URLhttp://arxiv.org/abs/2206.02770. arXiv:2206.02770 [cs]

work page arXiv 2022

[13] [13]

MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models, January 2026

Dianyi Wang, Siyuan Wang, Zejun Li, Yikun Wang, Yitong Li, Duyu Tang, Xiaoyu Shen, Xuanjing Huang, and Zhongyu Wei. MoIIE: Mixture of Intra- and Inter-Modality Experts for Large Vision Language Models, January 2026. URL http://arxiv.org/abs/2508.09779. arXiv:2508.09779 [cs]

work page arXiv 2026

[14] [14]

Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities, April 2025

Raman Dutt, Harleen Hanspal, Guoxuan Xia, Petru-Daniel Tudosiu, Alexander Black, Yongxin Yang, Steven McDonagh, and Sarah Parisot. Exploiting Mixture-of-Experts Redundancy Unlocks Multimodal Generative Abilities, April 2025. URL http://arxiv.org/abs/2503. 22517. arXiv:2503.22517 [cs]. 11

work page arXiv 2025

[15] [15]

Mixture of Experts in Image Classification: What’s the Sweet Spot?, October 2025

Mathurin Videau, Alessandro Leite, Marc Schoenauer, and Olivier Teytaud. Mixture of Experts in Image Classification: What’s the Sweet Spot?, October 2025. URL http://arxiv.org/ abs/2411.18322. arXiv:2411.18322 [cs]

work page arXiv 2025

[16] [16]

Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model, April

Chaoxiang Cai, Longrong Yang, Minghe Weng, Xuewei Li, Zequn Qin, and Xi Li. Long-Tailed Distribution-Aware Router For Mixture-of-Experts in Large Vision-Language Model, April

work page

[17] [17]

arXiv:2507.01351 [cs]

URLhttp://arxiv.org/abs/2507.01351. arXiv:2507.01351 [cs]

work page arXiv

[18] [18]

MoE Lens – An Expert Is All You Need, March 2026

Marmik Chaudhari, Idhant Gulati, Nishkal Hundia, Pranav Karra, and Shivam Raval. MoE Lens – An Expert Is All You Need, March 2026. URLhttp://arxiv.org/abs/2603.05806. arXiv:2603.05806 [cs]

work page arXiv 2026

[19] [19]

A Closer Look into Mixture-of- Experts in Large Language Models, June 2025

Ka Man Lo, Zeyu Huang, Zihan Qiu, Zili Wang, and Jie Fu. A Closer Look into Mixture-of- Experts in Large Language Models, June 2025. URL http://arxiv.org/abs/2406.18219. arXiv:2406.18219 [cs]

work page arXiv 2025

[20] [20]

Probing Semantic Routing in Large Mixture-of-Expert Models

Matthew Lyle Olson, Neale Ratzlaff, Musashi Hinck, Man Luo, Sungduk Yu, Chendi Xue, and Vasudev Lal. Probing Semantic Routing in Large Mixture-of-Expert Models. In Christos Christodoulopoulos, Tanmoy Chakraborty, Carolyn Rose, and Violet Peng, editors,Findings of the Association for Computational Linguistics: EMNLP 2025, pages 18263–18278, Suzhou, China, ...

work page 2025

[21] [21]

URL https://aclanthology.org/2025

doi: 10.18653/v1/2025.findings-emnlp.991. URL https://aclanthology.org/2025. findings-emnlp.991/

work page doi:10.18653/v1/2025.findings-emnlp.991 2025

[22] [22]

Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms, September 2025

Jiahao Ying, Mingbao Lin, Qianru Sun, and Yixin Cao. Beyond Benchmarks: Understanding Mixture-of-Experts Models through Internal Mechanisms, September 2025. URL http:// arxiv.org/abs/2509.23933. arXiv:2509.23933 [cs]

work page arXiv 2025

[23] [23]

Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts, September 2025

Strahinja Nikolic, Ilker Oguz, and Demetri Psaltis. Exploring Expert Specialization through Unsupervised Training in Sparse Mixture of Experts, September 2025. URL http://arxiv. org/abs/2509.10025. arXiv:2509.10025 [cs]

work page arXiv 2025

[24] [24]

Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models, December 2024

Elie Antoine, Frédéric Béchet, and Philippe Langlais. Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models, December 2024. URL http://arxiv.org/abs/2412.16971. arXiv:2412.16971 [cs]

work page arXiv 2024

[25] [25]

ST-MoE: Designing Stable and Transferable Sparse Expert Models, May

Barret Zoph, Irwan Bello, Sameer Kumar, Nan Du, Yanping Huang, Jeff Dean, Noam Shazeer, and William Fedus. ST-MoE: Designing Stable and Transferable Sparse Expert Models, May

work page

[26] [26]

ST-MoE: Designing Stable and Transferable Sparse Expert Models

URLhttp://arxiv.org/abs/2202.08906. arXiv:2202.08906 [cs]

work page internal anchor Pith review Pith/arXiv arXiv

[27] [27]

OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models, March 2024

Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, and Yang You. OpenMoE: An Early Effort on Open Mixture-of-Experts Language Models, March 2024. URLhttp://arxiv.org/abs/2402.01739. arXiv:2402.01739 [cs]

work page arXiv 2024

[28] [28]

Marius Zöllner

Svetlana Pavlitska, Haixi Fan, Konstantin Ditschuneit, and J. Marius Zöllner. Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation, April

work page

[29] [29]

Design and Behavior of Sparse Mixture-of-Experts Layers in CNN-based Semantic Segmentation

URLhttp://arxiv.org/abs/2604.13761. arXiv:2604.13761 [cs]

work page internal anchor Pith review Pith/arXiv arXiv

[30] [30]

D. J. McKeefry and S. Zeki. The position and topography of the human colour centre as revealed by functional magnetic resonance imaging.Brain: A Journal of Neurology, 120 ( Pt 12):2229–2242, December 1997. ISSN 0006-8950. doi: 10.1093/brain/120.12.2229

work page doi:10.1093/brain/120.12.2229 1997

[31] [31]

Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.Proceedings of the National Academy of Sciences, 92(18):8135–8139, August 1995

R Malach, J B Reppas, R R Benson, K K Kwong, H Jiang, W A Kennedy, P J Ledden, T J Brady, B R Rosen, and R B Tootell. Object-related activity revealed by functional magnetic resonance imaging in human occipital cortex.Proceedings of the National Academy of Sciences, 92(18):8135–8139, August 1995. doi: 10.1073/pnas.92.18.8135. URL https://www.pnas. org/doi...

work page doi:10.1073/pnas.92.18.8135 1995

[32] [32]

Cue-Invariant Activation in Object-Related Areas of the Human Occipital Lobe.Neuron, 21 (1):191–202, July 1998

Kalanit Grill-Spector, Tamar Kushnir, Shimon Edelman, Yacov Itzchak, and Rafael Malach. Cue-Invariant Activation in Object-Related Areas of the Human Occipital Lobe.Neuron, 21 (1):191–202, July 1998. ISSN 0896-6273. doi: 10.1016/S0896-6273(00)80526-7. URL https://www.sciencedirect.com/science/article/pii/S0896627300805267. 12

work page doi:10.1016/s0896-6273(00)80526-7 1998

[33] [33]

Nancy Kanwisher and Galit Yovel. The fusiform face area: a cortical region specialized for the perception of faces.Philosophical Transactions of the Royal Society B: Biological Sciences, 361(1476):2109–2128, December 2006. ISSN 0962-8436. doi: 10.1098/rstb.2006.1934. URL https://pmc.ncbi.nlm.nih.gov/articles/PMC1857737/

work page doi:10.1098/rstb.2006.1934 2006

[34] [34]

Nancy Kanwisher, Josh McDermott, and Marvin M. Chun. The Fusiform Face Area: A Module in Human Extrastriate Cortex Specialized for Face Perception.Journal of Neuroscience, 17(11): 4302–4311, June 1997. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.17-11-04302

work page doi:10.1523/jneurosci.17-11-04302 1997

[35] [35]

URLhttps://www.jneurosci.org/content/17/11/4302

work page

[36] [36]

A cortical representation of the local visual environment

Russell Epstein and Nancy Kanwisher. A cortical representation of the local visual environment. Nature, 392(6676):598–601, April 1998. ISSN 1476-4687. doi: 10.1038/33402. URL https: //www.nature.com/articles/33402

work page doi:10.1038/33402 1998

[37] [37]

Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher

Paul E. Downing, Yuhong Jiang, Miles Shuman, and Nancy Kanwisher. A Cortical Area Selective for Visual Processing of the Human Body.Science, 293(5539):2470–2473, September

work page

[38] [38]

URL https://www.science.org/doi/10.1126/ science.1063414

doi: 10.1126/science.1063414. URL https://www.science.org/doi/10.1126/ science.1063414

work page doi:10.1126/science.1063414

[39] [39]

Origins of the specialization for letters and numbers in ventral occipitotemporal cortex

Thomas Hannagan, Amir Amedi, Laurent Cohen, Ghislaine Dehaene-Lambertz, and Stanislas Dehaene. Origins of the specialization for letters and numbers in ventral occipitotemporal cortex. Trends in Cognitive Sciences, 19(7):374–382, July 2015. ISSN 1364-6613, 1879-307X. doi: 10. 1016/j.tics.2015.05.006. URL https://www.cell.com/trends/cognitive-sciences/ abs...

work page 2015

[40] [40]

Aliette Lochy, Corentin Jacques, Louis Maillard, Sophie Colnat-Coulbois, Bruno Rossion, and Jacques Jonas. Selective visual representation of letters and words in the left ventral occipito-temporal cortex with intracerebral recordings.Proceedings of the National Academy of Sciences, 115(32):E7595–E7604, August 2018. doi: 10.1073/pnas.1718987115. URL https...

work page doi:10.1073/pnas.1718987115 2018

[41] [41]

Baker, and Martin N

Oliver Contier, Chris I. Baker, and Martin N. Hebart. Distributed representations of behaviour- derived object dimensions in the human visual system.Nature Human Behaviour, 8(11): 2179–2193, November 2024. ISSN 2397-3374. doi: 10.1038/s41562-024-01980-y. URL https://www.nature.com/articles/s41562-024-01980-y

work page doi:10.1038/s41562-024-01980-y 2024

[42] [42]

van Dyck, Martin N

Leonard E. van Dyck, Martin N. Hebart, and Katharina Dobs. Multidimensional feature tuning in category-selective areas of human visual cortex, June 2025. URL https://www.biorxiv. org/content/10.1101/2025.06.17.659578v2. Pages: 2025.06.17.659578 Section: New Results

work page doi:10.1101/2025.06.17.659578v2 2025

[43] [43]

Visual feature processing in a large stroke cohort: evidence against modular organization.Brain, 148(4):1144–1154, April 2025

Selma Lugtmeijer, Aleksandra M Sobolewska, Edward H F De Haan, and H Steven Scholte. Visual feature processing in a large stroke cohort: evidence against modular organization.Brain, 148(4):1144–1154, April 2025. ISSN 0006-8950, 1460-2156. doi: 10.1093/brain/awaf009. URLhttps://academic.oup.com/brain/article/148/4/1144/7952043

work page doi:10.1093/brain/awaf009 2025

[44] [44]

Brendan Ritchie, Susan G

J. Brendan Ritchie, Susan G. Wardle, Maryam Vaziri-Pashkam, Dwight J. Kravitz, and Chris I. Baker. Rethinking category-selectivity in human visual cortex.Cognitive Neuroscience, 17(2):49–76, April 2026. ISSN 1758-8928. doi: 10.1080/17588928. 2025.2543890. URL https://doi.org/10.1080/17588928.2025.2543890. _eprint: https://doi.org/10.1080/17588928.2025.2543890

work page doi:10.1080/17588928 2026

[45] [45]

Hebart, Adam H

Martin N. Hebart, Adam H. Dickter, Alexis Kidder, Wan Y . Kwok, Anna Corriveau, Caitlin Van Wicklin, and Chris I. Baker. THINGS: A database of 1,854 object concepts and more than 26,000 naturalistic object images.PLOS ONE, 14(10):e0223792, October 2019. ISSN 1932-

work page 2019

[46] [46]

URL https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0223792

doi: 10.1371/journal.pone.0223792. URL https://journals.plos.org/plosone/ article?id=10.1371/journal.pone.0223792

work page doi:10.1371/journal.pone.0223792

[47] [47]

Hebart, Charles Y

Martin N. Hebart, Charles Y . Zheng, Francisco Pereira, and Chris I. Baker. Revealing the multidimensional mental representations of natural objects underlying human similarity judgements.Nature Human Behaviour, 4(11):1173–1185, November 2020. ISSN 2397-

work page 2020

[48] [48]

URL https://www.nature.com/articles/ s41562-020-00951-3

doi: 10.1038/s41562-020-00951-3. URL https://www.nature.com/articles/ s41562-020-00951-3. 13

work page doi:10.1038/s41562-020-00951-3

[49] [49]

Residual Mixture of Experts, October 2022

Lemeng Wu, Mengchen Liu, Yinpeng Chen, Dongdong Chen, Xiyang Dai, and Lu Yuan. Residual Mixture of Experts, October 2022. URL http://arxiv.org/abs/2204.09636. arXiv:2204.09636 [cs]

work page arXiv 2022

[50] [50]

HI-MoE: Hierarchical Instance-Conditioned Mixture-of-Experts for Object Detection

Vadim Vashkelis and Natalia Trukhina. HI-MoE: Hierarchical Instance-Conditioned Mixture- of-Experts for Object Detection, April 2026. URL http://arxiv.org/abs/2604.04908. arXiv:2604.04908 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2026

[51] [51]

E. D. Adrian and D. W. Bronk. The discharge of impulses in motor nerve fibres.The Journal of Physiology, 66(1):81–101, 1928. ISSN 1469-7793. doi: 10.1113/jphysiol.1928.sp002509. URL https://onlinelibrary.wiley.com/doi/abs/10.1113/jphysiol.1928.sp002509. _eprint: https://physoc.onlinelibrary.wiley.com/doi/pdf/10.1113/jphysiol.1928.sp002509

work page doi:10.1113/jphysiol.1928.sp002509 1928

[52] [52]

H. K. Hartline. The response of single optic nerve fibers of the vertebrate eye to illumination of the retina.American Journal of Physiology-Legacy Content, 121(2):400–415, January

work page

[53] [53]

doi: 10.1152/ajplegacy.1938.121.2.400

ISSN 0002-9513. doi: 10.1152/ajplegacy.1938.121.2.400. URL https://journals. physiology.org/doi/abs/10.1152/ajplegacy.1938.121.2.400

work page doi:10.1152/ajplegacy.1938.121.2.400 1938

[54] [54]

Walker, Fabian H

Edgar Y . Walker, Fabian H. Sinz, Erick Cobos, Taliah Muhammad, Emmanouil Froudarakis, Paul G. Fahey, Alexander S. Ecker, Jacob Reimer, Xaq Pitkow, and Andreas S. Tolias. Inception loops discover what excites neurons most using deep predictive models.Nature Neuroscience, 22(12):2060–2065, December 2019. ISSN 1546-1726. doi: 10.1038/s41593-019-0517-x. URL ...

work page doi:10.1038/s41593-019-0517-x 2060

[55] [55]

Neural tuning and representational geom- etry.Nature Reviews Neuroscience, 22(11):703–718, November 2021

Nikolaus Kriegeskorte and Xue-Xin Wei. Neural tuning and representational geom- etry.Nature Reviews Neuroscience, 22(11):703–718, November 2021. ISSN 1471-

work page 2021

[56] [56]

URL https://www.nature.com/articles/ s41583-021-00502-3

doi: 10.1038/s41583-021-00502-3. URL https://www.nature.com/articles/ s41583-021-00502-3

work page doi:10.1038/s41583-021-00502-3

[57] [57]

Nikolaus Kriegeskorte and Rogier A. Kievit. Representational geometry: integrating cognition, computation, and the brain.Trends in Cognitive Sciences, 17(8):401–412, August 2013. ISSN 1364-6613. doi: 10.1016/j.tics.2013.06.007. URL https://pmc.ncbi.nlm.nih.gov/ articles/PMC3730178/

work page doi:10.1016/j.tics.2013.06.007 2013

[58] [58]

S. E. Petersen, P. T. Fox, M. I. Posner, M. Mintun, and M. E. Raichle. Positron emission tomographic studies of the cortical anatomy of single-word processing.Nature, 331(6157): 585–589, February 1988. ISSN 1476-4687. doi: 10.1038/331585a0. URL https://www. nature.com/articles/331585a0

work page doi:10.1038/331585a0 1988

[59] [59]

The unique role of the visual word form area in reading.Trends in Cognitive Sciences, 15(6):254–262, June 2011

Stanislas Dehaene and Laurent Cohen. The unique role of the visual word form area in reading.Trends in Cognitive Sciences, 15(6):254–262, June 2011. ISSN 1364-6613. doi: 10.1016/j.tics.2011.04.003. URL https://www.sciencedirect.com/science/article/ pii/S1364661311000738

work page doi:10.1016/j.tics.2011.04.003 2011

[60] [60]

Rice, David M

Grace E. Rice, David M. Watson, Tom Hartley, and Timothy J. Andrews. Low-Level Image Properties of Visual Objects Predict Patterns of Neural Response across Category-Selective Regions of the Ventral Visual Pathway.Journal of Neuroscience, 34(26):8837–8844, June

work page

[61] [61]

doi: 10.1523/JNEUROSCI.5265-13.2014

ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.5265-13.2014. URL https: //www.jneurosci.org/content/34/26/8837

work page doi:10.1523/jneurosci.5265-13.2014 2014

[62] [62]

Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas.Neuron, 34(3):479– 490, April 2002

Uri Hasson, Ifat Levy, Marlene Behrmann, Talma Hendler, and Rafael Malach. Eccentricity Bias as an Organizing Principle for Human High-Order Object Areas.Neuron, 34(3):479– 490, April 2002. ISSN 0896-6273. doi: 10.1016/S0896-6273(02)00662-1. URL https: //www.sciencedirect.com/science/article/pii/S0896627302006621

work page doi:10.1016/s0896-6273(02)00662-1 2002

[63] [63]

Arcaro, Stephanie A

Michael J. Arcaro, Stephanie A. McMains, Benjamin D. Singer, and Sabine Kastner. Retinotopic Organization of Human Ventral Visual Cortex.Journal of Neuroscience, 29(34):10638–10652, August 2009. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.2807-09.2009. URL https://www.jneurosci.org/content/29/34/10638

work page doi:10.1523/jneurosci.2807-09.2009 2009

[64] [64]

Bria Long, Chen-Ping Yu, and Talia Konkle. Mid-level visual features underlie the high- level categorical organization of the ventral stream.Proceedings of the National Academy of Sciences, 115(38):E9015–E9024, September 2018. doi: 10.1073/pnas.1719616115. URL https://www.pnas.org/doi/full/10.1073/pnas.1719616115. 14

work page doi:10.1073/pnas.1719616115 2018

[65] [65]

The nature of the animacy organization in human ventral temporal cortex.eLife, 8:e47142, September 2019

Sushrut Thorat, Daria Proklova, and Marius V Peelen. The nature of the animacy organization in human ventral temporal cortex.eLife, 8:e47142, September 2019. ISSN 2050-084X. doi: 10.7554/eLife.47142. URLhttps://doi.org/10.7554/eLife.47142

work page doi:10.7554/elife.47142 2019

[66] [66]

Tripartite Organization of the Ventral Stream by Animacy and Object Size.Journal of Neuroscience, 33(25):10235–10242, June 2013

Talia Konkle and Alfonso Caramazza. Tripartite Organization of the Ventral Stream by Animacy and Object Size.Journal of Neuroscience, 33(25):10235–10242, June 2013. ISSN 0270-6474, 1529-2401. doi: 10.1523/JNEUROSCI.0983-13.2013. URL https://www.jneurosci.org/ content/33/25/10235

work page doi:10.1523/jneurosci.0983-13.2013 2013

[67] [67]

Deep Residual Learning for Image Recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep Residual Learning for Image Recognition, December 2015. URL http://arxiv.org/abs/1512.03385. arXiv:1512.03385 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2015

[68] [68]

Investigating the Benefits of Projection Head for Representation Learning, March 2024

Yihao Xue, Eric Gan, Jiayi Ni, Siddharth Joshi, and Baharan Mirzasoleiman. Investigating the Benefits of Projection Head for Representation Learning, March 2024. URL http://arxiv. org/abs/2403.11391. arXiv:2403.11391 [cs]

work page arXiv 2024

[69] [69]

Prince, George A

Jacob S. Prince, George A. Alvarez, and Talia Konkle. Contrastive learning explains the emer- gence and function of visual category-selective regions.Science Advances, 10(39):eadl1776, September 2024. doi: 10.1126/sciadv.adl1776. URL https://www.science.org/doi/10. 1126/sciadv.adl1776

work page doi:10.1126/sciadv.adl1776 2024

[70] [70]

An Analysis of Single-Layer Networks in Unsupervised Feature Learning

Adam Coates, Honglak Lee, and Andrew Y Ng. An Analysis of Single-Layer Networks in Unsupervised Feature Learning. 2011

work page 2011

[71] [71]

Xiao-Xiong Lin, Andreas Nieder, and Simon N. Jacob. The neuronal implementation of representational geometry in primate prefrontal cortex.Science Advances, 9(50):eadh8685, December 2023. doi: 10.1126/sciadv.adh8685. URL https://www.science.org/doi/10. 1126/sciadv.adh8685

work page doi:10.1126/sciadv.adh8685 2023

[72] [72]

Martin, Rhodri Cusack, and Stefan Köhler

Anna Blumenthal, Bobby Stojanoski, Chris B. Martin, Rhodri Cusack, and Stefan Köhler. Animacy and real-world size shape object representations in the human medial temporal lobes. Human Brain Mapping, 39(9):3779–3792, June 2018. ISSN 1065-9471. doi: 10.1002/hbm. 24212. URLhttps://pmc.ncbi.nlm.nih.gov/articles/PMC6866524/

work page doi:10.1002/hbm 2018

[73] [73]

Spoerer, Emer C

Johannes Mehrer, Courtney J. Spoerer, Emer C. Jones, Nikolaus Kriegeskorte, and Tim C. Kietzmann. An ecologically motivated image dataset for deep learning yields better models of human vision.Proceedings of the National Academy of Sciences, 118(8):e2011417118, February 2021. doi: 10.1073/pnas.2011417118. URL https://www.pnas.org/doi/10. 1073/pnas.2011417...

work page doi:10.1073/pnas.2011417118 2021