Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs

Bo Yuan; Charles Ye; Lee Sharkey

arxiv: 2604.17837 · v1 · submitted 2026-04-20 · 💻 cs.AI · cs.CL· cs.LG

Polysemantic Experts, Monosemantic Paths: Routing as Control in MoEs

Charles Ye , Bo Yuan , Lee Sharkey This is my paper

Pith reviewed 2026-05-10 04:38 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LG

keywords mixture of expertsinterpretabilityroutingpolysemanticmonosemantictrajectoriescontrol signaldecomposition

0 comments

The pith

Mixture-of-Experts models keep individual experts polysemantic but turn their routing paths into monosemantic trajectories by isolating an abstract control signal from surface content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a parameter-free decomposition that splits each layer's hidden state into a control signal driving the router and an orthogonal content channel the router cannot access. Surface features such as language, token identity, and position remain in the content channel while the control signal encodes an abstract function that rotates layer to layer. Low-bandwidth routing then forces compositional specialization across layers. Individual experts stay polysemantic, yet the full paths tokens follow cluster by semantic function across languages and surface forms. The decomposition shows that clusters in the control subspace are more monosemantic than those in the full representation, so the trajectory becomes the natural unit of interpretability.

Core claim

We introduce a parameter-free decomposition for Mixture-of-Experts models that splits each layer's hidden state into a control signal that causally drives routing and an orthogonal content channel invisible to the router. Across six MoE architectures, we find that models preserve surface-level features in the content channel while the control signal encodes an abstract function that rotates from layer to layer. Because each routing decision is low-bandwidth, this hand-off forces compositional specialization across layers. While individual experts remain polysemantic, expert paths become monosemantic, clustering tokens by semantic function across languages and surface forms. The same token (e

What carries the argument

The parameter-free orthogonal decomposition of each layer's hidden state into a routing-driving control signal and an invisible content channel.

If this is right

Tokens such as ":" follow distinct trajectories depending on whether they act as type annotations, introductory colons, or time separators.
Control-subspace clusters are substantially more monosemantic than clusters in the full representation.
Low-bandwidth routing produces compositional specialization across layers.
Surface-level features are preserved in the content channel while abstract functions rotate in the control signal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Interpretability methods for MoEs should track full trajectories rather than inspect experts in isolation.
The split could be adapted to identify analogous control signals in dense models or other architectures.
Targeted edits to the control signal might allow steering of semantic behavior without changing the content channel.

Load-bearing premise

The orthogonal split isolates a causally driving control signal from content invisible to the router.

What would settle it

Running the decomposition on a new MoE model or token set and finding that paths fail to form clusters by semantic function or that the control signal does not causally determine routing decisions.

Figures

Figures reproduced from arXiv: 2604.17837 by Bo Yuan, Charles Ye, Lee Sharkey.

**Figure 1.** Figure 1: One token, three programs. We plot 500 top-1 routing paths of the token : through 48 MoE layers of QWEN3-30B-A3B, separated by contextual function: Python type annotation (left), introductory colon (center), time separator (right). Despite identical token ID, the three uses produce distinct expert trajectories. Expert IDs are reordered at each layer using a shared Sugiyamastyle layout to minimize edge cro… view at source ↗

**Figure 2.** Figure 2: Control and content in an MoE layer. We decompose the residual stream hl into orthogonal components via SVD of the routing matrix. The router-visible component (h vis l , top) is the control signal: it alone determines expert selection. The router-blind component (h blind l , bottom) is the content: invisible to routing, but processed by the selected expert alongside h vis l . 3 HOW MOES ORGANIZE COMPUTAT… view at source ↗

**Figure 3.** Figure 3: Routing features are ephemeral. Probes predict the top-1 expert at the current layer E(l) or next layer E(l + 1) from each channel. The control signal h vis l predicts E(l) near-perfectly (dark orange, ∼99%) but carries almost no information about E(l+ 1) (light orange, ∼35%). The content channel h blind l shows the reverse: weak for the current layer (dark blue, ∼67%), it is the strongest predictor of the… view at source ↗

**Figure 4.** Figure 4: Expert paths are monosemantic. Each group shows tokens (highlighted) sharing an identical mid-layer expert path in gpt-oss-20b. Context is grayed; highlighted tokens are the routed unit. Paths cluster by function across languages and surface forms: overseas , 海外 , and 境外 share no lexical overlap but follow the same computation trace. The control subspace is the source. Our decomposition predicts this: if p… view at source ↗

**Figure 5.** Figure 5: Per-dimension hidden-state magnitude Mh l,d (x-axis) versus router-weight magnitude MR l,d (y-axis), both on log scales. One dot = one hidden dim d in layer l. Colors represent layers. Strong positive correlations are observed, indicating that router weights are largest on dimensions where the representation already has the highest magnitude. We show that routing is a surprisingly low-dimensional process. … view at source ↗

**Figure 6.** Figure 6: Probe accuracy versus fraction of highest-magnitude hidden dimensions used. Results [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Average cross-layer cosine similarities C vis l and C blind l (shaded: 95% bootstrap CIs). 13 [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Language probe. Normalized MI% between channel representations and language label. h blind l (blue) encodes language consistently; h vis l (orange) carries substantially less, especially at middle layers. 14 [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Token ID probe results. Normalized MI% between channel representations and token ID (filtered for top 100 token IDs). OlMoE Granite-4.0-Tiny Deepseek-v2-Lite Qwen3-30B-A3B gpt-oss-20b GLM-4.5-Air 10 10 20 30 40 10 20 10 20 30 40 10 20 10 20 30 40 0.05 0.10 0.15 0.20 0.05 0.10 0.15 0.20 Layer index (l) MI % hvis(l) → sequence_position hblind(l) → sequence_position [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Sequence position probe results. Normalized MI% between channel representations and token position ID. G COLON TOKEN DATASET To generate balanced data for [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

An LLM's residual stream is both state and instruction: it encodes the current context and determines the next transformation. We introduce a parameter-free decomposition for Mixture-of-Experts models that splits each layer's hidden state into a control signal that causally drives routing and an orthogonal content channel invisible to the router. Across six MoE architectures, we find that models preserve surface-level features (language, token identity, position) in the content channel, while the control signal encodes an abstract function that rotates from layer to layer. Because each routing decision is low-bandwidth, this hand-off forces compositional specialization across layers. While individual experts remain polysemantic, expert paths become monosemantic, clustering tokens by semantic function across languages and surface forms. The same token (e.g., ":") follows distinct trajectories depending on whether it serves as a type annotation, an introductory colon, or a time separator. Our decomposition identifies the source of this structure: clusters in the control subspace are substantially more monosemantic than those in the full representation. As a result, the natural unit of interpretability in MoEs is not the expert but the trajectory.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's parameter-free split of MoE states into control and content channels is a clean new tool, but the claim that paths are the natural interpretability unit rests on thin evidence from six models and illustrative tokens.

read the letter

The main thing to know is that this work offers a simple, parameter-free way to decompose each layer's hidden state in Mixture-of-Experts models into a control signal that actually drives routing and an orthogonal content channel the router ignores. Across the six architectures they test, the control part seems to carry abstract functional information that changes layer by layer, while content keeps surface features like token identity and language. This leads to their observation that individual experts stay polysemantic but the full paths through the model become more monosemantic, clustering tokens by role rather than form. The colon example across different syntactic uses is a clear illustration of that point.

Referee Report

2 major / 2 minor

Summary. The paper introduces a parameter-free orthogonal decomposition of the residual stream in Mixture-of-Experts (MoE) models that isolates a control signal (which drives routing) from an orthogonal content channel (invisible to the router). Across six MoE architectures, it reports that surface features (language, token identity, position) are preserved in the content channel while the control signal encodes abstract functions that rotate layer to layer, inducing compositional specialization. Individual experts remain polysemantic, but expert paths (trajectories) become monosemantic, clustering tokens by semantic function (e.g., distinct trajectories for the token ':' in different syntactic roles). The manuscript concludes that the natural unit of interpretability in MoEs is the trajectory rather than the expert.

Significance. If the decomposition and monosemantic-path observations hold under broader testing, the work offers a concrete, parameter-free tool for mechanistic interpretability of MoEs by reframing routing as a low-bandwidth control mechanism. The cross-architecture consistency and the explicit separation of control versus content are notable strengths that could guide future analyses of routing dynamics and compositional behavior in large models.

major comments (2)

[Abstract and §3] Abstract and §3 (Decomposition): the claim that control-subspace clusters are 'substantially more monosemantic' than full or content representations requires a quantitative monosemanticity metric, statistical tests, and controls for token selection; none are supplied in the abstract and the procedure details are insufficient to verify whether the orthogonal split truly isolates a causally driving signal from downstream content effects.
[§4 and §5] §4 (Empirical Results) and §5 (Discussion): the conclusion that trajectories are the natural interpretability unit rests on observations from only six architectures and illustrative tokens (e.g., ':'); without a defined generalization test or broader quantitative evaluation across additional models and a larger token set, the monosemantic-path clustering may be sensitive to architecture-specific router properties or post-hoc selection.

minor comments (2)

[§2] Notation for the control/content split (e.g., the exact orthogonality condition and projection) should be stated with an explicit equation early in §2 to aid reproducibility.
[Figures] Figure captions for trajectory visualizations should include the exact token contexts and layer indices shown, rather than relying on the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address the major comments point by point below, proposing specific revisions to strengthen the quantitative aspects of our claims while maintaining the core contributions.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Decomposition): the claim that control-subspace clusters are 'substantially more monosemantic' than full or content representations requires a quantitative monosemanticity metric, statistical tests, and controls for token selection; none are supplied in the abstract and the procedure details are insufficient to verify whether the orthogonal split truly isolates a causally driving signal from downstream content effects.

Authors: We agree that the presentation in the abstract and §3 relies on qualitative evidence from clustering visualizations and specific token examples to support the monosemanticity of control-subspace clusters. The decomposition is parameter-free and orthogonal by construction, and we show that routing decisions depend only on the control component through ablation experiments. However, to provide a more rigorous quantification, we will revise §3 to include a monosemanticity metric defined as the average purity of semantic function clusters (using annotated token roles), along with statistical tests comparing control, content, and full representations. We will also detail the token selection procedure and controls to confirm the causal isolation of the routing signal. revision: yes
Referee: [§4 and §5] §4 (Empirical Results) and §5 (Discussion): the conclusion that trajectories are the natural interpretability unit rests on observations from only six architectures and illustrative tokens (e.g., ':'); without a defined generalization test or broader quantitative evaluation across additional models and a larger token set, the monosemantic-path clustering may be sensitive to architecture-specific router properties or post-hoc selection.

Authors: The results are based on six MoE architectures chosen for their diversity in scale and design, and the token examples are selected to illustrate the key phenomenon of function-specific trajectories. We recognize that this is illustrative rather than exhaustive. In revision, we will add a quantitative evaluation of path monosemanticity across a broader sample of tokens (e.g., 100 common tokens) using a clustering coherence score, and include a discussion of potential architecture sensitivities. While we cannot exhaustively test all possible MoEs, the consistency across the tested models supports the claim, and we will frame the conclusion more cautiously as a hypothesis for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity: empirical decomposition is self-contained

full rationale

The paper introduces a parameter-free orthogonal decomposition of hidden states into control and content channels, then reports empirical observations of monosemantic path clustering across six architectures. No equations, fitted parameters, or derivations are presented that reduce to their own inputs by construction. The central claim that trajectories (not experts) are the natural interpretability unit follows directly from the observed clustering results rather than from any self-definition, self-citation chain, or renamed prior result. The analysis contains no load-bearing self-citations or ansatzes smuggled via prior work by the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; it describes a parameter-free orthogonal decomposition but provides no explicit free parameters, axioms, or invented entities beyond standard MoE concepts.

pith-pipeline@v0.9.0 · 5498 in / 1228 out tokens · 55029 ms · 2026-05-10T04:38:18.630503+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

61 extracted references · 61 canonical work pages

[1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page
[2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page
[3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016
[4]

Jiang, Albert Q and Sablayrolles, Alexandre and Roux, Antoine and Mensch, Arthur and Savary, Blanche and Bamford, Chris and Chaplot, Devendra Singh and Casas, Diego de las and Hanna, Emma Bou and Bressand, Florian and others , journal=

work page
[5]

Xue, Fuzhao and Zheng, Zian and Fu, Yao and Ni, Jinjie and Zheng, Zangwei and Zhou, Wangchunshu and You, Yang , journal=

work page
[6]

Dai, Damai and Deng, Chengqi and Zhao, Chenggang and Xu, RX and Gao, Huazuo and Chen, Deli and Li, Jiashi and Zeng, Wangding and Yu, Xingkai and Wu, Yu and others , journal=

work page
[7]

2025 , eprint=

Steering MoE LLMs via Expert (De)Activation , author=. 2025 , eprint=

work page 2025
[8]

2025 , eprint=

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? , author=. 2025 , eprint=

work page 2025
[9]

2025 , eprint=

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification , author=. 2025 , eprint=

work page 2025
[10]

Komatsuzaki, Aran and Puigcerver, Joan and Lee-Thorp, James and Ruiz, Carlos Riquelme and Mustafa, Basil and Ainslie, Joshua and Tay, Yi and Dehghani, Mostafa and Houlsby, Neil , journal=

work page
[11]

Fan, Dongyang and Messmer, Bettina and Jaggi, Martin , journal=

work page
[12]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page
[13]

Muennighoff, Niklas and Soldaini, Luca and Groeneveld, Dirk and Lo, Kyle and Morrison, Jacob and Min, Sewon and Shi, Weijia and Walsh, Pete and Tafjord, Oyvind and Lambert, Nathan and others , journal=

work page
[14]

Park, Jungwoo and Ahn, Young Jin and Kim, Kee-Eung and Kang, Jaewoo , journal=

work page
[15]

Oldfield, James and Georgopoulos, Markos and Chrysos, Grigorios and Tzelepis, Christos and Panagakis, Yannis and Nicolaou, Mihalis and Deng, Jiankang and Patras, Ioannis , journal=

work page
[16]

International Conference on Learning Representations , year=

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , author=. International Conference on Learning Representations , year=

work page
[17]

Fedus, William and Zoph, Barret and Shazeer, Noam , journal=

work page
[18]

2023 , howpublished =

Language models can explain neurons in language models , author=. 2023 , howpublished =

work page 2023
[19]

2023 , eprint=

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. 2023 , eprint=

work page 2023
[20]

Distill , year =

Olah, Chris and Cammarata, Nick and Schubert, Ludwig and Goh, Gabriel and Petrov, Michael and Carter, Shan , title =. Distill , year =

work page
[21]

2025 , eprint=

Open Problems in Mechanistic Interpretability , author=. 2025 , eprint=

work page 2025
[22]

2025 , eprint=

DeepSeek-V3 Technical Report , author=. 2025 , eprint=

work page 2025
[23]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

work page 2025
[24]

2025 , eprint=

gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=

work page 2025
[25]

2023 , eprint=

Sparse Autoencoders Find Hinghly Interpretable Features in Language Models , author=. 2023 , eprint=

work page 2023
[26]

and Hume, Tristan and Carter, Shan and Henighan, Tom and Olah, Christopher , journal =

Bricken, Trenton and Templeton, Adly and Batson, Joshua and Chen, Brian and Jermyn, Adam and Conerly, Tom and Turner, Nick and Anil, Cem and Denison, Carson and Askell, Amanda and Lasenby, Robert and Wu, Yifan and Kravec, Shauna and Schiefer, Nicholas and Maxwell, Tim and Joseph, Nicholas and Hatfield-Dodds, Zac and Tamkin, Alex and Nguyen, Karina and McL...

work page 2023
[27]

2021 , note =

Elhage, Nelson and Nanda, Neel and Olsson, Catherine and Henighan, Tom and Joseph, Nicholas and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and DasSarma, Nova and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and Hernandez, Danny and Jones, Andy and Kernion, Jackson and Lovitt, Liane and Ndousse, Kamal and Amodei, ...

work page 2021
[28]

2025 , eprint=

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders , author=. 2025 , eprint=

work page 2025
[29]

2024 , eprint=

Applying sparse autoencoders to unlearn knowledge in language models , author=. 2024 , eprint=

work page 2024
[30]

2025 , eprint=

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing , author=. 2025 , eprint=

work page 2025
[31]

2025 , eprint=

Mixture of Experts Made Intrinsically Interpretable , author=. 2025 , eprint=

work page 2025
[32]

2025 , eprint=

Probing Semantic Routing in Large Mixture-of-Expert Models , author=. 2025 , eprint=

work page 2025
[33]

2025 , eprint=

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis , author=. 2025 , eprint=

work page 2025
[34]

2025 , eprint=

A Closer Look into Mixture-of-Experts in Large Language Models , author=. 2025 , eprint=

work page 2025
[35]

Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference , year=

MoE Lens - An Expert Is All You Need , author=. Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference , year=

work page
[36]

2025 , eprint=

Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs , author=. 2025 , eprint=

work page 2025
[37]

Your Mixture-of-Experts

Ziyue Li and Tianyi Zhou , booktitle=. Your Mixture-of-Experts. 2025 , url=

work page 2025
[38]

2024 , eprint=

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free , author=. 2024 , eprint=

work page 2024
[39]

2024 , eprint=

The Remarkable Robustness of LLMs: Stages of Inference? , author=. 2024 , eprint=

work page 2024
[40]

2025 , eprint=

Transformer Layers as Painters , author=. 2025 , eprint=

work page 2025
[41]

2025 , month = apr, howpublished =

work page 2025
[42]

2025 , eprint=

On the Spatial Structure of Mixture-of-Experts in Transformers , author=. 2025 , eprint=

work page 2025
[43]

2024 , eprint=

Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models , author=. 2024 , eprint=

work page 2024
[44]

2025 , month = apr, howpublished =

The Llama 4 herd: The beginning of a new era of natively multimodal. 2025 , month = apr, howpublished =

work page 2025
[45]

2025 , eprint=

Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization , author=. 2025 , eprint=

work page 2025
[46]

2025 , eprint=

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training , author=. 2025 , eprint=

work page 2025
[47]

2022 , eprint=

ST-MoE: Designing Stable and Transferable Sparse Expert Models , author=. 2022 , eprint=

work page 2022
[48]

The Thirteenth International Conference on Learning Representations , year=

Tight Clusters Make Specialized Experts , author=. The Thirteenth International Conference on Learning Representations , year=

work page
[49]

2025 , eprint=

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT) , author=. 2025 , eprint=

work page 2025
[50]

2025 , eprint=

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference , author=. 2025 , eprint=

work page 2025
[51]

2025 , eprint=

Multilingual Routing in Mixture-of-Experts , author=. 2025 , eprint=

work page 2025
[52]

Artificial Analysis: AI Model & API Providers Analysis , howpublished =

work page
[53]

Proceedings of the 57th Conference of the Association for Computational Linguistics,

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie. BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1452

work page doi:10.18653/v1/p19-1452 2019
[54]

Methods for Visual Understanding of Hierarchical System Structures , year=

Sugiyama, Kozo and Tagawa, Shojiro and Toda, Mitsuhiko , journal=. Methods for Visual Understanding of Hierarchical System Structures , year=

work page
[55]

2007 , publisher=

Sugiyama, Kozo and Tagawa, Shojiro and Toda, Mitsuhiko , journal=. 2007 , publisher=

work page 2007
[56]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025
[57]

2025 , eprint=

gpt-oss-120b and gpt-oss-20b Model Card , author=. 2025 , eprint=

work page 2025
[58]

2025 , eprint=

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models , author=. 2025 , eprint=

work page 2025
[59]

2025 , eprint=

OLMoE: Open Mixture-of-Experts Language Models , author=. 2025 , eprint=

work page 2025
[60]

2025 , eprint=

Granite Code Models: A Family of Open Foundation Models for Code Intelligence , author=. 2025 , eprint=

work page 2025
[61]

2024 , eprint=

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model , author=. 2024 , eprint=

work page 2024

[1] [1]

Scaling Learning Algorithms Towards

Bengio, Yoshua and LeCun, Yann , booktitle =. Scaling Learning Algorithms Towards

work page

[2] [2]

and Osindero, Simon and Teh, Yee Whye , journal =

Hinton, Geoffrey E. and Osindero, Simon and Teh, Yee Whye , journal =. A Fast Learning Algorithm for Deep Belief Nets , volume =

work page

[3] [3]

2016 , publisher=

Deep learning , author=. 2016 , publisher=

work page 2016

[4] [4]

Jiang, Albert Q and Sablayrolles, Alexandre and Roux, Antoine and Mensch, Arthur and Savary, Blanche and Bamford, Chris and Chaplot, Devendra Singh and Casas, Diego de las and Hanna, Emma Bou and Bressand, Florian and others , journal=

work page

[5] [5]

Xue, Fuzhao and Zheng, Zian and Fu, Yao and Ni, Jinjie and Zheng, Zangwei and Zhou, Wangchunshu and You, Yang , journal=

work page

[6] [6]

Dai, Damai and Deng, Chengqi and Zhao, Chenggang and Xu, RX and Gao, Huazuo and Chen, Deli and Li, Jiashi and Zeng, Wangding and Yu, Xingkai and Wu, Yu and others , journal=

work page

[7] [7]

2025 , eprint=

Steering MoE LLMs via Expert (De)Activation , author=. 2025 , eprint=

work page 2025

[8] [8]

2025 , eprint=

SEUF: Is Unlearning One Expert Enough for Mixture-of-Experts LLMs? , author=. 2025 , eprint=

work page 2025

[9] [9]

2025 , eprint=

SAFEx: Analyzing Vulnerabilities of MoE-Based LLMs via Stable Safety-critical Expert Identification , author=. 2025 , eprint=

work page 2025

[10] [10]

Komatsuzaki, Aran and Puigcerver, Joan and Lee-Thorp, James and Ruiz, Carlos Riquelme and Mustafa, Basil and Ainslie, Joshua and Tay, Yi and Dehghani, Mostafa and Houlsby, Neil , journal=

work page

[11] [11]

Fan, Dongyang and Messmer, Bettina and Jaggi, Martin , journal=

work page

[12] [12]

OpenAI blog , volume=

Language models are unsupervised multitask learners , author=. OpenAI blog , volume=

work page

[13] [13]

Muennighoff, Niklas and Soldaini, Luca and Groeneveld, Dirk and Lo, Kyle and Morrison, Jacob and Min, Sewon and Shi, Weijia and Walsh, Pete and Tafjord, Oyvind and Lambert, Nathan and others , journal=

work page

[14] [14]

Park, Jungwoo and Ahn, Young Jin and Kim, Kee-Eung and Kang, Jaewoo , journal=

work page

[15] [15]

Oldfield, James and Georgopoulos, Markos and Chrysos, Grigorios and Tzelepis, Christos and Panagakis, Yannis and Nicolaou, Mihalis and Deng, Jiankang and Patras, Ioannis , journal=

work page

[16] [16]

International Conference on Learning Representations , year=

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , author=. International Conference on Learning Representations , year=

work page

[17] [17]

Fedus, William and Zoph, Barret and Shazeer, Noam , journal=

work page

[18] [18]

2023 , howpublished =

Language models can explain neurons in language models , author=. 2023 , howpublished =

work page 2023

[19] [19]

2023 , eprint=

Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer , author=. 2023 , eprint=

work page 2023

[20] [20]

Distill , year =

Olah, Chris and Cammarata, Nick and Schubert, Ludwig and Goh, Gabriel and Petrov, Michael and Carter, Shan , title =. Distill , year =

work page

[21] [21]

2025 , eprint=

Open Problems in Mechanistic Interpretability , author=. 2025 , eprint=

work page 2025

[22] [22]

2025 , eprint=

DeepSeek-V3 Technical Report , author=. 2025 , eprint=

work page 2025

[23] [23]

2025 , eprint=

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities , author=. 2025 , eprint=

work page 2025

[24] [24]

2025 , eprint=

gpt-oss-120b & gpt-oss-20b Model Card , author=. 2025 , eprint=

work page 2025

[25] [25]

2023 , eprint=

Sparse Autoencoders Find Hinghly Interpretable Features in Language Models , author=. 2023 , eprint=

work page 2023

[26] [26]

and Hume, Tristan and Carter, Shan and Henighan, Tom and Olah, Christopher , journal =

Bricken, Trenton and Templeton, Adly and Batson, Joshua and Chen, Brian and Jermyn, Adam and Conerly, Tom and Turner, Nick and Anil, Cem and Denison, Carson and Askell, Amanda and Lasenby, Robert and Wu, Yifan and Kravec, Shauna and Schiefer, Nicholas and Maxwell, Tim and Joseph, Nicholas and Hatfield-Dodds, Zac and Tamkin, Alex and Nguyen, Karina and McL...

work page 2023

[27] [27]

2021 , note =

Elhage, Nelson and Nanda, Neel and Olsson, Catherine and Henighan, Tom and Joseph, Nicholas and Mann, Ben and Askell, Amanda and Bai, Yuntao and Chen, Anna and Conerly, Tom and DasSarma, Nova and Drain, Dawn and Ganguli, Deep and Hatfield-Dodds, Zac and Hernandez, Danny and Jones, Andy and Kernion, Jackson and Lovitt, Liane and Ndousse, Kamal and Amodei, ...

work page 2021

[28] [28]

2025 , eprint=

AxBench: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders , author=. 2025 , eprint=

work page 2025

[29] [29]

2024 , eprint=

Applying sparse autoencoders to unlearn knowledge in language models , author=. 2024 , eprint=

work page 2024

[30] [30]

2025 , eprint=

Are Sparse Autoencoders Useful? A Case Study in Sparse Probing , author=. 2025 , eprint=

work page 2025

[31] [31]

2025 , eprint=

Mixture of Experts Made Intrinsically Interpretable , author=. 2025 , eprint=

work page 2025

[32] [32]

2025 , eprint=

Probing Semantic Routing in Large Mixture-of-Expert Models , author=. 2025 , eprint=

work page 2025

[33] [33]

2025 , eprint=

Decoding Knowledge Attribution in Mixture-of-Experts: A Framework of Basic-Refinement Collaboration and Efficiency Analysis , author=. 2025 , eprint=

work page 2025

[34] [34]

2025 , eprint=

A Closer Look into Mixture-of-Experts in Large Language Models , author=. 2025 , eprint=

work page 2025

[35] [35]

Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference , year=

MoE Lens - An Expert Is All You Need , author=. Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference , year=

work page

[36] [36]

2025 , eprint=

Understanding and Leveraging the Expert Specialization of Context Faithfulness in Mixture-of-Experts LLMs , author=. 2025 , eprint=

work page 2025

[37] [37]

Your Mixture-of-Experts

Ziyue Li and Tianyi Zhou , booktitle=. Your Mixture-of-Experts. 2025 , url=

work page 2025

[38] [38]

2024 , eprint=

Your Mixture-of-Experts LLM Is Secretly an Embedding Model For Free , author=. 2024 , eprint=

work page 2024

[39] [39]

2024 , eprint=

The Remarkable Robustness of LLMs: Stages of Inference? , author=. 2024 , eprint=

work page 2024

[40] [40]

2025 , eprint=

Transformer Layers as Painters , author=. 2025 , eprint=

work page 2025

[41] [41]

2025 , month = apr, howpublished =

work page 2025

[42] [42]

2025 , eprint=

On the Spatial Structure of Mixture-of-Experts in Transformers , author=. 2025 , eprint=

work page 2025

[43] [43]

2024 , eprint=

Part-Of-Speech Sensitivity of Routers in Mixture of Experts Models , author=. 2024 , eprint=

work page 2024

[44] [44]

2025 , month = apr, howpublished =

The Llama 4 herd: The beginning of a new era of natively multimodal. 2025 , month = apr, howpublished =

work page 2025

[45] [45]

2025 , eprint=

Semantic Specialization in MoE Appears with Scale: A Study of DeepSeek R1 Expert Specialization , author=. 2025 , eprint=

work page 2025

[46] [46]

2025 , eprint=

Two Experts Are All You Need for Steering Thinking: Reinforcing Cognitive Effort in MoE Reasoning Models Without Additional Training , author=. 2025 , eprint=

work page 2025

[47] [47]

2022 , eprint=

ST-MoE: Designing Stable and Transferable Sparse Expert Models , author=. 2022 , eprint=

work page 2022

[48] [48]

The Thirteenth International Conference on Learning Representations , year=

Tight Clusters Make Specialized Experts , author=. The Thirteenth International Conference on Learning Representations , year=

work page

[49] [49]

2025 , eprint=

An Expanded Massive Multilingual Dataset for High-Performance Language Technologies (HPLT) , author=. 2025 , eprint=

work page 2025

[50] [50]

2025 , eprint=

Mixture of Cache-Conditional Experts for Efficient Mobile Device Inference , author=. 2025 , eprint=

work page 2025

[51] [51]

2025 , eprint=

Multilingual Routing in Mixture-of-Experts , author=. 2025 , eprint=

work page 2025

[52] [52]

Artificial Analysis: AI Model & API Providers Analysis , howpublished =

work page

[53] [53]

Proceedings of the 57th Conference of the Association for Computational Linguistics,

Tenney, Ian and Das, Dipanjan and Pavlick, Ellie. BERT Rediscovers the Classical NLP Pipeline. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019. doi:10.18653/v1/P19-1452

work page doi:10.18653/v1/p19-1452 2019

[54] [54]

Methods for Visual Understanding of Hierarchical System Structures , year=

Sugiyama, Kozo and Tagawa, Shojiro and Toda, Mitsuhiko , journal=. Methods for Visual Understanding of Hierarchical System Structures , year=

work page

[55] [55]

2007 , publisher=

Sugiyama, Kozo and Tagawa, Shojiro and Toda, Mitsuhiko , journal=. 2007 , publisher=

work page 2007

[56] [56]

2025 , eprint=

Qwen3 Technical Report , author=. 2025 , eprint=

work page 2025

[57] [57]

2025 , eprint=

gpt-oss-120b and gpt-oss-20b Model Card , author=. 2025 , eprint=

work page 2025

[58] [58]

2025 , eprint=

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models , author=. 2025 , eprint=

work page 2025

[59] [59]

2025 , eprint=

OLMoE: Open Mixture-of-Experts Language Models , author=. 2025 , eprint=

work page 2025

[60] [60]

2025 , eprint=

Granite Code Models: A Family of Open Foundation Models for Code Intelligence , author=. 2025 , eprint=

work page 2025

[61] [61]

2024 , eprint=

DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model , author=. 2024 , eprint=

work page 2024