pith. sign in

arxiv: 2606.22798 · v1 · pith:IRWAUJQOnew · submitted 2026-06-22 · 💻 cs.CL

Does the Same Token Mean the Same State? MoE Routing as Signal for Reasoning Control

Pith reviewed 2026-06-26 08:57 UTC · model grok-4.3

classification 💻 cs.CL
keywords mixture of expertsrouting statesreasoning controlanswer selectiontest-time decodingboundary anchorsdelimiter anchorsweighted jaccard
0
0 comments X

The pith

The same token in MoE models activates different experts based on context, so routing states at anchors can select correct reasoning paths without reading the answers.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Sparse MoE language models show that fixing the token id at anchors does not fix the router state. The experts still encode task context and reasoning mode. This residual structure lets routing neighborhoods at boundary and delimiter anchors align with final-answer basins. RAD operationalizes this by selecting the rollout whose anchor-window routing is the densest center in Weighted-Jaccard K-NN space. It matches majority voting performance while working on tasks where answer strings cannot be voted on.

Core claim

Holding the emitted token id fixed at repeated anchors, the experts that produce it still separate task context, trajectory history, and reasoning-effort mode. Near boundary anchors and delimiter anchors, routing neighborhoods already align with final-answer basins at a marker-only readout, strongest when read at the answer opening.

What carries the argument

RAD (Routing Agreement Decoding): locate a fixed anchor, represent each rollout by anchor-window MoE routing states, return the densest Weighted-Jaccard K-NN route-basin center.

If this is right

  • RAD performs on par with majority voting (73.9 vs 73.6) across 10 MoE configs and 6 datasets without using answer strings.
  • It provides a direct pass@1 selector for code generation where exact-string voting is ill-defined.
  • Re-anchoring the routing-density principle to the agentic boundary improves best-of-16 patch selection on SWE-bench Verified over random.
  • RAD is not a verifier and can still select a dense wrong basin.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Routing states might allow internal monitoring of reasoning effort without external tools.
  • The approach could extend to selecting among multiple agent trajectories or plans in multi-step tasks.
  • Testing RAD on non-MoE models or other routing mechanisms would check if the signal is specific to sparse experts.

Load-bearing premise

MoE routing states observed at a fixed anchor window are stable and task-discriminative enough to identify the correct answer basin without any access to the generated token sequence or external verification.

What would settle it

Finding a set of rollouts where the routing-based selector picks the wrong basin more frequently than string majority voting across the tested datasets would show the alignment does not hold.

read the original abstract

In sparse Mixture-of-Experts language models, does the same token id imply the same router state and the same experts producing it? Holding the emitted token id fixed at repeated anchors, we find it does not: the experts that produce it still separate task context, trajectory history, and reasoning-effort mode. This residual structure supports test-time control: near \emph{boundary} anchors (the final-response transition) and \emph{delimiter} anchors (which open the answer, e.g.\ \texttt{\textbackslash boxed\{} or code fences), routing neighborhoods already align with final-answer basins at a marker-only readout and strongest when the routing is read at the answer opening. We operationalize this as \textbf{RAD} (Routing Agreement Decoding), an answer-string-free multi-rollout selector: it locates a fixed anchor, represents each rollout by its anchor-window MoE routing states, and returns the densest Weighted-Jaccard $K$-NN route-basin center, without parsing, normalizing, executing, or voting over answer strings. Across 10 sparse-MoE configurations (gpt-oss, Qwen3-MoE) and 6 datasets spanning math, GPQA, and code, RAD is on par with Majority where string voting is well-posed, with small positive paired deltas (RAD $73.9$ / RAD+DC $74.2$ vs.\ Majority $73.6$). Like majority voting, RAD is not a verifier: a dense \emph{wrong} basin can still win. Its value is the interface: the same selector gives direct pass@1 on code, where exact-string voting is ill-defined, and the same routing-density principle, re-anchored to the agentic boundary, improves best-of-16 patch selection on SWE-bench Verified over random, where patches have no answer string to vote on.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that in sparse MoE language models the same token ID does not imply the same router state: routing at repeated boundary and delimiter anchors separates task context, trajectory history, and reasoning-effort mode. It introduces RAD (Routing Agreement Decoding), a string-free multi-rollout selector that represents each rollout by its anchor-window routing states and returns the densest Weighted-Jaccard K-NN route-basin center; across 10 MoE configurations and 6 datasets RAD reports 73.9 (RAD+DC 74.2) versus Majority 73.6 and extends the same principle to code and SWE-bench patch selection.

Significance. If the routing states at fixed anchors prove stable and sufficiently discriminative, RAD supplies a practical interface for test-time control that does not require answer-string parsing or voting, which is directly useful for code generation and agentic settings where exact-string majority is ill-defined. The multi-configuration, multi-dataset evaluation is a concrete strength.

major comments (2)
  1. [Abstract] Abstract: the aggregate claim of small positive deltas (RAD 73.9 / RAD+DC 74.2 vs. Majority 73.6) across 10 configurations and 6 datasets supplies no variance, statistical tests, data-split details, or confirmation that anchor selection was pre-specified rather than post-hoc; this information is load-bearing for the assertion that routing neighborhoods already align with correct final-answer basins at a marker-only readout.
  2. [RAD definition and experimental section] RAD definition and experimental section: the selector is defined directly from the observed routing vectors and their Weighted-Jaccard density; the manuscript does not report independent metrics of cluster stability across rollouts or correlation between routing basins and correctness independent of the final answer string, leaving the central assumption that fixed-anchor states are task-discriminative without token access unverified.
minor comments (1)
  1. [Method] Notation for the Weighted-Jaccard K-NN distance and the precise window size around boundary/delimiter anchors should be stated explicitly in the method section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive feedback. We address the two major comments point by point below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the aggregate claim of small positive deltas (RAD 73.9 / RAD+DC 74.2 vs. Majority 73.6) across 10 configurations and 6 datasets supplies no variance, statistical tests, data-split details, or confirmation that anchor selection was pre-specified rather than post-hoc; this information is load-bearing for the assertion that routing neighborhoods already align with correct final-answer basins at a marker-only readout.

    Authors: The experimental section of the manuscript already reports per-configuration and per-dataset breakdowns together with the exact data splits and model configurations used. Anchor selection (boundary and delimiter tokens) was fixed in advance on the basis of earlier pilot observations of MoE routing behavior and was not tuned on the final test sets. Nevertheless, the abstract itself presents only aggregate figures. In the revision we will (i) add a parenthetical note on variance and the paired statistical tests that were performed, (ii) explicitly state that anchor positions were pre-specified, and (iii) reference the supplementary tables that contain the full per-run statistics. revision: partial

  2. Referee: [RAD definition and experimental section] RAD definition and experimental section: the selector is defined directly from the observed routing vectors and their Weighted-Jaccard density; the manuscript does not report independent metrics of cluster stability across rollouts or correlation between routing basins and correctness independent of the final answer string, leaving the central assumption that fixed-anchor states are task-discriminative without token access unverified.

    Authors: The primary evidence offered is that RAD, which uses only routing states at fixed anchors, matches or slightly exceeds string-based majority voting across ten model configurations and six tasks. This performance parity supplies indirect support for the claim that routing neighborhoods align with answer correctness. We agree, however, that direct, answer-string-independent diagnostics would strengthen the argument. In the revised experimental section we will therefore add (a) intra- and inter-cluster similarity statistics on the routing vectors themselves and (b) a correlation analysis between basin density and correctness computed after removing any reference to the generated answer strings. revision: yes

Circularity Check

0 steps flagged

No significant circularity; RAD is an empirical definition from observed routing vectors

full rationale

The paper reports an empirical observation that routing states at fixed anchors separate task context/trajectory/mode, then directly defines RAD as the densest Weighted-Jaccard K-NN center in that routing space. No equations, fitted parameters, or self-citations reduce the selector to its own inputs by construction. The method is presented as an operationalization of the observed alignment, with performance compared to majority voting on external datasets. This matches the default expectation of a self-contained empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the method rests on the empirical observation that routing states separate context.

pith-pipeline@v0.9.1-grok · 5891 in / 1145 out tokens · 19832 ms · 2026-06-26T08:57:17.707972+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 3 canonical work pages

  1. [1]

    Modelutilitylaw: Evaluating LLMs beyond performance through mechanism interpretable metric

    YixinCao,JiahaoYing,YaoningWang,XipengQiu,XuanjingHuang,andYugangJiang. Modelutilitylaw: Evaluating LLMs beyond performance through mechanism interpretable metric. arXiv preprint arXiv:2504.07440, 2025. URL https://arxiv.org/abs/2504.07440

  2. [2]

    Do LLMs signal when they’re right? evidence from neuron agreement, 2025

    Kang Chen, Yaoning Wang, Kai Xiong, Zhuoka Feng, Wenhe Sun, Haotian Chen, and Yixin Cao. Do LLMs signal when they’re right? evidence from neuron agreement, 2025. URLhttps://arxiv.org/abs/2510.26277

  3. [3]

    NEX: Neuron explore-exploit scoring for label-free chain-of-thought selection and model ranking

    Kang Chen, Zhuoka Feng, Sihan Zhao, Kai Xiong, Junjie Nian, Yaoning Wang, Changyi Xiao, and Yixin Cao. NEX: Neuron explore-exploit scoring for label-free chain-of-thought selection and model ranking. arXiv preprint arXiv:2602.05805, 2026. URLhttps://arxiv.org/abs/2602.05805

  4. [4]

    DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models

    Damai Dai, Chengqi Deng, Chenggang Zhao, R.x. Xu, Huazuo Gao, Deli Chen, Jiashi Li, Wangding Zeng, Xingkai Yu, Y. Wu, Zhenda Xie, Y.k. Li, Panpan Huang, Fuli Luo, Chong Ruan, Zhifang Sui, and Wenfeng Liang. DeepSeekMoE: Towards ultimate expert specialization in mixture-of-experts language models. InProceedings of the 62nd Annual Meeting of the Association...

  5. [5]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022

    William Fedus, Barret Zoph, and Noam Shazeer. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity.Journal of Machine Learning Research, 23(120):1–39, 2022. URLhttps://jmlr.org/ papers/v23/21-0998.html

  6. [6]

    Deep think with confidence

    Yichao Fu, Xuewei Wang, Hao Zhang, Yuandong Tian, and Jiawei Zhao. Deep think with confidence. InInternational Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=8LqHs0KIM7

  7. [7]

    Layer-wise MoE routing locality under shared-prefix code generation: Token-identity decomposition and compile-equivalent fork redundancy

    Shun-ichiro Hayashi, Daichi Mukunoki, Tetsuya Hoshino, and Takahiro Katagiri. Layer-wise MoE routing locality under shared-prefix code generation: Token-identity decomposition and compile-equivalent fork redundancy. arXiv preprint arXiv:2604.17182, 2026. URLhttps://arxiv.org/abs/2604.17182

  8. [8]

    Slim-SC: Thought pruning for efficient scaling with self-consistency

    Colin Hong, Xu Guo, Anand Chaanan Singh, Esha Choukse, and Dmitrii Ustiugov. Slim-SC: Thought pruning for efficient scaling with self-consistency. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 34500–34517, Suzhou, China, 2025. Association for Computational Linguistics. doi: 10.18653/v1/2025.emnlp-main.1750...

  9. [9]

    Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, De- vendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Tev...

  10. [10]

    Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan

    Carlos E. Jimenez, John Yang, Alexander Wettig, Shunyu Yao, Kexin Pei, Ofir Press, and Karthik Narasimhan. SWE-bench: Can language models resolve real-world GitHub issues? InInternational Conference on Learning Representations (ICLR), 2024. URLhttps://arxiv.org/abs/2310.06770

  11. [11]

    The path of least resistance: Guiding LLM reasoning trajectories with prefix consensus

    Ishan Jindal, Sai Prashanth Akuthota, Jayant Taneja, and Sachin Dev Sharma. The path of least resistance: Guiding LLM reasoning trajectories with prefix consensus. InInternational Conference on Learning Representations, 2026. URL https://openreview.net/forum?id=hrnSqERgPn

  12. [12]

    OpenAI Harmony Response Format, August 5 2025

    Dominik Kundel. OpenAI Harmony Response Format, August 5 2025. URLhttps://developers.openai.com/ cookbook/articles/openai-harmony. OpenAI Cookbook. Accessed 2026-05-07

  13. [13]

    GShard: Scaling giant models with conditional computation and automatic sharding

    DmitryLepikhin,HyoukJoongLee,YuanzhongXu,DehaoChen,OrhanFirat,YanpingHuang,MaximKrikun,Noam 12 Shazeer, and Zhifeng Chen. GShard: Scaling giant models with conditional computation and automatic sharding. In International Conference on Learning Representations, 2021. URLhttps://arxiv.org/abs/2006.16668

  14. [14]

    Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning

    Yiwei Li, Peiwen Yuan, Shaoxiong Feng, Boyuan Pan, Xinglin Wang, Bin Sun, Heda Wang, and Kan Li. Escape sky-high cost: Early-stopping self-consistency for multi-step reasoning. InInternational Conference on Learning Representations, 2024. URLhttps://openreview.net/forum?id=ndR8Ytrzhh

  15. [15]

    Inharmonywithgpt-oss

    BorislavMavrin. Inharmonywithgpt-oss. arXivpreprintarXiv:2604.00362,2026. URL https://arxiv.org/abs/ 2604.00362

  16. [16]

    Introducing SWE-bench verified

    OpenAI. Introducing SWE-bench verified. OpenAI blog, 2024. URL https://openai.com/index/ introducing-swe-bench-verified/

  17. [17]

    gpt-oss-120b & gpt-oss-20b Model Card, August 5 2025

    OpenAI. gpt-oss-120b & gpt-oss-20b Model Card, August 5 2025. URL https://cdn.openai.com/pdf/ 419b6906-9da6-406c-a19d-1bb078ac7637/oai_gpt-oss_model_card.pdf. Accessed 2026-05-07

  18. [18]

    Outrageously large neural networks: The sparsely-gated mixture-of-experts layer

    Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=B1ckMDqlg

  19. [19]

    Reasoning aware self-consistency: Leveraging reasoning paths for efficient LLM sampling

    Guangya Wan, Yuqi Wu, Jie Chen, and Sheng Li. Reasoning aware self-consistency: Leveraging reasoning paths for efficient LLM sampling. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pages 3613–3635, Albuquerque, New Mexico, 2...

  20. [20]

    The myth of expert specialization in MoEs: Why routing reflects geometry, not necessarily domain expertise

    Xi Wang, Soufiane Hayou, and Eric Nalisnick. The myth of expert specialization in MoEs: Why routing reflects geometry, not necessarily domain expertise. arXiv preprint arXiv:2604.09780, 2026. URLhttps://arxiv.org/ abs/2604.09780

  21. [21]

    Self-consistency improves chain of thought reasoning in language models

    Xuezhi Wang, Jason Wei, Dale Schuurmans, Quoc Le, Ed Chi, Sharan Narang, Aakanksha Chowdhery, and Denny Zhou. Self-consistency improves chain of thought reasoning in language models. InInternational Conference on Learning Representations (ICLR), 2023. URLhttps://arxiv.org/abs/2203.11171

  22. [22]

    Le, and Denny Zhou

    Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc V. Le, and Denny Zhou. Chain-of-thought prompting elicits reasoning in large language models.Advances in Neural Information Processing Systems, 35, 2022. URLhttps://arxiv.org/abs/2201.11903

  23. [23]

    OpenMoE: An early effort on open mixture-of-experts language models

    Fuzhao Xue, Zian Zheng, Yao Fu, Jinjie Ni, Zangwei Zheng, Wangchunshu Zhou, and Yang You. OpenMoE: An early effort on open mixture-of-experts language models. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Lear...

  24. [24]

    URLhttps://proceedings.mlr.press/v235/xue24c.html

  25. [25]

    Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press

    John Yang, Carlos E. Jimenez, Alexander Wettig, Kilian Lieret, Shunyu Yao, Karthik Narasimhan, and Ofir Press. SWE-agent: Agent-computer interfaces enable automated software engineering. InAdvances in Neural Information Processing Systems (NeurIPS), 2024. URLhttps://arxiv.org/abs/2405.15793

  26. [26]

    Beyond benchmarks: Understanding mixture-of-experts models through internal mechanisms

    Jiahao Ying, Mingbao Lin, Qianru Sun, and Yixin Cao. Beyond benchmarks: Understanding mixture-of-experts models through internal mechanisms. arXiv preprint arXiv:2509.23933, 2025. URLhttps://arxiv.org/abs/ 2509.23933

  27. [27]

    anchor”):so in blue,.\n\n in red. The horizontal orange line (printed in-panel with the earlier label “post-\boxed

    Anqi Zhang, Yulin Chen, Jane Pan, Chen Zhao, Aurojit Panda, Jinyang Li, and He He. Reasoning models know when they’re right: Probing hidden states for self-verification. arXiv preprint arXiv:2504.05419, 2025. URL https://arxiv.org/abs/2504.05419. A Technical appendices and supplementary material This appendix contains the full answer-string-free protocol ...