IR3DE: A Linear Router for Large Language Models

Eros Fan\`i; O\u{g}uzhan Ersoy

arxiv: 2606.06098 · v1 · pith:DHLTXHJNnew · submitted 2026-06-04 · 💻 cs.CL · cs.LG

IR3DE: A Linear Router for Large Language Models

Eros Fan\`i , O\u{g}uzhan Ersoy This is my paper

Pith reviewed 2026-06-28 01:16 UTC · model grok-4.3

classification 💻 cs.CL cs.LG

keywords LLM routingdomain expertsridge regressioninference optimizationdynamic expert managementcausal language modeling

0 comments

The pith

IR3DE shows that a linear ridge regression router can select domain-expert LLMs effectively and support dynamic expert sets without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents IR3DE, a router that applies ridge regression to prompt features to decide which domain-expert LLM to use for inference. This linear method performs on par with other routing approaches in causal language modeling tasks and exceeds them in a reasoning setting, reaching a normalized performance of 98.4 percent. Its main practical advantage is that adding or removing experts does not require retraining the router, enabling flexible management of multiple specialized models. Readers would care because the growing number of available LLMs makes efficient and adaptable routing essential for balancing performance and resource use.

Core claim

IR3DE is a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself.

What carries the argument

Ridge regression mapping from prompt features to domain experts

If this is right

Routing is cheap and fast for each prompt
Performance is comparable or better than baselines
New domain experts can be added or removed without retraining

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This routing method could be extended to include cost or latency as additional factors in selection.
The approach might apply to routing among models in other domains like computer vision.
Its linearity allows for easier analysis of which prompt characteristics drive expert selection.

Load-bearing premise

That ridge regression applied to prompt features can reliably identify the appropriate domain expert for a given input.

What would settle it

Measuring the router's accuracy on prompts from domains not seen during training or after introducing new experts without retraining the router.

read the original abstract

Foundational Large Language Models (LLMs) demonstrate proficiency on a wide range of general tasks, and achieve remarkable results on various specialized tasks via domain-expert LLMs. With the ever-growing list of available LLMs, inference routers are being proposed to select the most appropriate LLM for each prompt. However, existing routing methods either optimize cost across weak-to-strong generalist LLMs or require substantial training to support domain-expertise routing. In this paper, we propose IR3DE, a Ridge Regression-based Router for Domain Experts that provides cheap and fast routing decisions for each prompt. We evaluate IR3DE in two Causal Language Modeling (CLM) settings where the tasks are next-token prediction for all domains, and one reasoning setting where each domain has its own distinct reasoning task. Despite being a linear router, IR3DE achieves performance comparable to the other baselines in both CLM settings, and surpassing them in the reasoning setting, with a normalized performance of 98.4%. Moreover, IR3DE enables the addition or removal of new domain experts without requiring the router to be retrained from scratch, allowing a dynamic set of LLMs to be served with minimal disruption to the router itself. Our code is available at: github.com/gensyn-ai/IR3DE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

IR3DE is a lightweight ridge regression router that handles dynamic expert addition without full retraining, but the abstract leaves the input features and update rule unspecified.

read the letter

The core contribution is a ridge regression router for domain-expert LLMs that runs fast and supports adding or removing experts without retraining the whole model from scratch. That dynamic property is the practical angle worth noting, especially for serving setups where the set of available models changes over time.

The paper shows the router matching or beating baselines in two causal language modeling settings and reaching 98.4% normalized performance in a reasoning setting. Code is released, which is helpful for checking the implementation. The method stays linear and avoids heavy training, which keeps overhead low.

The main gaps are in the details. The prompt features fed to the regression are never defined, so it is unclear whether the linearity is real or whether complexity sits in an unstated feature extractor. The exact mechanism for dynamic updates—whether per-expert regressors, closed-form coefficient adjustments, or something else—is also missing from the abstract. The three evaluation settings are narrow; without broader prompt distributions or statistical tests, it is hard to judge how well the mapping generalizes. No error bars or baseline descriptions appear either.

This is aimed at engineers building multi-model serving systems who need low-cost routing with some flexibility. A reader already working on routing or mixture-of-experts inference would get the most out of it. The work is coherent on its own terms and shows clear engineering thinking, so it deserves a serious referee even if the claims need more supporting evidence in revision.

Referee Report

2 major / 1 minor

Summary. The paper proposes IR3DE, a ridge regression-based linear router for selecting domain-expert LLMs given a prompt. It evaluates the router in two causal language modeling (CLM) settings focused on next-token prediction and one reasoning setting with domain-specific tasks, claiming performance comparable to baselines in the CLM cases and superior (98.4% normalized) in reasoning, while also supporting addition or removal of experts without full retraining of the router.

Significance. If the performance and dynamic-expert claims hold after the missing implementation details are supplied, the work would demonstrate that a simple linear model can achieve effective routing across generalist and specialist LLMs with low training cost and support for dynamic expert sets. The public code release is a concrete strength that aids reproducibility.

major comments (2)

[Abstract] Abstract: the performance claims (comparable in CLM, 98.4% normalized in reasoning) are reported without any description of the prompt features supplied to ridge regression, the regression target (one-hot expert ID, loss proxy, etc.), training data composition, baseline implementations, or statistical tests/error bars. This information is load-bearing for assessing whether the linearity is genuine or whether complexity has been moved into an unspecified feature extractor.
[Abstract] Abstract: the claim that experts can be added or removed 'without requiring the router to be retrained from scratch' is presented without an explicit construction (per-expert regressors, closed-form coefficient update, or incremental ridge-regression formula). This mechanism is central to the stated advantage over existing routers.

minor comments (1)

[Abstract] Abstract: the term 'normalized performance' is used without a definition or reference to how the 98.4% figure is computed relative to the baselines.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address each major comment point by point below. Where the comments identify opportunities for improved clarity in the abstract, we have revised the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: the performance claims (comparable in CLM, 98.4% normalized in reasoning) are reported without any description of the prompt features supplied to ridge regression, the regression target (one-hot expert ID, loss proxy, etc.), training data composition, baseline implementations, or statistical tests/error bars. This information is load-bearing for assessing whether the linearity is genuine or whether complexity has been moved into an unspecified feature extractor.

Authors: We agree that the abstract's brevity omitted key high-level details. The full manuscript (Section 3) specifies that prompt features are sentence embeddings from a frozen encoder, the regression target is the one-hot encoding of the expert minimizing next-token perplexity on the prompt, training data consists of balanced domain-specific corpora, and baselines are implemented as described in their original papers. Results in Section 5 include standard deviations over three random seeds. To address the concern directly in the abstract, we have added a single sentence summarizing the feature representation and target in the revised version. revision: yes
Referee: [Abstract] Abstract: the claim that experts can be added or removed 'without requiring the router to be retrained from scratch' is presented without an explicit construction (per-expert regressors, closed-form coefficient update, or incremental ridge-regression formula). This mechanism is central to the stated advantage over existing routers.

Authors: Section 3.3 provides the explicit construction: because the router solves a closed-form ridge regression, adding or removing an expert updates the design matrix X and target vector y by appending or deleting the corresponding columns/rows and recomputes the solution (X^T X + λI)^{-1} X^T y via rank-one updates or a small refactorization, avoiding full retraining. We have inserted a brief parenthetical description of this closed-form update into the abstract in the revision. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard trained linear router evaluated on held-out data.

full rationale

The paper presents IR3DE as ridge regression mapping prompt features to domain experts, with performance measured on separate CLM and reasoning evaluation settings. No equations or claims reduce a reported result to its own training inputs by construction, no self-citation chains support load-bearing premises, and no uniqueness theorems or ansatzes are invoked. The derivation is a conventional supervised model whose outputs are tested externally rather than being tautological with the fit.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach rests on the modeling choice that a linear ridge regression suffices for routing and on standard assumptions of linear models; no invented entities or ad-hoc constants are described in the abstract.

free parameters (1)

Ridge regularization strength
Hyperparameter controlling regularization in ridge regression; value and selection method not specified in abstract.

axioms (1)

domain assumption Ridge regression on prompt features can approximate the mapping from input to best domain expert.
Core assumption enabling the use of a linear model instead of a more expressive router.

pith-pipeline@v0.9.1-grok · 5758 in / 1251 out tokens · 47741 ms · 2026-06-28T01:16:44.347153+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

23 extracted references · 19 canonical work pages · 10 internal anchors

[1]

Towards model agnostic federated learning using knowledge distillation.arXiv preprint arXiv:2110.15210,

Andrei Afonin and Sai Praneeth Karimireddy. Towards model agnostic federated learning using knowledge distillation.arXiv preprint arXiv:2110.15210,

work page arXiv
[2]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills

Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, and Mohit Bansal. Symbolic mixture-of- experts: Adaptive skill-based routing for heterogeneous reasoning.CoRR, abs/2503.05641,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Frugalgpt: How to use large language models while reducing cost and improving performance.Trans

Lingjiao Chen, Matei Zaharia, and James Zou. Frugalgpt: How to use large language models while reducing cost and improving performance.Trans. Mach. Learn. Res., 2024,

2024
[5]

Evaluating Large Language Models Trained on Code

Mark Chen. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168,

work page internal anchor Pith review Pith/arXiv arXiv
[7]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186,

2019
[8]

HDEE: Heterogeneous domain expert ensemble

Oguzhan Ersoy, Jari Kolehmainen, and Gabriel Passamani Andrade. HDEE: Heterogeneous domain expert ensemble. InICLR 2025 Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning,

2025
[9]

Eros Fanì and Oğuzhan Ersoy

URLhttps://openreview.net/forum?id=5ukL6nPcYe. Eros Fanì and Oğuzhan Ersoy. Training-free dynamic upcycling of expert language models.CoRR, abs/2603.29765,

work page arXiv
[10]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv
[11]

Prasanna

Nikunj Gupta, Bill Guo, Rajgopal Kannan, and Viktor K. Prasanna. Hierrouter: Coordinated routing of specialized large language models via reinforcement learning.CoRR, abs/2511.09873,

work page arXiv
[12]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Pengcheng He, Jianfeng Gao, and Weizhu Chen. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.arXiv preprint arXiv:2111.09543,

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, and Shriyash Kaustubh Upadhyay

URLhttps://arxiv.org/abs/2207.00220. Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, and Shriyash Kaustubh Upadhyay. Routerbench: A benchmark for multi-llm routing system.arXiv preprint arXiv:2403.12031,

work page arXiv
[14]

arXiv preprint arXiv:2502.08773 , year=

Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, and Sanjiv Kumar. Universal model routing for efficient LLM inference.CoRR, abs/2502.08773,

work page arXiv
[15]

Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback

Viet Lai, Chien Nguyen, Nghia Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan Rossi, and Thien Nguyen. Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 318–327,

2023
[16]

Task-Aware LLM Routing with Multi-Level Task-Profile-Guided Data Synthesis for Cold-Start Scenarios

Hui Liu, Bin Zou, Kecheng Chen, Jie Liu, Wenya Wang, and Haoliang Li. Task-aware llm routing with multi-level task-profile-guided data synthesis for cold-start scenarios.arXiv preprint arXiv:2604.09377,

work page internal anchor Pith review Pith/arXiv arXiv
[17]

M2d2: A massively multi-domain language modeling dataset.arXiv preprint arXiv:2210.07370,

Machel Reid, Victor Zhong, Suchin Gururangan, and Luke Zettlemoyer. M2d2: A massively multi-domain language modeling dataset.arXiv preprint arXiv:2210.07370,

work page arXiv
[18]

arXiv preprint arXiv:2506.01048 , year=

ODC-By,https://github.com/allenai/pes2o. Wei Song, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, Guanhao Zhao, Fei Wang, and Runze Wu. Irt-router: Effective and interpretable multi-llm routing via item response theory.CoRR, abs/2506.01048,

work page arXiv
[19]

arXiv preprint arXiv:2408.12320 , year=

Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, and Chaoyang He. Polyrouter: A multi-llm querying system.CoRR, abs/2408.12320,

work page arXiv
[20]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

R2-Router: A New Paradigm for LLM Routing with Reasoning

Jiaqi Xue, Qian Lou, Jiarong Xing, and Heng Huang. R2-router: A new paradigm for LLM routing with reasoning. CoRR, abs/2602.02823,

work page internal anchor Pith review Pith/arXiv arXiv
[22]

Disrouter: Distributed self-routing for LLM selections.CoRR, abs/2510.19208,

10 IR3DE: A Linear Router for Large Language Models Hang Zheng, Hongshen Xu, Yongkai Lin, Shuai Fan, Lu Chen, and Kai Yu. Disrouter: Distributed self-routing for LLM selections.CoRR, abs/2510.19208,

work page arXiv
[23]

Instruction-Following Evaluation for Large Language Models

Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models.arXiv preprint arXiv:2311.07911,

work page internal anchor Pith review Pith/arXiv arXiv

[1] [1]

Towards model agnostic federated learning using knowledge distillation.arXiv preprint arXiv:2110.15210,

Andrei Afonin and Sai Praneeth Karimireddy. Towards model agnostic federated learning using knowledge distillation.arXiv preprint arXiv:2110.15210,

work page arXiv

[2] [2]

Qwen Technical Report

Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, Binyuan Hui, Luo Ji, Mei Li, Junyang Lin, Runji Lin, Dayiheng Liu, Gao Liu, Chengqiang Lu, Keming Lu, Jianxin Ma, Rui Men, Xingzhang Ren, Xuancheng Ren, Chuanqi Tan, Sinan Tan, Jianhong Tu, Peng Wang, Shijie Wang, Wei Wang, Shengguang Wu, Benfeng X...

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Skill-Based Mixture-of-Experts: Adaptive Routing for Heterogeneous Reasoning via Inferred Skills

Justin Chih-Yao Chen, Sukwon Yun, Elias Stengel-Eskin, Tianlong Chen, and Mohit Bansal. Symbolic mixture-of- experts: Adaptive skill-based routing for heterogeneous reasoning.CoRR, abs/2503.05641,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Frugalgpt: How to use large language models while reducing cost and improving performance.Trans

Lingjiao Chen, Matei Zaharia, and James Zou. Frugalgpt: How to use large language models while reducing cost and improving performance.Trans. Mach. Learn. Res., 2024,

2024

[5] [5]

Evaluating Large Language Models Trained on Code

Mark Chen. Evaluating large language models trained on code.arXiv preprint arXiv:2107.03374,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Training Verifiers to Solve Math Word Problems

Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, et al. Training verifiers to solve math word problems.arXiv preprint arXiv:2110.14168,

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186,

2019

[8] [8]

HDEE: Heterogeneous domain expert ensemble

Oguzhan Ersoy, Jari Kolehmainen, and Gabriel Passamani Andrade. HDEE: Heterogeneous domain expert ensemble. InICLR 2025 Workshop on Modularity for Collaborative, Decentralized, and Continual Deep Learning,

2025

[9] [9]

Eros Fanì and Oğuzhan Ersoy

URLhttps://openreview.net/forum?id=5ukL6nPcYe. Eros Fanì and Oğuzhan Ersoy. Training-free dynamic upcycling of expert language models.CoRR, abs/2603.29765,

work page arXiv

[10] [10]

The Llama 3 Herd of Models

Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

work page internal anchor Pith review Pith/arXiv arXiv

[11] [11]

Prasanna

Nikunj Gupta, Bill Guo, Rajgopal Kannan, and Viktor K. Prasanna. Hierrouter: Coordinated routing of specialized large language models via reinforcement learning.CoRR, abs/2511.09873,

work page arXiv

[12] [12]

DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing

Pengcheng He, Jianfeng Gao, and Weizhu Chen. Debertav3: Improving deberta using electra-style pre-training with gradient-disentangled embedding sharing.arXiv preprint arXiv:2111.09543,

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, and Shriyash Kaustubh Upadhyay

URLhttps://arxiv.org/abs/2207.00220. Qitian Jason Hu, Jacob Bieker, Xiuyu Li, Nan Jiang, Benjamin Keigwin, Gaurav Ranganath, Kurt Keutzer, and Shriyash Kaustubh Upadhyay. Routerbench: A benchmark for multi-llm routing system.arXiv preprint arXiv:2403.12031,

work page arXiv

[14] [14]

arXiv preprint arXiv:2502.08773 , year=

Wittawat Jitkrittum, Harikrishna Narasimhan, Ankit Singh Rawat, Jeevesh Juneja, Zifeng Wang, Chen-Yu Lee, Pradeep Shenoy, Rina Panigrahy, Aditya Krishna Menon, and Sanjiv Kumar. Universal model routing for efficient LLM inference.CoRR, abs/2502.08773,

work page arXiv

[15] [15]

Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback

Viet Lai, Chien Nguyen, Nghia Ngo, Thuat Nguyen, Franck Dernoncourt, Ryan Rossi, and Thien Nguyen. Okapi: Instruction-tuned large language models in multiple languages with reinforcement learning from human feedback. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 318–327,

2023

[16] [16]

Task-Aware LLM Routing with Multi-Level Task-Profile-Guided Data Synthesis for Cold-Start Scenarios

Hui Liu, Bin Zou, Kecheng Chen, Jie Liu, Wenya Wang, and Haoliang Li. Task-aware llm routing with multi-level task-profile-guided data synthesis for cold-start scenarios.arXiv preprint arXiv:2604.09377,

work page internal anchor Pith review Pith/arXiv arXiv

[17] [17]

M2d2: A massively multi-domain language modeling dataset.arXiv preprint arXiv:2210.07370,

Machel Reid, Victor Zhong, Suchin Gururangan, and Luke Zettlemoyer. M2d2: A massively multi-domain language modeling dataset.arXiv preprint arXiv:2210.07370,

work page arXiv

[18] [18]

arXiv preprint arXiv:2506.01048 , year=

ODC-By,https://github.com/allenai/pes2o. Wei Song, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, Guanhao Zhao, Fei Wang, and Runze Wu. Irt-router: Effective and interpretable multi-llm routing via item response theory.CoRR, abs/2506.01048,

work page arXiv

[19] [19]

arXiv preprint arXiv:2408.12320 , year=

Dimitris Stripelis, Zijian Hu, Jipeng Zhang, Zhaozhuo Xu, Alay Dilipbhai Shah, Han Jin, Yuhang Yao, Salman Avestimehr, and Chaoyang He. Polyrouter: A multi-llm querying system.CoRR, abs/2408.12320,

work page arXiv

[20] [20]

LLaMA: Open and Efficient Foundation Language Models

Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971,

work page internal anchor Pith review Pith/arXiv arXiv

[21] [21]

R2-Router: A New Paradigm for LLM Routing with Reasoning

Jiaqi Xue, Qian Lou, Jiarong Xing, and Heng Huang. R2-router: A new paradigm for LLM routing with reasoning. CoRR, abs/2602.02823,

work page internal anchor Pith review Pith/arXiv arXiv

[22] [22]

Disrouter: Distributed self-routing for LLM selections.CoRR, abs/2510.19208,

10 IR3DE: A Linear Router for Large Language Models Hang Zheng, Hongshen Xu, Yongkai Lin, Shuai Fan, Lu Chen, and Kai Yu. Disrouter: Distributed self-routing for LLM selections.CoRR, abs/2510.19208,

work page arXiv

[23] [23]

Instruction-Following Evaluation for Large Language Models

Jeffrey Zhou, Tianjian Lu, Swaroop Mishra, Siddhartha Brahma, Sujoy Basu, Yi Luan, Denny Zhou, and Le Hou. Instruction-following evaluation for large language models.arXiv preprint arXiv:2311.07911,

work page internal anchor Pith review Pith/arXiv arXiv