pith. sign in

arxiv: 2607.00053 · v1 · pith:OA5S4NELnew · submitted 2026-06-30 · 💻 cs.SE · cs.AI

SWE-Router: Routing in Multi-turn Agentic Software Engineering Tasks

Pith reviewed 2026-07-02 18:17 UTC · model grok-4.3

classification 💻 cs.SE cs.AI
keywords LLM routingmulti-turn agentssoftware engineering taskscost efficiencyBayes optimalitytrajectory-based routingvalue-based decisionagentic harnesses
0
0 comments X

The pith

SWE-Router conditions routing decisions on a partial trajectory from cheap-model exploration to achieve Bayes-optimal outcomes in multi-turn agentic software engineering tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing routers decide based only on the initial task description, which cannot separate easy fixes from complex refactors that look similar on the surface. SWE-Router lets a cheap model run a few exploratory turns, then reads the resulting partial trajectory to decide whether to keep going cheaply or escalate to an expensive model. The paper proves that conditioning on this trajectory is Bayes-optimal: it never performs worse than routing from the prompt alone and improves whenever the exploration reveals useful information. Experiments across pairs of weak and strong models show large gains in cost efficiency while preserving most of the stronger model's success rate on SWE tasks.

Core claim

SWE-Router is a value-based temporal routing method that first lets a cheap model generate a partial trajectory through exploratory turns, then uses that trajectory to decide continuation or escalation. A Bayes-optimality theorem shows that conditioning the routing decision on the partial trajectory never harms performance and is strictly better when exploration is informative. Across contemporary weak-strong LLM pairs this yields substantially better cost efficiency on SWE tasks while retaining the majority of the stronger model's performance.

What carries the argument

The partial trajectory produced by a small number of exploratory turns with the cheap model, which supplies the value estimate used to make the routing decision.

If this is right

  • Routing decisions become informed by actual interaction history rather than the prompt alone.
  • Cost efficiency improves across model pairs while most performance of the stronger model is retained.
  • The Bayes-optimality result guarantees that adding trajectory information cannot degrade routing quality.
  • A released multi-LLM trajectory dataset enables further study of trajectory-level routing methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same trajectory-conditioning logic could apply to other multi-turn agentic domains where difficulty only becomes visible through interaction.
  • Organizations could dynamically allocate model resources to handle more tasks under fixed compute budgets.
  • Similar partial-trajectory value estimates might reduce waste in any sequential LLM usage pattern that involves repeated calls.

Load-bearing premise

A small number of exploratory turns with the cheap model produces a partial trajectory whose value estimate is sufficiently accurate and low-cost to justify the routing decision.

What would settle it

A controlled experiment in which routing decisions made after the exploratory turns produce either higher total cost or lower task success rate than a task-description-only router or always using the strong model.

Figures

Figures reproduced from arXiv: 2607.00053 by Ilija Bogunovic, Jiahua Tang, Lorenz Wolf, Sangwoong Yoon, Seongho Son, Shuhan Wang.

Figure 1
Figure 1. Figure 1: SWE-Router overview. A weak (cheap) model m1 runs for several exploratory turns; a learned value head reads the resulting partial trajectory and predicts whether m1 will eventually solve the task. If the predicted probability exceeds a cost-adjusted threshold, m1 continues; otherwise the strong (expensive) model m2 is invoked from the original prompt q. Conditioning on the partial trajectory — rather than … view at source ↗
Figure 2
Figure 2. Figure 2: Cost-vs.-resolved routing curves on the held-out SWE-Bench Verified for the two (weak, strong) pairs. Each colored curve is the threshold sweep of the corresponding router; the gray band is the random-assignment reference, computed as the per-cost-level mean (±1σ) over 400 Monte-Carlo permutations of the per-instance routing decision; × and ⋆ mark the all-weak and all-strong endpoints. The curves of SWE-Ro… view at source ↗
read the original abstract

Large language models (LLMs) embedded in multi-turn agentic harnesses are reshaping software engineering (SWE), but routing every task to a frontier model is wasteful when many issues admit cheap fixes. Existing LLM routers operate on the task description alone, which inherits an information-theoretic Bayes-error floor in agentic settings: a similar issue can hide either a localized typo or a multi-module refactor, and the prompt does not separate the two. We introduce SWE-Router, a value-based temporal approach that lets a cheap model run for a few exploratory turns and reads the resulting partial trajectory before deciding whether to continue cheaply or to escalate to an expensive model. We provide a Bayes-optimality theorem showing that conditioning on the partial trajectory never harms routing and is strictly better whenever exploration is informative. Across the LLM pairs of weak and strong models spanning the contemporary cost--capability frontier, we show that SWE-Router greatly improves the cost efficiency of SWE tasks, while maintaining the majority of the performances of the stronger model. We additionally release a multi-LLM trajectory dataset which allows reproduction of our trajectory-level routing.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces SWE-Router, a value-based temporal routing approach for multi-turn agentic software engineering tasks. A cheap model performs a small number of exploratory turns to produce a partial trajectory; the router then decides whether to continue with the cheap model or escalate to a stronger model. The central theoretical claim is a Bayes-optimality theorem asserting that conditioning on the partial trajectory never harms routing performance and is strictly better whenever the trajectory is informative. Empirically, the method is reported to improve cost efficiency across pairs of weak and strong LLMs while preserving most of the stronger model's task performance; a multi-LLM trajectory dataset is also released.

Significance. If the theorem is correctly derived and the empirical gains survive scrutiny of value-estimate accuracy, the work would offer a principled, trajectory-aware alternative to task-only routing and could materially reduce the cost of deploying agentic LLM systems on SWE benchmarks. The dataset release supports reproducibility and follow-on research.

major comments (3)
  1. [Abstract / Theorem] Abstract and theorem section: the Bayes-optimality claim is load-bearing yet the manuscript supplies no derivation, proof sketch, or explicit statement of the posterior-value estimator. Without these details it is impossible to verify that the practical routing rule inherits the stated optimality guarantee once the cost of the exploratory turns is subtracted.
  2. [Method / Routing rule] Value-estimate accuracy (weakest assumption identified in the stress-test note): the routing policy presupposes that a small fixed number of cheap-model turns yields a sufficiently accurate and low-variance estimate of continuation value. The paper provides no error analysis, variance bounds, or ablation on the number of exploratory turns, leaving open the possibility that noisy estimates produce net-negative routing decisions even when the theoretical statement remains formally true.
  3. [Experiments] Empirical results: the abstract asserts "greatly improves the cost efficiency" and "maintains the majority of the performances," but the available text contains neither quantitative tables, baseline comparisons (task-only routing, oracle routing), nor error bars. These data are required to substantiate the central empirical claim.
minor comments (2)
  1. [Notation / Method] Clarify the precise functional form of the value estimator and the decision threshold that trades off expected gain against exploration cost.
  2. [Data release] The dataset release is welcome; the paper should state the exact license, format, and reproduction instructions.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive report. We address each major comment below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract / Theorem] Abstract and theorem section: the Bayes-optimality claim is load-bearing yet the manuscript supplies no derivation, proof sketch, or explicit statement of the posterior-value estimator. Without these details it is impossible to verify that the practical routing rule inherits the stated optimality guarantee once the cost of the exploratory turns is subtracted.

    Authors: We agree that the main text would benefit from greater self-containment. The full derivation appears in Appendix A, but we will insert a concise proof sketch into Section 3 together with an explicit statement of the posterior-value estimator (the conditional expectation of continuation value given the partial trajectory). This will make the inheritance of the optimality guarantee, net of exploratory cost, directly verifiable from the main body. revision: yes

  2. Referee: [Method / Routing rule] Value-estimate accuracy (weakest assumption identified in the stress-test note): the routing policy presupposes that a small fixed number of cheap-model turns yields a sufficiently accurate and low-variance estimate of continuation value. The paper provides no error analysis, variance bounds, or ablation on the number of exploratory turns, leaving open the possibility that noisy estimates produce net-negative routing decisions even when the theoretical statement remains formally true.

    Authors: The Bayes-optimality theorem is agnostic to estimation error, yet we concur that empirical validation of the assumption is necessary. We will add (i) an ablation varying the number of exploratory turns (1, 3, 5) and (ii) a variance analysis of the value estimates across the released dataset. These additions will quantify the regimes in which the routing decisions remain net-positive. revision: yes

  3. Referee: [Experiments] Empirical results: the abstract asserts "greatly improves the cost efficiency" and "maintains the majority of the performances," but the available text contains neither quantitative tables, baseline comparisons (task-only routing, oracle routing), nor error bars. These data are required to substantiate the central empirical claim.

    Authors: Section 4 of the manuscript already contains tables reporting cost and performance metrics for multiple weak–strong LLM pairs, with explicit comparison to task-only routing. To address the concern we will (i) add error bars computed over repeated runs and (ii) include an oracle-routing baseline. We will also ensure the experimental section is fully present in the revision. revision: partial

Circularity Check

0 steps flagged

No significant circularity; theorem applies standard decision theory without self-referential reduction

full rationale

The paper's central Bayes-optimality claim is a general result that additional observations (partial trajectory) cannot harm and can only improve routing decisions under standard Bayesian updating; this is independent of any fitted parameters or prior self-citations. No equations or sections in the abstract or reader's summary reduce a prediction to a self-defined input, a fitted value renamed as output, or a load-bearing self-citation chain. The empirical routing policy is presented as an application of this theorem plus a value-based temporal approach, with no indication that the value estimates are constructed by definition from the routing outcomes themselves. The derivation chain remains self-contained against external benchmarks of decision theory.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are detailed beyond the implicit domain assumption that partial trajectories carry decision-relevant information.

axioms (1)
  • domain assumption Partial trajectories from cheap-model exploration are informative about task complexity and repair cost
    Central premise enabling the value-based routing decision and the strict improvement claim in the theorem.

pith-pipeline@v0.9.1-grok · 5737 in / 1216 out tokens · 23300 ms · 2026-07-02T18:17:42.667124+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

83 extracted references · 40 canonical work pages · 11 internal anchors

  1. [1]

    KaShun SHUM and Binyuan Hui and Jiawei Chen and Lei Zhang and X. W. and Jiaxi Yang and Yuzhen Huang and Junyang Lin and Junxian He , booktitle=. 2026 , url=

  2. [2]

    arXiv preprint arXiv:2502.08773 , year=

    Universal model routing for efficient llm inference , author=. arXiv preprint arXiv:2502.08773 , year=

  3. [3]

    Richard Zhuang and Tianhao Wu and Zhaojin Wen and Andrew Li and Jiantao Jiao and Kannan Ramchandran , booktitle=. Embed. 2025 , url=

  4. [4]

    Gonzalez and M Waleed Kadous and Ion Stoica , booktitle=

    Isaac Ong and Amjad Almahairi and Vincent Wu and Wei-Lin Chiang and Tianhao Wu and Joseph E. Gonzalez and M Waleed Kadous and Ion Stoica , booktitle=. Route. 2025 , url=

  5. [5]

    The Eleventh International Conference on Learning Representations , year=

    ReAct: Synergizing Reasoning and Acting in Language Models , author=. The Eleventh International Conference on Learning Representations , year=

  6. [6]

    2025 , url=

    John Yang and Kilian Lieret and Carlos E Jimenez and Alexander Wettig and Kabir Khandpur and Yanzhe Zhang and Binyuan Hui and Ofir Press and Ludwig Schmidt and Diyi Yang , booktitle=. 2025 , url=

  7. [7]

    SWE-Bench Pro: Can AI Agents Solve Long-Horizon Software Engineering Tasks?

    Swe-bench pro: Can ai agents solve long-horizon software engineering tasks? , author=. arXiv preprint arXiv:2509.16941 , year=

  8. [8]

    2024 , url=

    Jimenez, Carlos E and Yang, John and Wettig, Alexander and Yao, Shunyu and Pei, Kexin and Press, Ofir and Narasimhan, Karthik , booktitle=. 2024 , url=

  9. [9]

    and Wettig, Alexander and Lieret, Kilian and Yao, Shunyu and Narasimhan, Karthik and Press, Ofir , booktitle =

    Yang, John and Jimenez, Carlos E. and Wettig, Alexander and Lieret, Kilian and Yao, Shunyu and Narasimhan, Karthik and Press, Ofir , booktitle =. SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , url =. doi:10.52202/079017-1601 , editor =

  10. [10]

    Evaluating Large Language Models Trained on Code

    Evaluating Large Language Models Trained on Code , author=. arXiv preprint arXiv:2107.03374 , year=

  11. [11]

    ArXiv , year=

    Evaluating Large Language Models Trained on Code , author=. ArXiv , year=

  12. [12]

    Program Synthesis with Large Language Models

    Program Synthesis with Large Language Models , author=. arXiv preprint arXiv:2108.07732 , year=

  13. [13]

    Neural Information Processing Systems , year=

    Measuring Coding Challenge Competence With APPS , author=. Neural Information Processing Systems , year=

  14. [14]

    ArXiv , year=

    SWE-smith: Scaling Data for Software Engineering Agents , author=. ArXiv , year=

  15. [15]

    Introducing swe-bench verified | openai ,author=

  16. [16]

    arXiv preprint arXiv:2404.02605 , year=

    Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving , author=. arXiv preprint arXiv:2404.02605 , year=

  17. [17]

    Neural Information Processing Systems , year=

    CrossCodeEval: A Diverse and Multilingual Benchmark for Cross-File Code Completion , author=. Neural Information Processing Systems , year=

  18. [18]

    RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems

    RepoBench: Benchmarking Repository-Level Code Auto-Completion Systems , author=. arXiv preprint arXiv:2306.03091 , year=

  19. [19]

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions

    BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions , author=. arXiv preprint arXiv:2406.15877 , year=

  20. [20]

    Swe-bench multimodal: Do ai systems generalize to visual software domains? arXiv preprint arXiv:2410.03859, 2024 c

    SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains? , author=. arXiv preprint arXiv:2410.03859 , year=

  21. [21]

    Neural Information Processing Systems , year=

    SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering , author=. Neural Information Processing Systems , year=

  22. [22]

    ACM SIGSOFT International Symposium on Software Testing and Analysis , year=

    AutoCodeRover: Autonomous Program Improvement , author=. ACM SIGSOFT International Symposium on Software Testing and Analysis , year=

  23. [23]

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents

    OpenHands: An Open Platform for AI Software Developers as Generalist Agents , author=. arXiv preprint arXiv:2407.16741 , year=

  24. [24]

    AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation

    AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation , author=. arXiv preprint arXiv:2312.13010 , year=

  25. [25]

    International Conference on Machine Learning , year=

    Executable Code Actions Elicit Better LLM Agents , author=. International Conference on Machine Learning , year=

  26. [26]

    Agentless: Demystifying LLM-based Software Engineering Agents

    Agentless: Demystifying LLM-based Software Engineering Agents , author=. arXiv preprint arXiv:2407.01489 , year=

  27. [27]

    Proceedings of the ACM on Software Engineering , year=

    CodePlan: Repository-level Coding using LLMs and Planning , author=. Proceedings of the ACM on Software Engineering , year=

  28. [28]

    Benchmark Data Contamination of Large Language Models: A Survey

    Benchmark Data Contamination of Large Language Models: A Survey , author=. arXiv preprint arXiv:2406.04244 , year=

  29. [29]

    Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

    Investigating Data Contamination in Modern Benchmarks for Large Language Models , author=. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=. 2024 , address=

  30. [30]

    arXiv preprint arXiv:2502.14425 (2025)

    A Survey on Data Contamination for Large Language Models , author=. arXiv preprint arXiv:2502.14425 , year=

  31. [31]

    arXiv preprint arXiv:2410.06992 , year=

    SWE-Bench+: Enhanced Coding Benchmark for LLMs , author=. arXiv preprint arXiv:2410.06992 , year=

  32. [32]

    Swe-bench goes live!, 2025

    SWE-bench Goes Live! , author=. arXiv preprint arXiv:2505.23419 , year=

  33. [33]

    2009 IEEE 31st International Conference on Software Engineering , pages=

    Predicting faults using the complexity of code changes , author=. 2009 IEEE 31st International Conference on Software Engineering , pages=. 2009 , organization=

  34. [34]

    Empirical Software Engineering , volume=

    Evaluating code complexity triggers, use of complexity measures and the influence of code complexity on maintenance time , author=. Empirical Software Engineering , volume=. 2017 , publisher=

  35. [35]

    Advances in neural information processing systems , volume=

    Language models are few-shot learners , author=. Advances in neural information processing systems , volume=

  36. [36]

    ArXiv , year=

    LiveBench: A Challenging, Contamination-Free LLM Benchmark , author=. ArXiv , year=

  37. [37]

    ArXiv , year=

    Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards , author=. ArXiv , year=

  38. [38]

    ArXiv , year=

    SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? , author=. ArXiv , year=

  39. [39]

    ArXiv , year=

    A Careful Examination of Large Language Model Performance on Grade School Arithmetic , author=. ArXiv , year=

  40. [40]

    arXiv preprint arXiv:2311.09783 , year=

    Investigating the Impact of Data Contamination of Large Language Models in Text-to-SQL Translation , author=. arXiv preprint arXiv:2311.09783 , year=

  41. [41]

    Efficient Training-Free Online Routing for High-Volume Multi-LLM Serving , author=

  42. [42]

    arXiv preprint arXiv:2410.03834 , year=

    Graphrouter: A graph-based router for llm selections , author=. arXiv preprint arXiv:2410.03834 , year=

  43. [43]

    Advances in neural information processing systems , volume=

    Judging llm-as-a-judge with mt-bench and chatbot arena , author=. Advances in neural information processing systems , volume=

  44. [44]

    arXiv preprint arXiv:2505.16881 , year=

    CASTILLO: Characterizing Response Length Distributions of Large Language Models , author=. arXiv preprint arXiv:2505.16881 , year=

  45. [45]

    arXiv preprint arXiv:2407.10834 , year=

    Metallm: A high-performant and cost-efficient dynamic framework for wrapping llms , author=. arXiv preprint arXiv:2407.10834 , year=

  46. [46]

    arXiv preprint arXiv:2504.09858 , year=

    Reasoning models can be effective without thinking , author=. arXiv preprint arXiv:2504.09858 , year=

  47. [47]

    arXiv preprint arXiv:2503.10657 , year=

    Routereval: A comprehensive benchmark for routing llms to explore model-level scaling up in llms , author=. arXiv preprint arXiv:2503.10657 , year=

  48. [48]

    Lingjiao Chen and Matei Zaharia and James Zou , journal=. Frugal. 2024 , url=

  49. [49]

    arXiv preprint arXiv:2408.12320 , year=

    Tensoropera router: A multi-model router for efficient llm inference , author=. arXiv preprint arXiv:2408.12320 , year=

  50. [50]

    Proceedings of the 17th ACM International Conference on Web Search and Data Mining , pages=

    Fly-swat or cannon? cost-effective language model choice via meta-modeling , author=. Proceedings of the 17th ACM International Conference on Web Search and Data Mining , pages=

  51. [51]

    Advances in Neural Information Processing Systems , volume=

    Routerdc: Query-based router by dual contrastive learning for assembling large language models , author=. Advances in Neural Information Processing Systems , volume=

  52. [52]

    arXiv preprint arXiv:2308.11601 , year=

    Tryage: Real-time, intelligent routing of user prompts to large language models , author=. arXiv preprint arXiv:2308.11601 , year=

  53. [53]

    Cost-aware contrastive routing for llms.arXiv preprint arXiv:2508.12491,

    Cost-Aware Contrastive Routing for LLMs , author=. arXiv preprint arXiv:2508.12491 , year=

  54. [54]

    Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing

    Hybrid llm: Cost-efficient and quality-aware query routing , author=. arXiv preprint arXiv:2404.14618 , year=

  55. [55]

    AutoMix: Automatically Mixing Language Models

    Automix: Automatically mixing language models , author=. arXiv preprint arXiv:2310.12963 , year=

  56. [56]

    arXiv preprint arXiv:2306.02561 , year=

    Llm-blender: Ensembling large language models with pairwise ranking and generative fusion , author=. arXiv preprint arXiv:2306.02561 , year=

  57. [57]

    IEEE Transactions on Knowledge and Data Engineering , year=

    A survey on mixture of experts in large language models , author=. IEEE Transactions on Knowledge and Data Engineering , year=

  58. [58]

    The Twelfth International Conference on Learning Representations , year=

    Large Language Model Cascades with Mixture of Thought Representations for Cost-Efficient Reasoning , author=. The Twelfth International Conference on Learning Representations , year=

  59. [59]

    FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data

    FusionFactory: Fusing LLM Capabilities with Multi-LLM Log Data , author=. arXiv preprint arXiv:2507.10540 , year=

  60. [60]

    arXiv preprint arXiv:2504.07113 , year=

    How Robust Are Router-LLMs? Analysis of the Fragility of LLM Routing Capabilities , author=. arXiv preprint arXiv:2504.07113 , year=

  61. [61]

    arXiv preprint arXiv:2502.03261 , year=

    Carrot: A cost aware rate optimal router , author=. arXiv preprint arXiv:2502.03261 , year=

  62. [62]

    arXiv preprint arXiv:2509.06274 , year=

    IPR: Intelligent Prompt Routing with User-Controlled Quality-Cost Trade-offs , author=. arXiv preprint arXiv:2509.06274 , year=

  63. [63]

    arXiv preprint arXiv:2502.20576 , year=

    OmniRouter: Budget and Performance Controllable Multi-LLM Routing , author=. arXiv preprint arXiv:2502.20576 , year=

  64. [64]

    arXiv preprint arXiv:2510.00202 , year=

    RouterArena: An Open Platform for Comprehensive Comparison of LLM Routers , author=. arXiv preprint arXiv:2510.00202 , year=

  65. [65]

    arXiv preprint arXiv:2506.01048 , year=

    IRT-Router: Effective and Interpretable Multi-LLM Routing via Item Response Theory , author=. arXiv preprint arXiv:2506.01048 , year=

  66. [66]

    2025 , howpublished=

    vLLM Semantic Router , author=. 2025 , howpublished=

  67. [67]

    2026 , note =

    LLM Router – Chat UI Configuration , author =. 2026 , note =

  68. [68]

    2025 , author =

    RequestyAI: Unified LLM Gateway Routing , howpublished =. 2025 , author =

  69. [69]

    arXiv preprint arXiv:2505.19435 , year=

    Route to Reason: Adaptive Routing for LLM and Reasoning Strategy Selection , author=. arXiv preprint arXiv:2505.19435 , year=

  70. [70]

    arXiv preprint arXiv:2506.05901 , year=

    Route-and-Reason: Scaling Large Language Model Reasoning with Reinforced Model Router , author=. arXiv preprint arXiv:2506.05901 , year=

  71. [71]

    2026 , eprint=

    Think When Needed: Model-Aware Reasoning Routing for LLM-based Ranking , author=. 2026 , eprint=

  72. [72]

    arXiv preprint arXiv:2510.08731 , year=

    When to Reason: Semantic Router for vLLM , author=. arXiv preprint arXiv:2510.08731 , year=

  73. [73]

    2025 , note =

    Auto Router , author =. 2025 , note =

  74. [74]

    arXiv preprint arXiv:2310.01542 , year=

    Fusing models with complementary expertise , author=. arXiv preprint arXiv:2310.01542 , year=

  75. [75]

    2023 , publisher =

    OpenHermes 2.5: An Open Dataset of Synthetic Data for Generalist LLM Assistants , author =. 2023 , publisher =

  76. [76]

    2025 , author =

    NotDiamond: Routing Between AI Models , howpublished =. 2025 , author =

  77. [77]

    Model router for Microsoft Foundry , author =

  78. [78]

    RouterBench: A Benchmark for Multi-LLM Routing System

    Routerbench: A benchmark for multi-llm routing system , author=. arXiv preprint arXiv:2403.12031 , year=

  79. [79]

    Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference , year=

    Faster, Cheaper, Just as Good: Cost-and Latency-Constrained Routing for LLMs , author=. Sparsity in LLMs (SLLM): Deep Dive into Mixture of Experts, Quantization, Hardware, and Inference , year=

  80. [80]

    The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

    Router-r1: Teaching llms multi-round routing and aggregation via reinforcement learning , author=. The Thirty-ninth Annual Conference on Neural Information Processing Systems , year=

Showing first 80 references.