hub Canonical reference

RouteLLM: Learning to Route LLMs with Preference Data

Isaac Ong, Amjad Almahairi, Vincent Wu, Wei-Lin Chiang, Tianhao Wu, Joseph E. Gonzalez · 2024 · cs.LG · arXiv 2406.18665

Canonical reference. 80% of citing Pith papers cite this work as background.

49 Pith papers citing it

Background 80% of classified citations

open full Pith review browse 49 citing papers arXiv PDF

abstract

Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data and data augmentation techniques to enhance performance. Our evaluation on widely-recognized benchmarks shows that our approach significantly reduces costs-by over 2 times in certain cases-without compromising the quality of responses. Interestingly, our router models also demonstrate significant transfer learning capabilities, maintaining their performance even when the strong and weak models are changed at test time. This highlights the potential of these routers to provide a cost-effective yet high-performance solution for deploying LLMs.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 8 baseline 2

citation-polarity summary

background 8 baseline 2

claims ledger

abstract Large language models (LLMs) exhibit impressive capabilities across a wide range of tasks, yet the choice of which model to use often involves a trade-off between performance and cost. More powerful models, though effective, come with higher expenses, while less capable models are more cost-effective. To address this dilemma, we propose several efficient router models that dynamically select between a stronger and a weaker LLM during inference, aiming to optimize the balance between cost and response quality. We develop a training framework for these routers leveraging human preference data an

co-cited works

representative citing papers

FlowCompile: An Optimizing Compiler for Structured LLM Workflows

cs.CL · 2026-05-13 · unverdicted · novelty 8.0

FlowCompile performs compile-time design space exploration on structured LLM workflows to produce reusable high-quality configuration sets that outperform routing baselines with up to 6.4x speedup.

DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection

cs.CV · 2026-05-22 · unverdicted · novelty 7.0

A reinforcement-learned vision-language agent adaptively selects and fuses monocular depth experts per sample for better performance across camera geometries.

DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

DecisionBench supplies a fixed task suite, model pool, delegation interface, and multi-axis metrics to evaluate emergent delegation, showing similar quality across awareness conditions but 15-31 point headroom under perfect delegation.

KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving

cs.DC · 2026-05-13 · conditional · novelty 7.0

KVServe delivers up to 9.13x job completion time speedup and 32.8x time-to-first-token reduction by making KV cache compression service-aware and adaptive in disaggregated LLM serving.

A Regime Theory of Controller Class Selection for LLM Action Decisions

cs.AI · 2026-05-07 · unverdicted · novelty 7.0

A regime theory selects the optimal controller class for LLM action decisions from a nested lattice of four classes using three data-estimable bottlenecks, with a Bernstein-tight threshold and empirical matches on multiple benchmarks.

Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost

cs.AI · 2026-05-07 · conditional · novelty 7.0

Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.

MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents

cs.MA · 2026-05-05 · unverdicted · novelty 7.0

MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.

When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs

cs.PF · 2026-05-04 · unverdicted · novelty 7.0 · 2 refs

Hosted open-weight LLM APIs function as time-varying heterogeneous services rather than fixed model artifacts, with demand concentrated, supply-use mismatches, and task-specific routing yielding major cost and throughput gains.

When Alignment Isn't Enough: Response-Path Attacks on LLM Agents

cs.CR · 2026-05-04 · unverdicted · novelty 7.0

A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.

Model Routing as a Trust Problem: Route Receipts for Adaptive AI Systems

cs.AI · 2026-05-03 · conditional · novelty 7.0

The paper introduces route receipts as a portable runtime record of routing decisions to make adaptive AI systems more transparent and trustworthy.

Credo: Declarative Control of LLM Pipelines via Beliefs and Policies

cs.AI · 2026-04-15 · unverdicted · novelty 7.0

Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.

Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference

cs.CL · 2026-02-23 · unverdicted · novelty 7.0

Pyramid MoA is a hierarchical Mixture-of-Agents system with a decision-theoretic router that achieves up to 42.9% compute savings while nearly matching oracle accuracy on MBPP, GSM8K, MMLU, HumanEval, and MATH.

ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation

cs.IR · 2026-02-16 · unverdicted · novelty 7.0

ScrapeGraphAI-100k releases 93,695 real telemetry examples pairing web page content with prompts, schemas, and LLM responses to support training and benchmarking of schema-constrained generation.

HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools

cs.CL · 2026-05-16 · unverdicted · novelty 6.0

HyDRA routes queries to cost-effective LLMs by predicting multi-dimensional capability requirements with a multi-head encoder and applying shortfall matching against configuration-defined model profiles, delivering up to 72.5 percent cost savings on coding benchmarks while remaining decoupled from具体

GAR: Carbon-Aware Routing for LLM Inference via Constrained Optimization

cs.AI · 2026-05-12 · unverdicted · novelty 6.0

GAR routes LLM inference requests via constrained multi-objective optimization to cut per-request CO2 emissions while respecting accuracy floors and p95 latency SLOs.

LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer?

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

LatentRouter routes image-question queries to the best MLLM by predicting counterfactual performance via latent communication between learned query capsules and model capability tokens.

Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge

cs.AI · 2026-05-11 · unverdicted · novelty 6.0

RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.

Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs

cs.AI · 2026-05-09 · unverdicted · novelty 6.0

A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.

ModelLens: Finding the Best for Your Task from Myriads of Models

cs.LG · 2026-05-08 · unverdicted · novelty 6.0

ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.

Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

A small RL-trained policy for stepwise model routing between LLM sizes improves the accuracy-cost tradeoff on math benchmarks over handcrafted strategies and matches large process reward model methods.

Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training

cs.AI · 2026-05-04 · conditional · novelty 6.0

Average token log-probability provides a zero-shot confidence signal for small LLMs that matches supervised baselines in-distribution and outperforms them out-of-distribution, with a new retrieval-conditional variant improving further at lower latency.

AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go?

cs.AI · 2026-05-01 · unverdicted · novelty 6.0

Small open-weight models match GPT-5 on routine agent tool-use tasks but lag on long-horizon planning, supporting tiered routing to reduce costs in agentic systems.

ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation

cs.AI · 2026-04-26 · unverdicted · novelty 6.0

ClawTrace enables cost-aware LLM agent skill distillation by tracing per-step costs and generating preserve, prune, and repair patches, with ablations showing reduced regressions and prune rules transferring to cut costs by 32%.

RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment

cs.CL · 2026-04-24 · unverdicted · novelty 6.0

RouteLMT learns to route MT requests to large or small LLMs by predicting marginal quality gain from small-model token representations, yielding a better quality-budget Pareto frontier than baselines.

citing papers explorer

Showing 49 of 49 citing papers.

FlowCompile: An Optimizing Compiler for Structured LLM Workflows cs.CL · 2026-05-13 · unverdicted · none · ref 3 · internal anchor
FlowCompile performs compile-time design space exploration on structured LLM workflows to produce reusable high-quality configuration sets that outperform routing baselines with up to 6.4x speedup.
DepthAgent: Towards Better Universal Depth Estimation via Sample-wise Expert Selection cs.CV · 2026-05-22 · unverdicted · none · ref 45 · internal anchor
A reinforcement-learned vision-language agent adaptively selects and fuses monocular depth experts per sample for better performance across camera geometries.
DecisionBench: A Benchmark for Emergent Delegation in Long-Horizon Agentic Workflows cs.AI · 2026-05-18 · unverdicted · none · ref 27 · internal anchor
DecisionBench supplies a fixed task suite, model pool, delegation interface, and multi-axis metrics to evaluate emergent delegation, showing similar quality across awareness conditions but 15-31 point headroom under perfect delegation.
KVServe: Service-Aware KV Cache Compression for Communication-Efficient Disaggregated LLM Serving cs.DC · 2026-05-13 · conditional · none · ref 36 · internal anchor
KVServe delivers up to 9.13x job completion time speedup and 32.8x time-to-first-token reduction by making KV cache compression service-aware and adaptive in disaggregated LLM serving.
A Regime Theory of Controller Class Selection for LLM Action Decisions cs.AI · 2026-05-07 · unverdicted · none · ref 21 · internal anchor
A regime theory selects the optimal controller class for LLM action decisions from a nested lattice of four classes using three data-estimable bottlenecks, with a Bernstein-tight threshold and empirical matches on multiple benchmarks.
Post Reasoning: Improving the Performance of Non-Thinking Models at No Cost cs.AI · 2026-05-07 · conditional · none · ref 86 · internal anchor
Post-Reasoning boosts LLM accuracy by reversing the usual answer-after-reasoning order, delivering mean relative gains of 17.37% across 117 model-benchmark pairs with zero extra cost.
MemFlow: Intent-Driven Memory Orchestration for Small Language Model Agents cs.MA · 2026-05-05 · unverdicted · none · ref 27 · internal anchor
MemFlow routes queries by intent to tiered memory operations, nearly doubling accuracy of a 1.7B SLM on long-horizon benchmarks compared to full-context baselines.
When Is the Same Model Not the Same Service? A Measurement Study of Hosted Open-Weight LLM APIs cs.PF · 2026-05-04 · unverdicted · none · ref 17 · 2 links · internal anchor
Hosted open-weight LLM APIs function as time-varying heterogeneous services rather than fixed model artifacts, with demand concentrated, supply-use mismatches, and task-specific routing yielding major cost and throughput gains.
When Alignment Isn't Enough: Response-Path Attacks on LLM Agents cs.CR · 2026-05-04 · unverdicted · none · ref 79 · internal anchor
A malicious relay can strategically rewrite aligned LLM outputs in BYOK agent architectures to achieve up to 99.1% attack success on benchmarks like AgentDojo and ASB.
Model Routing as a Trust Problem: Route Receipts for Adaptive AI Systems cs.AI · 2026-05-03 · conditional · none · ref 15 · internal anchor
The paper introduces route receipts as a portable runtime record of routing decisions to make adaptive AI systems more transparent and trustworthy.
Credo: Declarative Control of LLM Pipelines via Beliefs and Policies cs.AI · 2026-04-15 · unverdicted · none · ref 7 · internal anchor
Credo proposes representing LLM agent state as beliefs and regulating pipeline behavior with declarative policies stored in a database for adaptive, auditable control.
Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference cs.CL · 2026-02-23 · unverdicted · none · ref 11 · internal anchor
Pyramid MoA is a hierarchical Mixture-of-Agents system with a decision-theoretic router that achieves up to 42.9% compute savings while nearly matching oracle accuracy on MBPP, GSM8K, MMLU, HumanEval, and MATH.
ScrapeGraphAI-100k: Dataset for Schema-Constrained LLM Generation cs.IR · 2026-02-16 · unverdicted · none · ref 16 · internal anchor
ScrapeGraphAI-100k releases 93,695 real telemetry examples pairing web page content with prompts, schemas, and LLM responses to support training and benchmarking of schema-constrained generation.
HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools cs.CL · 2026-05-16 · unverdicted · none · ref 12 · internal anchor
HyDRA routes queries to cost-effective LLMs by predicting multi-dimensional capability requirements with a multi-head encoder and applying shortfall matching against configuration-defined model profiles, delivering up to 72.5 percent cost savings on coding benchmarks while remaining decoupled from具体
GAR: Carbon-Aware Routing for LLM Inference via Constrained Optimization cs.AI · 2026-05-12 · unverdicted · none · ref 11 · internal anchor
GAR routes LLM inference requests via constrained multi-objective optimization to cut per-request CO2 emissions while respecting accuracy floors and p95 latency SLOs.
LatentRouter: Can We Choose the Right Multimodal Model Before Seeing Its Answer? cs.AI · 2026-05-11 · unverdicted · none · ref 27 · internal anchor
LatentRouter routes image-question queries to the best MLLM by predicting counterfactual performance via latent communication between learned query capsules and model capability tokens.
Reasoning Is Not Free: Robust Adaptive Cost-Efficient Routing for LLM-as-a-Judge cs.AI · 2026-05-11 · unverdicted · none · ref 19 · internal anchor
RACER routes between reasoning and non-reasoning LLM judges via constrained distributionally robust optimization to achieve better accuracy-cost trade-offs under distribution shift.
Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs cs.AI · 2026-05-09 · unverdicted · none · ref 22 · internal anchor
A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.
ModelLens: Finding the Best for Your Task from Myriads of Models cs.LG · 2026-05-08 · unverdicted · none · ref 25 · internal anchor
ModelLens learns a performance-aware latent space from 1.62M leaderboard records to rank unseen models on unseen datasets without forward passes on the target.
Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning cs.AI · 2026-05-07 · unverdicted · none · ref 12 · internal anchor
A small RL-trained policy for stepwise model routing between LLM sizes improves the accuracy-cost tradeoff on math benchmarks over handcrafted strategies and matches large process reward model methods.
Zero-Shot Confidence Estimation for Small LLMs: When Supervised Baselines Aren't Worth Training cs.AI · 2026-05-04 · conditional · none · ref 1 · internal anchor
Average token log-probability provides a zero-shot confidence signal for small LLMs that matches supervised baselines in-distribution and outperforms them out-of-distribution, with a new retrieval-conditional variant improving further at lower latency.
AgentFloor: How Far Up the tool use Ladder Can Small Open-Weight Models Go? cs.AI · 2026-05-01 · unverdicted · none · ref 10 · internal anchor
Small open-weight models match GPT-5 on routine agent tool-use tasks but lag on long-horizon planning, supporting tiered routing to reduce costs in agentic systems.
ClawTrace: Cost-Aware Tracing for LLM Agent Skill Distillation cs.AI · 2026-04-26 · unverdicted · none · ref 21 · internal anchor
ClawTrace enables cost-aware LLM agent skill distillation by tracing per-step costs and generating preserve, prune, and repair patches, with ablations showing reduced regressions and prune rules transferring to cut costs by 32%.
RouteLMT: Learned Sample Routing for Hybrid LLM Translation Deployment cs.CL · 2026-04-24 · unverdicted · none · ref 13 · internal anchor
RouteLMT learns to route MT requests to large or small LLMs by predicting marginal quality gain from small-model token representations, yielding a better quality-budget Pareto frontier than baselines.
Phase-Scheduled Multi-Agent Systems for Token-Efficient Coordination cs.AI · 2026-04-19 · unverdicted · none · ref 16 · internal anchor
PSMAS reduces token use in LLM multi-agent systems by 27.3% on average via phase-based temporal scheduling and context compression, with task performance staying within 2.1 points of full activation.
Adaptive Test-Time Compute Allocation for Reasoning LLMs via Constrained Policy Optimization cs.LG · 2026-04-16 · unverdicted · none · ref 30 · internal anchor
A Lagrangian-relaxation plus imitation-learning pipeline adaptively allocates test-time compute to LLMs, outperforming uniform baselines by up to 12.8% relative accuracy on MATH while staying within a fixed average budget.
Local-Splitter: A Measurement Study of Seven Tactics for Reducing Cloud LLM Token Usage on Coding-Agent Workloads cs.DC · 2026-04-14 · unverdicted · none · ref 18 · internal anchor
Combining local routing with prompt compression saves 45-79% cloud tokens on edit and explanation workloads, while a fuller set including draft-review saves 51% on RAG-heavy tasks.
RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving cs.NI · 2026-04-13 · unverdicted · none · ref 11 · internal anchor
Joint resource allocation and routing for multi-model LLM serving can produce up to 87% variation in achievable output quality across setups on the same GPU cluster.
Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals cs.SE · 2026-04-08 · unverdicted · none · ref 7 · internal anchor
Triage routes coding tasks to cost-effective LLM tiers based on code quality metrics to maintain verification quality at lower cost.
Select-then-Solve: Paradigm Routing as Inference-Time Optimization for LLM Agents cs.CL · 2026-04-08 · conditional · none · ref 3 · internal anchor
A learned embedding-based router selecting among six reasoning paradigms improves LLM agent accuracy from 47.6% to 53.1% on average, beating the best fixed paradigm by 2.8pp.
Policy-Governed LLM Routing with Intent Matching for Instrument Laboratories cs.CY · 2026-04-03 · conditional · none · ref 9 · internal anchor
A governed LLM routing system for lab tutoring raises challenge-alignment from 0.90 to 0.98, boosts productive-struggle time, and cuts token costs by two-thirds while preserving answer accuracy.
GlimpRouter: Efficient Collaborative Inference by Glimpsing One Token of Thoughts cs.AI · 2026-01-08 · unverdicted · none · ref 15 · internal anchor
GlimpRouter uses the entropy of the first token in each reasoning step to decide whether to invoke a large model, yielding 10.7% higher accuracy and 25.9% lower latency than a standalone large model on AIME25.
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs cs.CV · 2025-05-21 · unverdicted · none · ref 31 · internal anchor
Chain-of-Focus enables VLMs to adaptively search and zoom on important image areas via a two-stage SFT and RL pipeline on a custom 3K-sample dataset, yielding 5% gains on the V* benchmark across resolutions from 224 to 4K.
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling cs.LG · 2024-07-31 · unverdicted · none · ref 48 · internal anchor
Repeated sampling scales problem coverage log-linearly with sample count, improving SWE-bench Lite performance from 15.9% to 56% using 250 samples.
UCCI: Calibrated Uncertainty for Cost-Optimal LLM Cascade Routing cs.LG · 2026-05-11 · unverdicted · none · ref 42 · internal anchor
UCCI calibrates LLM uncertainty to error probabilities with isotonic regression for cost-optimal cascade routing, delivering 31% cost savings at maintained accuracy on a 75k-query NER task.
Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation cs.AI · 2026-05-07 · unverdicted · none · ref 74 · internal anchor
RGAO combines retrieval-based complexity assessment with a formal budget algebra to enable dynamic topology selection in multi-agent code generation with provable conservation.
Agentic AI Systems Should Be Designed as Marginal Token Allocators cs.AI · 2026-05-02 · unverdicted · none · ref 30 · internal anchor
Agentic AI systems should be designed as marginal token allocators that balance benefit against cost, latency, and risk across their layers rather than as unit-priced text generators.
TRACES: Tagging Reasoning Steps for Adaptive Cost-Efficient Early-Stopping cs.CL · 2026-04-22 · unverdicted · none · ref 2 · internal anchor
TRACES tags reasoning steps to enable adaptive early stopping, cutting token use by 20-50% on MATH500, GSM8K, AIME, MMLU and GPQA with comparable accuracy.
A-IO: Adaptive Inference Orchestration for Memory-Bound NPUs cs.DC · 2026-04-10 · unverdicted · none · ref 10 · internal anchor
A-IO adaptively orchestrates LLM inference on NPUs to address memory bottlenecks, model scaling paradoxes, and synchronization costs in speculative decoding.
AgentOpt v0.1 Technical Report: Client-Side Optimization for LLM-Based Agent cs.LG · 2026-04-07 · unverdicted · none · ref 21 · internal anchor
AgentOpt introduces a framework-agnostic package that uses algorithms like UCB-E to find cost-effective model assignments in multi-step LLM agent pipelines, cutting evaluation budgets by 62-76% while maintaining near-optimal accuracy on benchmarks.
Scoring, Reasoning, and Selecting the Best! Ensembling Large Language Models via a Peer-Review Process cs.CL · 2025-12-29 · unverdicted · none · ref 38 · internal anchor
LLM-PeerReview ensembles LLMs by scoring responses with LLM-as-Judge and selecting the best via averaging or truth inference, beating Smoothie-Global by 6.9-7.3 points on four datasets.
Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models cs.CL · 2025-03-20 · accept · none · ref 135 · internal anchor
A survey organizing techniques to achieve efficient reasoning in LLMs by shortening chain-of-thought outputs.
Computational Challenges in Token Economics: Bridging Economic Theory and AI System Design cs.AI · 2026-05-17 · unverdicted · none · ref 40 · internal anchor
The paper defines Computational Token Economics and introduces the Token Economics Trilemma as a framework for trade-offs in granularity, real-time performance, and optimality, while outlining a research agenda for three challenge areas.
INAR-VL: Input-Aware Routing for Edge-Cloud Vision-Language Inference cs.LG · 2026-05-13 · unverdicted · none · ref 14 · internal anchor
INAR-VL routes 36% of visual question answering requests to the edge using lightweight complexity signals, cutting latency 24% and energy 26% while retaining 97% of cloud accuracy.
Qualixar OS: A Universal Operating System for AI Agent Orchestration cs.AI · 2026-04-07 · unverdicted · none · ref 19 · internal anchor
Qualixar OS provides a runtime for multi-agent AI systems with support for 12 topologies, LLM-driven team design, dynamic routing, consensus judging, content attribution, and protocol bridging, achieving 100% accuracy on a custom 20-task suite at $0.000039 mean cost per task.
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble cs.CL · 2025-02-25 · unverdicted · none · ref 38 · internal anchor
A systematic survey of LLM ensemble methods organized into a taxonomy of ensemble-before-inference, ensemble-during-inference, and ensemble-after-inference stages, with review of benchmarks, applications, and future directions.
Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents cs.LG · 2026-05-14 · unreviewed · ref 12 · internal anchor
A Two-Dimensional Framework for AI Agent Design Patterns: Cognitive Function and Execution Topology cs.AI · 2026-03-16 · unreviewed · ref 16 · internal anchor
Optimal Query Allocation in Extractive QA with LLMs: A Learning-to-Defer Framework with Theoretical Guarantees cs.CL · 2024-10-21 · unreviewed · ref 32 · internal anchor

RouteLLM: Learning to Route LLMs with Preference Data

hub tools

citation-role summary

citation-polarity summary

claims ledger

co-cited works

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer