pith. sign in

hub Mixed citations

RouterBench: A Benchmark for Multi-LLM Routing System

Mixed citation behavior. Most common role is background (60%).

41 Pith papers citing it
Background 60% of classified citations
abstract

As the range of applications for Large Language Models (LLMs) continues to grow, the demand for effective serving solutions becomes increasingly critical. Despite the versatility of LLMs, no single model can optimally address all tasks and applications, particularly when balancing performance with cost. This limitation has led to the development of LLM routing systems, which combine the strengths of various models to overcome the constraints of individual LLMs. Yet, the absence of a standardized benchmark for evaluating the performance of LLM routers hinders progress in this area. To bridge this gap, we present RouterBench, a novel evaluation framework designed to systematically assess the efficacy of LLM routing systems, along with a comprehensive dataset comprising over 405k inference outcomes from representative LLMs to support the development of routing strategies. We further propose a theoretical framework for LLM routing, and deliver a comparative analysis of various routing approaches through RouterBench, highlighting their potentials and limitations within our evaluation framework. This work not only formalizes and advances the development of LLM routing systems but also sets a standard for their assessment, paving the way for more accessible and economically viable LLM deployments. The code and data are available at https://github.com/withmartian/routerbench.

hub tools

citation-role summary

background 3 baseline 2

citation-polarity summary

representative citing papers

Online Pandora's Box for Contextual LLM Cascading

cs.AI · 2026-06-05 · unverdicted · novelty 7.0

Introduces a parametric reservation-index policy with GMM estimation and UCB exploration for contextual LLM cascading under output-mediated feedback, claiming dimension-dependent square-root regret.

When to Think Deeply: Inhibitory Deliberation for LLM Reasoning

cs.CL · 2026-06-04 · unverdicted · novelty 7.0

IDPR is a response-conditioned inhibitory deliberation method that trains a controller on fast-slow outcome pairs to decide when to override LLM fast answers, improving accuracy from 47.90% to 48.92% with slow reasoning invoked on only 8.20% of a 5,000-example math test set.

Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents

cs.LG · 2026-05-14 · unverdicted · novelty 7.0 · 2 refs

LQM-ContextRoute routes LLM tool calls via latency-quality matching in a contextual bandit, improving F1 by 2.18 pp, accuracy by up to 18 pp, and NDCG by 2.91-3.22 pp over SW-UCB on web-search, StrategyQA, and retriever benchmarks.

Efficient Ensemble Selection from Binary and Pairwise Feedback

cs.GT · 2026-05-10 · unverdicted · novelty 7.0

The paper develops efficient algorithms for ensemble selection from binary and pairwise feedback, achieving (1-1/e) guarantees with query savings for coverage and PTAS-style results via submodular relaxation for theta-winning committees.

Triaging Threats to Specialized Guardrails

cs.CR · 2026-05-29 · unverdicted · novelty 6.0

Introduces GuardZoo benchmark and RouteGuard router-expert system showing monolithic guardrails suffer task interference while specialized routing improves threat detection and generalization.

Natural Language Query to Configuration for Retrieval Agents

cs.AI · 2026-05-26 · unverdicted · novelty 6.0

BRANE maps queries to optimal retrieval pipeline configurations using LLM-derived features and per-configuration correctness predictors, improving the cost-quality Pareto frontier on three benchmarks.

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

cs.AI · 2026-05-16 · unverdicted · novelty 6.0 · 2 refs

ECC calibrates semantic embeddings with model comparisons via Bradley-Terry profiles and mixture weights to cluster queries by latent LLM capabilities, claiming 17-18 point gains in ranking quality over baselines.

Privacy-Preserving LLMs Routing

cs.CR · 2026-04-17 · unverdicted · novelty 6.0

PPRoute achieves plaintext-level LLM routing quality with MPC-based privacy and a 20x speedup over naive encrypted implementations via MPC-friendly encoders, multi-step training, and O(1) communication Top-k search.

citing papers explorer

Showing 41 of 41 citing papers.