A router that decomposes uncertainty to flexibly route queries between cheap models and oracles while providing regret bounds and supporting abstention in classification tasks with multiple annotations.
arXiv preprint arXiv:2506.22716 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 6roles
background 1polarities
background 1representative citing papers
A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.
BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.
Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.
citing papers explorer
-
Flexible Routing via Uncertainty Decomposition
A router that decomposes uncertainty to flexibly route queries between cheap models and oracles while providing regret bounds and supporting abstention in classification tasks with multiple annotations.
-
Iterative Critique-and-Routing Controller for Multi-Agent Systems with Heterogeneous LLMs
A critique-and-routing controller cast as a finite-horizon MDP with policy-gradient optimization outperforms one-shot routing baselines on reasoning benchmarks while using the strongest agent for under 25% of calls.
-
Learning Agent Routing From Early Experience
BoundaryRouter routes queries to LLM or agent using early experience memory from a seed set, cutting inference time 60.6% versus always using agents and raising performance 28.6% versus always using direct LLM inference.
-
When Do LLMs Reason? A Dynamical Systems View via Entropy Phase Transitions
Early entropy dynamics during LLM decoding mark when explicit reasoning becomes beneficial, enabling the training-free EDRM router that selects strategies per instance and yields 41-55% token savings with accuracy gains across 15 benchmarks.
- Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents
- RouterWise: Joint Resource Allocation and Routing for Latency-Aware Multi-Model LLM Serving