LQM-ContextRoute routes tool calls by expected quality per service cycle using contextual bandits and LLM-as-judge feedback, yielding +2.18 pp F1, up to +18 pp accuracy, and +2.91-3.22 pp NDCG gains over SW-UCB on web-search, StrategyQA, and retriever benchmarks.
Thompson Sampling contextual bandit over heterogeneous tools (PubMed, drug DBs, calculator, web) with composite reward including latency
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Latency-Quality Routing for Functionally Equivalent Tools in LLM Agents
LQM-ContextRoute routes tool calls by expected quality per service cycle using contextual bandits and LLM-as-judge feedback, yielding +2.18 pp F1, up to +18 pp accuracy, and +2.91-3.22 pp NDCG gains over SW-UCB on web-search, StrategyQA, and retriever benchmarks.