TACO combines Differential Answer-Probe Reward (DAPR) and Outcome-Gated Advantage Routing (OGAR) to assign credit to tool calls in agentic visual reasoning, producing accuracy gains on multimodal benchmarks.
RadialRouter: Structured representation for efficient and robust large language models routing.arXiv preprint arXiv:2506.03880, 2025
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
SeqRoute applies offline RL with CQL and Hindsight Budget Relabeling to sequential LLM routing under global budgets, claiming 6.0-73.5% cost reduction, maintained or improved quality, and under 1% bankruptcy rate.
citing papers explorer
-
SeqRoute: Global Budget-Aware Sequential LLM Routing via Offline Reinforcement Learning
SeqRoute applies offline RL with CQL and Hindsight Budget Relabeling to sequential LLM routing under global budgets, claiming 6.0-73.5% cost reduction, maintained or improved quality, and under 1% bankruptcy rate.