TwinRouterBench supplies step-level static evaluation with 970 prefixes and verified tiers plus a dynamic harness for live SWE-bench agent runs, enabling deterministic scoring for agentic LLM routing.
Triage: Routing Software Engineering Tasks to Cost-Effective LLM Tiers via Code Quality Signals
2 Pith papers cite this work. Polarity classification is still indexing.
abstract
Context: AI coding agents route every task to a single frontier large language model (LLM), paying premium inference cost even when many tasks are routine. Objectives: We propose Triage, a framework that uses code health metrics -- indicators of software maintainability -- as a routing signal to assign each task to the cheapest model tier whose output passes the same verification gate as the expensive model. Methods: Triage defines three capability tiers (light, standard, heavy -- mirroring, e.g., Haiku, Sonnet, Opus) and routes tasks based on pre-computed code health sub-factors and task metadata. We design an evaluation comparing three routing policies on SWE-bench Lite (300 tasks across three model tiers): heuristic thresholds, a trained ML classifier, and a perfect-hindsight oracle. Results: We analytically derived two falsifiable conditions under which the tier-dependent asymmetry (medium LLMs benefit from clean code while frontier models do not) yields cost-effective routing: the light-tier pass rate on healthy code must exceed the inter-tier cost ratio, and code health must discriminate the required model tier with at least a small effect size ($\hat{p} \geq 0.56$). Conclusion: Triage transforms a diagnostic code quality metric into an actionable model-selection signal. We present a rigorous evaluation protocol to test the cost--quality trade-off and identify which code health sub-factors drive routing decisions.
years
2026 2representative citing papers
HyDRA routes queries to cost-effective LLMs by predicting multi-dimensional capability requirements with a multi-head encoder and applying shortfall matching against configuration-defined model profiles, delivering up to 72.5 percent cost savings on coding benchmarks while remaining decoupled from具体
citing papers explorer
-
TwinRouterBench: Fast Static and Live Dynamic Evaluation for Realistic Agentic LLM Routing
TwinRouterBench supplies step-level static evaluation with 970 prefixes and verified tiers plus a dynamic harness for live SWE-bench agent runs, enabling deterministic scoring for agentic LLM routing.
-
HyDRA: Hybrid Dynamic Routing Architecture for Heterogeneous LLM Pools
HyDRA routes queries to cost-effective LLMs by predicting multi-dimensional capability requirements with a multi-head encoder and applying shortfall matching against configuration-defined model profiles, delivering up to 72.5 percent cost savings on coding benchmarks while remaining decoupled from具体