SWE-Router introduces trajectory-conditioned value-based routing for LLM agents on SWE tasks, with a Bayes-optimality theorem and empirical cost savings while retaining most strong-model performance.
Irt-router: Effective and interpretable multi-llm routing via item response theory
4 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
LLM routers across 21 methods on 5 benchmarks converge to similar accuracy below oracle due to learning global performance trends rather than fine-grained query signals.
IR3DE is a ridge regression router for domain-expert LLMs that matches or exceeds baselines in language modeling and reasoning tasks while allowing dynamic expert addition or removal without retraining.
MedIRT applies Item Response Theory to medical LLM benchmarks to separate latent competency from item difficulty and discrimination, producing more stable rankings and revealing domain heterogeneity than accuracy alone.
citing papers explorer
-
Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks
MedIRT applies Item Response Theory to medical LLM benchmarks to separate latent competency from item difficulty and discrimination, producing more stable rankings and revealing domain heterogeneity than accuracy alone.