Irt-router: Effective and interpretable multi-llm routing via item response theory

Wei Song, Zhenya Huang, Cheng Cheng, Weibo Gao, Bihan Xu, GuanHao Zhao, Fei Wang, Runze Wu · 2025 · arXiv 2506.01048

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

LLM routers across 21 methods on 5 benchmarks converge to similar accuracy below oracle due to learning global performance trends rather than fine-grained query signals.

Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks

cs.CL · 2025-09-29 · conditional · novelty 5.0

MedIRT applies Item Response Theory to medical LLM benchmarks to separate latent competency from item difficulty and discrimination, producing more stable rankings and revealing domain heterogeneity than accuracy alone.

citing papers explorer

Showing 2 of 2 citing papers.

The Routing Plateau: Understanding and Breaking the Accuracy Limits of LLM Routers cs.LG · 2026-05-27 · unverdicted · none · ref 56
LLM routers across 21 methods on 5 benchmarks converge to similar accuracy below oracle due to learning global performance trends rather than fine-grained query signals.
Measuring Competency, Not Performance: Item-Aware Evaluation Across Medical Benchmarks cs.CL · 2025-09-29 · conditional · none · ref 28
MedIRT applies Item Response Theory to medical LLM benchmarks to separate latent competency from item difficulty and discrimination, producing more stable rankings and revealing domain heterogeneity than accuracy alone.

Irt-router: Effective and interpretable multi-llm routing via item response theory

fields

years

verdicts

representative citing papers

citing papers explorer