Switchcraft routes agentic tool-calling queries to the lowest-cost model that preserves correctness, reaching 82.9% accuracy and 84% cost reduction on five benchmarks.
Tensoropera router: A multi-model router for efficient llm inference
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A well-tuned kNN router matches or exceeds state-of-the-art learned routers on new standardized benchmarks spanning instruction, QA, reasoning, and the first multi-modal visual routing dataset, due to locality of model performance in embedding space.
A systematic survey of LLM ensemble methods organized into a taxonomy of ensemble-before-inference, ensemble-during-inference, and ensemble-after-inference stages, with review of benchmarks, applications, and future directions.
citing papers explorer
-
Switchcraft: AI Model Router for Agentic Tool Calling
Switchcraft routes agentic tool-calling queries to the lowest-cost model that preserves correctness, reaching 82.9% accuracy and 84% cost reduction on five benchmarks.
-
Rethinking Predictive Modeling for LLM Routing: When Simple kNN Beats Complex Learned Routers
A well-tuned kNN router matches or exceeds state-of-the-art learned routers on new standardized benchmarks spanning instruction, QA, reasoning, and the first multi-modal visual routing dataset, due to locality of model performance in embedding space.
-
Harnessing Multiple Large Language Models: A Survey on LLM Ensemble
A systematic survey of LLM ensemble methods organized into a taxonomy of ensemble-before-inference, ensemble-during-inference, and ensemble-after-inference stages, with review of benchmarks, applications, and future directions.