Switchcraft routes agentic tool-calling queries to the lowest-cost model that preserves correctness, reaching 82.9% accuracy and 84% cost reduction on five benchmarks.
Tensoropera router: A multi-model router for efficient llm inference.arXiv preprint arXiv:2408.12320
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
A well-tuned kNN router matches or exceeds state-of-the-art learned routers on new standardized benchmarks spanning instruction, QA, reasoning, and the first multi-modal visual routing dataset, due to locality of model performance in embedding space.
CAMI frames multi-index construction for semantic retrieval as a budgeted multi-objective portfolio problem and uses agent-guided search plus confidence-aware pruning to find high-recall configurations with reduced evaluation cost.
A systematic survey of LLM ensemble methods organized into a taxonomy of ensemble-before-inference, ensemble-during-inference, and ensemble-after-inference stages, with review of benchmarks, applications, and future directions.
citing papers explorer
-
CAMI: Cost-Aware Agent-Guided Multi-Indexing for Semantic Retrieval
CAMI frames multi-index construction for semantic retrieval as a budgeted multi-objective portfolio problem and uses agent-guided search plus confidence-aware pruning to find high-recall configurations with reduced evaluation cost.