PACE builds proxy benchmarks from non-agentic instances via relevance and global selection plus regression to predict agentic scores with MAE under 4%, Spearman correlation above 0.80, and 85% ranking accuracy at under 1% cost.
Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PACE: A Proxy for Agentic Capability Evaluation
PACE builds proxy benchmarks from non-agentic instances via relevance and global selection plus regression to predict agentic scores with MAE under 4%, Spearman correlation above 0.80, and 85% ranking accuracy at under 1% cost.