Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores

Ramaswamy, Ashwin, Demeure, Nestor, Rrapaj, Ermal · 2025 · DOI 10.18653/v1/2025.emnlp-main.1534

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open at publisher browse 1 citing papers

representative citing papers

PACE: A Proxy for Agentic Capability Evaluation

cs.AI · 2026-07-02 · unverdicted · novelty 6.0

PACE builds proxy benchmarks from non-agentic instances via relevance and global selection plus regression to predict agentic scores with MAE under 4%, Spearman correlation above 0.80, and 85% ranking accuracy at under 1% cost.

citing papers explorer

Showing 1 of 1 citing paper after filters.

PACE: A Proxy for Agentic Capability Evaluation cs.AI · 2026-07-02 · unverdicted · none · ref 40
PACE builds proxy benchmarks from non-agentic instances via relevance and global selection plus regression to predict agentic scores with MAE under 4%, Spearman correlation above 0.80, and 85% ranking accuracy at under 1% cost.

Model Consistency as a Cheap yet Predictive Proxy for LLM Elo Scores

fields

years

verdicts

representative citing papers

citing papers explorer