Map- ping global dynamics of benchmark creation and saturation in artificial intelligence

Adriano Barbosa-Silva, Simon Ott, Kathrin Blagec, Jan Brauner, Matthias Samwald · 2022 · arXiv 2203.04592

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge?

cs.CL · 2026-05-28 · unverdicted · novelty 6.0

SEAL revives saturated benchmarks via adaptive LLM meta-judging in elimination matches, matching full pairwise accuracy with roughly half the calls across code, math, QA, and agent tasks.

The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators

cs.LG · 2026-06-24 · unverdicted · novelty 5.0

RQGM enables co-evolution of agents and evaluators across epochs with non-stationary utilities, reporting gains in coding pass rates, paper acceptance, and proof grading over prior self-improving agents.

citing papers explorer

Showing 2 of 2 citing papers after filters.

SEAL: Can Saturated Benchmarks Be Revived by LLM-as-a-Meta-Judge? cs.CL · 2026-05-28 · unverdicted · none · ref 2
SEAL revives saturated benchmarks via adaptive LLM meta-judging in elimination matches, matching full pairwise accuracy with roughly half the calls across code, math, QA, and agent tasks.
The Red Queen G\"odel Machine: Co-Evolving Agents and Their Evaluators cs.LG · 2026-06-24 · unverdicted · none · ref 14
RQGM enables co-evolution of agents and evaluators across epochs with non-stationary utilities, reporting gains in coding pass rates, paper acceptance, and proof grading over prior self-improving agents.

Map- ping global dynamics of benchmark creation and saturation in artificial intelligence

fields

years

verdicts

representative citing papers

citing papers explorer