Introduces the first community-governed unified JSON schema and crowdsourced repository for AI evaluation results, with converters and a database spanning 22,235 models and 2,273 benchmarks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
AgentBeats implements agentified evaluation of diverse AI agents through standardized interfaces, validated at scale in a five-month competition with 298 judges and 467 subjects plus a coding case study.
citing papers explorer
-
Every Eval Ever: A Unifying Schema and Community Repository for AI Evaluation Results
Introduces the first community-governed unified JSON schema and crowdsourced repository for AI evaluation results, with converters and a database spanning 22,235 models and 2,273 benchmarks.
-
AgentBeats: Agentifying Agent Assessment for Openness, Standardization, and Reproducibility
AgentBeats implements agentified evaluation of diverse AI agents through standardized interfaces, validated at scale in a five-month competition with 298 judges and 467 subjects plus a coding case study.