Auto-benchmarkcard: Automated synthesis of benchmark documentation

Aris Hofmann, Inge Vejsbjerg, Dhaval Salwala, Elizabeth Daly · 2026 · DOI 10.1609/aaai.v40i48.42352

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

EEG Benchmarking Needs a Task Specification Layer: NeuroDoc for Rulebook-Guided, Executable Benchmark Construction

cs.LG · 2026-06-22 · unverdicted · novelty 7.0

Introduces NeuroDoc and NeuroAudit to create a community-reviewed corpus of 53 EEG benchmark entries with 245 task definitions using a rulebook-guided task document and executable kernel.

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting

cs.AI · 2026-06-08 · unverdicted · novelty 6.0

EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting cs.AI · 2026-06-08 · unverdicted · none · ref 53
EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.

Auto-benchmarkcard: Automated synthesis of benchmark documentation

fields

years

verdicts

representative citing papers

citing papers explorer