Introduces NeuroDoc and NeuroAudit to create a community-reviewed corpus of 53 EEG benchmark entries with 245 task definitions using a rulebook-guided task document and executable kernel.
Auto-benchmarkcard: Automated synthesis of benchmark documentation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.
citing papers explorer
-
Evaluation Cards: An Interpretive Layer for AI Evaluation Reporting
EvalCards is a composable reporting schema and monitoring tool for AI evaluations, derived from 52 papers and 10 interviews, and applied to 5,816 models and 101,843 results to surface reporting gaps.