Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.
Mlr-bench: Evaluating ai agents on open-ended machine learning research.Advances in Neural Information Processing Systems, 38, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
1GC-7RC: One Graphic Card -- Seven Research Challenges! How Good Are AI Agents at Doing Your Job?
Introduces the 1GC-7RC benchmark to evaluate AI coding agents on seven diverse ML tasks under single-GPU time and access constraints.