JunoBench is the first benchmark of 111 reproducible crashes in Python ML Jupyter notebooks from Kaggle, with verified fixes and rich annotations for bug research.
From bugs to bench- marks: A comprehensive survey of software defect datasets
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
fields
cs.SE 3verdicts
UNVERDICTED 3representative citing papers
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
21.6% of Defects4J defects are unsuitable and 7.1% have under-specified test suites for reproducible APR evaluation.
citing papers explorer
-
What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
-
Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset
21.6% of Defects4J defects are unsuitable and 7.1% have under-specified test suites for reproducible APR evaluation.