JunoBench is the first benchmark of 111 reproducible crashes in Python ML Jupyter notebooks from Kaggle, with verified fixes and rich annotations for bug research.
From bugs to bench- marks: A comprehensive survey of software defect datasets
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.SE 3verdicts
UNVERDICTED 3representative citing papers
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
21.6% of Defects4J defects are unsuitable and 7.1% have under-specified test suites for reproducible APR evaluation.
citing papers explorer
-
JunoBench: A Benchmark Dataset of Crashes in Python Machine Learning Jupyter Notebooks
JunoBench is the first benchmark of 111 reproducible crashes in Python ML Jupyter notebooks from Kaggle, with verified fixes and rich annotations for bug research.
-
What Makes Software Bugs Escape Testing? Evidence from a Large-Scale Empirical Study
Post-release defects concentrate in older, frequently modified high-churn components and require longer and more complex fixes than pre-release defects.
-
Reproducible Automated Program Repair Is Hard -- Experiences With the Defects4J Dataset
21.6% of Defects4J defects are unsuitable and 7.1% have under-specified test suites for reproducible APR evaluation.