pith. sign in

arxiv: 2504.09651 · v4 · submitted 2025-04-13 · 💻 cs.SE

GitBugs: Bug Reports for Duplicate Detection, Retrieval Augmented Generation, Triage, and More

Pith reviewed 2026-05-22 20:19 UTC · model grok-4.3

classification 💻 cs.SE
keywords bug reportsduplicate detectionsoftware engineering datasetsissue tracking systemsmachine learningopen source projectstriage automation
0
0 comments X

The pith

GitBugs supplies over 150,000 standardized bug reports from nine projects to support duplicate detection and related machine learning tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper assembles GitBugs as a large collection of bug reports drawn from active open-source projects to fill gaps left by smaller or older datasets. It pulls records from three different trackers and converts them into uniform fields while adding fixed train-test splits. The result lets researchers run experiments on duplicate finding, automated triage, resolution prediction, and temporal trends without first building their own data pipeline. The authors also release notebooks and per-project statistics to make the resource immediately usable.

Core claim

We present GitBugs-a comprehensive and up-to-date dataset comprising over 150,000 bug reports from nine actively maintained open-source projects, including Firefox, Cassandra, and VS Code. GitBugs aggregates data from Github, Bugzilla and Jira issue trackers, offering standardized categorical fields for classification tasks and predefined train/test splits for duplicate bug detection.

What carries the argument

The GitBugs dataset, which pulls bug reports from GitHub, Bugzilla, and Jira trackers and converts them into uniform categorical fields plus fixed train-test splits.

If this is right

  • Duplicate detection models can be trained and evaluated on the supplied splits across multiple projects.
  • Retrieval-augmented generation systems can draw from the standardized bug text and metadata.
  • Automated triage and resolution-time prediction experiments become possible with the categorical fields.
  • Temporal analyses of bug resolution patterns are enabled by the date information in the records.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Cross-project comparisons of duplicate rates may reveal differences in how trackers label related issues.
  • The fixed splits reduce the chance that published results overfit to particular data partitions.
  • Models trained on this collection could be tested for transfer to new projects not included in the nine.

Load-bearing premise

Aggregation from the three trackers produces correctly standardized fields and accurate duplicate labels without systematic extraction errors or loss of metadata.

What would settle it

A manual audit of a random sample of duplicate labels against the original tracker pages to measure mismatch rate or missing fields.

Figures

Figures reproduced from arXiv: 2504.09651 by Aryan Jadon, Avinash Patil, Siru Tao.

Figure 1
Figure 1. Figure 1: Monthly bug report trends from 2020 to 2024 across [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Kernel density estimates of bug resolution times across [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Distribution of bug resolution times across projects [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Monthly bug report forecasts using ARIMA and [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 7
Figure 7. Figure 7: Monthly distribution of bug reports across LDA-derived [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗
Figure 6
Figure 6. Figure 6: Scatter plot of actual vs. predicted bug fix times. The [PITH_FULL_IMAGE:figures/full_fig_p005_6.png] view at source ↗
Figure 9
Figure 9. Figure 9: Histogram of cosine similarity scores between bug [PITH_FULL_IMAGE:figures/full_fig_p006_9.png] view at source ↗
read the original abstract

Bug reports provide critical insights into software quality, yet existing datasets often suffer from limited scope, outdated content, or insufficient metadata for machine learning. To address these limitations, we present GitBugs-a comprehensive and up-to-date dataset comprising over 150,000 bug reports from nine actively maintained open-source projects, including Firefox, Cassandra, and VS Code. GitBugs aggregates data from Github, Bugzilla and Jira issue trackers, offering standardized categorical fields for classification tasks and predefined train/test splits for duplicate bug detection. In addition, it includes exploratory analysis notebooks and detailed project-level statistics, such as duplicate rates and resolution times. GitBugs supports various software engineering research tasks, including duplicate detection, retrieval augmented generation, resolution prediction, automated triaging, and temporal analysis. The openly licensed dataset provides a valuable cross-project resource for benchmarking and advancing automated bug report analysis. Access the data and code at https://github.com/av9ash/gitbugs/.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents GitBugs, a dataset of over 150,000 bug reports from nine open-source projects (Firefox, Cassandra, VS Code, etc.) aggregated from GitHub, Bugzilla, and Jira. It claims to deliver standardized categorical fields, predefined train/test splits for duplicate detection, project-level statistics (duplicate rates, resolution times), and exploratory notebooks to support duplicate detection, RAG, resolution prediction, triaging, and temporal analysis. The resource is openly licensed with code and data at a GitHub repository.

Significance. If the standardization and duplicate labeling are accurate, GitBugs would be a useful large-scale, up-to-date, cross-project resource that improves on prior datasets in scope and metadata richness. The provision of ready train/test splits, analysis notebooks, and open licensing with reproducible code are concrete strengths that would facilitate benchmarking and adoption in software engineering ML research.

major comments (2)
  1. [§3] §3 (Data Collection and Standardization): the manuscript asserts that aggregation produces correctly standardized categorical fields and accurate duplicate labels but reports no quantitative validation (spot-check error rates, cross-tracker consistency metrics, or inter-annotator agreement). This directly undermines the central claim that the dataset is ready for downstream ML tasks without systematic extraction errors or metadata loss.
  2. [§4] §4 (Dataset Statistics and Splits): the reported duplicate rates and predefined train/test splits are presented as reliable for benchmarking, yet without any verification that duplicate relations from the three heterogeneous trackers were faithfully preserved, the splits cannot be guaranteed to be free of extraction artifacts.
minor comments (2)
  1. [Abstract] Abstract: 'Github' should be capitalized consistently as 'GitHub'.
  2. [§3] The manuscript would benefit from an explicit statement of the exact mapping rules used for status/priority/severity fields to allow independent verification.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript describing the GitBugs dataset. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [§3] §3 (Data Collection and Standardization): the manuscript asserts that aggregation produces correctly standardized categorical fields and accurate duplicate labels but reports no quantitative validation (spot-check error rates, cross-tracker consistency metrics, or inter-annotator agreement). This directly undermines the central claim that the dataset is ready for downstream ML tasks without systematic extraction errors or metadata loss.

    Authors: We agree that the original submission did not report quantitative validation metrics such as spot-check error rates. Standardization was performed via deterministic, rule-based field mappings derived from each tracker's public API documentation, with duplicate labels imported verbatim from the source 'duplicates' fields. Inter-annotator agreement does not apply, as the labels originate from the trackers themselves rather than new annotation. In the revised manuscript we will add a dedicated validation subsection presenting manual spot-check results on 100 randomly sampled reports per project, reporting per-field accuracy and duplicate-label fidelity. revision: yes

  2. Referee: [§4] §4 (Dataset Statistics and Splits): the reported duplicate rates and predefined train/test splits are presented as reliable for benchmarking, yet without any verification that duplicate relations from the three heterogeneous trackers were faithfully preserved, the splits cannot be guaranteed to be free of extraction artifacts.

    Authors: Each of the nine projects originates from a single tracker, so no cross-tracker duplicate relations exist. Duplicate pairs are kept together within the same split by construction during the per-project partitioning step; the splitting code is released in the repository. We acknowledge that an explicit verification step confirming preservation was not described. The revision will include a short verification paragraph and table confirming that all duplicate relations are co-located in the splits and that duplicate-rate statistics match the source data. revision: yes

Circularity Check

0 steps flagged

No circularity: dataset resource paper with no derivations or predictions

full rationale

The paper is a data-resource contribution that aggregates bug reports from GitHub, Bugzilla, and Jira into a standardized dataset of over 150,000 reports. No equations, predictions, fitted parameters, or first-principles derivations are present in the abstract or described structure. Claims about standardization and duplicate labels are presented as outcomes of the aggregation pipeline rather than results derived from prior outputs within the paper. The work is self-contained as a factual description of data collection and release, with no load-bearing steps that reduce to self-definition, self-citation chains, or renaming of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Dataset release paper; contains no free parameters, mathematical axioms, or invented entities.

pith-pipeline@v0.9.0 · 5698 in / 927 out tokens · 87101 ms · 2026-05-22T20:19:24.031941+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

  1. [1]

    The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information,

    A. Lamkanfi, J. P ´erez, and S. Demeyer, “The eclipse and mozilla defect tracking dataset: a genuine dataset for mining bug information,” in 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 2013, pp. 203–206

  2. [2]

    Bughub: A collection of free-text bug reports for duplicate issue identification,

    L. Team, “Bughub: A collection of free-text bug reports for duplicate issue identification,” GitHub repository, 2018. [Online]. Available: https://github.com/logpai/bughub

  3. [3]

    Andror2: A dataset of manually-reproduced bug reports for android apps,

    T. Wendland, J. Sun, J. Mahmud, S. H. Mansur, S. Huang, K. Moran, J. Rubin, and M. Fazzini, “Andror2: A dataset of manually-reproduced bug reports for android apps,” in2021 IEEE/ACM 18th International Conference on Mining Software Repositories (MSR). IEEE, 2021, pp. 600–604

  4. [4]

    An auto- matically created novel bug dataset and its validation in bug prediction,

    R. Ferenc, P. Gyimesi, G. Gyimesi, Z. T ´oth, and T. Gyim´othy, “An auto- matically created novel bug dataset and its validation in bug prediction,” Journal of Systems and Software, vol. 169, p. 110691, 2020

  5. [5]

    Bugl–a cross-language dataset for bug localization,

    S. Muvva, A. E. Rao, and S. Chimalakonda, “Bugl–a cross-language dataset for bug localization,”arXiv preprint arXiv:2004.08846, 2020. [Online]. Available: https://arxiv.org/abs/2004.08846

  6. [6]

    From reports to bug-fix commits: A 10 years dataset of bug-fixing activity from 55 apache’s open source projects,

    R. Vieira, A. da Silva, L. Rocha, and J. P. Gomes, “From reports to bug-fix commits: A 10 years dataset of bug-fixing activity from 55 apache’s open source projects,” inProceedings of the Fifteenth International Conference on Predictive Models and Data Analytics in Software Engineering, 2019, pp. 80–89

  7. [7]

    Regminer: mining replicable regression dataset from code repositories,

    X. Song, Y . Lin, Y . Wu, Y . Zhang, S. H. Ng, X. Peng, J. S. Dong, and H. Mei, “Regminer: mining replicable regression dataset from code repositories,” inProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, 2022, pp. 1711–1715

  8. [8]

    The promise repository of empirical software engineer- ing data,

    G. Boetticher, “The promise repository of empirical software engineer- ing data,”http://promisedata. org/repository, 2007

  9. [9]

    Defects4j: A database of existing faults to enable controlled testing studies for java programs,

    R. Just, D. Jalali, and M. D. Ernst, “Defects4j: A database of existing faults to enable controlled testing studies for java programs,” inPro- ceedings of the 2014 international symposium on software testing and analysis, 2014, pp. 437–440

  10. [10]

    Towards more accurate retrieval of duplicate bug reports,

    C. Sun, D. Lo, S.-C. Khoo, and J. Jiang, “Towards more accurate retrieval of duplicate bug reports,” in2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011). IEEE, 2011, pp. 253–262

  11. [11]

    A comparative analysis of text embedding models for bug report semantic similarity,

    A. Patil, K. Han, and A. Jadon, “A comparative analysis of text embedding models for bug report semantic similarity,” in2024 11th International Conference on Signal Processing and Integrated Networks (SPIN). IEEE, 2024, pp. 262–267

  12. [12]

    Who should fix this bug?

    J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proceedings of the 28th international conference on Software engineer- ing, 2006, pp. 361–370

  13. [13]

    Predicting the severity of a reported bug,

    A. Lamkanfi, S. Demeyer, E. Giger, and B. Goethals, “Predicting the severity of a reported bug,” in2010 7th IEEE working conference on mining software repositories (MSR 2010). IEEE, 2010, pp. 1–10

  14. [14]

    Cc2vec: Distributed representations of code changes,

    T. Hoang, H. J. Kang, D. Lo, and J. Lawall, “Cc2vec: Distributed representations of code changes,” inProceedings of the ACM/IEEE 42nd international conference on software engineering, 2020, pp. 518–529

  15. [15]

    Github issues dataset (8m),

    D. Shinn, “Github issues dataset (8m),” 2017, available on Kaggle. [Online]. Available: https://www.kaggle.com/datasets/ davidshinn/github-issues

  16. [16]

    Bugrepo: A dataset for duplicate bug report detection,

    O. Lazar, J. Ritchey, and B. Sharif, “Bugrepo: A dataset for duplicate bug report detection,” 2018, zenodo, DOI: 10.5281/zenodo.1246025. [Online]. Available: https://doi.org/10.5281/zenodo.1246025

  17. [17]

    Eclipse bugzilla dataset,

    J. Sang and A. Mockus, “Eclipse bugzilla dataset,” 2022, mendeley Data, DOI: 10.17632/t6d9y7yt54.1. [Online]. Available: https://doi.org/ 10.17632/t6d9y7yt54.1

  18. [18]

    The public jira dataset,

    L. Montgomery, F. L ¨uders, and W. Maalej, “The public jira dataset,” 2022, zenodo, DOI: 10.5281/zenodo.5901804. [Online]. Available: https://doi.org/10.5281/zenodo.5901804

  19. [19]

    Cupid: Leveraging chatgpt for more accurate duplicate bug report detection,

    T. Zhang, I. C. Irsan, F. Thung, and D. Lo, “Cupid: Leveraging chatgpt for more accurate duplicate bug report detection,”arXiv preprint arXiv:2308.10022, 2023

  20. [20]

    Towards understanding the impacts of textual dissimilarity on duplicate bug report detection,

    S. Jahan and M. M. Rahman, “Towards understanding the impacts of textual dissimilarity on duplicate bug report detection,” in2023 IEEE international conference on software analysis, evolution and reengineering (SANER). IEEE, 2023, pp. 25–36

  21. [21]

    Understanding the impact of domain term explanation on duplicate bug report detection,

    U. Mukherjee and M. M. Rahman, “Understanding the impact of domain term explanation on duplicate bug report detection,” inProceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering, 2025, pp. 568–579

  22. [22]

    Ex- ploring the role of automation in duplicate bug report detection: An industrial case study,

    M. G ¨otharsson, K. Stahre, G. Gay, and F. G. de Oliveira Neto, “Ex- ploring the role of automation in duplicate bug report detection: An industrial case study,” inProceedings of the 5th ACM/IEEE International Conference on Automation of Software Test (AST 2024), 2024, pp. 193– 203

  23. [23]

    An intelligent duplicate bug report detection method based on technical term extraction,

    X. Wu, W. Shan, W. Zheng, Z. Chen, T. Ren, and X. Sun, “An intelligent duplicate bug report detection method based on technical term extraction,” in2023 IEEE/ACM International Conference on Automation of Software Test (AST). IEEE, 2023, pp. 1–12

  24. [24]

    Automated duplicate bug report detection in large open bug repositories,

    C. E. Laney, A. Barovic, and A. Moin, “Automated duplicate bug report detection in large open bug repositories,” in2025 IEEE 49th Annual Computers, Software, and Applications Conference (COMPSAC). IEEE, 2025, pp. 450–458

  25. [25]

    Combining retrieval and classification: Balancing efficiency and accuracy in duplicate bug report detection,

    Q. Meng, X. Zhang, G. Ramackers, and V . Joost, “Combining retrieval and classification: Balancing efficiency and accuracy in duplicate bug report detection,”arXiv preprint arXiv:2404.14877, 2024

  26. [26]

    Does deep learning improve the performance of duplicate bug report detection? an empirical study,

    Y . Jiang, X. Su, C. Treude, C. Shang, and T. Wang, “Does deep learning improve the performance of duplicate bug report detection? an empirical study,”Journal of Systems and Software, vol. 198, p. 111607, 2023

  27. [27]

    A case study on the limitations of automated duplicate bug report detection,

    M. G ¨otharsson and K. Stahre, “A case study on the limitations of automated duplicate bug report detection,” 2023

  28. [28]

    Denature: duplicate detection and type identification in open source bug repositories,

    R. Chauhan, S. Sharma, and A. Goyal, “Denature: duplicate detection and type identification in open source bug repositories,”International Journal of System Assurance Engineering and Management, vol. 14, no. Suppl 1, pp. 275–292, 2023

  29. [29]

    Impact of textual (dis) similarities of bug report sections on duplicate bug report detection performance,

    L. Ghadhab and P. N. B. Amor, “Impact of textual (dis) similarities of bug report sections on duplicate bug report detection performance,” in International Conference on Service-Oriented Computing. Springer, 2024, pp. 188–194

  30. [30]

    Duplicate bug report detection using named entity recognition,

    W. Zheng, Y . Li, X. Wu, and J. Cheng, “Duplicate bug report detection using named entity recognition,”Knowledge-based systems, vol. 284, p. 111258, 2024

  31. [31]

    Bert based severity prediction of bug reports for the maintenance of mobile applications,

    A. Ali, Y . Xia, Q. Umer, and M. Osman, “Bert based severity prediction of bug reports for the maintenance of mobile applications,”Journal of Systems and Software, vol. 208, p. 111898, 2024

  32. [32]

    Bug report severity prediction based on text embedding via graph transformer,

    A. Zhou, G. Liu, and J. Mei, “Bug report severity prediction based on text embedding via graph transformer,”Applied Soft Computing, p. 114491, 2025

  33. [33]

    Graph neural network vs. large language model: A comparative analysis for bug report priority and severity prediction,

    J. Acharya and G. Ginde, “Graph neural network vs. large language model: A comparative analysis for bug report priority and severity prediction,” inProceedings of the 20th international conference on predictive models and data analytics in software engineering, 2024, pp. 2–11

  34. [34]

    Bug severity prediction using lda and sentiment scores: A cnn approach,

    R. Bibyan, S. Anand, A. Jaiswal, and A. G. Aggarwal, “Bug severity prediction using lda and sentiment scores: A cnn approach,”Expert Systems, vol. 41, no. 7, p. e13264, 2024

  35. [35]

    Improving bug severity prediction with domain-specific representation learning,

    Y . Wei, C. Zhang, and T. Ren, “Improving bug severity prediction with domain-specific representation learning,”Ieee Access, vol. 11, pp. 62 829–62 839, 2023

  36. [36]

    Crowdsourced bug report severity prediction based on text and image understanding via heterogeneous graph convolutional networks,

    Y . Wu, C. Lin, A. Liu, L. Zhao, and X. Zhang, “Crowdsourced bug report severity prediction based on text and image understanding via heterogeneous graph convolutional networks,”Journal of Software: Evolution and Process, vol. 36, no. 11, p. e2705, 2024

  37. [37]

    Software bug severity prediction using convolutional neural network and bilstm models,

    T. S. Mian and A. Alsaeedi, “Software bug severity prediction using convolutional neural network and bilstm models,” inInternational Conference of Reliable Information and Communication Technology. Springer, 2023, pp. 1–12

  38. [38]

    Method-level bug severity prediction using source code metrics and llms,

    E. Mashhadi, H. Ahmadvand, and H. Hemmati, “Method-level bug severity prediction using source code metrics and llms,” in2023 IEEE 34th international symposium on software reliability engineering (IS- SRE). IEEE, 2023, pp. 635–646

  39. [39]

    Machine learning-based meth- ods for identifying bug severity level from bug reports,

    K. Sarawan, J. Polpinij, and B. Luaphol, “Machine learning-based meth- ods for identifying bug severity level from bug reports,” inInternational Conference on Computing and Information Technology. Springer, 2023, pp. 199–208

  40. [40]

    A machine learning approach for classifying the default bug severity level,

    A. Aburakhia and M. Alshayeb, “A machine learning approach for classifying the default bug severity level,”Arabian Journal for Science and Engineering, vol. 49, no. 9, pp. 13 131–13 148, 2024

  41. [41]

    Method-level bug prediction: Problems and promises,

    S. Chowdhury, G. Uddin, H. Hemmati, and R. Holmes, “Method-level bug prediction: Problems and promises,”ACM Transactions on Software Engineering and Methodology, vol. 33, no. 4, pp. 1–31, 2024

  42. [42]

    Improving bug assignment and developer allocation in software engineering through interpretable machine learning models,

    M. Samir, N. Sherief, and W. Abdelmoez, “Improving bug assignment and developer allocation in software engineering through interpretable machine learning models,”Computers, vol. 12, no. 7, p. 128, 2023

  43. [43]

    Automatic bug assignments without texts: a study,

    Z. Li and K. Huang, “Automatic bug assignments without texts: a study,” Frontiers of Computer Science, vol. 18, no. 4, p. 184210, 2024

  44. [44]

    Adopting automated bug assignment in practice—a longitudinal case study at ericsson,

    M. Borg, L. Jonsson, E. Engstr ¨om, B. Bartalos, and A. Szab´o, “Adopting automated bug assignment in practice—a longitudinal case study at ericsson,”Empirical Software Engineering, vol. 29, no. 5, p. 126, 2024

  45. [45]

    Fixer-level supervised contrastive learning for bug assignment,

    R. Wang, X. Ji, Y . Tian, S. Xu, X. Sun, and S. Jiang, “Fixer-level supervised contrastive learning for bug assignment,”Empirical Software Engineering, vol. 30, no. 3, p. 76, 2025

  46. [46]

    A method of component prediction for crash bug reports using component-based features and machine learning,

    Y . Xu, C. Liu, Y . Li, Q. Xie, and H.-D. Choi, “A method of component prediction for crash bug reports using component-based features and machine learning,” in2023 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2023, pp. 773– 777

  47. [47]

    Leveraging machine learning for enhanced bug triaging in open-source software projects,

    N. Adhikari, R. Bista, and J. C. Ferreira, “Leveraging machine learning for enhanced bug triaging in open-source software projects,”IEEE Access, 2025

  48. [48]

    Btal: An imbalance software bug report triage approach based on bert-textcnn,

    Y . Zhang, Y . Sun, Y . Shi, S. Jiang, and G. Yuan, “Btal: An imbalance software bug report triage approach based on bert-textcnn,”Information and Software Technology, vol. 183, p. 107731, 2025

  49. [49]

    Comparison of ml, deep learning and bio-inspired algorithms in bug triaging,

    A. Yadav, G. Yadav, S. Jain, and S. A. Dwivedi, “Comparison of ml, deep learning and bio-inspired algorithms in bug triaging,” inProceed- ings of the 2023 Fifteenth International Conference on Contemporary Computing, 2023, pp. 759–765

  50. [50]

    An empirical assessment of machine learning ap- proaches for triaging reports of static analysis tools,

    S. Yerramreddy, A. Mordahl, U. Koc, S. Wei, J. S. Foster, M. Carpuat, and A. A. Porter, “An empirical assessment of machine learning ap- proaches for triaging reports of static analysis tools,”Empirical Software Engineering, vol. 28, no. 2, p. 28, 2023

  51. [51]

    A comparative study of transformer- based neural text representation techniques on bug triaging,

    A. K. Dipongkor and K. Moran, “A comparative study of transformer- based neural text representation techniques on bug triaging,” in2023 38th IEEE/ACM International Conference on Automated Software En- gineering (ASE). IEEE, 2023, pp. 1012–1023

  52. [52]

    An ensemble method for bug triaging using large language models,

    A. Kumar Dipongkor, “An ensemble method for bug triaging using large language models,” inProceedings of the 2024 IEEE/ACM 46th Interna- tional Conference on Software Engineering: Companion Proceedings, 2024, pp. 438–440

  53. [53]

    Can we enhance bug report quality using llms?: An empirical study of llm-based bug report generation,

    J. Acharya and G. Ginde, “Can we enhance bug report quality using llms?: An empirical study of llm-based bug report generation,” in Proceedings of the 29th International Conference on Evaluation and Assessment in Software Engineering, 2025, pp. 994–1003

  54. [54]

    Chatbr: Automated assessment and improvement of bug report quality using chatgpt,

    L. Bo, W. Ji, X. Sun, T. Zhang, X. Wu, and Y . Wei, “Chatbr: Automated assessment and improvement of bug report quality using chatgpt,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, 2024, pp. 1472–1483

  55. [55]

    An empirical study on the capability of llms in decomposing bug reports,

    Z. Chen, V . Nava-Camal, A. Suleiman, Y . Tang, D. Hou, and W. Shang, “An empirical study on the capability of llms in decomposing bug reports,”arXiv preprint arXiv:2504.20911, 2025

  56. [56]

    Can llms demystify bug reports?

    L. Plein and T. F. Bissyand ´e, “Can llms demystify bug reports?”arXiv preprint arXiv:2310.06310, 2023

  57. [57]

    Evaluating diverse large language models for automatic and general bug reproduction,

    S. Kang, J. Yoon, N. Askarbekkyzy, and S. Yoo, “Evaluating diverse large language models for automatic and general bug reproduction,” IEEE Transactions on Software Engineering, vol. 50, no. 10, pp. 2677– 2694, 2024

  58. [58]

    Large language models are few- shot testers: Exploring llm-based general bug reproduction,

    S. Kang, J. Yoon, and S. Yoo, “Large language models are few- shot testers: Exploring llm-based general bug reproduction,” in2023 IEEE/ACM 45th International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 2312–2323

  59. [59]

    Agentic bug reproduction for effective automated program repair at google,

    R. Cheng, M. Tufano, J. Cito, J. Cambronero, P. Rondon, R. Wei, A. Sun, and S. Chandra, “Agentic bug reproduction for effective automated program repair at google,”arXiv preprint arXiv:2502.01821, 2025

  60. [60]

    Feedback-driven automated whole bug report reproduction for android apps,

    D. Wang, Y . Zhao, S. Feng, Z. Zhang, W. G. Halfond, C. Chen, X. Sun, J. Shi, and T. Yu, “Feedback-driven automated whole bug report reproduction for android apps,” inProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis, 2024, pp. 1048–1060

  61. [61]

    Aegis: An agent-based framework for bug reproduction from issue descriptions,

    X. Wang, P. Gao, X. Meng, C. Peng, R. Hu, Y . Lin, and C. Gao, “Aegis: An agent-based framework for bug reproduction from issue descriptions,” inProceedings of the 33rd ACM International Conference on the Foundations of Software Engineering, 2025, pp. 331–342

  62. [62]

    Bugrepro: Enhancing android bug reproduction with domain-specific knowledge integration,

    H. Yin, J. Huang, Y . Li, Y . Dong, and T. Zhang, “Bugrepro: Enhancing android bug reproduction with domain-specific knowledge integration,” arXiv preprint arXiv:2505.14528, 2025

  63. [63]

    Burt: A chatbot for interactive bug reporting,

    Y . Song, J. Mahmud, N. De Silva, Y . Zhou, O. Chaparro, K. Moran, A. Marcus, and D. Poshyvanyk, “Burt: A chatbot for interactive bug reporting,” in2023 IEEE/ACM 45th International Conference on Soft- ware Engineering: Companion Proceedings (ICSE-Companion). IEEE, 2023, pp. 170–174

  64. [64]

    Let’s fix this together: Conversational debugging with github copilot,

    Y . Bajpai, B. Chopra, P. Biyani, C. Aslan, D. Coleman, S. Gulwani, C. Parnin, A. Radhakrishna, and G. Soares, “Let’s fix this together: Conversational debugging with github copilot,” in2024 IEEE Sympo- sium on Visual Languages and Human-Centric Computing (VL/HCC). IEEE, 2024, pp. 1–12

  65. [65]

    A comprehensive survey of regression- based loss functions for time series forecasting,

    A. Jadon, A. Patil, and S. Jadon, “A comprehensive survey of regression- based loss functions for time series forecasting,” inInternational Con- ference on Data Management, Analytics & Innovation. Springer, 2024, pp. 117–147