pith. sign in

arxiv: 2212.00548 · v1 · submitted 2022-12-01 · 💻 cs.SE

Duplicate Bug Report Detection: How Far Are We?

Pith reviewed 2026-05-24 09:59 UTC · model grok-4.3

classification 💻 cs.SE
keywords duplicate bug report detectionbenchmark constructionempirical evaluationissue tracking systemssoftware maintenanceperformance comparisondata bias
0
0 comments X

The pith

Simpler duplicate bug report detection techniques outperform recent sophisticated ones on a corrected benchmark.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares research and industry Duplicate Bug Report Detection techniques after identifying two major biases that had skewed prior evaluations. Data age and the choice of issue tracking system both produce large differences in measured accuracy. The authors build a new benchmark that removes those biases and then run the comparison again. On this benchmark a basic technique beats most of the complex methods proposed in recent papers, and a simple method already used in practice matches the performance of one recent research tool. The result indicates that reported progress in the area rests partly on unrealistic test conditions.

Core claim

After preparing a benchmark that corrects for data age and issue-tracking-system choice, evaluation shows that a simpler technique outperforms recently proposed sophisticated DBRD techniques on most projects, while a simple technique already adopted in practice achieves comparable results to a recently proposed research tool.

What carries the argument

The new benchmark constructed by correcting for data age and issue-tracking-system choice, which is then used to re-evaluate and rank DBRD techniques under conditions closer to current deployment.

If this is right

  • New DBRD proposals should be compared against both recent research tools and the simple methods already used in practice.
  • Benchmark construction for DBRD must explicitly control for data age to avoid inflated accuracy numbers.
  • Industry tools provide a useful baseline that research should match or exceed before claiming improvement.
  • Reported gains from sophisticated features may shrink or disappear once age and platform biases are removed.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar age and platform biases may affect other detection tasks such as duplicate question detection or code-clone detection.
  • Future work could test whether the simple techniques remain competitive when the task is extended to cross-project or cross-language settings.
  • The finding suggests that research effort might usefully shift from accuracy gains to other properties such as speed or ease of integration.

Load-bearing premise

The corrected benchmark gives a realistic estimate of how the compared techniques would perform if used on today's projects.

What would settle it

Re-running the same set of techniques on bug reports opened after the benchmark's latest date and checking whether the simple method still ranks highest on most projects.

Figures

Figures reproduced from arXiv: 2212.00548 by Bowen Xu, David Lo, DongGyun Han, Ferdian Thung, Ivana Clairine Irsan, Lingxiao Jiang, Ting Zhang, Venkatesh Vinayakarao.

Figure 1
Figure 1. Figure 1: An example of duplicate issue recommendation when typing “always loading” before submitting a [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An example of VSCodeBot duplicate issue recommendation (issue 75817) from Microsoft/VSCode repository the pre-submission usage scenario, the efficiency of DBRD tools plays an important role as it requires DBRD tools to produce a real-time recommendation. Issue reporters are unlikely to be willing to wait for a long time for a DBRD tool to return some results. On the other hand, for the post-submission usag… view at source ↗
Figure 3
Figure 3. Figure 3: Examples of the predictions in the top-10 positions for 4 test BRs. [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The workflow of retrieving the correct bucket. [PITH_FULL_IMAGE:figures/full_fig_p015_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Recall Rate@k in the test data of Eclipse, Mozilla, Hadoop, Spark, Kibana, and VSCode J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018 [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: REP compared to the other four approaches in terms of successful predictions the dataset characteristics. Furthermore, we draw the Venn diagrams ( [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Recall Rate@𝑘 comparing the tools in research and in practice on the VSCode data VSCodeBot in terms of RR@5. FTS which is also adopted in practice also shows worse results than VSCodeBot. Implications. Since FTS is based on exact word matching, the relatively good performance of FTS indicates that many duplicate BRs are more likely to carry the same words in BR titles. It also indicates the important role … view at source ↗
Figure 8
Figure 8. Figure 8: Recall Rate@𝑘 in the test data of Eclipse-Old, Mozilla-Old project was created on February 6, 2013. Therefore, we are only able to conduct experiments on the old data of Hadoop, which uses Jira as an ITS. Following what we did in the RQ1, we evaluate the three tools REP, Siamese Pair, and SABD on the Hadoop old dataset (which contains BRs submitted between 2012 and 2014) [PITH_FULL_IMAGE:figures/full_fig_… view at source ↗
read the original abstract

Many Duplicate Bug Report Detection (DBRD) techniques have been proposed in the research literature. The industry uses some other techniques. Unfortunately, there is insufficient comparison among them, and it is unclear how far we have been. This work fills this gap by comparing the aforementioned techniques. To compare them, we first need a benchmark that can estimate how a tool would perform if applied in a realistic setting today. Thus, we first investigated potential biases that affect the fair comparison of the accuracy of DBRD techniques. Our experiments suggest that data age and issue tracking system choice cause a significant difference. Based on these findings, we prepared a new benchmark. We then used it to evaluate DBRD techniques to estimate better how far we have been. Surprisingly, a simpler technique outperforms recently proposed sophisticated techniques on most projects in our benchmark. In addition, we compared the DBRD techniques proposed in research with those used in Mozilla and VSCode. Surprisingly, we observe that a simple technique already adopted in practice can achieve comparable results as a recently proposed research tool. Our study gives reflections on the current state of DBRD, and we share our insights to benefit future DBRD research.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper identifies two sources of bias (data age and issue-tracking-system choice) that affect fair evaluation of duplicate bug report detection (DBRD) techniques, constructs a new benchmark that corrects for these biases, and uses it to compare research-proposed DBRD methods against each other and against techniques already deployed in Mozilla and VSCode. The central empirical finding is that simpler techniques outperform recently proposed sophisticated ones on most projects in the benchmark, and that a simple technique already in industrial use achieves results comparable to a recent research tool.

Significance. If the benchmark construction is accepted as yielding realistic performance estimates, the result directly challenges the incremental value of complex DBRD methods and supplies a reusable, bias-corrected resource for future work. The explicit comparison against deployed industrial techniques is a strength that grounds the claims in practice.

major comments (2)
  1. [Benchmark preparation] Benchmark construction section: the claim that correcting only for data age and ITS choice produces a realistic estimate of current-day performance rests on the assumption that these are the dominant biases; the paper should report the quantitative impact of each correction (e.g., change in MAP or recall@K before/after) and any sensitivity analysis on the age threshold chosen.
  2. [Evaluation] Evaluation results: the statement that a simpler technique outperforms sophisticated ones “on most projects” requires the per-project breakdown (including number of projects, effect sizes, and statistical significance tests) to be shown; without these, the aggregate claim cannot be assessed for robustness.
minor comments (2)
  1. [Abstract] Abstract and introduction should cite the specific prior DBRD papers whose techniques are re-evaluated so readers can immediately locate the baselines.
  2. [Evaluation] Notation for the performance metrics (MAP, recall@K, etc.) should be defined at first use and kept consistent across tables and text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment and constructive comments on our paper. We address the major comments below and will incorporate the suggested changes in the revised manuscript.

read point-by-point responses
  1. Referee: Benchmark construction section: the claim that correcting only for data age and ITS choice produces a realistic estimate of current-day performance rests on the assumption that these are the dominant biases; the paper should report the quantitative impact of each correction (e.g., change in MAP or recall@K before/after) and any sensitivity analysis on the age threshold chosen.

    Authors: We agree that reporting the quantitative impact of each correction and including a sensitivity analysis would increase transparency. Our prior experiments identified data age and ITS choice as the biases producing statistically significant differences, but we will add explicit before/after metric tables (MAP, recall@K) for each correction step and a sensitivity analysis varying the age threshold in the revised benchmark-preparation section. revision: yes

  2. Referee: Evaluation results: the statement that a simpler technique outperforms sophisticated ones “on most projects” requires the per-project breakdown (including number of projects, effect sizes, and statistical significance tests) to be shown; without these, the aggregate claim cannot be assessed for robustness.

    Authors: We will expand the evaluation section to present the full per-project breakdown. The revision will report the exact number of projects, per-project performance values, effect sizes, and the results of statistical significance tests (e.g., paired Wilcoxon tests) so that the claim of outperformance “on most projects” can be directly verified. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical evaluation on external data

full rationale

The paper performs an empirical study: it identifies biases (data age, issue-tracking system) via experiments on project data, constructs a benchmark accordingly, then measures accuracy of existing DBRD techniques (research and industry) on that benchmark. No derivation chain, equations, fitted parameters renamed as predictions, self-definitional steps, or load-bearing self-citations appear; the central claims are direct observations from performance metrics on real bug-report corpora. The structure is self-contained against external benchmarks and does not reduce any result to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is an empirical benchmarking study and introduces no new free parameters, ad-hoc axioms, or invented entities beyond standard assumptions of information-retrieval evaluation.

axioms (1)
  • domain assumption Standard information-retrieval metrics (precision, recall, etc.) and similarity measures are appropriate for evaluating DBRD techniques.
    The abstract relies on these metrics to declare one technique superior without additional justification.

pith-pipeline@v0.9.0 · 5762 in / 1172 out tokens · 43791 ms · 2026-05-24T09:59:51.460030+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · 2 internal anchors

  1. [1]

    2022. GraphQL. https://docs.github.com/en/graphql. (Accessed on 02/10/2022)

  2. [2]

    2022. hadoop. https://issues.apache.org/jira/projects/HADOOP/issues. (Accessed on 02/10/2022)

  3. [3]

    2022. Jira. https://www.atlassian.com/software/jira. (Accessed on 02/10/2022)

  4. [4]

    2022. kibana. https://github.com/elastic/kibana. (Accessed on 02/10/2022)

  5. [5]

    2022. spark. https://issues.apache.org/jira/projects/SPARK/issues. (Accessed on 02/10/2022)

  6. [6]

    vscode repository on GitHub

    2022. vscode repository on GitHub. https://github.com/microsoft/vscode. (Accessed on 02/10/2022)

  7. [7]

    vscodebot on GitHub

    2022. vscodebot on GitHub. https://github.com/apps/vscodebot. (Accessed on 02/10/2022)

  8. [8]

    Anahita Alipour, Abram Hindle, and Eleni Stroulia. 2013. A contextual approach towards more accurate duplicate bug report detection. In 2013 10th Working Conference on Mining Software Repositories (MSR) . IEEE, 183–192

  9. [9]

    Mehdi Amoui, Nilam Kaushik, Abraham Al-Dabbagh, Ladan Tahvildari, Shimin Li, and Weining Liu. 2013. Search-based duplicate defect detection: an industrial experience. In 2013 10th Working Conference on Mining Software Repositories (MSR). IEEE, 173–182

  10. [10]

    Jude Arokiam and Jeremy S Bradbury. 2020. Automatically predicting bug severity early in the development process. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering: New Ideas and Emerging Results . 17–20. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018. 111:30 Zhang et al

  11. [11]

    Atlassian. 2022. Open Source Project License Request | Atlassian. https://www.atlassian.com/software/views/open- source-license-request. (Accessed on 02/10/2022)

  12. [12]

    Aladdin Baarah, Ahmad Aloqaily, Zaher Salah, Mannam Zamzeer, and Mohammad Sallam. 2019. Machine learning approaches for predicting the severity level of software bug reports in closed source projects. International Journal of Advanced Computer Science and Applications 10, 10.14569 (2019)

  13. [13]

    Victor R Basili. 2007. The role of controlled experiments in software engineering research. In Empirical Software Engineering Issues. Critical Assessment and Future Directions . Springer, 33–37

  14. [14]

    Nicolas Bettenburg, Rahul Premraj, Thomas Zimmermann, and Sunghun Kim. 2008. Duplicate bug reports considered harmful. . . really?. In2008 IEEE International Conference on Software Maintenance . IEEE, 337–345

  15. [15]

    Jake D Brutlag, Hilary Hutchinson, and Maria Stone. 2008. User preference and search engine latency. (2008)

  16. [16]

    Bugzilla. 2022. Eclipse. https://bugs.eclipse.org/bugs/. (Accessed on 02/10/2022)

  17. [17]

    Bugzilla. 2022. Mozilla. https://bugzilla.mozilla.org/home. (Accessed on 02/10/2022)

  18. [18]

    Jayati Deshmukh, KM Annervaz, Sanjay Podder, Shubhashis Sengupta, and Neville Dubash. 2017. Towards accurate duplicate bug retrieval using deep learning techniques. In 2017 IEEE International conference on software maintenance and evolution (ICSME). IEEE, 115–124

  19. [19]

    Bugzilla Wiki FAQ. 2022. How to Mark a Bug Report as a Duplicate? https://wiki.documentfoundation.org/QA/ Bugzilla/FAQ#How_to_Mark_a_Bug_Report_as_a_Duplicate. (Accessed on 02/10/2022)

  20. [20]

    Tao-yang Fu, Wang-Chien Lee, and Zhen Lei. 2017. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management. 1797–1806

  21. [21]

    Wei Fu and Tim Menzies. 2017. Easy over hard: A case study on deep learning. In Proceedings of the 2017 11th joint meeting on foundations of software engineering . 49–60

  22. [22]

    Cooper, Andreas Herzig, Fr ´ed´eric Maris & Julien Vianey (2018): Temporal Epistemic Gossip Problems

    Vahid Garousi, Michael Felderer, Mika V. Mäntylä, and Austen Rainer. 2020.Benefitting from the Grey Literature in Software Engineering Research. Springer International Publishing, Cham, 385–413. https://doi.org/10.1007/978-3-030- 32489-6_14

  23. [23]

    Edmund A Gehan. 1965. A generalized Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 1-2 (1965), 203–224

  24. [24]

    GitHub. [n. d.]. Marking issues or pull requests as a duplicate - GitHub Docs. https://docs.github.com/en/issues/tracking- your-work-with-issues/marking-issues-or-pull-requests-as-a-duplicate. (Accessed on 02/10/2022)

  25. [25]

    Mehdi Golzadeh, Alexandre Decan, Eleni Constantinou, and Tom Mens. 2021. Identifying bot activity in GitHub pull request and issue comments. In 2021 IEEE/ACM Third International Workshop on Bots in Software Engineering (BotSE) . IEEE, 21–25

  26. [26]

    Georgios Gousios. 2013. The GHTorrent dataset and tool suite. In Proceedings of the 10th Working Conference on Mining Software Repositories (San Francisco, CA, USA) (MSR ’13). IEEE Press, Piscataway, NJ, USA, 233–236. http: //dl.acm.org/citation.cfm?id=2487085.2487132

  27. [27]

    Jianjun He, Ling Xu, Meng Yan, Xin Xia, and Yan Lei. 2020. Duplicate bug report detection using dual-channel convolutional neural networks. In Proceedings of the 28th International Conference on Program Comprehension . 117–127

  28. [28]

    Nilam Kaushik and Ladan Tahvildari. 2012. A comparative study of the performance of IR models on duplicate bug detection. In 2012 16th European Conference on Software Maintenance and Reengineering . IEEE, 159–168

  29. [29]

    Misoo Kim and Eunseok Lee. 2021. Are datasets for information retrieval-based bug localization techniques trustworthy? Empirical Software Engineering 26, 3 (2021), 1–66

  30. [30]

    Pavneet Singh Kochhar, Xin Xia, David Lo, and Shanping Li. 2016. Practitioners’ expectations on automated fault localization. In Proceedings of the 25th International Symposium on Software Testing and Analysis . 165–176

  31. [31]

    Berfin Kucuk and Eray Tuzun. 2021. Characterizing duplicate bugs: An empirical analysis. In 2021 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) . IEEE, 661–668

  32. [32]

    Alina Lazar, Sarah Ritchey, and Bonita Sharif. 2014. Generating duplicate bug datasets. In Proceedings of the 11th working conference on mining software repositories . 392–395

  33. [33]

    Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and word2vec for text classification with semantic features. In 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI* CC). IEEE, 136–140

  34. [34]

    Wenjie Liu, Shanshan Wang, Xin Chen, and He Jiang. 2018. Predicting the severity of bug reports based on feature selection. International Journal of Software Engineering and Knowledge Engineering 28, 04 (2018), 537–558

  35. [35]

    Pablo Loyola, Kugamoorthy Gajananan, and Fumiko Satoh. 2018. Bug localization by learning to rank and represent bug inducing changes. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management . 657–665

  36. [36]

    Tim Menzies, Suvodeep Majumder, Nikhila Balaji, Katie Brey, and Wei Fu. 2018. 500+ times faster than deep learning:(a case study exploring faster methods for text mining stackoverflow). In 2018 IEEE/ACM 15th International Conference on J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018. Duplicate Bug Report Detection: How Far Are We? 111:3...

  37. [37]

    Lloyd Montgomery, Clara Lüders, and Walid Maalej. 2022. An Alternative Issue Tracking Dataset of Public Jira Repositories. In 2022 IEEE/ACM 19th International Conference on Mining Software Repositories (MSR) . IEEE, 73–77

  38. [38]

    Jakob Nielsen. 1994. Usability engineering. Morgan Kaufmann

  39. [39]

    Michael Pradel, Vijayaraghavan Murali, Rebecca Qian, Mateusz Machalica, Erik Meijer, and Satish Chandra. 2020. Scaffle: Bug Localization on Millions of Files. In Proceedings of the 29th ACM SIGSOFT International Symposium on Software Testing and Analysis (Virtual Event, USA) (ISSTA 2020). Association for Computing Machinery, New York, NY, USA, 225–236. ht...

  40. [40]

    Mohamed Sami Rakha, Cor-Paul Bezemer, and Ahmed E Hassan. 2018. Revisiting the performance of automated approaches for the retrieval of duplicate reports in issue tracking systems that perform just-in-time duplicate retrieval. Empirical Software Engineering 23, 5 (2018), 2597–2621

  41. [41]

    Mohamed Sami Rakha, Weiyi Shang, and Ahmed E Hassan. 2016. Studying the needed effort for identifying duplicate issues. Empirical Software Engineering 21, 5 (2016), 1960–1989

  42. [42]

    Stephen Robertson, Hugo Zaragoza, and Michael Taylor. 2004. Simple BM25 extension to multiple weighted fields. In Proceedings of the thirteenth ACM international conference on Information and knowledge management . 42–49

  43. [43]

    Irving Muller Rodrigues, Daniel Aloise, Eraldo Rezende Fernandes, and Michel Dagenais. 2020. A Soft Alignment Model for Bug Deduplication. InProceedings of the 17th International Conference on Mining Software Repositories . 43–53

  44. [44]

    Jeanine Romano, Jeffrey D Kromrey, Jesse Coraggio, and Jeff Skowronek. 2006. Appropriate statistics for ordinal level data: Should we really be using t-test and Cohen’sd for evaluating group differences on the NSSE and other surveys. In annual meeting of the Florida Association of Institutional Research , Vol. 13

  45. [45]

    Xin Rong. 2014. word2vec parameter learning explained. arXiv preprint arXiv:1411.2738 (2014)

  46. [46]

    Per Runeson, Magnus Alexandersson, and Oskar Nyholm. 2007. Detection of duplicate defect reports using natural language processing. In 29th International Conference on Software Engineering (ICSE’07). IEEE, 499–510

  47. [47]

    Marcos Salganicoff. 1997. Tolerating Concept and Sampling Shift in Lazy Learning Using Prediction Error Context Switching. Springer Netherlands, Dordrecht, 133–155. https://doi.org/10.1007/978-94-017-2053-3_5

  48. [48]

    Hinrich Schütze, Christopher D Manning, and Prabhakar Raghavan. 2008. Introduction to information retrieval . Vol. 39. Cambridge University Press Cambridge

  49. [49]

    Yanqi Su, Zhenchang Xing, Xin Peng, Xin Xia, Chong Wang, Xiwei Xu, and Liming Zhu. 2021. Reducing Bug Triaging Confusion by Learning from Mistakes with a Bug Tossing Knowledge Graph. In 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE) . IEEE, 191–202

  50. [50]

    Chengnian Sun, David Lo, Siau-Cheng Khoo, and Jing Jiang. 2011. Towards more accurate retrieval of duplicate bug reports. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011) . IEEE, 253–262

  51. [51]

    Chengnian Sun, David Lo, Xiaoyin Wang, Jing Jiang, and Siau-Cheng Khoo. 2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1. 45–54

  52. [52]

    Feifei Tu, Jiaxin Zhu, Qimu Zheng, and Minghui Zhou. 2018. Be Careful of When: An Empirical Study on Time-Related Misuse of Issue Tracking Data. In Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering (Lake Buena Vista, FL, USA) (ESEC/FSE 2018). Association for ...

  53. [53]

    Xiaoyin Wang, Lu Zhang, Tao Xie, John Anvik, and Jiasu Sun. 2008. An approach to detecting duplicate bug reports using natural language and execution information. In Proceedings of the 30th international conference on Software engineering. 461–470

  54. [54]

    Cathrin Weiss, Rahul Premraj, Thomas Zimmermann, and Andreas Zeller. 2007. How long will it take to fix this bug?. In Fourth International Workshop on Mining Software Repositories (MSR’07: ICSE Workshops 2007) . IEEE, 1–1

  55. [55]

    Eric W Weisstein. 2004. Bonferroni correction. https://mathworld.wolfram.com/ (2004)

  56. [56]

    Xin Xia, David Lo, Ming Wen, Emad Shihab, and Bo Zhou. 2014. An empirical study of bug report field reassignment. In 2014 Software Evolution Week-IEEE Conference on Software Maintenance, Reengineering, and Reverse Engineering (CSMR-WCRE). IEEE, 174–183

  57. [57]

    Guanping Xiao, Xiaoting Du, Yulei Sui, and Tao Yue. 2020. HINDBR: Heterogeneous information network based duplicate bug report prediction. In 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE) . IEEE, 195–206

  58. [58]

    Qi Xie, Zhiyuan Wen, Jieming Zhu, Cuiyun Gao, and Zibin Zheng. 2018. Detecting duplicate bug reports with convolutional neural networks. In 2018 25th Asia-Pacific Software Engineering Conference (APSEC) . IEEE, 416–425

  59. [59]

    Jifeng Xuan, He Jiang, Zhilei Ren, Jun Yan, and Zhongxuan Luo. 2017. Automatic bug triage using semi-supervised text classification. arXiv preprint arXiv:1704.04769 (2017)

  60. [60]

    Zhengran Zeng, Yuqun Zhang, Haotian Zhang, and Lingming Zhang. 2021. Deep just-in-time defect prediction: how far are we?. In Proceedings of the 30th ACM SIGSOFT International Symposium on Software Testing and Analysis . 427–438. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018. 111:32 Zhang et al

  61. [61]

    Jian Zhou and Hongyu Zhang. 2012. Learning to rank duplicate bug reports. InProceedings of the 21st ACM international conference on Information and knowledge management . 852–861

  62. [62]

    Indr˙e Žliobait ˙e, Mykola Pechenizkiy, and Joao Gama. 2016. An overview of concept drift applications.Big data analysis: new algorithms for a new society (2016), 91–114

  63. [63]

    Weiqin Zou, David Lo, Zhenyu Chen, Xin Xia, Yang Feng, and Baowen Xu. 2018. How practitioners perceive automated bug report management techniques. IEEE Transactions on Software Engineering 46, 8 (2018), 836–862. J. ACM, Vol. 37, No. 4, Article 111. Publication date: August 2018