Not All Bugs Are the Same: Understanding, Characterizing, and Classifying the Root Cause of Bugs

Andy Zaidman; Fabio Palomba; Filomena Ferrucci; Gemma Catolino

arxiv: 1907.11031 · v1 · pith:BBVBUMR3new · submitted 2019-07-25 · 💻 cs.SE

Not All Bugs Are the Same: Understanding, Characterizing, and Classifying the Root Cause of Bugs

Gemma Catolino , Fabio Palomba , Andy Zaidman , Filomena Ferrucci This is my paper

Pith reviewed 2026-05-24 16:03 UTC · model grok-4.3

classification 💻 cs.SE

keywords root cause analysisbug classificationbug reportssoftware bugstaxonomyempirical studymachine learningbug triage

0 comments

The pith

Analysis of 1,280 bug reports from 119 projects identifies nine common root causes that text alone can classify at 64% F-measure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors manually review bug reports across Mozilla, Apache, and Eclipse projects to create a taxonomy of why bugs occur. This produces nine recurring root cause categories that appear in the studied systems. They then build a machine learning model that reads the text of a report and assigns it to one of the nine categories. The model reaches 64% F-Measure and 74% AUC-ROC overall on the collected data. If the approach holds, developers could receive an immediate suggestion of the likely cause before beginning any investigation or triage.

Core claim

Examination of 1,280 bug reports from 119 projects in three ecosystems shows nine main root causes that are common across the systems. A classification model trained on the textual content of the reports is able to assign new bugs to these categories, achieving 64% F-Measure and 74% AUC-ROC overall.

What carries the argument

A taxonomy of nine root cause categories derived from manual inspection of bug report text, used to label data and train a supervised text classifier.

If this is right

Bug triage can begin with an automatic suggestion of root cause type rather than starting from raw text.
The nine categories supply a shared language for comparing bug patterns across different projects and ecosystems.
Fixing effort or prevention techniques can be studied separately for each root cause type.
The same labeling process can be repeated on new projects to extend or refine the taxonomy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the nine categories prove stable, they could serve as a target for static analysis tools that detect likely causes before code is committed.
Projects outside the three ecosystems might require only a small number of additional categories rather than an entirely new taxonomy.
Accuracy might rise if the model also receives metadata such as component or reporter experience in addition to report text.

Load-bearing premise

The text in a bug report is enough for analysts to agree on which of the nine root cause categories applies, and these nine categories describe bugs beyond the 119 projects examined.

What would settle it

A replication in which multiple independent analysts label the same 1,280 reports and obtain low agreement on categories, or a new collection of bug reports from additional projects where many cases fall outside the nine categories.

Figures

Figures reproduced from arXiv: 1907.11031 by Andy Zaidman, Fabio Palomba, Filomena Ferrucci, Gemma Catolino.

**Figure 1.** Figure 1: Bug reported and reopened in Apache HBase. tables atop clusters of commodity hardware. On August 15th, 2015 the bug report shown in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: shows the diffusion of root causes extracted from the 1,139 analyzed bug reports. As depicted, the most frequent one is the Functional Issue, which covers almost half of the entire dataset (i.e., 41,3%). This was somehow expected as a result: indeed, it is reasonable to believe that most of the problems raised are related to developers actively implementing new features or enhancing existing ones. Our fin… view at source ↗

**Figure 3.** Figure 3: RQ2 - Box plots reporting the Delay Before Response (DBR) for each identified bug root cause. conf.−issue network−issue db−issue gui−issue perf.−issue perm.−depr.−issue sec.−issue program−issue test−issue 0 5 10 15 20 25 Delay Before Assigned ● ● ● ● ● ● ● ● ● [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: RQ2 - Box plots reporting the Delay Before Assigned (DBA) for each identified bug root cause. based on the developers’ expertise and workload, a certain type of bug is assigned faster than others. While further investigations around this hypothesis would be needed and beneficial to study the phenomenon deeper, we manually investigated the bugs of our dataset to find initial compelling evidence that sugges… view at source ↗

**Figure 5.** Figure 5: RQ2 - Box plots reporting the Delay Before Change (DBC) for each identified bug root cause. conf.−issue network−issue db−issue gui−issue perf.−issue perm.−depr.−issue sec.−issue program−issue test−issue 0 50 100 150 Duration of Bug Fixing ● ● ● ● ● ● ● ● ● [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: RQ2 - Box plots reporting the Duration of Bug Fixing (DBF) for each identified bug root cause. that these bugs can cause issues leading end-users not to interact with the system in a proper manner and, therefore, they represent issues that are worth to start fixing quickly. More surprisingly, the fixing process of program anomalies requires a higher number of hours to be started. While more investigations… view at source ↗

**Figure 7.** Figure 7: RQ2 - Box plots reporting the Delay After Change (DAC) for each identified bug root cause. base [64]. The only exception to this general discussion is related to the configuration-issue, which takes up to 33 hours to be integrated: however, given previous findings in literature [6, 53, 82], we see this as an expected result because configuration-related discussions generally trigger more comments by develo… view at source ↗

read the original abstract

Modern version control systems such as Git or SVN include bug tracking mechanisms, through which developers can highlight the presence of bugs through bug reports, i.e., textual descriptions reporting the problem and what are the steps that led to a failure. In past and recent years, the research community deeply investigated methods for easing bug triage, that is, the process of assigning the fixing of a reported bug to the most qualified developer. Nevertheless, only a few studies have reported on how to support developers in the process of understanding the type of a reported bug, which is the first and most time-consuming step to perform before assigning a bug-fix operation. In this paper, we target this problem in two ways: first, we analyze 1,280 bug reports of 119 popular projects belonging to three ecosystems such as Mozilla, Apache, and Eclipse, with the aim of building a taxonomy of the root causes of reported bugs; then, we devise and evaluate an automated classification model able to classify reported bugs according to the defined taxonomy. As a result, we found nine main common root causes of bugs over the considered systems. Moreover, our model achieves high F-Measure and AUC-ROC (64% and 74% on overall, respectively).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper derives a nine-category root-cause taxonomy from 1,280 bug reports across three ecosystems and trains a classifier on it, but supplies no evidence on labeling reliability or baselines.

read the letter

The paper manually reviews 1,280 bug reports from 119 projects in Mozilla, Apache, and Eclipse to produce nine common root-cause categories, then builds a text-based classifier that reaches 64% F-measure and 74% AUC-ROC overall. The scale of the corpus and the multi-ecosystem scope are the clearest positives; most prior bug classification work stays inside one project or one language, so this breadth is a step forward. The classifier attempt also shows they want the taxonomy to be usable rather than purely descriptive.

Referee Report

3 major / 2 minor

Summary. The paper manually analyzes 1,280 bug reports from 119 projects across Mozilla, Apache, and Eclipse to derive a taxonomy of nine root causes of bugs, then trains and evaluates a classifier on textual features of the reports, claiming overall F-Measure of 64% and AUC-ROC of 74%.

Significance. If the taxonomy proves reproducible and the classifier metrics hold under proper validation, the work would offer a concrete, multi-ecosystem taxonomy and a practical starting point for automated root-cause classification to support bug triage. The scale of the manual analysis across three ecosystems is a strength that could support broader applicability claims.

major comments (3)

[Abstract] Abstract: performance numbers (64% F-Measure, 74% AUC-ROC) are stated without any description of the labeling procedure, number of annotators, inter-rater agreement, disagreement resolution, feature engineering, train-test split, or baseline comparisons. These omissions make the numbers unverifiable and directly undermine the central claim that the model achieves the reported performance on the derived taxonomy.
[Taxonomy construction / manual analysis] Taxonomy construction section (manual analysis of 1,280 reports): no information is supplied on annotator count, annotation guidelines, or inter-rater agreement. Because the nine categories are defined from these labels and then used as ground truth for the classifier, the absence of reliability metrics is load-bearing for both the taxonomy and all downstream results.
[Evaluation / results] Evaluation section: the claim that the nine categories are 'main common root causes' across the considered systems requires evidence that the categories are stable and not artifacts of individual annotator interpretation; without agreement statistics or a reproducibility check, the generalization statement in the abstract cannot be assessed.

minor comments (2)

[Evaluation] Clarify whether the nine categories are mutually exclusive or allow multi-label assignment, and report per-category performance to show whether the overall F-Measure is driven by a few dominant classes.
[Taxonomy presentation] The manuscript should include a table listing the nine root-cause categories with brief definitions and example bug-report excerpts to make the taxonomy concrete for readers.

Simulated Author's Rebuttal

3 responses · 1 unresolved

We thank the referee for the detailed feedback on methodological transparency. We address each major comment below and will revise the manuscript accordingly where the original study design permits.

read point-by-point responses

Referee: [Abstract] Abstract: performance numbers (64% F-Measure, 74% AUC-ROC) are stated without any description of the labeling procedure, number of annotators, inter-rater agreement, disagreement resolution, feature engineering, train-test split, or baseline comparisons. These omissions make the numbers unverifiable and directly undermine the central claim that the model achieves the reported performance on the derived taxonomy.

Authors: We agree that the abstract omits key methodological details. In the revised version we will expand the abstract to briefly describe the labeling procedure, annotator involvement, train-test split, feature engineering, and baseline comparisons, while retaining full details in the body. This directly addresses verifiability of the reported metrics. revision: yes
Referee: [Taxonomy construction / manual analysis] Taxonomy construction section (manual analysis of 1,280 reports): no information is supplied on annotator count, annotation guidelines, or inter-rater agreement. Because the nine categories are defined from these labels and then used as ground truth for the classifier, the absence of reliability metrics is load-bearing for both the taxonomy and all downstream results.

Authors: We acknowledge the omission in the taxonomy construction section. The revision will add a description of the annotation guidelines and annotator count (primarily one author with co-author review and disagreement resolution via discussion). A formal inter-rater agreement statistic was not computed in the original study and therefore cannot be supplied. revision: partial
Referee: [Evaluation / results] Evaluation section: the claim that the nine categories are 'main common root causes' across the considered systems requires evidence that the categories are stable and not artifacts of individual annotator interpretation; without agreement statistics or a reproducibility check, the generalization statement in the abstract cannot be assessed.

Authors: We agree that stability evidence would strengthen the generalization claim. The revised evaluation section will add discussion of how the nine categories emerged consistently across the three ecosystems and any available reproducibility considerations from the manual analysis. revision: partial

standing simulated objections not resolved

Absence of computed inter-rater agreement statistics for the manual labeling, which prevents supplying quantitative reliability metrics for the taxonomy.

Circularity Check

0 steps flagged

No circularity: taxonomy and classifier derived from independent manual labeling process

full rationale

The paper's derivation consists of manual analysis of 1,280 bug reports to induce a 9-category taxonomy, followed by training and evaluating a supervised classifier on those labels. No equations, fitted parameters renamed as predictions, self-citations, or ansatzes are present in the provided text. The taxonomy construction and model evaluation are standard empirical steps that do not reduce to each other by construction; the central claims rest on the (unreported) labeling process and performance metrics rather than any definitional loop or self-referential citation chain.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Empirical study relying on manual labeling of bug-report text; no mathematical derivations or invented physical entities.

axioms (1)

domain assumption Bug report text contains enough information for accurate root-cause labeling by human readers
The entire taxonomy and subsequent classifier rest on this premise.

pith-pipeline@v0.9.0 · 5760 in / 1104 out tokens · 24309 ms · 2026-05-24T16:03:04.658391+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

103 extracted references · 103 canonical work pages · 3 internal anchors

[1]

Akila, V., Zayaraz, G., and Govindasamy, V. 2015. Eﬀective bug triage–a framework. Procedia Computer Science 48, 114–120

work page 2015
[2]

, Villaneau, J

Antoine, J.-Y. , Villaneau, J. , and Lefeuvre, A. 2014. Weighted krippendorﬀ’s alpha is a more reliable metrics for multi- coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation. In EACL 2014. 10–p

work page 2014
[3]

, Ayari, K

Antoniol, G. , Ayari, K. , Di Penta, M. , Khomh, F. , and Gu´eh´eneuc, Y.-G. 2008. Is it a bug or an enhancement?: a text-based approach to classify change requests. In Proceedings of the 2008 conference of the center for advanced studies on collab- orative research: meeting of minds . ACM, 23

work page 2008
[4]

Anvik, J. 2006. Automating bug report assignment. In Proc. Int’l Conference on Software Engineering (ICSE). ACM, 937–940. 18

work page 2006
[5]

, Hiew, L

Anvik, J. , Hiew, L. , and Murphy, G. C. 2006. Who should ﬁx this bug? In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 361–370

work page 2006
[6]

and Murphy, G

Anvik, J. and Murphy, G. C. 2011. Reducing the eﬀort of bug report triage: Recommenders for development-oriented deci- sions. ACM Transactions on Software Engineering and Method- ology (TOSEM) 20, 3, 10

work page 2011
[7]

and Venolia, G

Aranda, J. and Venolia, G. 2009. The secret life of bugs: Go- ing past the errors and omissions in software repositories. In Pro- ceedings of the International Conference on Software Engineering (ICSE). IEEE Computer Society, 298–308

work page 2009
[8]

, Krsul, I

Aslam, T. , Krsul, I. , and Spafford, E. H. 1996. Use of a taxonomy of security faults

work page 1996
[9]

Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern In- formation Retrieval . Addison-Wesley Longman Publishing Co., Inc

work page 1999
[10]

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. , and Nielsen, H. 2000. Assessing the accuracy of prediction algorithms for classiﬁcation: an overview. Bioinformatics 16, 5, 412–424

work page 2000
[11]

Bauer, M. W. 2007. Content analysis. an introduction to its methodology–by klaus krippendorﬀ from words to numbers. nar- rative, data and social science–by roberto franzosi. The British Journal of Sociology 58, 2, 329–331

work page 2007
[12]

E., Di Penta, M

Bavota, G., Linares-Vasquez, M., Bernal-Cardenas, C. E., Di Penta, M. , Oliveto, R. , and Poshyvanyk, D. 2015. The impact of api change-and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering 41, 4, 384–407

work page 2015
[13]

Bell, J., Legunsen, O., Hilton, M., Eloussi, L., Yung, T., and Marinov, D. 2018. Deﬂaker: Automatically detecting ﬂaky tests. In Proceedings of the International Conference on Software Engineering (ICSE). ACM

work page 2018
[14]

, Gousios, G

Beller, M. , Gousios, G. , Panichella, A. , Proksch, S. , Amann, S., and Zaidman, A. Developer testing in the ide: Pat- terns, beliefs, and behavior. IEEE Transactions on Software En- gineering (TSE). To Appear

work page
[15]

Beller, M., Gousios, G., Panichella, A., and Zaidman, A

work page
[16]

In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE)

When, how, and why developers (do not) test in their ides. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 179–190

work page 2015
[17]

, Gousios, G

Beller, M. , Gousios, G. , and Zaidman, A. 2017. Oops, my tests broke the build: An explorative analysis of Travis CI with GitHub. In Mining Software Repositories (MSR), 2017 IEEE/ACM 14th International Conference on . IEEE, 356–367

work page 2017
[18]

Beller, M., Spruit, N., Spinellis, D., and Zaidman, A. 2018. On the dichotomy of debugging behavior among programmers. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 572–583

work page 2018
[19]

and Bengio, Y

Bergstra, J. and Bengio, Y. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Re- search 13, Feb, 281–305

work page 2012
[20]

, and Zimmermann, T

Bettenburg, N., Just, S., Schr¨oter, A., Weiß, C., Prem- raj, R. , and Zimmermann, T. 2007. Quality of bug reports in eclipse. In Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange. ACM, 21–25

work page 2007
[21]

Bezemer, C.-P., McIntosh, S., Adams, B., German, D. M. , and Hassan, A. E. 2017. An empirical study of unspeciﬁed de- pendencies in make-based build systems. Empirical Software En- gineering 22, 6, 3117–3148

work page 2017
[22]

Blei, D. M. , Ng, A. Y. , and Jordan, M. I. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan, 993–1022

work page 2003
[23]

, Premraj, R

Breu, S. , Premraj, R. , Sillito, J. , and Zimmermann, T

work page
[24]

In Proceedings of the ACM confer- ence on Computer Supported Cooperative Work (CSCW)

Information needs in bug reports: improving cooperation between developers and users. In Proceedings of the ACM confer- ence on Computer Supported Cooperative Work (CSCW) . ACM, 301–310

work page
[25]

Bruning, S., Weissleder, S., and Malek, M. 2007. A fault taxonomy for service-oriented architecture. In High Assurance Systems Engineering Symposium, 2007. HASE’07. 10th IEEE . IEEE, 367–368

work page 2007
[26]

and Abran, A

Buglione, L. and Abran, A. 2006. Introducing root-cause analysis and orthogonal defect classiﬁcation at lower cmmi matu- rity levels. Proc. MENSURA 910, 29–40

work page 2006
[27]

Catolino, G., Palomba, F., Zaidman, A., and Ferrucci, F

work page
[28]

com/s/dcb95c70c4472b2ac935

Not all bugs are created equal: Understanding and classify- ing the root cause of bugs - online appendix https://figshare. com/s/dcb95c70c4472b2ac935

work page
[29]

M., Bishop, J., Steyn, J., Baresi, L., and Guinea, S

Chan, K. M., Bishop, J., Steyn, J., Baresi, L., and Guinea, S. 2007. A fault taxonomy for web service composition. In In- ternational Conference on Service-Oriented Computing. Springer, 363–375

work page 2007
[30]

Chawla, N. V. , Bowyer, K. W. , Hall, L. O. , and Kegelmeyer, W. P. 2002. Smote: synthetic minority over- sampling technique. Journal of artiﬁcial intelligence research 16 , 321–357

work page 2002
[31]

, Bhandari, I

Chillarege, R. , Bhandari, I. S. , Chaar, J. K. , Halliday, M. J. , Moebus, D. S. , Ray, B. K. , and Wong, M.-Y. 1992. Orthogonal defect classiﬁcation-a concept for in-process measure- ments. IEEE Transactions on software Engineering 18, 11, 943– 956

work page 1992
[32]

Chowdhury, G. G. 2003. Natural language processing. Annual review of information science and technology 37, 1, 51–89

work page 2003
[33]

The evolu- tion and decay of statically detected source code vulnerabilities

Di Penta, M., Cerulo, L., and Aversano, L.2008. The evolu- tion and decay of statically detected source code vulnerabilities. In Eighth IEEE International Working Conference on Source Code Analysis and Manipulation . IEEE, 101–110

work page 2008
[34]

, Denger, C

Freimut, B. , Denger, C. , and Ketterer, M. 2005. An in- dustrial case study of implementing and validating defect classiﬁ- cation for process improvement and quality management. In Soft- ware Metrics, 2005. 11th IEEE International Symposium . IEEE, 10–pp

work page 2005
[35]

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

Goldberg, Y. and Levy, O. 2014. word2vec explained: Deriv- ing mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722

work page internal anchor Pith review Pith/arXiv arXiv 2014
[36]

Gousios, G., Zaidman, A., Storey, M.-A., and Van Deursen, A. 2015. Work practices and challenges in pull-based develop- ment: the integrator’s perspective. In Proceedings of the 37th In- ternational Conference on Software Engineering-Volume 1. IEEE Press, 358–368

work page 2015
[37]

Hall, T., Beecham, S., Bowes, D., Gray, D., and Counsell, S. 2011. Developing fault-prediction models: What the research can show industry. IEEE software 28, 6, 96–99

work page 2011
[38]

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

Hecking, T. and Leydesdorff, L. 2018. Topic modelling of empirical text corpora: Validity, reliability, and reproducibility in comparison to semantic maps. arXiv preprint arXiv:1806.01045

work page internal anchor Pith review Pith/arXiv arXiv 2018
[39]

, Rodriguez, D

Hern´andez-Gonz´alez, J. , Rodriguez, D. , Inza, I. , Harri- son, R., and Lozano, J. A. 2018. Learning to classify software defects from crowds: a novel approach. Applied Soft Comput- ing 62 , 579–591

work page 2018
[40]

Herzig, K., Just, S., and Zeller, A. 2013. It’s not a bug, it’s a feature: how misclassiﬁcation impacts bug prediction. In Pro- ceedings of the International Conference on Software Engineering (ICSE). IEEE, 392–401

work page 2013
[41]

and Weimer, W

Hooimeijer, P. and Weimer, W. 2007. Modeling bug report quality. In Proceedings of the international conference on Auto- mated software engineering (ASE) . ACM, 34–43

work page 2007
[42]

Huang, L., Ng, V., Persing, I., Chen, M., Li, Z., Geng, R., and Tian, J. 2015. Autoodc: Automated generation of orthogonal defect classiﬁcations. Automated Software Engineering 22, 1, 3– 46

work page 2015
[43]

Javed, M. Y. , Mohsin, H. , et al. 2012. An automated ap- proach for software bug classiﬁcation. In Complex, Intelligent and Software Intensive Systems (CISIS), 2012 Sixth International Conference on. IEEE, 414–419

work page 2012
[44]

Jeong, G., Kim, S., and Zimmermann, T. 2009. Improving bug triage with bug tossing graphs. In Proceedings of the joint meeting of the European software engineering conference & the symposium on The foundations of software engineering (ESEC/FSE) . ACM, 111–120

work page 2009
[45]

, Adamoli, A

Jovic, M. , Adamoli, A. , and Hauswirth, M. 2011. Catch me if you can: performance bug detection in the wild. In ACM SIGPLAN Notices. Vol. 46. ACM, 155–170. 19

work page 2011
[46]

and Sureka, A

Lal, S. and Sureka, A. 2012. Comparison of seven bug report types: A case-study of google chrome browser project. In Software Engineering Conference (APSEC), 2012 19th Asia-Paciﬁc. Vol. 1. IEEE, 517–526

work page 2012
[47]

and Mikolov, T

Le, Q. and Mikolov, T. 2014. Distributed representations of sentences and documents. In International Conference on Ma- chine Learning. 1188–1196

work page 2014
[48]

E., and Stoll, D

Leszak, M., Perry, D. E., and Stoll, D. 2002. Classiﬁcation and evaluation of defects in a project retrospective. Journal of Systems and Software 61, 3, 173–187

work page 2002
[49]

, Holden, K

Lidwell, W. , Holden, K. , and Butler, J. 2010. Universal Principles of Design, Revised and Updated: 125 Ways to Enhance Usability, Inﬂuence Perception, Increase Appeal, Make Better De- sign Decisions, and Teach through Design 2nd Ed. Rockport Pub- lishers

work page 2010
[50]

and Accorsi, R

Lowis, L. and Accorsi, R. 2011. Vulnerability analysis in soa- based business processes. IEEE Transactions on Services Com- puting 4, 3, 230–242

work page 2011
[51]

Luo, Q., Hariri, F., Eloussi, L., and Marinov, D. 2014. An empirical analysis of ﬂaky tests. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 643–653

work page 2014
[52]

, Ray, B., and Kim, M

McDonnell, T. , Ray, B., and Kim, M. 2013. An empirical study of api stability and adoption in the android ecosystem. In Proc. Int’l Conf. on Software Maintenance (ICSM). IEEE, 70–79

work page 2013
[53]

Memon, A. M. 2002. GUI testing: Pitfalls and process. Com- puter 35, 8, 87–88

work page 2002
[54]

N., Fritz, T., Murphy, G

Meyer, A. N., Fritz, T., Murphy, G. C., and Zimmermann, T. 2014. Software developers’ perceptions of productivity. In Pro- ceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 19–29

work page 2014
[55]

Mileva, Y. M. , Dallmeier, V. , Burger, M. , and Zeller, A. 2009. Mining trends of library usage. In Proceedings of the joint international and annual ERCIM workshops on Princi- ples of software evolution (IWPSE) and software evolution (Evol) workshops. ACM, 57–62

work page 2009
[56]

, Fielding, R

Mockus, A. , Fielding, R. T. , and Herbsleb, J. D. 2002. Two case studies of open source software development: Apache and mozilla. ACM Transactions on Software Engineering and Methodology (TOSEM) 11, 3, 309–346

work page 2002
[57]

and H ˚akansson, A

Moradian, E. and H ˚akansson, A. 2006. Possible attacks on xml web services. IJCSNS International Journal of Computer Science and Network Security 6, 1B, 154–170

work page 2006
[58]

and Cubranic, D

Murphy, G. and Cubranic, D. 2004. Automatic bug triage using text categorization. In Proceedings of the International Conference on Software Engineering & Knowledge Engineering (SEKE). 92–97

work page 2004
[59]

Nagwani, N., Verma, S., and Mehta, K. K. 2013. Generating taxonomic terms for software bug classiﬁcation by utilizing topic models based on latent dirichlet allocation. InICT and Knowledge Engineering (ICT&KE), 2013 11th International Conference on . IEEE, 1–5

work page 2013
[60]

Nasrabadi, N. M. 2007. Pattern recognition and machine learning. Journal of electronic imaging 16, 4, 049901

work page 2007
[61]

Ostrand, T. J. and Weyuker, E. J.1984. Collecting and cate- gorizing software error data in an industrial environment. Journal of Systems and Software 4, 4, 289–300

work page 1984
[62]

, Bavota, G., Oliveto, R., Di Penta, M

Palomba, F., Linares-V´asquez, M. , Bavota, G., Oliveto, R., Di Penta, M. , Poshyvanyk, D., and De Lucia, A. 2018. Crowdsourcing user reviews to support the evolution of mobile apps. Journal of Systems and Software 137 , 143–162

work page 2018
[63]

, Salza, P

Palomba, F. , Salza, P. , Ciurumelea, A. , Panichella, S. , Gall, H., Ferrucci, F., and De Lucia, A. 2017. Recommend- ing and localizing change requests for mobile apps based on user reviews. In Proceedings of the 39th international conference on software engineering. IEEE Press, 106–117

work page 2017
[64]

and Zaidman, A

Palomba, F. and Zaidman, A. 2017. Does refactoring of test smells induce ﬁxing ﬂaky tests? In Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on . IEEE, 1–12

work page 2017
[65]

Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshy- vanyk, D., and De Lucia, A. 2013. How to eﬀectively use topic models for software engineering tasks? an approach based on ge- netic algorithms. In Proceedings of the 2013 International Con- ference on Software Engineering. IEEE Press, 522–531

work page 2013
[66]

, Nijholt, A., and Huang, T

Pantic, M., Pentland, A. , Nijholt, A., and Huang, T. S

work page
[67]

In Artiﬁcal Intelligence for Human Comput- ing

Human computing and machine understanding of human behavior: a survey. In Artiﬁcal Intelligence for Human Comput- ing. Springer, 47–71

work page
[68]

, Spadini, D

Pascarella, L. , Spadini, D. , Palomba, F. , Bruntink, M. , and Bacchelli, A. 2018. Information needs in contemporary code review. Proceedings of the ACM on Human-Computer In- teraction 2, CSCW, 135

work page 2018
[69]

Peng, J., Heisterkamp, D. R. , and Dai, H. 2001. Lda/svm driven nearest neighbor classiﬁcation. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on . Vol. 1. IEEE, I–I

work page 2001
[70]

Porter, M. F. 1980. An algorithm for suﬃx stripping. Pro- gram 14, 3, 130–137

work page 1980
[71]

Ray, B., Hellendoorn, V., Godhane, S., Tu, Z., Bacchelli, A., and Devanbu, P. 2016. On the naturalness of buggy code. In Proceedings of the International Conference on Software En- gineering (ICSE). ACM, 428–439

work page 2016
[72]

, Tang, L

Refaeilzadeh, P. , Tang, L. , and Liu, H. 2009. Cross- validation. In Encyclopedia of database systems . Springer, 532– 538

work page 2009
[73]

Robbes, R., Lungu, M., and R¨othlisberger, D. 2012. How do developers react to api deprecation?: the case of a smalltalk ecosystem. In Proceedings of the ACM SIGSOFT 20th Interna- tional Symposium on the Foundations of Software Engineering . ACM, 56

work page 2012
[74]

and Buckley, C.1988

Salton, G. and Buckley, C.1988. Term-weighting approaches in automatic text retrieval. Information processing & manage- ment 24, 5, 513–523

work page 1988
[75]

, DUva, C., De Lucia, A., and Ferrucci, F

Salza, P., Palomba, F., Di Nucci, D. , DUva, C., De Lucia, A., and Ferrucci, F. 2018. Do developers update third-party libraries in mobile apps?

work page 2018
[76]

, Premraj, R

Schr¨oter, A., Zimmermann, T. , Premraj, R. , and Zeller, A. 2006. If your bug database could talk. In Proceedings of the 5th international symposium on empirical software engineering . Vol. 2. 18–20

work page 2006
[77]

Shokripour, R., Anvik, J., Kasirun, Z. M. , and Zamani, S

work page
[78]

In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on

Why so complicated? simple term ﬁltering and weight- ing for location-based bug report assignment recommendation. In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on. IEEE, 2–11

work page 2013
[79]

Stone, M. 1974. Cross-validatory choice and assessment of sta- tistical predictions. Journal of the royal statistical society. Series B (Methodological), 111–147

work page 1974
[80]

A discriminative model approach for accurate duplicate bug report retrieval

Sun, C., Lo, D., Wang, X., Jiang, J., and Khoo, S.-C.2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 . ACM, 45–54

work page 2010

Showing first 80 references.

[1] [1]

Akila, V., Zayaraz, G., and Govindasamy, V. 2015. Eﬀective bug triage–a framework. Procedia Computer Science 48, 114–120

work page 2015

[2] [2]

, Villaneau, J

Antoine, J.-Y. , Villaneau, J. , and Lefeuvre, A. 2014. Weighted krippendorﬀ’s alpha is a more reliable metrics for multi- coders ordinal annotations: experimental studies on emotion, opinion and coreference annotation. In EACL 2014. 10–p

work page 2014

[3] [3]

, Ayari, K

Antoniol, G. , Ayari, K. , Di Penta, M. , Khomh, F. , and Gu´eh´eneuc, Y.-G. 2008. Is it a bug or an enhancement?: a text-based approach to classify change requests. In Proceedings of the 2008 conference of the center for advanced studies on collab- orative research: meeting of minds . ACM, 23

work page 2008

[4] [4]

Anvik, J. 2006. Automating bug report assignment. In Proc. Int’l Conference on Software Engineering (ICSE). ACM, 937–940. 18

work page 2006

[5] [5]

, Hiew, L

Anvik, J. , Hiew, L. , and Murphy, G. C. 2006. Who should ﬁx this bug? In Proceedings of the International Conference on Software Engineering (ICSE). ACM, 361–370

work page 2006

[6] [6]

and Murphy, G

Anvik, J. and Murphy, G. C. 2011. Reducing the eﬀort of bug report triage: Recommenders for development-oriented deci- sions. ACM Transactions on Software Engineering and Method- ology (TOSEM) 20, 3, 10

work page 2011

[7] [7]

and Venolia, G

Aranda, J. and Venolia, G. 2009. The secret life of bugs: Go- ing past the errors and omissions in software repositories. In Pro- ceedings of the International Conference on Software Engineering (ICSE). IEEE Computer Society, 298–308

work page 2009

[8] [8]

, Krsul, I

Aslam, T. , Krsul, I. , and Spafford, E. H. 1996. Use of a taxonomy of security faults

work page 1996

[9] [9]

Baeza-Yates, R. A. and Ribeiro-Neto, B. 1999. Modern In- formation Retrieval . Addison-Wesley Longman Publishing Co., Inc

work page 1999

[10] [10]

Baldi, P., Brunak, S., Chauvin, Y., Andersen, C. A. , and Nielsen, H. 2000. Assessing the accuracy of prediction algorithms for classiﬁcation: an overview. Bioinformatics 16, 5, 412–424

work page 2000

[11] [11]

Bauer, M. W. 2007. Content analysis. an introduction to its methodology–by klaus krippendorﬀ from words to numbers. nar- rative, data and social science–by roberto franzosi. The British Journal of Sociology 58, 2, 329–331

work page 2007

[12] [12]

E., Di Penta, M

Bavota, G., Linares-Vasquez, M., Bernal-Cardenas, C. E., Di Penta, M. , Oliveto, R. , and Poshyvanyk, D. 2015. The impact of api change-and fault-proneness on the user ratings of android apps. IEEE Transactions on Software Engineering 41, 4, 384–407

work page 2015

[13] [13]

Bell, J., Legunsen, O., Hilton, M., Eloussi, L., Yung, T., and Marinov, D. 2018. Deﬂaker: Automatically detecting ﬂaky tests. In Proceedings of the International Conference on Software Engineering (ICSE). ACM

work page 2018

[14] [14]

, Gousios, G

Beller, M. , Gousios, G. , Panichella, A. , Proksch, S. , Amann, S., and Zaidman, A. Developer testing in the ide: Pat- terns, beliefs, and behavior. IEEE Transactions on Software En- gineering (TSE). To Appear

work page

[15] [15]

Beller, M., Gousios, G., Panichella, A., and Zaidman, A

work page

[16] [16]

In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE)

When, how, and why developers (do not) test in their ides. In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering (ESEC/FSE). ACM, 179–190

work page 2015

[17] [17]

, Gousios, G

Beller, M. , Gousios, G. , and Zaidman, A. 2017. Oops, my tests broke the build: An explorative analysis of Travis CI with GitHub. In Mining Software Repositories (MSR), 2017 IEEE/ACM 14th International Conference on . IEEE, 356–367

work page 2017

[18] [18]

Beller, M., Spruit, N., Spinellis, D., and Zaidman, A. 2018. On the dichotomy of debugging behavior among programmers. In Proceedings of the 40th International Conference on Software Engineering (ICSE). ACM, 572–583

work page 2018

[19] [19]

and Bengio, Y

Bergstra, J. and Bengio, Y. 2012. Random search for hyper-parameter optimization. Journal of Machine Learning Re- search 13, Feb, 281–305

work page 2012

[20] [20]

, and Zimmermann, T

Bettenburg, N., Just, S., Schr¨oter, A., Weiß, C., Prem- raj, R. , and Zimmermann, T. 2007. Quality of bug reports in eclipse. In Proceedings of the 2007 OOPSLA workshop on eclipse technology eXchange. ACM, 21–25

work page 2007

[21] [21]

Bezemer, C.-P., McIntosh, S., Adams, B., German, D. M. , and Hassan, A. E. 2017. An empirical study of unspeciﬁed de- pendencies in make-based build systems. Empirical Software En- gineering 22, 6, 3117–3148

work page 2017

[22] [22]

Blei, D. M. , Ng, A. Y. , and Jordan, M. I. 2003. Latent dirichlet allocation. Journal of machine Learning research 3, Jan, 993–1022

work page 2003

[23] [23]

, Premraj, R

Breu, S. , Premraj, R. , Sillito, J. , and Zimmermann, T

work page

[24] [24]

In Proceedings of the ACM confer- ence on Computer Supported Cooperative Work (CSCW)

Information needs in bug reports: improving cooperation between developers and users. In Proceedings of the ACM confer- ence on Computer Supported Cooperative Work (CSCW) . ACM, 301–310

work page

[25] [25]

Bruning, S., Weissleder, S., and Malek, M. 2007. A fault taxonomy for service-oriented architecture. In High Assurance Systems Engineering Symposium, 2007. HASE’07. 10th IEEE . IEEE, 367–368

work page 2007

[26] [26]

and Abran, A

Buglione, L. and Abran, A. 2006. Introducing root-cause analysis and orthogonal defect classiﬁcation at lower cmmi matu- rity levels. Proc. MENSURA 910, 29–40

work page 2006

[27] [27]

Catolino, G., Palomba, F., Zaidman, A., and Ferrucci, F

work page

[28] [28]

com/s/dcb95c70c4472b2ac935

Not all bugs are created equal: Understanding and classify- ing the root cause of bugs - online appendix https://figshare. com/s/dcb95c70c4472b2ac935

work page

[29] [29]

M., Bishop, J., Steyn, J., Baresi, L., and Guinea, S

Chan, K. M., Bishop, J., Steyn, J., Baresi, L., and Guinea, S. 2007. A fault taxonomy for web service composition. In In- ternational Conference on Service-Oriented Computing. Springer, 363–375

work page 2007

[30] [30]

Chawla, N. V. , Bowyer, K. W. , Hall, L. O. , and Kegelmeyer, W. P. 2002. Smote: synthetic minority over- sampling technique. Journal of artiﬁcial intelligence research 16 , 321–357

work page 2002

[31] [31]

, Bhandari, I

Chillarege, R. , Bhandari, I. S. , Chaar, J. K. , Halliday, M. J. , Moebus, D. S. , Ray, B. K. , and Wong, M.-Y. 1992. Orthogonal defect classiﬁcation-a concept for in-process measure- ments. IEEE Transactions on software Engineering 18, 11, 943– 956

work page 1992

[32] [32]

Chowdhury, G. G. 2003. Natural language processing. Annual review of information science and technology 37, 1, 51–89

work page 2003

[33] [33]

The evolu- tion and decay of statically detected source code vulnerabilities

Di Penta, M., Cerulo, L., and Aversano, L.2008. The evolu- tion and decay of statically detected source code vulnerabilities. In Eighth IEEE International Working Conference on Source Code Analysis and Manipulation . IEEE, 101–110

work page 2008

[34] [34]

, Denger, C

Freimut, B. , Denger, C. , and Ketterer, M. 2005. An in- dustrial case study of implementing and validating defect classiﬁ- cation for process improvement and quality management. In Soft- ware Metrics, 2005. 11th IEEE International Symposium . IEEE, 10–pp

work page 2005

[35] [35]

word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method

Goldberg, Y. and Levy, O. 2014. word2vec explained: Deriv- ing mikolov et al.’s negative-sampling word-embedding method. arXiv preprint arXiv:1402.3722

work page internal anchor Pith review Pith/arXiv arXiv 2014

[36] [36]

Gousios, G., Zaidman, A., Storey, M.-A., and Van Deursen, A. 2015. Work practices and challenges in pull-based develop- ment: the integrator’s perspective. In Proceedings of the 37th In- ternational Conference on Software Engineering-Volume 1. IEEE Press, 358–368

work page 2015

[37] [37]

Hall, T., Beecham, S., Bowes, D., Gray, D., and Counsell, S. 2011. Developing fault-prediction models: What the research can show industry. IEEE software 28, 6, 96–99

work page 2011

[38] [38]

Topic Modelling of Empirical Text Corpora: Validity, Reliability, and Reproducibility in Comparison to Semantic Maps

Hecking, T. and Leydesdorff, L. 2018. Topic modelling of empirical text corpora: Validity, reliability, and reproducibility in comparison to semantic maps. arXiv preprint arXiv:1806.01045

work page internal anchor Pith review Pith/arXiv arXiv 2018

[39] [39]

, Rodriguez, D

Hern´andez-Gonz´alez, J. , Rodriguez, D. , Inza, I. , Harri- son, R., and Lozano, J. A. 2018. Learning to classify software defects from crowds: a novel approach. Applied Soft Comput- ing 62 , 579–591

work page 2018

[40] [40]

Herzig, K., Just, S., and Zeller, A. 2013. It’s not a bug, it’s a feature: how misclassiﬁcation impacts bug prediction. In Pro- ceedings of the International Conference on Software Engineering (ICSE). IEEE, 392–401

work page 2013

[41] [41]

and Weimer, W

Hooimeijer, P. and Weimer, W. 2007. Modeling bug report quality. In Proceedings of the international conference on Auto- mated software engineering (ASE) . ACM, 34–43

work page 2007

[42] [42]

Huang, L., Ng, V., Persing, I., Chen, M., Li, Z., Geng, R., and Tian, J. 2015. Autoodc: Automated generation of orthogonal defect classiﬁcations. Automated Software Engineering 22, 1, 3– 46

work page 2015

[43] [43]

Javed, M. Y. , Mohsin, H. , et al. 2012. An automated ap- proach for software bug classiﬁcation. In Complex, Intelligent and Software Intensive Systems (CISIS), 2012 Sixth International Conference on. IEEE, 414–419

work page 2012

[44] [44]

Jeong, G., Kim, S., and Zimmermann, T. 2009. Improving bug triage with bug tossing graphs. In Proceedings of the joint meeting of the European software engineering conference & the symposium on The foundations of software engineering (ESEC/FSE) . ACM, 111–120

work page 2009

[45] [45]

, Adamoli, A

Jovic, M. , Adamoli, A. , and Hauswirth, M. 2011. Catch me if you can: performance bug detection in the wild. In ACM SIGPLAN Notices. Vol. 46. ACM, 155–170. 19

work page 2011

[46] [46]

and Sureka, A

Lal, S. and Sureka, A. 2012. Comparison of seven bug report types: A case-study of google chrome browser project. In Software Engineering Conference (APSEC), 2012 19th Asia-Paciﬁc. Vol. 1. IEEE, 517–526

work page 2012

[47] [47]

and Mikolov, T

Le, Q. and Mikolov, T. 2014. Distributed representations of sentences and documents. In International Conference on Ma- chine Learning. 1188–1196

work page 2014

[48] [48]

E., and Stoll, D

Leszak, M., Perry, D. E., and Stoll, D. 2002. Classiﬁcation and evaluation of defects in a project retrospective. Journal of Systems and Software 61, 3, 173–187

work page 2002

[49] [49]

, Holden, K

Lidwell, W. , Holden, K. , and Butler, J. 2010. Universal Principles of Design, Revised and Updated: 125 Ways to Enhance Usability, Inﬂuence Perception, Increase Appeal, Make Better De- sign Decisions, and Teach through Design 2nd Ed. Rockport Pub- lishers

work page 2010

[50] [50]

and Accorsi, R

Lowis, L. and Accorsi, R. 2011. Vulnerability analysis in soa- based business processes. IEEE Transactions on Services Com- puting 4, 3, 230–242

work page 2011

[51] [51]

Luo, Q., Hariri, F., Eloussi, L., and Marinov, D. 2014. An empirical analysis of ﬂaky tests. In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM, 643–653

work page 2014

[52] [52]

, Ray, B., and Kim, M

McDonnell, T. , Ray, B., and Kim, M. 2013. An empirical study of api stability and adoption in the android ecosystem. In Proc. Int’l Conf. on Software Maintenance (ICSM). IEEE, 70–79

work page 2013

[53] [53]

Memon, A. M. 2002. GUI testing: Pitfalls and process. Com- puter 35, 8, 87–88

work page 2002

[54] [54]

N., Fritz, T., Murphy, G

Meyer, A. N., Fritz, T., Murphy, G. C., and Zimmermann, T. 2014. Software developers’ perceptions of productivity. In Pro- ceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering . ACM, 19–29

work page 2014

[55] [55]

Mileva, Y. M. , Dallmeier, V. , Burger, M. , and Zeller, A. 2009. Mining trends of library usage. In Proceedings of the joint international and annual ERCIM workshops on Princi- ples of software evolution (IWPSE) and software evolution (Evol) workshops. ACM, 57–62

work page 2009

[56] [56]

, Fielding, R

Mockus, A. , Fielding, R. T. , and Herbsleb, J. D. 2002. Two case studies of open source software development: Apache and mozilla. ACM Transactions on Software Engineering and Methodology (TOSEM) 11, 3, 309–346

work page 2002

[57] [57]

and H ˚akansson, A

Moradian, E. and H ˚akansson, A. 2006. Possible attacks on xml web services. IJCSNS International Journal of Computer Science and Network Security 6, 1B, 154–170

work page 2006

[58] [58]

and Cubranic, D

Murphy, G. and Cubranic, D. 2004. Automatic bug triage using text categorization. In Proceedings of the International Conference on Software Engineering & Knowledge Engineering (SEKE). 92–97

work page 2004

[59] [59]

Nagwani, N., Verma, S., and Mehta, K. K. 2013. Generating taxonomic terms for software bug classiﬁcation by utilizing topic models based on latent dirichlet allocation. InICT and Knowledge Engineering (ICT&KE), 2013 11th International Conference on . IEEE, 1–5

work page 2013

[60] [60]

Nasrabadi, N. M. 2007. Pattern recognition and machine learning. Journal of electronic imaging 16, 4, 049901

work page 2007

[61] [61]

Ostrand, T. J. and Weyuker, E. J.1984. Collecting and cate- gorizing software error data in an industrial environment. Journal of Systems and Software 4, 4, 289–300

work page 1984

[62] [62]

, Bavota, G., Oliveto, R., Di Penta, M

Palomba, F., Linares-V´asquez, M. , Bavota, G., Oliveto, R., Di Penta, M. , Poshyvanyk, D., and De Lucia, A. 2018. Crowdsourcing user reviews to support the evolution of mobile apps. Journal of Systems and Software 137 , 143–162

work page 2018

[63] [63]

, Salza, P

Palomba, F. , Salza, P. , Ciurumelea, A. , Panichella, S. , Gall, H., Ferrucci, F., and De Lucia, A. 2017. Recommend- ing and localizing change requests for mobile apps based on user reviews. In Proceedings of the 39th international conference on software engineering. IEEE Press, 106–117

work page 2017

[64] [64]

and Zaidman, A

Palomba, F. and Zaidman, A. 2017. Does refactoring of test smells induce ﬁxing ﬂaky tests? In Software Maintenance and Evolution (ICSME), 2017 IEEE International Conference on . IEEE, 1–12

work page 2017

[65] [65]

Panichella, A., Dit, B., Oliveto, R., Di Penta, M., Poshy- vanyk, D., and De Lucia, A. 2013. How to eﬀectively use topic models for software engineering tasks? an approach based on ge- netic algorithms. In Proceedings of the 2013 International Con- ference on Software Engineering. IEEE Press, 522–531

work page 2013

[66] [66]

, Nijholt, A., and Huang, T

Pantic, M., Pentland, A. , Nijholt, A., and Huang, T. S

work page

[67] [67]

In Artiﬁcal Intelligence for Human Comput- ing

Human computing and machine understanding of human behavior: a survey. In Artiﬁcal Intelligence for Human Comput- ing. Springer, 47–71

work page

[68] [68]

, Spadini, D

Pascarella, L. , Spadini, D. , Palomba, F. , Bruntink, M. , and Bacchelli, A. 2018. Information needs in contemporary code review. Proceedings of the ACM on Human-Computer In- teraction 2, CSCW, 135

work page 2018

[69] [69]

Peng, J., Heisterkamp, D. R. , and Dai, H. 2001. Lda/svm driven nearest neighbor classiﬁcation. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on . Vol. 1. IEEE, I–I

work page 2001

[70] [70]

Porter, M. F. 1980. An algorithm for suﬃx stripping. Pro- gram 14, 3, 130–137

work page 1980

[71] [71]

Ray, B., Hellendoorn, V., Godhane, S., Tu, Z., Bacchelli, A., and Devanbu, P. 2016. On the naturalness of buggy code. In Proceedings of the International Conference on Software En- gineering (ICSE). ACM, 428–439

work page 2016

[72] [72]

, Tang, L

Refaeilzadeh, P. , Tang, L. , and Liu, H. 2009. Cross- validation. In Encyclopedia of database systems . Springer, 532– 538

work page 2009

[73] [73]

Robbes, R., Lungu, M., and R¨othlisberger, D. 2012. How do developers react to api deprecation?: the case of a smalltalk ecosystem. In Proceedings of the ACM SIGSOFT 20th Interna- tional Symposium on the Foundations of Software Engineering . ACM, 56

work page 2012

[74] [74]

and Buckley, C.1988

Salton, G. and Buckley, C.1988. Term-weighting approaches in automatic text retrieval. Information processing & manage- ment 24, 5, 513–523

work page 1988

[75] [75]

, DUva, C., De Lucia, A., and Ferrucci, F

Salza, P., Palomba, F., Di Nucci, D. , DUva, C., De Lucia, A., and Ferrucci, F. 2018. Do developers update third-party libraries in mobile apps?

work page 2018

[76] [76]

, Premraj, R

Schr¨oter, A., Zimmermann, T. , Premraj, R. , and Zeller, A. 2006. If your bug database could talk. In Proceedings of the 5th international symposium on empirical software engineering . Vol. 2. 18–20

work page 2006

[77] [77]

Shokripour, R., Anvik, J., Kasirun, Z. M. , and Zamani, S

work page

[78] [78]

In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on

Why so complicated? simple term ﬁltering and weight- ing for location-based bug report assignment recommendation. In Mining Software Repositories (MSR), 2013 10th IEEE Working Conference on. IEEE, 2–11

work page 2013

[79] [79]

Stone, M. 1974. Cross-validatory choice and assessment of sta- tistical predictions. Journal of the royal statistical society. Series B (Methodological), 111–147

work page 1974

[80] [80]

A discriminative model approach for accurate duplicate bug report retrieval

Sun, C., Lo, D., Wang, X., Jiang, J., and Khoo, S.-C.2010. A discriminative model approach for accurate duplicate bug report retrieval. In Proceedings of the 32nd ACM/IEEE International Conference on Software Engineering-Volume 1 . ACM, 45–54

work page 2010