Confident Learning-based Network for Detecting Bug-Inducing Commits on SZZ with Noisy Labels

Qiyun Zhao; Weihao Sun

arxiv: 2605.27880 · v1 · pith:GGD5LGS2new · submitted 2026-05-27 · 💻 cs.SE

Confident Learning-based Network for Detecting Bug-Inducing Commits on SZZ with Noisy Labels

Weihao Sun , Qiyun Zhao This is my paper

Pith reviewed 2026-06-29 11:39 UTC · model grok-4.3

classification 💻 cs.SE

keywords bug-inducing commitsSZZ labelsconfident learninggraph convolutional networksJIT defect predictionnoisy labelssoftware qualitydefect detection

0 comments

The pith

BIC-Hunter cleans noisy SZZ labels with confident learning and uses graph convolutions on homogeneous commit graphs to raise recall in bug-inducing commit detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that real-world SZZ data contains annotation noise and weak semantic links that hurt just-in-time defect prediction, and that a two-part model can fix both problems. A denoising stage applies confident learning to spot and repair inaccurate labels, while a second stage builds homogeneous graphs of code commits and runs graph convolutional networks to recover missing context. If the approach holds, training data becomes more trustworthy and models identify which commits introduce defects with higher precision on large open-source collections. Experiments on a merged dataset from three projects report gains of 6.16 percent, 7.13 percent, and 5.53 percent on Recall@1, Recall@2, and Recall@3, together with an 8.43-to-32.82 percent lift in the MFR index.

Core claim

The BIC-Hunter model mitigates inaccurate annotations and inconsistencies in SZZ-labeled data through a confident-learning denoising component and constructs homogeneous graphs processed by graph convolutional networks to capture semantic relationships among commits, thereby improving the accuracy of bug-inducing commit identification over prior methods.

What carries the argument

Confident learning applied to label correction, paired with homogeneous graph construction and graph convolutional networks for semantic context analysis.

If this is right

Training sets for JIT defect prediction become more reliable after noise correction.
Graph convolutional networks on commit graphs recover semantic links that earlier models missed.
Overall robustness to real-world label noise increases.
Recall@1, Recall@2, Recall@3 and MFR all rise by the reported margins on merged open-source data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same denoising-plus-graph pattern could be tested on other noisy-label tasks such as vulnerability or smell detection.
Replacing the homogeneous graph with a heterogeneous one that also links files and developers might further improve context capture.
The method supplies a concrete way to measure how much label noise currently limits existing SZZ-based detectors.

Load-bearing premise

Confident learning can reliably spot and fix inaccurate SZZ annotations without adding new biases that lower downstream detection performance.

What would settle it

On a hold-out set of commits whose inducing status has been independently verified by multiple developers, measure whether applying the confident-learning stage increases final Recall@K compared with training on the raw noisy labels.

Figures

Figures reproduced from arXiv: 2605.27880 by Qiyun Zhao, Weihao Sun.

**Figure 2.** Figure 2: The workflow of the SZZ algorithm. although not problematic at the time, eventually lead to issues that will require future changes to fix the introduced bugs. Effectively identifying and preventing software bugs is a key objective in the software engineering community. Threrfore, Sliwerski et al. [10] proposed the SZZ algorithm, to identify changes that introduce bugs. To accurately identify bug-inducing … view at source ↗

**Figure 3.** Figure 3: The framework of BIC-Hunter. 3.2 Data Denoising Component To obtain higher-quality learning data and reduce noise in the model input, the confidence learning denoising component, which is based on confident learning that calculates the confidence level of the data and ranks the data according to their respective confidence scores, is employed to process and denoise the initial data. First, we input the del… view at source ↗

**Figure 4.** Figure 4: The Process of Denoising Data Using Confidence Learning. [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Graph Construction 3.3 Graph Convolutional Network Component In this section, We have constructed a weighted Graph Convolutional Network (GCN) model, which is built upon the GCN framework. This model inputs high-quality data and generates probability outputs, enabling the calculation of the root cause probability for each deletion node. The module is divided into the following parts. In Section 3.3.1, we p… view at source ↗

read the original abstract

The Just-In-Time (JIT) defect prediction model serves as a critical tool for ensuring the quality of software development and enhancing software performance. It assists development teams in promptly identifying and addressing potential issues by predicting whether code submissions may introduce defects. However, due to the existence of data noise and insufficient semantic connections in real-world scenarios, existing approaches face challenges in accurately identifying the code commits that introduce defects and capturing the potential semantic relationships. To address these challenges, we propose the BIC- Hunter(Bug-Inducing Commits Hunter) model, which mitigates data noise and improves semantic understanding, thereby enhancing the accuracy of bug-inducing commit identification. BIC - Hunter model consists of two components: a data denoising component and a semantic relationship capturing component. Specifically, the data denoising component addresses the challenges posed by inaccurate annotations and inconsistencies in real-world data, enhancing the reliability of training data and improving overall model robustness. The semantic relation- ship capturing component constructs homogeneous graphs and applies graph convolutional networks to facilitate a more comprehensive analysis of code context, enabling the identification of defects caused by code commits and enhancing the confidence in pinpointing their root causes. Experimental studies on a large-scale dataset integrated from three open-source datasets show that BIC- Hunter exhibits outstanding performance. BIC- Hunter outperforms the state-of-the-art by 6.16%, 7.13%, and 5.53% on Recall@1, Recall@2, and Recall@3, respectively, while the MFR index increases by 8.43% to 32.82%. These results demonstrate the superior capability of our method in identifying bug-inducing commits.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

BIC-Hunter applies confident learning to clean SZZ noise and GCNs to model commit graphs, claiming 5-7% recall lifts, but the abstract supplies no experimental details to assess whether those gains are real.

read the letter

The paper's main move is to pair confident learning for fixing noisy SZZ labels with graph convolutional networks that build homogeneous graphs over commits. This produces BIC-Hunter and the abstract reports it beats prior work by 6.16%, 7.13%, and 5.53% on Recall@1/2/3 plus an 8.43% MFR improvement on a merged dataset from three open-source projects.

What it does reasonably is name the two practical problems—label noise and weak semantic context—and pick established tools that target each one. Confident learning was designed for annotation errors, so the fit is logical. The graph step tries to capture relationships that simpler features miss. The numbers are specific enough that a referee could check them if the code and data splits are released.

The soft spots are all in the missing evidence. The abstract gives performance numbers but no baselines, no dataset sizes or integration steps, no statistical tests, and no ablation showing that the denoising step actually helps the final task rather than just relabeling data. The central assumption—that confident learning removes inaccurate SZZ annotations without injecting new bias—is stated but not demonstrated. Without those checks the claimed gains cannot be evaluated.

This work is aimed at the narrow group of researchers who already follow SZZ-based just-in-time defect prediction and want to see one more denoising-plus-graph variant. A reader outside that subfield or anyone looking for a new framework will find little to use.

I would not bring it to a general reading group and would not cite it unless the full experiments prove reproducible. It still deserves a serious referee because the problem is concrete, the techniques are off-the-shelf, and a reviewer can quickly decide whether the missing validation details are present in the full manuscript. Send it out rather than desk reject, but expect the first round to focus on experimental rigor.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes BIC-Hunter, a model for identifying bug-inducing commits (BICs) that combines a confident-learning-based data-denoising component to mitigate noisy SZZ labels with a semantic-relationship component that builds homogeneous graphs and applies graph convolutional networks. Experiments on a large-scale dataset formed by integrating three open-source projects report that BIC-Hunter outperforms prior work by 6.16%, 7.13%, and 5.53% on Recall@1, Recall@2, and Recall@3 respectively, while improving the MFR index by 8.43% to 32.82%.

Significance. If the reported gains are shown to be robust after proper controls for baselines, dataset construction, and component validation, the work would address two recognized difficulties in just-in-time defect prediction: label noise arising from the SZZ algorithm and limited modeling of semantic context among commits. The combination of confident learning for denoising and GCNs on homogeneous graphs is a plausible direction, but its practical impact cannot yet be judged from the supplied information.

major comments (1)

[Abstract] Abstract: performance numbers are stated without any description of the baselines, the construction or size of the integrated dataset, statistical significance testing, or ablation/validation results for the denoising and graph components; consequently the central empirical claim cannot be assessed.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review. The single major comment concerns the abstract's lack of detail on baselines, dataset, significance testing, and ablations. We address this below.

read point-by-point responses

Referee: [Abstract] Abstract: performance numbers are stated without any description of the baselines, the construction or size of the integrated dataset, statistical significance testing, or ablation/validation results for the denoising and graph components; consequently the central empirical claim cannot be assessed.

Authors: We agree the abstract is concise and omits these specifics, which are instead provided in the body (dataset construction and size in Section 4.1, baselines in Section 4.2, significance testing in Section 5.3, and component ablations in Section 5.4). To make the central claim more self-contained, we will revise the abstract to briefly note the integrated dataset from three projects, the SOTA baselines compared, and that full validation details appear in the experiments section. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The abstract and available description present an empirical ML model (BIC-Hunter) with two components: confident learning for denoising SZZ labels and GCN on homogeneous graphs for semantic capture. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains are quoted or present. Performance metrics are reported as experimental outcomes on integrated datasets rather than derived results. Without load-bearing mathematical steps that reduce to inputs by construction, the derivation chain (such as it is) is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5826 in / 1021 out tokens · 39103 ms · 2026-06-29T11:39:51.120694+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 9 canonical work pages · 6 internal anchors

[1]

Classifying software changes: Clean or buggy?IEEE Transactions on software engineering, 34(2):181–196, 2008

Sunghun Kim, E James Whitehead, and Yi Zhang. Classifying software changes: Clean or buggy?IEEE Transactions on software engineering, 34(2):181–196, 2008

2008
[2]

A large-scale empirical study of just-in-time quality assurance.IEEE Transactions on Software Engineering, 39(6):757–773, 2012

Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. A large-scale empirical study of just-in-time quality assurance.IEEE Transactions on Software Engineering, 39(6):757–773, 2012

2012
[3]

Towards reliable online just-in-time software defect prediction.IEEE Transactions on Software Engineering, 49(3):1342–1358, 2022

George G Cabral and Leandro L Minku. Towards reliable online just-in-time software defect prediction.IEEE Transactions on Software Engineering, 49(3):1342–1358, 2022

2022
[4]

Bridging expert knowledge with deep learning techniques for just-in-time defect prediction.Empirical Software Engineering, 30(1):1–44, 2025

Xin Zhou, DongGyun Han, and David Lo. Bridging expert knowledge with deep learning techniques for just-in-time defect prediction.Empirical Software Engineering, 30(1):1–44, 2025

2025
[5]

The impact of mislabeled changes by szz on just-in-time defect prediction.IEEE transactions on software engineering, 47(8):1559–1586, 2019

Yuanrui Fan, Xin Xia, Daniel Alencar Da Costa, David Lo, Ahmed E Hassan, and Shanping Li. The impact of mislabeled changes by szz on just-in-time defect prediction.IEEE transactions on software engineering, 47(8):1559–1586, 2019

2019
[6]

Do developers introduce bugs when they do not communicate? the case of eclipse and mozilla

Mario Luca Bernardi, Gerardo Canfora, Giuseppe A Di Lucca, Massimiliano Di Penta, and Damiano Distante. Do developers introduce bugs when they do not communicate? the case of eclipse and mozilla. In2012 16th European Conference on Software Maintenance and Reengineering, pages 139–148. IEEE, 2012

2012
[7]

How long does a bug survive? an empirical study

Gerardo Canfora, Michele Ceccarelli, Luigi Cerulo, and Massimiliano Di Penta. How long does a bug survive? an empirical study. In2011 18th Working Conference on Reverse Engineering, pages 191–200. IEEE, 2011

2011
[8]

Identifying failure inducing developer pairs within developer networks

Jordan Ell. Identifying failure inducing developer pairs within developer networks. In2013 35th International Conference on Software Engineering (ICSE), pages 1471–1473. IEEE, 2013

2013
[9]

Bug introducing changes: A case study with android

Muhammad Asaduzzaman, Michael C Bullock, Chanchal K Roy, and Kevin A Schneider. Bug introducing changes: A case study with android. In2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pages 116–119. IEEE, 2012

2012
[10]

When do changes induce fixes?ACM sigsoft software engineering notes, 30(4):1–5, 2005

Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. When do changes induce fixes?ACM sigsoft software engineering notes, 30(4):1–5, 2005

2005
[11]

Evaluating szz implementations through a developer-informed oracle

Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, and Rocco Oliveto. Evaluating szz implementations through a developer-informed oracle. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 436–447. IEEE, 2021

2021
[12]

Problems with szz and features: An empirical study of the state of practice of defect prediction data collection.Empirical Software Engineering, 27(2):42, 2022

Steffen Herbold, Alexander Trautsch, Fabian Trautsch, and Benjamin Ledel. Problems with szz and features: An empirical study of the state of practice of defect prediction data collection.Empirical Software Engineering, 27(2):42, 2022. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 2, Article 3. Publication date: May 2025. 3:24 Sun et al

2022
[13]

An empirical study on the use of szz for identifying inducing changes of non-functional bugs.Empirical Software Engineering, 26(4):71, 2021

Sophia Quach, Maxime Lamothe, Yasutaka Kamei, and Weiyi Shang. An empirical study on the use of szz for identifying inducing changes of non-functional bugs.Empirical Software Engineering, 26(4):71, 2021

2021
[14]

The impact of refactoring changes on the szz algorithm: An empirical study

Edmilson Campos Neto, Daniel Alencar Da Costa, and Uirá Kulesza. The impact of refactoring changes on the szz algorithm: An empirical study. In2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER), pages 380–390. IEEE, 2018

2018
[15]

Evaluating szz implementations: An empirical study on the linux kernel.IEEE Transactions on Software Engineering, 2024

Yunbo Lyu, Hong Jin Kang, Ratnadira Widyasari, Julia Lawall, and David Lo. Evaluating szz implementations: An empirical study on the linux kernel.IEEE Transactions on Software Engineering, 2024

2024
[16]

Automatic identification of bug-introducing changes

Sunghun Kim, Thomas Zimmermann, Kai Pan, E James Jr, et al. Automatic identification of bug-introducing changes. In21st IEEE/ACM international conference on automated software engineering (ASE’06), pages 81–90. IEEE, 2006

2006
[17]

Revisiting and improving szz implementations

Edmilson Campos Neto, Daniel Alencar Da Costa, and Uirá Kulesza. Revisiting and improving szz implementations. In2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–12. IEEE, 2019

2019
[18]

Neural szz algorithm

Lingxiao Tang, Lingfeng Bao, Xin Xia, and Zhongdong Huang. Neural szz algorithm. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1024–1035. IEEE, 2023

2023
[19]

A comprehensive evaluation of szz variants through a developer-informed oracle.Journal of Systems and Software, 202:111729, 2023

Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, and Rocco Oliveto. A comprehensive evaluation of szz variants through a developer-informed oracle.Journal of Systems and Software, 202:111729, 2023

2023
[20]

Exploring and exploiting the correlations between bug-inducing and bug-fixing commits

Ming Wen, Rongxin Wu, Yepang Liu, Yongqiang Tian, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. Exploring and exploiting the correlations between bug-inducing and bug-fixing commits. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 326–337, 2019

2019
[21]

Heterogeneous graph attention network

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. InThe world wide web conference, pages 2022–2032, 2019

2022
[22]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[23]

Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

Curtis Northcutt, Lu Jiang, and Isaac Chuang. Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

2021
[24]

https://github.com/Vandbs/BIC-Hunter

Our project package. https://github.com/Vandbs/BIC-Hunter
[25]

Predicting risk of software changes.Bell Labs Technical Journal, 5(2):169–180, 2000

Audris Mockus and David M Weiss. Predicting risk of software changes.Bell Labs Technical Journal, 5(2):169–180, 2000

2000
[26]

Class noise vs

Xingquan Zhu and Xindong Wu. Class noise vs. attribute noise: A quantitative study.Artificial intelligence review, 22:177–210, 2004

2004
[27]

Hipikat: Recommending pertinent software development artifacts

Davor Cubranic and Gail C Murphy. Hipikat: Recommending pertinent software development artifacts. In25th International Conference on Software Engineering, 2003. Proceedings., pages 408–418. IEEE, 2003

2003
[28]

Analyzing and relating bug report data for feature tracking

Michael Fischer, Martin Pinzger, and Harald Gall. Analyzing and relating bug report data for feature tracking. In WCRE, volume 3, page 90, 2003

2003
[29]

A framework for evaluating the results of the szz approach for identifying bug-introducing changes.IEEE Transactions on Software Engineering, 43(7):641–657, 2016

Daniel Alencar Da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E Hassan. A framework for evaluating the results of the szz approach for identifying bug-introducing changes.IEEE Transactions on Software Engineering, 43(7):641–657, 2016

2016
[30]

Identifying bug-inducing changes for code additions

Emre Sahal and Ayse Tosun. Identifying bug-inducing changes for code additions. InProceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pages 1–2, 2018

2018
[31]

V-szz: automatic identification of version ranges affected by cve vulnerabilities

Lingfeng Bao, Xin Xia, Ahmed E Hassan, and Xiaohu Yang. V-szz: automatic identification of version ranges affected by cve vulnerabilities. InProceedings of the 44th International Conference on Software Engineering, pages 2352–2364, 2022

2022
[32]

Pr-szz: How pull requests can support the tracing of defects in software repositories

Peter Bludau and Alexander Pretschner. Pr-szz: How pull requests can support the tracing of defects in software repositories. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 1–12. IEEE, 2022

2022
[33]

Identifying root cause of bugs by capturing changed code lines with relational graph neural networks

Jiaqi Zhang, Shikai Guo, Hui Li, Chenchen Li, Yu Chai, and Rong Chen. Identifying root cause of bugs by capturing changed code lines with relational graph neural networks. https://arxiv.org/abs/2505.00990,

work page arXiv
[34]

Detecting the root cause code lines in bug-fixing commits by heterogeneous graph learning

Liguo Ji, Shikai Guo, Lehuan Zhang, Hui Li, Yu Chai, and Rong Chen. Detecting the root cause code lines in bug-fixing commits by heterogeneous graph learning. http://export.arxiv.org/abs/2505.01022

work page arXiv
[35]

Getting defect prediction into industrial practice: the elff tool

David Bowes, Steve Counsell, Tracy Hall, Jean Petric, and Thomas Shippey. Getting defect prediction into industrial practice: the elff tool. In2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pages 44–47. IEEE, 2017

2017
[36]

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. Codebert: A pre-trained model for programming and natural languages.arXiv preprint arXiv:2002.08155, 2020. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 2, Article 3. Publication date: May 2025. Confident Learning-based Network for...

work page internal anchor Pith review Pith/arXiv arXiv 2002
[37]

From ranknet to lambdarank to lambdamart: An overview.Learning, 11(23-581):81, 2010

Christopher JC Burges. From ranknet to lambdarank to lambdamart: An overview.Learning, 11(23-581):81, 2010

2010
[38]

Yahoo! learning to rank challenge overview

Olivier Chapelle and Yi Chang. Yahoo! learning to rank challenge overview. InProceedings of the learning to rank challenge, pages 1–24. PMLR, 2011

2011
[39]

Adapting deep ranknet for personalized search

Yang Song, Hongning Wang, and Xiaodong He. Adapting deep ranknet for personalized search. InProceedings of the 7th ACM international conference on Web search and data mining, pages 83–92, 2014

2014
[40]

On application of learning to rank for e-commerce search

Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai. On application of learning to rank for e-commerce search. InProceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pages 475–484, 2017

2017
[41]

Regminer: towards constructing a large regression dataset from code evolution history

Xuezhi Song, Yun Lin, Siang Hwee Ng, Yijian Wu, Xin Peng, Jin Song Dong, and Hong Mei. Regminer: towards constructing a large regression dataset from code evolution history. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 314–326, 2022

2022
[42]

Radial basis functions.Acta numerica, 9:1–38, 2000

Martin Dietrich Buhmann. Radial basis functions.Acta numerica, 9:1–38, 2000

2000
[43]

A weighted gcn with logical adjacency matrix for relation extraction

Li Zhou, Tingyu Wang, Hong Qu, Li Huang, and Yuguo Liu. A weighted gcn with logical adjacency matrix for relation extraction. InECAI 2020, pages 2314–2321. IOS Press, 2020

2020
[44]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[45]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008

2008
[46]

Addressing the curse of imbalanced training sets: one-sided selection

Miroslav Kubat, Stan Matwin, et al. Addressing the curse of imbalanced training sets: one-sided selection. InIcml, volume 97, page 179. Citeseer, 1997

1997
[47]

Dealing with noise in defect prediction

Sunghun Kim, Hongyu Zhang, Rongxin Wu, and Liang Gong. Dealing with noise in defect prediction. InProceedings of the 33rd International Conference on Software Engineering, pages 481–490, 2011

2011
[48]

Random space division sampling for label-noisy classification or imbalanced classification.IEEE Transactions on Cybernetics, 52(10):10444–10457, 2021

Shuyin Xia, Yong Zheng, Guoyin Wang, Ping He, Heng Li, and Zizhong Chen. Random space division sampling for label-noisy classification or imbalanced classification.IEEE Transactions on Cybernetics, 52(10):10444–10457, 2021

2021
[49]

An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification

Washington Cunha, Celso França, Guilherme Fonseca, Leonardo Rocha, and Marcos André Gonçalves. An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 665–674, 2023

2023
[50]

Granular ball sampling for noisy label classification or imbalanced classification.IEEE Transactions on Neural Networks and Learning Systems, 34(4):2144–2155, 2021

Shuyin Xia, Shaoyuan Zheng, Guoyin Wang, Xinbo Gao, and Binggui Wang. Granular ball sampling for noisy label classification or imbalanced classification.IEEE Transactions on Neural Networks and Learning Systems, 34(4):2144–2155, 2021

2021
[51]

Deepergcn: All you need to train deeper gcns.arXiv preprint arXiv:2006.07739, 2020

Guohao Li, Chenxin Xiong, Ali Thabet, and Bernard Ghanem. Deepergcn: All you need to train deeper gcns.arXiv preprint arXiv:2006.07739, 2020

work page arXiv 2006
[52]

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[53]

Relational Graph Attention Networks

Dan Busbridge, Dane Sherburn, Pietro Cavallo, and Nils Y Hammerla. Relational graph attention networks.arXiv preprint arXiv:1904.05811, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904
[54]

Topology Adaptive Graph Convolutional Networks

Jian Du, Shanghang Zhang, Guanhang Wu, José MF Moura, and Soummya Kar. Topology adaptive graph convolutional networks.arXiv preprint arXiv:1710.10370, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[55]

Modeling relational data with graph convolutional networks

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. InThe semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15, pages 593–607. Springer, 2018. ACM Trans. Softw. Eng. Methodol., Vol. 1, No....

2018

[1] [1]

Classifying software changes: Clean or buggy?IEEE Transactions on software engineering, 34(2):181–196, 2008

Sunghun Kim, E James Whitehead, and Yi Zhang. Classifying software changes: Clean or buggy?IEEE Transactions on software engineering, 34(2):181–196, 2008

2008

[2] [2]

A large-scale empirical study of just-in-time quality assurance.IEEE Transactions on Software Engineering, 39(6):757–773, 2012

Yasutaka Kamei, Emad Shihab, Bram Adams, Ahmed E Hassan, Audris Mockus, Anand Sinha, and Naoyasu Ubayashi. A large-scale empirical study of just-in-time quality assurance.IEEE Transactions on Software Engineering, 39(6):757–773, 2012

2012

[3] [3]

Towards reliable online just-in-time software defect prediction.IEEE Transactions on Software Engineering, 49(3):1342–1358, 2022

George G Cabral and Leandro L Minku. Towards reliable online just-in-time software defect prediction.IEEE Transactions on Software Engineering, 49(3):1342–1358, 2022

2022

[4] [4]

Bridging expert knowledge with deep learning techniques for just-in-time defect prediction.Empirical Software Engineering, 30(1):1–44, 2025

Xin Zhou, DongGyun Han, and David Lo. Bridging expert knowledge with deep learning techniques for just-in-time defect prediction.Empirical Software Engineering, 30(1):1–44, 2025

2025

[5] [5]

The impact of mislabeled changes by szz on just-in-time defect prediction.IEEE transactions on software engineering, 47(8):1559–1586, 2019

Yuanrui Fan, Xin Xia, Daniel Alencar Da Costa, David Lo, Ahmed E Hassan, and Shanping Li. The impact of mislabeled changes by szz on just-in-time defect prediction.IEEE transactions on software engineering, 47(8):1559–1586, 2019

2019

[6] [6]

Do developers introduce bugs when they do not communicate? the case of eclipse and mozilla

Mario Luca Bernardi, Gerardo Canfora, Giuseppe A Di Lucca, Massimiliano Di Penta, and Damiano Distante. Do developers introduce bugs when they do not communicate? the case of eclipse and mozilla. In2012 16th European Conference on Software Maintenance and Reengineering, pages 139–148. IEEE, 2012

2012

[7] [7]

How long does a bug survive? an empirical study

Gerardo Canfora, Michele Ceccarelli, Luigi Cerulo, and Massimiliano Di Penta. How long does a bug survive? an empirical study. In2011 18th Working Conference on Reverse Engineering, pages 191–200. IEEE, 2011

2011

[8] [8]

Identifying failure inducing developer pairs within developer networks

Jordan Ell. Identifying failure inducing developer pairs within developer networks. In2013 35th International Conference on Software Engineering (ICSE), pages 1471–1473. IEEE, 2013

2013

[9] [9]

Bug introducing changes: A case study with android

Muhammad Asaduzzaman, Michael C Bullock, Chanchal K Roy, and Kevin A Schneider. Bug introducing changes: A case study with android. In2012 9th IEEE Working Conference on Mining Software Repositories (MSR), pages 116–119. IEEE, 2012

2012

[10] [10]

When do changes induce fixes?ACM sigsoft software engineering notes, 30(4):1–5, 2005

Jacek Śliwerski, Thomas Zimmermann, and Andreas Zeller. When do changes induce fixes?ACM sigsoft software engineering notes, 30(4):1–5, 2005

2005

[11] [11]

Evaluating szz implementations through a developer-informed oracle

Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, and Rocco Oliveto. Evaluating szz implementations through a developer-informed oracle. In2021 IEEE/ACM 43rd International Conference on Software Engineering (ICSE), pages 436–447. IEEE, 2021

2021

[12] [12]

Problems with szz and features: An empirical study of the state of practice of defect prediction data collection.Empirical Software Engineering, 27(2):42, 2022

Steffen Herbold, Alexander Trautsch, Fabian Trautsch, and Benjamin Ledel. Problems with szz and features: An empirical study of the state of practice of defect prediction data collection.Empirical Software Engineering, 27(2):42, 2022. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 2, Article 3. Publication date: May 2025. 3:24 Sun et al

2022

[13] [13]

An empirical study on the use of szz for identifying inducing changes of non-functional bugs.Empirical Software Engineering, 26(4):71, 2021

Sophia Quach, Maxime Lamothe, Yasutaka Kamei, and Weiyi Shang. An empirical study on the use of szz for identifying inducing changes of non-functional bugs.Empirical Software Engineering, 26(4):71, 2021

2021

[14] [14]

The impact of refactoring changes on the szz algorithm: An empirical study

Edmilson Campos Neto, Daniel Alencar Da Costa, and Uirá Kulesza. The impact of refactoring changes on the szz algorithm: An empirical study. In2018 IEEE 25th international conference on software analysis, evolution and reengineering (SANER), pages 380–390. IEEE, 2018

2018

[15] [15]

Evaluating szz implementations: An empirical study on the linux kernel.IEEE Transactions on Software Engineering, 2024

Yunbo Lyu, Hong Jin Kang, Ratnadira Widyasari, Julia Lawall, and David Lo. Evaluating szz implementations: An empirical study on the linux kernel.IEEE Transactions on Software Engineering, 2024

2024

[16] [16]

Automatic identification of bug-introducing changes

Sunghun Kim, Thomas Zimmermann, Kai Pan, E James Jr, et al. Automatic identification of bug-introducing changes. In21st IEEE/ACM international conference on automated software engineering (ASE’06), pages 81–90. IEEE, 2006

2006

[17] [17]

Revisiting and improving szz implementations

Edmilson Campos Neto, Daniel Alencar Da Costa, and Uirá Kulesza. Revisiting and improving szz implementations. In2019 ACM/IEEE International Symposium on Empirical Software Engineering and Measurement (ESEM), pages 1–12. IEEE, 2019

2019

[18] [18]

Neural szz algorithm

Lingxiao Tang, Lingfeng Bao, Xin Xia, and Zhongdong Huang. Neural szz algorithm. In2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1024–1035. IEEE, 2023

2023

[19] [19]

A comprehensive evaluation of szz variants through a developer-informed oracle.Journal of Systems and Software, 202:111729, 2023

Giovanni Rosa, Luca Pascarella, Simone Scalabrino, Rosalia Tufano, Gabriele Bavota, Michele Lanza, and Rocco Oliveto. A comprehensive evaluation of szz variants through a developer-informed oracle.Journal of Systems and Software, 202:111729, 2023

2023

[20] [20]

Exploring and exploiting the correlations between bug-inducing and bug-fixing commits

Ming Wen, Rongxin Wu, Yepang Liu, Yongqiang Tian, Xuan Xie, Shing-Chi Cheung, and Zhendong Su. Exploring and exploiting the correlations between bug-inducing and bug-fixing commits. InProceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, pages 326–337, 2019

2019

[21] [21]

Heterogeneous graph attention network

Xiao Wang, Houye Ji, Chuan Shi, Bai Wang, Yanfang Ye, Peng Cui, and Philip S Yu. Heterogeneous graph attention network. InThe world wide web conference, pages 2022–2032, 2019

2022

[22] [22]

Semi-Supervised Classification with Graph Convolutional Networks

Thomas N Kipf and Max Welling. Semi-supervised classification with graph convolutional networks.arXiv preprint arXiv:1609.02907, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[23] [23]

Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

Curtis Northcutt, Lu Jiang, and Isaac Chuang. Confident learning: Estimating uncertainty in dataset labels.Journal of Artificial Intelligence Research, 70:1373–1411, 2021

2021

[24] [24]

https://github.com/Vandbs/BIC-Hunter

Our project package. https://github.com/Vandbs/BIC-Hunter

[25] [25]

Predicting risk of software changes.Bell Labs Technical Journal, 5(2):169–180, 2000

Audris Mockus and David M Weiss. Predicting risk of software changes.Bell Labs Technical Journal, 5(2):169–180, 2000

2000

[26] [26]

Class noise vs

Xingquan Zhu and Xindong Wu. Class noise vs. attribute noise: A quantitative study.Artificial intelligence review, 22:177–210, 2004

2004

[27] [27]

Hipikat: Recommending pertinent software development artifacts

Davor Cubranic and Gail C Murphy. Hipikat: Recommending pertinent software development artifacts. In25th International Conference on Software Engineering, 2003. Proceedings., pages 408–418. IEEE, 2003

2003

[28] [28]

Analyzing and relating bug report data for feature tracking

Michael Fischer, Martin Pinzger, and Harald Gall. Analyzing and relating bug report data for feature tracking. In WCRE, volume 3, page 90, 2003

2003

[29] [29]

A framework for evaluating the results of the szz approach for identifying bug-introducing changes.IEEE Transactions on Software Engineering, 43(7):641–657, 2016

Daniel Alencar Da Costa, Shane McIntosh, Weiyi Shang, Uirá Kulesza, Roberta Coelho, and Ahmed E Hassan. A framework for evaluating the results of the szz approach for identifying bug-introducing changes.IEEE Transactions on Software Engineering, 43(7):641–657, 2016

2016

[30] [30]

Identifying bug-inducing changes for code additions

Emre Sahal and Ayse Tosun. Identifying bug-inducing changes for code additions. InProceedings of the 12th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement, pages 1–2, 2018

2018

[31] [31]

V-szz: automatic identification of version ranges affected by cve vulnerabilities

Lingfeng Bao, Xin Xia, Ahmed E Hassan, and Xiaohu Yang. V-szz: automatic identification of version ranges affected by cve vulnerabilities. InProceedings of the 44th International Conference on Software Engineering, pages 2352–2364, 2022

2022

[32] [32]

Pr-szz: How pull requests can support the tracing of defects in software repositories

Peter Bludau and Alexander Pretschner. Pr-szz: How pull requests can support the tracing of defects in software repositories. In2022 IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER), pages 1–12. IEEE, 2022

2022

[33] [33]

Identifying root cause of bugs by capturing changed code lines with relational graph neural networks

Jiaqi Zhang, Shikai Guo, Hui Li, Chenchen Li, Yu Chai, and Rong Chen. Identifying root cause of bugs by capturing changed code lines with relational graph neural networks. https://arxiv.org/abs/2505.00990,

work page arXiv

[34] [34]

Detecting the root cause code lines in bug-fixing commits by heterogeneous graph learning

Liguo Ji, Shikai Guo, Lehuan Zhang, Hui Li, Yu Chai, and Rong Chen. Detecting the root cause code lines in bug-fixing commits by heterogeneous graph learning. http://export.arxiv.org/abs/2505.01022

work page arXiv

[35] [35]

Getting defect prediction into industrial practice: the elff tool

David Bowes, Steve Counsell, Tracy Hall, Jean Petric, and Thomas Shippey. Getting defect prediction into industrial practice: the elff tool. In2017 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW), pages 44–47. IEEE, 2017

2017

[36] [36]

CodeBERT: A Pre-Trained Model for Programming and Natural Languages

Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, et al. Codebert: A pre-trained model for programming and natural languages.arXiv preprint arXiv:2002.08155, 2020. ACM Trans. Softw. Eng. Methodol., Vol. 1, No. 2, Article 3. Publication date: May 2025. Confident Learning-based Network for...

work page internal anchor Pith review Pith/arXiv arXiv 2002

[37] [37]

From ranknet to lambdarank to lambdamart: An overview.Learning, 11(23-581):81, 2010

Christopher JC Burges. From ranknet to lambdarank to lambdamart: An overview.Learning, 11(23-581):81, 2010

2010

[38] [38]

Yahoo! learning to rank challenge overview

Olivier Chapelle and Yi Chang. Yahoo! learning to rank challenge overview. InProceedings of the learning to rank challenge, pages 1–24. PMLR, 2011

2011

[39] [39]

Adapting deep ranknet for personalized search

Yang Song, Hongning Wang, and Xiaodong He. Adapting deep ranknet for personalized search. InProceedings of the 7th ACM international conference on Web search and data mining, pages 83–92, 2014

2014

[40] [40]

On application of learning to rank for e-commerce search

Shubhra Kanti Karmaker Santu, Parikshit Sondhi, and ChengXiang Zhai. On application of learning to rank for e-commerce search. InProceedings of the 40th international ACM SIGIR conference on research and development in information retrieval, pages 475–484, 2017

2017

[41] [41]

Regminer: towards constructing a large regression dataset from code evolution history

Xuezhi Song, Yun Lin, Siang Hwee Ng, Yijian Wu, Xin Peng, Jin Song Dong, and Hong Mei. Regminer: towards constructing a large regression dataset from code evolution history. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, pages 314–326, 2022

2022

[42] [42]

Radial basis functions.Acta numerica, 9:1–38, 2000

Martin Dietrich Buhmann. Radial basis functions.Acta numerica, 9:1–38, 2000

2000

[43] [43]

A weighted gcn with logical adjacency matrix for relation extraction

Li Zhou, Tingyu Wang, Hong Qu, Li Huang, and Yuguo Liu. A weighted gcn with logical adjacency matrix for relation extraction. InECAI 2020, pages 2314–2321. IOS Press, 2020

2020

[44] [44]

Adam: A Method for Stochastic Optimization

Diederik P Kingma. Adam: A method for stochastic optimization.arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[45] [45]

Isolation forest

Fei Tony Liu, Kai Ming Ting, and Zhi-Hua Zhou. Isolation forest. In2008 eighth ieee international conference on data mining, pages 413–422. IEEE, 2008

2008

[46] [46]

Addressing the curse of imbalanced training sets: one-sided selection

Miroslav Kubat, Stan Matwin, et al. Addressing the curse of imbalanced training sets: one-sided selection. InIcml, volume 97, page 179. Citeseer, 1997

1997

[47] [47]

Dealing with noise in defect prediction

Sunghun Kim, Hongyu Zhang, Rongxin Wu, and Liang Gong. Dealing with noise in defect prediction. InProceedings of the 33rd International Conference on Software Engineering, pages 481–490, 2011

2011

[48] [48]

Random space division sampling for label-noisy classification or imbalanced classification.IEEE Transactions on Cybernetics, 52(10):10444–10457, 2021

Shuyin Xia, Yong Zheng, Guoyin Wang, Ping He, Heng Li, and Zizhong Chen. Random space division sampling for label-noisy classification or imbalanced classification.IEEE Transactions on Cybernetics, 52(10):10444–10457, 2021

2021

[49] [49]

An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification

Washington Cunha, Celso França, Guilherme Fonseca, Leonardo Rocha, and Marcos André Gonçalves. An effective, efficient, and scalable confidence-based instance selection framework for transformer-based text classification. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pages 665–674, 2023

2023

[50] [50]

Granular ball sampling for noisy label classification or imbalanced classification.IEEE Transactions on Neural Networks and Learning Systems, 34(4):2144–2155, 2021

Shuyin Xia, Shaoyuan Zheng, Guoyin Wang, Xinbo Gao, and Binggui Wang. Granular ball sampling for noisy label classification or imbalanced classification.IEEE Transactions on Neural Networks and Learning Systems, 34(4):2144–2155, 2021

2021

[51] [51]

Deepergcn: All you need to train deeper gcns.arXiv preprint arXiv:2006.07739, 2020

Guohao Li, Chenxin Xiong, Ali Thabet, and Bernard Ghanem. Deepergcn: All you need to train deeper gcns.arXiv preprint arXiv:2006.07739, 2020

work page arXiv 2006

[52] [52]

Graph Attention Networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Lio, and Yoshua Bengio. Graph attention networks.arXiv preprint arXiv:1710.10903, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[53] [53]

Relational Graph Attention Networks

Dan Busbridge, Dane Sherburn, Pietro Cavallo, and Nils Y Hammerla. Relational graph attention networks.arXiv preprint arXiv:1904.05811, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1904

[54] [54]

Topology Adaptive Graph Convolutional Networks

Jian Du, Shanghang Zhang, Guanhang Wu, José MF Moura, and Soummya Kar. Topology adaptive graph convolutional networks.arXiv preprint arXiv:1710.10370, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[55] [55]

Modeling relational data with graph convolutional networks

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. InThe semantic web: 15th international conference, ESWC 2018, Heraklion, Crete, Greece, June 3–7, 2018, proceedings 15, pages 593–607. Springer, 2018. ACM Trans. Softw. Eng. Methodol., Vol. 1, No....

2018