Reducing Labeling Effort in Architecture Technical Debt Detection through Active Learning and Explainable AI
read the original abstract
Self-Admitted Technical Debt (SATD) refers to technical compromises explicitly admitted by developers in natural language artifacts, such as code comments, commit messages, and issue trackers. Among its types, Architecture Technical Debt (ATD) is particularly difficult to detect due to its abstract and context-dependent nature. Manual annotation of ATD is costly, time-consuming, and challenging to scale. To reduce labeling effort, this study combines keyword-based filtering, active learning, and explainable AI for ATD detection. We refined an existing dataset of ATD-related Jira issues to obtain an expert-validated seed set used to extract representative keywords. These keywords were then applied to identify more than 103k candidate issues across 10 open-source projects. To assess the reliability of keyword-based filtering, we qualitatively evaluated a statistically representative sample of labeled issues. Building on the resulting dataset, we applied active learning with multiple query strategies to prioritize informative samples for annotation. The results show that Breaking Ties achieved the best performance, with an F1-score of 0.72 and a 49% reduction in annotation effort. To improve transparency, we used SHAP and LIME to explain ATD classification results. Expert evaluation showed that both methods provided useful explanations, with LIME generally preferred for its clarity and ease of use.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
The Dangers of Non-Self-Fixed Architecture Technical Debt and Its Impact on Time-to-Fix
Non-self-fixed architectural technical debt persists longer than self-fixed debt in Apache projects, with repayment speed linked to the spread of changes across developers.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.