pith. sign in

arxiv: 2508.16860 · v2 · pith:ZCPZR74Enew · submitted 2025-08-23 · 💻 cs.SE · cs.AI· cs.LG

TriagerX: Dual Transformers for Bug Triaging Tasks with Content and Interaction Based Rankings

Pith reviewed 2026-05-18 21:46 UTC · model grok-4.3

classification 💻 cs.SE cs.AIcs.LG
keywords bug triagingtransformer modelsdeveloper recommendationpretrained language modelsinteraction historysoftware engineeringrecommendation systems
0
0 comments X

The pith

TriagerX improves bug triaging by using two transformers for content-based developer rankings and then refining those rankings with historical interaction data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TriagerX as a way to fix shortcomings in pretrained language models for recommending developers to fix bugs. Single transformer models can focus on irrelevant parts of a bug report, and they ignore how developers have interacted with similar bugs in the past. TriagerX runs two transformers in parallel, takes recommendations from the last three layers of each to build a stronger content ranking, and then adjusts that ranking using an interaction-based method drawn from fixed bugs. Tested on five datasets, the system beats nine other transformer approaches and often raises Top-1 and Top-3 accuracy by more than 10 percent. It was also deployed at a large company for both developer and component recommendations.

Core claim

TriagerX is a dual-transformer architecture for bug triaging that generates a robust content-based ranking by collecting recommendations from the last three layers of each of two transformers, then refines this ranking with an interaction-based methodology considering historical developer interactions with similar fixed bugs, leading to superior performance over single-transformer baselines.

What carries the argument

Dual-transformer architecture with last-three-layer aggregation for content ranking, followed by interaction-based refinement that scores developers according to their history with similar fixed bugs.

If this is right

  • TriagerX raises Top-1 and Top-3 developer recommendation accuracy by more than 10 percent on five datasets compared with nine prior transformer methods.
  • The same model delivers up to 10 percent gains for component recommendations and 54 percent gains for developer recommendations on an industrial dataset.
  • The system was successfully deployed in a large partner’s development environment where both developer and component suggestions are required.
  • Recommendations remain useful even when teams change or developers leave, because components act as proxies for team assignments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same dual-plus-interaction pattern could be tested on related tasks such as recommending code reviewers or assigning issues to teams.
  • Adding richer interaction signals, for example commit patterns or discussion threads, might strengthen the refinement stage further.
  • Performance is likely highest in projects that already contain many resolved bugs, since the interaction ranking depends on historical examples.

Load-bearing premise

The last three layers from each of the two transformers supply independent and complementary signals, and past interactions with similar fixed bugs supply reliable refinement signals that hold up outside the training data.

What would settle it

On a new bug dataset, if stripping away the interaction-based refinement step leaves Top-1 and Top-3 accuracy no better than the strongest single-transformer baseline, the added value of the dual-plus-interaction design would be falsified.

Figures

Figures reproduced from arXiv: 2508.16860 by Gias Uddin, Lan Xia, Longyu Zhang, Md Afif Al Mamun.

Figure 1
Figure 1. Figure 1: TriagerX updated ranking with IBR for Listing 1. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: CBR with 2 embedding modules and 3 classifiers. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: CNNs capture local patterns and hierarchical features [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: TriagerX complete framework with both CBR and IBR. [PITH_FULL_IMAGE:figures/full_fig_p004_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Top-1 test accuracy of TriagerX CBR model compared [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Class-wise Top-1 accuracy on the OpenJ9 dataset. [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Impact of the number of classifiers on Top-k accuracy [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Effect of similarity threshold over Top-1 accuracy. [PITH_FULL_IMAGE:figures/full_fig_p008_11.png] view at source ↗
Figure 14
Figure 14. Figure 14: Runtime GPU usage, parameters, inference time, and [PITH_FULL_IMAGE:figures/full_fig_p009_14.png] view at source ↗
read the original abstract

Pretrained Language Models or PLMs are transformer-based architectures that can be used in bug triaging tasks. PLMs can better capture token semantics than traditional Machine Learning (ML) models that rely on statistical features (e.g., TF-IDF, bag of words). However, PLMs may still attend to less relevant tokens in a bug report, which can impact their effectiveness. In addition, the model can be sub-optimal with its recommendations when the interaction history of developers around similar bugs is not taken into account. We designed TriagerX to address these limitations. First, to assess token semantics more reliably, we leverage a dual-transformer architecture. Unlike current state-of-the-art (SOTA) baselines that employ a single transformer architecture, TriagerX collects recommendations from two transformers with each offering recommendations via its last three layers. This setup generates a robust content-based ranking of candidate developers. TriagerX then refines this ranking by employing a novel interaction-based ranking methodology, which considers developers' historical interactions with similar fixed bugs. Across five datasets, TriagerX surpasses all nine transformer-based methods, including SOTA baselines, often improving Top-1 and Top-3 developer recommendation accuracy by over 10%. We worked with our large industry partner to successfully deploy TriagerX in their development environment. The partner required both developer and component recommendations, with components acting as proxies for team assignments-particularly useful in cases of developer turnover or team changes. We trained TriagerX on the partner's dataset for both tasks, and it outperformed SOTA baselines by up to 10% for component recommendations and 54% for developer recommendations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces TriagerX, a dual-transformer architecture for bug triaging that produces content-based developer rankings by aggregating recommendations from the last three layers of each of two transformers, then refines the ranking via a novel interaction-based component that incorporates developers' historical interactions with similar fixed bugs. It reports consistent outperformance over nine transformer-based baselines (including SOTA methods) on five datasets, with Top-1 and Top-3 accuracy gains often exceeding 10%, and describes successful deployment at an industry partner for both developer and component recommendations.

Significance. If the reported gains prove robust, the work would advance software engineering research on PLM-based recommendations by showing how dual content encoders can be combined with interaction history to address token-attention limitations and improve practical triaging. The industrial deployment for component recommendations (as proxies for team assignments) adds applied value, particularly for handling developer turnover.

major comments (3)
  1. [§3.2] §3.2 (Dual-Transformer Architecture): The decision to aggregate only the last three layers from each transformer for the content ranking lacks motivation, ablation, or comparison to other layer selections. This choice is load-bearing for the claim of 'robust content-based ranking' because the complementarity of the two transformers rests on it; without evidence that three layers are optimal or that other depths yield inferior results, the architectural novelty is difficult to assess.
  2. [§4.3] §4.3 (Evaluation and Interaction Ranking): The manuscript does not specify whether dataset splits are temporal (to prevent leakage) or random, nor does it detail the similarity function used to retrieve 'similar fixed bugs' for the interaction-based refinement. If similarity is computed from the same content embeddings as the primary ranking or if splits permit future data to influence training, the >10% gains on Top-1/Top-3 metrics could be inflated rather than reflecting genuine complementarity of the interaction component.
  3. [Results] Results tables (e.g., Table 2 or equivalent): No statistical significance tests (paired t-test, McNemar, or bootstrap) or details on hyper-parameter tuning and validation folds are reported. Given that the central claim is consistent outperformance across five datasets, the absence of these controls makes it impossible to rule out post-hoc selection or overfitting as explanations for the observed margins.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'often improving ... by over 10%' is vague; specifying the exact datasets, metrics, and range of improvements (or providing a summary table reference) would improve clarity for readers.
  2. [§2] §2 (Related Work): Several citations to recent PLM papers could be supplemented with foundational bug-triaging references (e.g., earlier ML-based assignee recommendation studies) to better situate the novelty of the dual + interaction design.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment point by point below, indicating revisions where the manuscript will be updated to improve clarity and rigor.

read point-by-point responses
  1. Referee: [§3.2] §3.2 (Dual-Transformer Architecture): The decision to aggregate only the last three layers from each transformer for the content ranking lacks motivation, ablation, or comparison to other layer selections. This choice is load-bearing for the claim of 'robust content-based ranking' because the complementarity of the two transformers rests on it; without evidence that three layers are optimal or that other depths yield inferior results, the architectural novelty is difficult to assess.

    Authors: The last three layers were selected because deeper transformer layers encode higher-level semantic representations suited to bug report semantics, while avoiding dilution from shallower syntactic features. We acknowledge the absence of supporting ablation. In the revision we will add an ablation study comparing last-1, last-2, last-3, and last-4 layers on all five datasets, demonstrating that three layers yield the strongest and most stable content rankings. revision: yes

  2. Referee: [§4.3] §4.3 (Evaluation and Interaction Ranking): The manuscript does not specify whether dataset splits are temporal (to prevent leakage) or random, nor does it detail the similarity function used to retrieve 'similar fixed bugs' for the interaction-based refinement. If similarity is computed from the same content embeddings as the primary ranking or if splits permit future data to influence training, the >10% gains on Top-1/Top-3 metrics could be inflated rather than reflecting genuine complementarity of the interaction component.

    Authors: All splits are strictly temporal (training on bugs reported before a cutoff date, testing on later bugs) to eliminate leakage. The similarity function for retrieving similar fixed bugs combines BM25 textual similarity with metadata overlap (component, priority, severity) and is computed independently of the dual-transformer content embeddings. We will document these choices explicitly in the revised §4.3 to confirm the interaction component supplies orthogonal signal. revision: yes

  3. Referee: [Results] Results tables (e.g., Table 2 or equivalent): No statistical significance tests (paired t-test, McNemar, or bootstrap) or details on hyper-parameter tuning and validation folds are reported. Given that the central claim is consistent outperformance across five datasets, the absence of these controls makes it impossible to rule out post-hoc selection or overfitting as explanations for the observed margins.

    Authors: Hyperparameters were selected via grid search with 5-fold cross-validation performed inside the training set of each dataset. We will add paired t-tests and McNemar tests on Top-1/Top-3 accuracy across repeated runs in the revised results section, reporting p-values to establish statistical significance of the observed margins. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on external benchmarks

full rationale

The paper introduces TriagerX as a dual-transformer model for content-based developer ranking followed by an interaction-based refinement step using historical bug interactions. Its central claims consist of measured accuracy improvements (Top-1/Top-3 gains >10% across five datasets versus nine baselines). These are obtained by training on one data partition and evaluating on held-out test reports, not by any algebraic reduction, fitted parameter renamed as prediction, or self-referential definition. No equations, uniqueness theorems, or ansatzes are presented that would make the reported rankings equivalent to the model's own inputs by construction. Design choices such as extracting signals from the last three layers of each transformer are architectural decisions justified by the goal of capturing token semantics, not circularly defined quantities. The results therefore remain self-contained against external baselines and do not trigger any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The approach relies on standard transformer assumptions plus the empirical claim that last-three-layer outputs from two models plus interaction counts are sufficient signals. No new physical or mathematical entities are postulated.

free parameters (2)
  • choice of last three layers
    The decision to use exactly the final three layers of each transformer is a modeling choice that affects the content ranking.
  • interaction history weighting
    How much the interaction-based ranking adjusts the content ranking is controlled by an implicit or explicit weighting parameter.
axioms (2)
  • domain assumption Transformer models capture token semantics better than TF-IDF or bag-of-words features.
    Stated in the abstract as the motivation for using PLMs.
  • domain assumption Historical developer interactions with similar fixed bugs are predictive of future assignments.
    Core premise of the interaction-based ranking stage.

pith-pipeline@v0.9.0 · 5843 in / 1462 out tokens · 37190 ms · 2026-05-18T21:46:50.384434+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

71 extracted references · 71 canonical work pages · 9 internal anchors

  1. [1]

    Automatic bug triage using text categorization,

    D. Cubranic and G. C. Murphy, “Automatic bug triage using text categorization,” in International Conference on Software Engineering and Knowledge Engineering , 2004

  2. [2]

    Who should fix this bug?

    J. Anvik, L. Hiew, and G. C. Murphy, “Who should fix this bug?” in Proceedings of the 28th International Conference on Software Engineering, ser. ICSE ’06. New York, NY , USA: Association for Computing Machinery, 2006, p. 361–370

  3. [3]

    Reducing the effort of bug report triage: Recommenders for development-oriented deci- sions,

    J. Anvik and G. C. Murphy, “Reducing the effort of bug report triage: Recommenders for development-oriented deci- sions,” ACM Trans. Softw. Eng. Methodol. , vol. 20, no. 3, aug 2011. TABLE XI: Top-k accuracy of all considered baselines on different datasets compared to TriagerX CBR. Dataset Method Top-1 Top-3 Top-5 Top-10 Top-20 GoogleChromium TriagerX CB...

  4. [4]

    Ap- plying deep learning based automatic bug triager to industrial projects,

    S.-R. Lee, M.-J. Heo, C.-G. Lee, M. Kim, and G. Jeong, “Ap- plying deep learning based automatic bug triager to industrial projects,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering , ser. ESEC/FSE 2017. New York, NY , USA: Association for Computing Machinery, 2017, p. 926–931

  5. [5]

    DeepTriage: Exploring the Effectiveness of Deep Learning for Bug Triaging

    S. Mani, A. Sankaran, and R. Aralikatte, “Deeptriage: Exploring the effectiveness of deep learning for bug triaging,” CoRR, vol. abs/1801.01275, 2018

  6. [6]

    Applying convolutional neural networks with different word representation techniques to recommend bug fixers,

    S. F. A. Zaidi, F. M. Awan, M. Lee, H. Woo, and C.-G. Lee, “Applying convolutional neural networks with different word representation techniques to recommend bug fixers,”IEEE Access, vol. 8, pp. 213 729–213 747, 2020

  7. [7]

    A light bug triage framework for applying large pre-trained language model,

    J. Lee, K. Han, and H. Yu, “A light bug triage framework for applying large pre-trained language model,” in Proceedings of the 37th IEEE/ACM International Conference on Automated Software Engineering , ser. ASE ’22. New York, NY , USA: Association for Computing Machinery, 2023

  8. [8]

    Improving bug triaging with high confidence predictions at ericsson,

    A. Sarkar, P. C. Rigby, and B. Bartalos, “Improving bug triaging with high confidence predictions at ericsson,” in 2019 IEEE International Conference on Software Maintenance and Evolution (ICSME), 2019, pp. 81–91

  9. [9]

    A comparative study of transformer-based neural text representation techniques on bug triaging,

    A. K. Dipongkor and K. Moran, “A comparative study of transformer-based neural text representation techniques on bug triaging,” in 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE) , 2023, pp. 1012–1023

  10. [10]

    BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

    J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” CoRR, vol. abs/1810.04805, 2018

  11. [11]

    Deep contextualized word rep- resentations,

    M. E. Peters, M. Neumann, M. Iyyer, M. Gardner, C. Clark, K. Lee, and L. Zettlemoyer, “Deep contextualized word rep- resentations,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), M. Walker, H. Ji, and A. Stent, Eds. New Or- leans, Lo...

  12. [12]

    Efficient Estimation of Word Representations in Vector Space

    T. Mikolov, “Efficient estimation of word representations in vector space,” arXiv preprint arXiv:1301.3781, vol. 3781, 2013

  13. [13]

    Distilling the knowledge in a neural network,

    G. Hinton, O. Vinyals, and J. Dean, “Distilling the knowledge in a neural network,” 2015

  14. [14]

    Does knowledge distillation really work?

    S. D. Stanton, P. Izmailov, P. Kirichenko, A. A. Alemi, and A. G. Wilson, “Does knowledge distillation really work?” inAd- vances in Neural Information Processing Systems , A. Beygelz- imer, Y . Dauphin, P. Liang, and J. W. Vaughan, Eds., 2021

  15. [15]

    Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference

    R. T. McCoy, E. Pavlick, and T. Linzen, “Right for the wrong reasons: Diagnosing syntactic heuristics in natural language inference,” arXiv preprint arXiv:1902.01007 , 2019

  16. [16]

    Effects of human adversarial and affable samples on bert generalization,

    A. Elangovan, J. He, Y . Li, and K. Verspoor, “Effects of human adversarial and affable samples on bert generalization,” arXiv preprint arXiv:2310.08008, 2023

  17. [17]

    Ensemble learning for heterogeneous large language models with deep parallel collaboration,

    Y . Huang, X. Feng, B. Li, Y . Xiang, H. Wang, T. Liu, and B. Qin, “Ensemble learning for heterogeneous large language models with deep parallel collaboration,” Advances in Neural Information Processing Systems , vol. 37, pp. 119 838–119 860, 2024

  18. [18]

    An ensemble method for bug triaging using large language models,

    A. Kumar Dipongkor, “An ensemble method for bug triaging using large language models,” in Proceedings of the 2024 IEEE/ACM 46th International Conference on Software Engi- neering: Companion Proceedings , 2024, pp. 438–440

  19. [19]

    When does diversity help generalization in classification ensembles?

    Y . Bian and H. Chen, “When does diversity help generalization in classification ensembles?” IEEE Transactions on Cybernet- ics, vol. 52, no. 9, pp. 9059–9075, 2022

  20. [20]

    Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,

    L. I. Kuncheva and C. J. Whitaker, “Measures of diversity in classifier ensembles and their relationship with the ensemble accuracy,”Mach. Learn., vol. 51, no. 2, p. 181–207, may 2003

  21. [21]

    Utilizing a multi-developer network-based developer recommendation algorithm to fix bugs effectively,

    G. Yang, T. Zhang, and B. Lee, “Utilizing a multi-developer network-based developer recommendation algorithm to fix bugs effectively,” inProceedings of the 29th Annual ACM Symposium on Applied Computing , ser. SAC ’14. New York, NY , USA: Association for Computing Machinery, 2014, p. 1134–1139

  22. [22]

    Sentimental analysis of movie reviews using soft voting ensemble-based machine learning,

    A. Athar, S. Ali, M. M. Sheeraz, S. Bhattachariee, and H.- C. Kim, “Sentimental analysis of movie reviews using soft voting ensemble-based machine learning,” in 2021 Eighth Inter- national Conference on Social Network Analysis, Management and Security (SNAMS) , 2021, pp. 01–05

  23. [23]

    An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,

    S. Kumari, D. Kumar, and M. Mittal, “An ensemble approach for classification and prediction of diabetes mellitus using soft voting classifier,”International Journal of Cognitive Computing in Engineering, vol. 2, pp. 40–46, 2021

  24. [24]

    Catastrophic forgetting in connectionist net- works,

    R. M. French, “Catastrophic forgetting in connectionist net- works,” Trends in cognitive sciences, vol. 3, no. 4, pp. 128–135, 1999

  25. [25]

    Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

    S. Ioffe and C. Szegedy, “Batch normalization: Accelerating deep network training by reducing internal covariate shift,” CoRR, vol. abs/1502.03167, 2015

  26. [26]

    Dropout: A simple way to prevent neural networks from overfitting,

    N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: A simple way to prevent neural networks from overfitting,” Journal of Machine Learning Re- search, vol. 15, no. 56, pp. 1929–1958, 2014

  27. [27]

    Sentence-bert: Sentence em- beddings using siamese bert-networks,

    N. Reimers and I. Gurevych, “Sentence-bert: Sentence em- beddings using siamese bert-networks,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11 2019

  28. [28]

    Time weight collaborative filtering,

    Y . Ding and X. Li, “Time weight collaborative filtering,” in Proceedings of the 14th ACM international conference on Information and knowledge management , 2005, pp. 485–492

  29. [29]

    Performance evaluation of time-based recommendation system in collaborative filtering technique,

    G. Jain, T. Mahara, and S. Sharma, “Performance evaluation of time-based recommendation system in collaborative filtering technique,” Procedia Computer Science , vol. 218, pp. 1834– 1844, 2023

  30. [30]

    Cost- aware triage ranking algorithms for bug reporting systems,

    J.-w. Park, M.-W. Lee, J. Kim, S.-w. Hwang, and S. Kim, “Cost- aware triage ranking algorithms for bug reporting systems,” Knowledge and Information Systems , vol. 48, pp. 679–705, 2016

  31. [31]

    Adptriage: Approximate dynamic programming for bug triage,

    H. Jahanshahi, M. Cevik, K. Mousavi, and A. Bas ¸ar, “Adptriage: Approximate dynamic programming for bug triage,” IEEE Trans. Softw. Eng., vol. 49, no. 10, p. 4594–4609, Aug. 2023

  32. [32]

    Decoupled Weight Decay Regularization

    I. Loshchilov and F. Hutter, “Fixing weight decay regularization in adam,” CoRR, vol. abs/1711.05101, 2017

  33. [33]

    MPNet: Masked And Permuted Pre-Training For Language Understanding,

    K. Song, X. Tan, T. Qin, J. Lu, and T. Liu, “Mpnet: Masked and permuted pre-training for language understanding,” CoRR, vol. abs/2004.09297, 2020

  34. [34]

    Multiple word embeddings for increased diver- sity of representation,

    B. Lester, D. Pressel, A. Hemmeter, S. R. Choudhury, and S. Bangalore, “Multiple word embeddings for increased diver- sity of representation,” arXiv preprint arXiv:2009.14394, 2020

  35. [35]

    Comparison and combination of sentence embeddings derived from different supervision signals,

    H. Tsukagoshi, R. Sasano, and K. Takeda, “Comparison and combination of sentence embeddings derived from different supervision signals,” in Proceedings of the 11th Joint Con- ference on Lexical and Computational Semantics , V . Nastase, E. Pavlick, M. T. Pilehvar, J. Camacho-Collados, and A. Ra- ganato, Eds. Seattle, Washington: Association for Computa- t...

  36. [36]

    Explaining adaboost,

    R. E. Schapire, “Explaining adaboost,” in Empirical inference: festschrift in honor of vladimir N. Vapnik . Springer, 2013, pp. 37–52

  37. [37]

    Smote: synthetic minority over-sampling technique,

    N. V . Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” J. Artif. Int. Res., vol. 16, no. 1, p. 321–357, jun 2002

  38. [38]

    Focal loss for dense object detection,

    T.-Y . Lin, P. Goyal, R. Girshick, K. He, and P. Doll ´ar, “Focal loss for dense object detection,” in Proceedings of the IEEE international conference on computer vision , 2017, pp. 2980– 2988

  39. [39]

    Developer activity motivated bug triaging: Via convolutional neural network,

    S. Guo, X. Zhang, X. Yang, R. Chen, C. Guo, H. Li, and T. Li, “Developer activity motivated bug triaging: Via convolutional neural network,” Neural Process. Lett. , vol. 51, no. 3, p. 2589–2606, jun 2020

  40. [40]

    Efficient bug triage for industrial environments,

    W. Zhang, “Efficient bug triage for industrial environments,” in 2020 IEEE International Conference on Software Maintenance and Evolution (ICSME) , 2020, pp. 727–735

  41. [41]

    Automatic software bug triage system (bts) based on latent semantic indexing and support vector machine,

    S. N. Ahsan, J. Ferzund, and F. Wotawa, “Automatic software bug triage system (bts) based on latent semantic indexing and support vector machine,” in 2009 Fourth International Conference on Software Engineering Advances , 2009, pp. 216– 221

  42. [42]

    Bug prioritization to facilitate bug report triage,

    J. Kanwal and O. Maqbool, “Bug prioritization to facilitate bug report triage,” Journal of Computer Science and Technology , vol. 27, pp. 397 – 412, 2012

  43. [43]

    Automated bug triaging in an industrial context,

    V . Ded´ık and B. Rossi, “Automated bug triaging in an industrial context,” in 2016 42th Euromicro Conference on Software Engi- neering and Advanced Applications (SEAA), 2016, pp. 363–367

  44. [44]

    Easy over hard: a case study on deep learning,

    W. Fu and T. Menzies, “Easy over hard: a case study on deep learning,” in Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering , ser. ESEC/FSE 2017. New York, NY , USA: Association for Computing Machinery, 2017, p. 49–60

  45. [45]

    Automated bug assignment: Ensemble-based ma- chine learning in large scale industrial contexts,

    L. Jonsson, M. Borg, D. Broman, K. Sandahl, S. Eldh, and P. Runeson, “Automated bug assignment: Ensemble-based ma- chine learning in large scale industrial contexts,” Empirical Softw. Engg., vol. 21, no. 4, p. 1533–1578, aug 2016

  46. [46]

    Triaging incoming change requests: Bug or commit history, or code authorship?

    M. Linares-V ´asquez, K. Hossen, H. Dang, H. Kagdi, M. Geth- ers, and D. Poshyvanyk, “Triaging incoming change requests: Bug or commit history, or code authorship?” in 2012 28th IEEE International Conference on Software Maintenance (ICSM) , 2012, pp. 451–460

  47. [47]

    Why so complicated? simple term filtering and weighting for location- based bug report assignment recommendation,

    R. Shokripour, J. Anvik, Z. M. Kasirun, and S. Zamani, “Why so complicated? simple term filtering and weighting for location- based bug report assignment recommendation,” in 2013 10th Working Conference on Mining Software Repositories (MSR) , 2013, pp. 2–11

  48. [48]

    Automatic bug triage using hierarchical attention networks,

    H. He and S. Yang, “Automatic bug triage using hierarchical attention networks,” in 2021 IEEE 21st International Confer- ence on Software Quality, Reliability and Security Companion 14 (QRS-C), 2021, pp. 1043–1049

  49. [49]

    A multi-label, dual-output deep neural network for automated bug triaging,

    C. A. Choquette-Choo, D. Sheldon, J. Proppe, J. Alphonso- Gibbs, and H. Gupta, “A multi-label, dual-output deep neural network for automated bug triaging,” in 2019 18th IEEE Inter- national Conference On Machine Learning And Applications (ICMLA), 2019, pp. 937–944

  50. [50]

    Fuzzy set and cache-based approach for bug triaging,

    A. Tamrawi, T. T. Nguyen, J. M. Al-Kofahi, and T. N. Nguyen, “Fuzzy set and cache-based approach for bug triaging,” in Pro- ceedings of the 19th ACM SIGSOFT Symposium and the 13th European Conference on Foundations of Software Engineering , ser. ESEC/FSE ’11. New York, NY , USA: Association for Computing Machinery, 2011, p. 365–375

  51. [51]

    Effective bug triage based on historical bug-fix information,

    H. Hu, H. Zhang, J. Xuan, and W. Sun, “Effective bug triage based on historical bug-fix information,” in 2014 IEEE 25th International Symposium on Software Reliability Engineering , 2014, pp. 122–132

  52. [52]

    Learning to rank for bug report assignee recommendation,

    Y . Tian, D. Wijedasa, D. Lo, and C. Le Goues, “Learning to rank for bug report assignee recommendation,” in 2016 IEEE 24th International Conference on Program Comprehension (ICPC) , 2016, pp. 1–10

  53. [53]

    Latent dirichlet allocation,

    D. M. Blei, A. Y . Ng, and M. I. Jordan, “Latent dirichlet allocation,” J. Mach. Learn. Res. , vol. 3, no. null, p. 993–1022, mar 2003

  54. [54]

    Dretom: developer recommendation based on topic models for bug resolution,

    X. Xie, W. Zhang, Y . Yang, and Q. Wang, “Dretom: developer recommendation based on topic models for bug resolution,” in Proceedings of the 8th International Conference on Predictive Models in Software Engineering , ser. PROMISE ’12. New York, NY , USA: Association for Computing Machinery, 2012, p. 19–28

  55. [55]

    A novel developer ranking algorithm for automatic bug triage using topic model and developer relations,

    T. Zhang, G. Yang, B. Lee, and E. K. Lua, “A novel developer ranking algorithm for automatic bug triage using topic model and developer relations,” in 2014 21st Asia-Pacific Software Engineering Conference, vol. 1, 2014, pp. 223–230

  56. [56]

    Towards semi-automatic bug triage and severity prediction based on topic model and multi- feature of bug reports,

    G. Yang, T. Zhang, and B. Lee, “Towards semi-automatic bug triage and severity prediction based on topic model and multi- feature of bug reports,” in 2014 IEEE 38th Annual Computer Software and Applications Conference , 2014, pp. 97–106

  57. [57]

    Improving automated bug triaging with specialized topic model,

    X. Xia, D. Lo, Y . Ding, J. M. Al-Kofahi, T. N. Nguyen, and X. Wang, “Improving automated bug triaging with specialized topic model,” IEEE Transactions on Software Engineering , vol. 43, no. 3, pp. 272–297, 2017

  58. [58]

    Developer prioritization in bug repositories,

    J. Xuan, H. Jiang, Z. Ren, and W. Zou, “Developer prioritization in bug repositories,” in 2012 34th International Conference on Software Engineering (ICSE) , 2012, pp. 25–35

  59. [59]

    Heterogeneous network analysis of developer contribution in bug repositories,

    W. Zhang, S. Wang, Y . Yang, and Q. Wang, “Heterogeneous network analysis of developer contribution in bug repositories,” in 2013 International Conference on Cloud and Service Com- puting, 2013, pp. 98–105

  60. [60]

    Improving bug triage with bug tossing graphs,

    G. Jeong, S. Kim, and T. Zimmermann, “Improving bug triage with bug tossing graphs,” in Proceedings of the 7th Joint Meeting of the European Software Engineering Conference and the ACM SIGSOFT Symposium on The Foundations of Software Engineering, ser. ESEC/FSE ’09. New York, NY , USA: Association for Computing Machinery, 2009, p. 111–120

  61. [61]

    Fine-grained incremental learning and multi-feature tossing graphs to improve bug triag- ing,

    P. Bhattacharya and I. Neamtiu, “Fine-grained incremental learning and multi-feature tossing graphs to improve bug triag- ing,” in 2010 IEEE International Conference on Software Main- tenance, 2010, pp. 1–10

  62. [62]

    Bug triaging based on tossing sequence modeling,

    S. Xi, Y . Yao, X. Xiao, F. Xu, and J. Lu, “Bug triaging based on tossing sequence modeling,” Journal of Computer Science and Technology, vol. 34, pp. 942 – 956, 2019

  63. [63]

    Reducing bug triaging confusion by learning from mistakes with a bug tossing knowledge graph,

    Y . Su, Z. Xing, X. Peng, X. Xia, C. Wang, X. Xu, and L. Zhu, “Reducing bug triaging confusion by learning from mistakes with a bug tossing knowledge graph,” in 2021 36th IEEE/ACM International Conference on Automated Software Engineering (ASE), 2021, pp. 191–202

  64. [64]

    Assigning bug reports using a vocabulary-based expertise model of developers,

    D. Matter, A. Kuhn, and O. Nierstrasz, “Assigning bug reports using a vocabulary-based expertise model of developers,” in 2009 6th IEEE International Working Conference on Mining Software Repositories, 2009, pp. 131–140

  65. [65]

    Bug report assignee recommendation using activity profiles,

    H. Naguib, N. Narayan, B. Br ¨ugge, and D. Helal, “Bug report assignee recommendation using activity profiles,” in 2013 10th Working Conference on Mining Software Repositories (MSR) , 2013, pp. 22–30

  66. [66]

    Sustriage: Sustainable bug triage with multi-modal ensemble learning,

    W. Zhang, J. Zhao, and S. Wang, “Sustriage: Sustainable bug triage with multi-modal ensemble learning,” in IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology, ser. WI-IAT ’21. New York, NY , USA: Association for Computing Machinery, 2022, p. 441–448

  67. [67]

    Shallow-deep networks: Understanding and mitigating network overthinking,

    Y . Kaya, S. Hong, and T. Dumitras, “Shallow-deep networks: Understanding and mitigating network overthinking,” in Inter- national conference on machine learning . PMLR, 2019, pp. 3301–3310

  68. [68]

    HuggingFace's Transformers: State-of-the-art Natural Language Processing

    T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, and J. Brew, “Huggingface’s transformers: State-of-the-art natural language processing,” CoRR, vol. abs/1910.03771, 2019

  69. [69]

    RoBERTa: A Robustly Optimized BERT Pretraining Approach

    Y . Liu, M. Ott, N. Goyal, J. Du, M. Joshi, D. Chen, O. Levy, M. Lewis, L. Zettlemoyer, and V . Stoyanov, “Roberta: A robustly optimized BERT pretraining approach,” CoRR, vol. abs/1907.11692, 2019

  70. [70]

    DeBERTa: Decoding-enhanced BERT with Disentangled Attention

    P. He, X. Liu, J. Gao, and W. Chen, “Deberta: Decoding- enhanced BERT with disentangled attention,” CoRR, vol. abs/2006.03654, 2020

  71. [71]

    CodeBERT: A pre- trained model for programming and natural languages,

    Z. Feng, D. Guo, D. Tang, N. Duan, X. Feng, M. Gong, L. Shou, B. Qin, T. Liu, D. Jiang, and M. Zhou, “CodeBERT: A pre- trained model for programming and natural languages,” in Find- ings of the Association for Computational Linguistics: EMNLP 2020, T. Cohn, Y . He, and Y . Liu, Eds. Online: Association for Computational Linguistics, Nov. 2020, pp. 1536–1547