An Improved Quantum Software Challenges Classification Approach using Transfer Learning and Explainable AI

Arif Ali Khan; Javed Ali Khan; Mobashir Husain; Muhammad Azeem Akbar; Muhammad Sohail Khan; Nek Dil Khan; Shahid Hussain

arxiv: 2509.21068 · v1 · submitted 2025-09-25 · 💻 cs.SE

An Improved Quantum Software Challenges Classification Approach using Transfer Learning and Explainable AI

Nek Dil Khan , Javed Ali Khan , Mobashir Husain , Muhammad Sohail Khan , Arif Ali Khan , Muhammad Azeem Akbar , Shahid Hussain This is my paper

Pith reviewed 2026-05-18 13:59 UTC · model grok-4.3

classification 💻 cs.SE

keywords quantum software engineeringchallenge classificationtransfer learningBERTStack Overflowexplainable AIgrounded theorysoftware challenges

0 comments

The pith

Transformer models classify quantum software challenges from Stack Overflow posts at 95 percent accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that fine-tuning pre-trained transformer models on actual developer discussions from quantum-tagged posts sorts those posts into six recurring challenge types with notably higher accuracy than conventional neural networks. A sympathetic reader would care because quantum software developers currently rely on scattered forum threads to navigate tooling problems, conceptual gaps, and error handling; an automated classifier could surface relevant threads faster and reduce repeated questions. The work processes the original 2829 posts without artificial data expansion and adds explanations that show which words steer each prediction. If the approach holds, platform maintainers gain a practical way to tag and group content while researchers obtain a reusable labeled dataset for further study of quantum developer pain points.

Core claim

The authors extract 2829 questions using quantum-related tags, apply content analysis and grounded theory to define six challenge categories, annotate the posts through human review plus ChatGPT validation to create ground truth, and show that fine-tuned BERT and DistilBERT models reach an average 95 percent accuracy in assigning posts to these categories while fine-tuned feedforward, convolutional, and LSTM networks reach 89, 86, and 84 percent; the transformer method gains a six-point edge by operating directly on the unaltered discussions and SHAP explanations reveal the linguistic features that drive each classification.

What carries the argument

Fine-tuned transformer models such as BERT and DistilBERT paired with SHAP value explanations that map word patterns in the posts to one of six categories: Tooling, Theoretical, Learning, Conceptual, Errors, and API Usage.

If this is right

Quantum vendors and forum operators can apply the trained models to automatically organize and surface discussions for quicker developer access.
The six-category taxonomy provides a stable reference frame for tracking which challenges appear most often over time.
SHAP explanations allow maintainers to inspect and adjust the model when particular linguistic cues produce unexpected assignments.
The same transfer-learning pipeline can be reused on other specialized software domains once comparable labeled discussions exist.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Deploying the classifier inside live forums could generate suggested tags or related threads while a question is being written.
Periodic retraining on newer posts would test whether the original six categories continue to capture emerging issues as quantum tools mature.
Comparing predictions against direct developer surveys would measure how well the model aligns with self-reported challenge priorities.

Load-bearing premise

The six categories identified through content analysis and grounded theory together with the human-plus-ChatGPT annotations supply an accurate and unbiased ground-truth labeling of the extracted posts.

What would settle it

A new round of independent annotations by a different set of quantum developers on a held-out collection of posts that produces substantially different category assignments or drops transformer accuracy below 90 percent.

read the original abstract

Quantum Software Engineering (QSE) is a research area practiced by tech firms. Quantum developers face challenges in optimizing quantum computing and QSE concepts. They use Stack Overflow (SO) to discuss challenges and label posts with specialized quantum tags, which often refer to technical aspects rather than developer posts. Categorizing questions based on quantum concepts can help identify frequent QSE challenges. We conducted studies to classify questions into various challenges. We extracted 2829 questions from Q&A platforms using quantum-related tags. Posts were analyzed to identify frequent challenges and develop a novel grounded theory. Challenges include Tooling, Theoretical, Learning, Conceptual, Errors, and API Usage. Through content analysis and grounded theory, discussions were annotated with common challenges to develop a ground truth dataset. ChatGPT validated human annotations and resolved disagreements. Fine-tuned transformer algorithms, including BERT, DistilBERT, and RoBERTa, classified discussions into common challenges. We achieved an average accuracy of 95% with BERT DistilBERT, compared to fine-tuned Deep and Machine Learning (D&ML) classifiers, including Feedforward Neural Networks (FNN), Convolutional Neural Networks (CNN), and Long Short-Term Memory networks (LSTM), which achieved accuracies of 89%, 86%, and 84%, respectively. The Transformer-based approach outperforms the D&ML-based approach with a 6\% increase in accuracy by processing actual discussions, i.e., without data augmentation. We applied SHAP (SHapley Additive exPlanations) for model interpretability, revealing how linguistic features drive predictions and enhancing transparency in classification. These findings can help quantum vendors and forums better organize discussions for improved access and readability. However,empirical evaluation studies with actual developers and vendors are needed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper builds a practical taxonomy of six quantum software challenges from 2829 Stack Overflow posts and shows transformers hitting 95% accuracy, but the labeling lacks agreement metrics.

read the letter

The paper introduces a grounded-theory taxonomy for quantum software engineering challenges and demonstrates that standard transformers can classify developer posts into those categories at 95% accuracy, outperforming some deep learning baselines by 6% on real data without augmentation. They extracted 2829 posts from Stack Overflow using quantum tags. Content analysis and grounded theory produced six categories: Tooling, Theoretical, Learning, Conceptual, Errors, and API Usage. Humans annotated the posts, with ChatGPT helping to validate and resolve disagreements. Fine-tuned BERT, DistilBERT, and RoBERTa then classified the discussions. The best models hit 95% accuracy compared to 89%, 86%, and 84% for FNN, CNN, and LSTM. SHAP explanations highlight the linguistic features driving the predictions. This work does a good job creating a labeled resource for an emerging subfield and applying transfer learning without data augmentation. The focus on real discussions is a reasonable choice. The labeling process is the weakest part. No inter-annotator agreement metrics are given, and there are no details on data splits or statistical significance. Subjective categories like these can have high noise, so the performance numbers need those checks to be convincing. This is for quantum software researchers and forum maintainers who want better ways to tag and surface developer questions. A reader interested in domain-specific classification tasks will get practical value. The paper has enough new data and results to deserve a serious referee. I would recommend sending it to peer review.

Referee Report

3 major / 2 minor

Summary. The manuscript extracts 2829 quantum-tagged posts from Q&A platforms, derives six challenge categories (Tooling, Theoretical, Learning, Conceptual, Errors, API Usage) via content analysis and grounded theory, creates ground-truth labels through human annotation validated and reconciled by ChatGPT, and fine-tunes transformer models (BERT, DistilBERT, RoBERTa) to classify posts. It reports an average accuracy of 95% for the transformer models, a 6% improvement over fine-tuned FNN (89%), CNN (86%), and LSTM (84%) baselines on the unaugmented data, and applies SHAP to interpret linguistic features driving predictions.

Significance. If the reported performance holds under proper validation, the work provides a practical, interpretable tool for organizing developer discussions on quantum software challenges, which could benefit forums and vendors. Credit is due for evaluating on real (non-augmented) data and for incorporating SHAP explanations to enhance transparency. The central performance claim, however, rests on the unverified reliability of the human-plus-ChatGPT labels.

major comments (3)

[Section 3] Section 3 (Annotation and Ground-Truth Creation): No inter-annotator agreement statistics (Cohen’s kappa, Fleiss’ kappa, or raw agreement percentages) or breakdown of disagreement rates resolved by ChatGPT are reported. Because the 95% accuracy and 6% lift are measured against these labels, the absence of agreement metrics directly undermines interpretability of the headline result.
[Section 4] Section 4 (Experimental Setup and Evaluation): The manuscript provides no information on the train/test split ratio, class-balance statistics across the six categories in the 2829-post corpus, or any statistical significance testing (e.g., McNemar’s test or bootstrap confidence intervals) for the reported accuracy differences. These omissions are load-bearing for the claim that transformers outperform the D&ML baselines.
[Section 5] Section 5 (Results): The comparison to FNN, CNN, and LSTM baselines does not specify whether identical hyperparameter search, early-stopping criteria, or data-preprocessing steps were applied, making it difficult to attribute the 6% gap solely to model architecture rather than experimental differences.

minor comments (2)

[Abstract] Abstract: Typo in the final sentence (“However,empirical” should read “However, empirical”).
[Section 3] The description of how the six categories were finalized from the grounded-theory analysis could be clarified with a brief example of a post-to-category mapping.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments point by point below and outline the revisions we will implement to enhance the manuscript's rigor and transparency.

read point-by-point responses

Referee: [Section 3] Section 3 (Annotation and Ground-Truth Creation): No inter-annotator agreement statistics (Cohen’s kappa, Fleiss’ kappa, or raw agreement percentages) or breakdown of disagreement rates resolved by ChatGPT are reported. Because the 95% accuracy and 6% lift are measured against these labels, the absence of agreement metrics directly undermines interpretability of the headline result.

Authors: We acknowledge the importance of reporting inter-annotator agreement to validate the ground-truth labels. Our process involved multiple human annotators performing content analysis, with ChatGPT used to validate annotations and resolve any disagreements. In the revised version, we will include inter-annotator agreement statistics such as Cohen's kappa and Fleiss' kappa where applicable, along with a detailed breakdown of disagreement rates and how they were resolved by ChatGPT. This addition will strengthen the credibility of our labeled dataset. revision: yes
Referee: [Section 4] Section 4 (Experimental Setup and Evaluation): The manuscript provides no information on the train/test split ratio, class-balance statistics across the six categories in the 2829-post corpus, or any statistical significance testing (e.g., McNemar’s test or bootstrap confidence intervals) for the reported accuracy differences. These omissions are load-bearing for the claim that transformers outperform the D&ML baselines.

Authors: We agree that these details are crucial for assessing the robustness of our results. We will update Section 4 to specify the train/test split ratio employed, present the class-balance statistics for the six categories in the corpus of 2829 posts, and incorporate statistical significance testing, including McNemar's test or bootstrap confidence intervals, to confirm the significance of the performance improvements. revision: yes
Referee: [Section 5] Section 5 (Results): The comparison to FNN, CNN, and LSTM baselines does not specify whether identical hyperparameter search, early-stopping criteria, or data-preprocessing steps were applied, making it difficult to attribute the 6% gap solely to model architecture rather than experimental differences.

Authors: We appreciate this observation regarding experimental fairness. All models were evaluated under consistent conditions, including the same data preprocessing pipeline and hyperparameter tuning approaches. In the revised manuscript, we will provide explicit details on the hyperparameter search, early-stopping criteria, and preprocessing steps applied uniformly to the transformer models and the D&ML baselines (FNN, CNN, LSTM). This will clarify that the performance differences are due to the model architectures. revision: yes

Circularity Check

0 steps flagged

No circularity: standard supervised classification on independently derived labels

full rationale

The paper first extracts 2829 posts via quantum tags, then applies content analysis and grounded theory to identify six challenge categories and produce human-plus-ChatGPT annotations as ground truth. Transformer models (BERT, DistilBERT) are subsequently fine-tuned on this labeled set and evaluated on held-out posts to report 95% accuracy versus 89/86/84% for FNN/CNN/LSTM baselines. No equations, fitted parameters, or self-citations reduce the reported accuracies to the input labels by construction; the taxonomy precedes modeling and the performance numbers measure generalization on unseen data rather than tautological reproduction of the annotation process.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central performance claim rests on the assumption that the manually derived six-category taxonomy plus ChatGPT-validated labels constitute reliable ground truth; no free parameters are explicitly fitted in the abstract beyond standard transformer fine-tuning hyperparameters, and no new physical or mathematical entities are introduced.

free parameters (1)

fine-tuning hyperparameters
Learning rate, batch size, and number of epochs for BERT-family models are chosen but not reported in the abstract; these affect the 95% accuracy figure.

axioms (2)

domain assumption Pre-trained transformer models can be successfully fine-tuned for multi-class text classification on domain-specific forum posts
Invoked when the authors apply BERT, DistilBERT, and RoBERTa to the annotated quantum posts.
ad hoc to paper ChatGPT can reliably resolve annotation disagreements and validate human labels for QSE challenge categories
Used to create the ground-truth dataset described in the abstract.

pith-pipeline@v0.9.0 · 5869 in / 1648 out tokens · 42566 ms · 2026-05-18T13:59:24.962036+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We achieved an average accuracy of 95% with BERT DistilBERT... The Transformer-based approach outperforms the D&ML-based approach with a 6% increase in accuracy by processing actual discussions, i.e., without data augmentation.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Challenges include Tooling, Theoretical, Learning, Conceptual, Errors, and API Usage... ChatGPT validated human annotations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 2 internal anchors

[1]

Quantum computation and quantum information

Nielsen M A, Chuang I L. Quantum computation and quantum information. Cambridge university press, 2010

work page 2010
[2]

Reliable quantum computers

Preskill J. Reliable quantum computers. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 1998, 454(1969): 385–410

work page 1998
[3]

Quantum supremacy using a pro- grammable superconducting processor

Arute F, Arya K, Babbush R, Bacon D, Bardin J C, Barends R, Biswas R, Boixo S, Brandao F G, Buell D A, others . Quantum supremacy using a pro- grammable superconducting processor. Nature, 2019, 574(7779): 505–510

work page 2019
[4]

Ibm quantum computing roadmap,

IBM Quantum . Ibm quantum computing roadmap,

work page
[6]

Why quantum, 2020

Rigetti Computing . Why quantum, 2020. Accessed: 2025-03-28

work page 2020
[7]

The talavera manifesto for quantum software engineering and pro- gramming

Piattini M, Peterssen G, P ´erez-Castillo R, Hevia J L, Serrano M A, Hern ´andez G, De Guzm ´an I G R, Pa- radela C A, Polo M, Murina E, others . The talavera manifesto for quantum software engineering and pro- gramming. In: QANSWER. 2020, 1–5

work page 2020
[8]

d I, others

Murillo J M, Garcia-Alonso J, Moguel E, Barzen J, Leymann F, Ali S, Yue T, Arcaini P, P ´erez-Castillo R, Guzm´an G.-R. d I, others . Quantum software engineer- ing: Roadmap and challenges ahead. ACM Transac- tions on Software Engineering and Methodology, 2025, 34(5): 1–48

work page 2025
[9]

Quantum software engineering and potential of quantum computing in software engineering research: a review

Mandal A K, Nadim M, Roy C K, Roy B, Schneider K A. Quantum software engineering and potential of quantum computing in software engineering research: a review. Automated Software Engineering, 2025, 32(1): 27

work page 2025
[10]

Quantum software engineering: Landscapes and horizons,

Zhao J. Quantum software engineering: Landscapes and horizons. arXiv preprint arXiv:2007.07047, 2020

work page arXiv 2007
[11]

Q# enabling scalable quantum computing and development with a high-level dsl

Svore K, Geller A, Troyer M, Azariah J, Granade C, Heim B, Kliuchnikov V , Mykhailova M, Paz A, Roet- teler M. Q# enabling scalable quantum computing and development with a high-level dsl. In: Proceedings of the real world domain specific languages workshop

work page
[12]

Scaffcc: Scalable compila- tion and analysis of quantum programs

JavadiAbhari A, Patil S, Kudrow D, Heckey J, Lvov A, Chong F T, Martonosi M. Scaffcc: Scalable compila- tion and analysis of quantum programs. Parallel Com- puting, 2015, 45: 2–17

work page 2015
[13]

Qiskit — ibm quantum computing,

IBM Quantum . Qiskit — ibm quantum computing,

work page
[14]

Accessed: 2025-03-28

work page 2025
[15]

Understanding quantum software engineering challenges an empirical study on stack exchange forums and github issues

Li H, Khomh F, Openja M, others . Understanding quantum software engineering challenges an empirical study on stack exchange forums and github issues. In: 2021 IEEE International Conference on Software Main- tenance and Evolution (ICSME). 2021, 343–354

work page 2021
[16]

Towards quantum software requirements engineering

Yue T, Ali S, Arcaini P. Towards quantum software requirements engineering. In: 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). 2023, 161–164

work page 2023
[17]

Towards process centered architecting for quantum software systems

Ahmad A, Khan A A, Waseem M, Fahmideh M, Mikkonen T. Towards process centered architecting for quantum software systems. In: 2022 IEEE international conference on quantum software (QSW). 2022, 26–31

work page 2022
[18]

Mining q&a platforms for empirical evidence on quantum software programming

Khan A A, Ye B, Akbar M A, Khan J A, Mougouei D, Ma X. Mining q&a platforms for empirical evidence on quantum software programming. arXiv preprint arXiv:2503.05240, 2025

work page arXiv 2025
[19]

A systematic decision- making framework for tackling quantum software engi- neering challenges

Akbar M A, Khan A A, Rafi S. A systematic decision- making framework for tackling quantum software engi- neering challenges. Automated Software Engineering, 2023, 30(2): 22

work page 2023
[20]

Quantum software testing: A brief intro- duction

Ali S, Yue T. Quantum software testing: A brief intro- duction. In: 2023 IEEE/ACM 45th International Con- ference on Software Engineering: Companion Proceed- ings (ICSE-Companion). 2023, 332–333

work page 2023
[21]

A survey on mining stack overflow: question and answering (q&a) commu- nity

Ahmad A, Feng C, Ge S, Yousif A. A survey on mining stack overflow: question and answering (q&a) commu- nity. Data Technologies and Applications, 2018, 52(2): 190–247

work page 2018
[22]

How do oss developers reuse architectural solutions from q&a sites: An empir- ical study

Dieu d M J, Liang P, Shahin M. How do oss developers reuse architectural solutions from q&a sites: An empir- ical study. IEEE Transactions on Software Engineering, Nek Dil Khan et al. An Improved Quantum Software Challenges Classification Approach using Transfer Learning and Explainable AI 37 2025

work page 2025
[23]

Insights into software development approaches: min- ing q &a repositories

Khan A A, Khan J A, Akbar M A, Zhou P, Fahmideh M. Insights into software development approaches: min- ing q &a repositories. Empirical Software Engineering, 2024, 29(1): 8

work page 2024
[24]

Ma- chine learning for requirements engineering (ml4re): A systematic literature review complemented by practi- tioners’ voices from stack overflow

Li T, Zhang X, Wang Y , Zhou Q, Wang Y , Dong F. Ma- chine learning for requirements engineering (ml4re): A systematic literature review complemented by practi- tioners’ voices from stack overflow. Information and Software Technology, 2024, 172: 107477

work page 2024
[25]

What kind of questions do developers ask on stack overflow? a comparison of automated approaches to classify posts into question categories

Beyer S, Macho C, Di Penta M, Pinzger M. What kind of questions do developers ask on stack overflow? a comparison of automated approaches to classify posts into question categories. Empirical Software Engineer- ing, 2020, 25: 2258–2301

work page 2020
[26]

Exploring developers discussion forums for quantum software engineering: A fine-grained clas- sification approach using large language model (chat- gpt)

Husain M, Khan M S, Khan J A, Khan N D, Khan A, Akbar M A. Exploring developers discussion forums for quantum software engineering: A fine-grained clas- sification approach using large language model (chat- gpt). In: Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineer- ing. 2025, 1742–1755

work page 2025
[27]

Quantum computing: A taxonomy, sys- tematic review and future directions

Gill S S, Kumar A, Singh H, Singh M, Kaur K, Usman M, Buyya R. Quantum computing: A taxonomy, sys- tematic review and future directions. Software: Practice and Experience, 2022, 52(1): 66–114

work page 2022
[28]

On decision support for quantum application developers: catego- rization, comparison, and analysis of existing technolo- gies

Vietz D, Barzen J, Leymann F, Wild K. On decision support for quantum application developers: catego- rization, comparison, and analysis of existing technolo- gies. In: International Conference on Computational Science. 2021, 127–141

work page 2021
[29]

Quantum software engineering challenges from devel- opers’ perspective: Mapping research challenges to the proposed workflow model

Haghparast M, Mikkonen T, Nurminen J K, Stirbu V . Quantum software engineering challenges from devel- opers’ perspective: Mapping research challenges to the proposed workflow model. In: 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). 2023, 173–176

work page 2023
[30]

How do program- mers ask and answer questions on the web?(nier track)

Treude C, Barzilay O, Storey M A. How do program- mers ask and answer questions on the web?(nier track). In: Proceedings of the 33rd international conference on software engineering. 2011, 804–807

work page 2011
[31]

Deep learning-based correct answer pre- diction for developer forums

Iftikhar H U, Rehman A U, Kalugina O A, Umer Q, Khan H A. Deep learning-based correct answer pre- diction for developer forums. IEEE Access, 2021, 9: 128166–128177

work page 2021
[32]

An em- pirical study of question discussions on stack overflow

Zhu W, Zhang H, Hassan A E, Godfrey M W. An em- pirical study of question discussions on stack overflow. Empirical Software Engineering, 2022, 27(6): 148

work page 2022
[33]

Valuating requirements arguments in the online user’s forum for requirements decision-making: the crowdre-varg framework

Khan J A, Yasin A, Fatima R, Vasan D, Khan A A, Khan A W. Valuating requirements arguments in the online user’s forum for requirements decision-making: the crowdre-varg framework. Software: Practice and Experience, 2022, 52(12): 2537–2573

work page 2022
[34]

Requirements knowledge ac- quisition from online user forums

Ali Khan J, Liu L, Wen L. Requirements knowledge ac- quisition from online user forums. Iet Software, 2020, 14(3): 242–253

work page 2020
[35]

A manual categorization of an- droid app development issues on stack overflow

Beyer S, Pinzger M. A manual categorization of an- droid app development issues on stack overflow. In: 2014 IEEE International Conference on Software Main- tenance and Evolution. 2014, 531–535

work page 2014
[36]

Basics of qualitative research tech- niques

Strauss A, Corbin J. Basics of qualitative research tech- niques. 1998

work page 1998
[37]

The content analysis guidebook

Neuendorf K A. The content analysis guidebook. sage, 2017

work page 2017
[38]

Mining software insights: uncovering the frequently occurring issues in low-rating software applications

Khan N D, Khan J A, Li J, Ullah T, Zhao Q. Mining software insights: uncovering the frequently occurring issues in low-rating software applications. PeerJ Com- puter Science, 2024, 10: e2115

work page 2024
[39]

Leveraging large language model chatgpt for enhanced understand- ing of end-user emotions in social media feedbacks

Khan N D, Khan J A, Li J, Ullah T, Zhao Q. Leveraging large language model chatgpt for enhanced understand- ing of end-user emotions in social media feedbacks. Ex- pert Systems with Applications, 2025, 261: 125524

work page 2025
[40]

What are mobile developers asking about? a large scale study using stack overflow

Rosen C, Shihab E. What are mobile developers asking about? a large scale study using stack overflow. Empir- ical Software Engineering, 2016, 21: 1192–1223

work page 2016
[41]

Why, when, and what: analyz- ing stack overflow questions by topic, type, and code

Allamanis M, Sutton C. Why, when, and what: analyz- ing stack overflow questions by topic, type, and code. In: 2013 10th Working conference on mining software repositories (MSR). 2013, 53–56

work page 2013
[42]

Automatic mining of opinions ex- pressed about apis in stack overflow

Uddin G, Khomh F. Automatic mining of opinions ex- pressed about apis in stack overflow. IEEE Transactions on Software Engineering, 2019, 47(3): 522–559

work page 2019
[43]

Predicting the programming language: Extracting knowledge from stack overflow posts

Baquero J F, Camargo J E, Restrepo-Calle F, Aponte J H, Gonz ´alez F A. Predicting the programming language: Extracting knowledge from stack overflow posts. In: Advances in Computing: 12th Colombian Conference, CCC 2017, Cali, Colombia, September 19- 22, 2017, Proceedings 12. 2017, 199–210

work page 2017
[44]

Bug severity prediction using question-and-answer pairs from stack overflow

Tan Y , Xu S, Wang Z, Zhang T, Xu Z, Luo X. Bug severity prediction using question-and-answer pairs from stack overflow. Journal of Systems and Software, 2020, 165: 110567

work page 2020
[45]

Bert: Pre- training of deep bidirectional transformers for language understanding

Devlin J, Chang M W, Lee K, Toutanova K. Bert: Pre- training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technolo- gies, volume 1 (long and short papers). 2019, 4171– 4186

work page 2019
[46]

Compar- ing bert against traditional machine learning text classi- fication

Gonz ´alez-Carvajal S, Garrido-Merch ´an E C. Compar- ing bert against traditional machine learning text classi- fication. arXiv preprint arXiv:2005.13012, 2020 38 Front. Comput. Sci., 2025, 0(0): 1–40

work page arXiv 2005
[47]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Sanh V , Debut L, Chaumond J, Wolf T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910
[48]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu Y , Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V . Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[49]

Bug classification in quantum software: A rule-based framework and its evaluation

Yousuf M M, Sofi S A. Bug classification in quantum software: A rule-based framework and its evaluation. arXiv preprint arXiv:2506.10397, 2025

work page arXiv 2025
[50]

Architecture decisions in quantum soft- ware systems: An empirical study on stack exchange and github

Aktar M S, Liang P, Waseem M, Tahir A, Ahmad A, Zhang B, Li Z. Architecture decisions in quantum soft- ware systems: An empirical study on stack exchange and github. Information and Software Technology, 2025, 177: 107587

work page 2025
[51]

Automated Code Recommendation System

Upadhyay K, Chhetri V , Siddique A, Farooq U. Analyz- ing the evolution and maintenance of quantum software repositories. arXiv preprint arXiv:2501.06894, 2025 NEK DIL KHAN received his B.Sc. degree in software engi- neering from the University of Science and Technology Bannu, Khyber Pakhtunkhwa, Pakistan. He continued to pursue his pas- sion and earned h...

work page arXiv 2025
[52]

He has published over 130 research articles in well- reputed journals and international conferences. He taught and designed several core and advanced courses in software engineering and has been recognized with excellence in teaching, excellence in instructional tech- nology, and excellence in academic advising awards. His multidisciplinary research integ...

work page 2025

[1] [1]

Quantum computation and quantum information

Nielsen M A, Chuang I L. Quantum computation and quantum information. Cambridge university press, 2010

work page 2010

[2] [2]

Reliable quantum computers

Preskill J. Reliable quantum computers. Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences, 1998, 454(1969): 385–410

work page 1998

[3] [3]

Quantum supremacy using a pro- grammable superconducting processor

Arute F, Arya K, Babbush R, Bacon D, Bardin J C, Barends R, Biswas R, Boixo S, Brandao F G, Buell D A, others . Quantum supremacy using a pro- grammable superconducting processor. Nature, 2019, 574(7779): 505–510

work page 2019

[4] [4]

Ibm quantum computing roadmap,

IBM Quantum . Ibm quantum computing roadmap,

work page

[5] [6]

Why quantum, 2020

Rigetti Computing . Why quantum, 2020. Accessed: 2025-03-28

work page 2020

[6] [7]

The talavera manifesto for quantum software engineering and pro- gramming

Piattini M, Peterssen G, P ´erez-Castillo R, Hevia J L, Serrano M A, Hern ´andez G, De Guzm ´an I G R, Pa- radela C A, Polo M, Murina E, others . The talavera manifesto for quantum software engineering and pro- gramming. In: QANSWER. 2020, 1–5

work page 2020

[7] [8]

d I, others

Murillo J M, Garcia-Alonso J, Moguel E, Barzen J, Leymann F, Ali S, Yue T, Arcaini P, P ´erez-Castillo R, Guzm´an G.-R. d I, others . Quantum software engineer- ing: Roadmap and challenges ahead. ACM Transac- tions on Software Engineering and Methodology, 2025, 34(5): 1–48

work page 2025

[8] [9]

Quantum software engineering and potential of quantum computing in software engineering research: a review

Mandal A K, Nadim M, Roy C K, Roy B, Schneider K A. Quantum software engineering and potential of quantum computing in software engineering research: a review. Automated Software Engineering, 2025, 32(1): 27

work page 2025

[9] [10]

Quantum software engineering: Landscapes and horizons,

Zhao J. Quantum software engineering: Landscapes and horizons. arXiv preprint arXiv:2007.07047, 2020

work page arXiv 2007

[10] [11]

Q# enabling scalable quantum computing and development with a high-level dsl

Svore K, Geller A, Troyer M, Azariah J, Granade C, Heim B, Kliuchnikov V , Mykhailova M, Paz A, Roet- teler M. Q# enabling scalable quantum computing and development with a high-level dsl. In: Proceedings of the real world domain specific languages workshop

work page

[11] [12]

Scaffcc: Scalable compila- tion and analysis of quantum programs

JavadiAbhari A, Patil S, Kudrow D, Heckey J, Lvov A, Chong F T, Martonosi M. Scaffcc: Scalable compila- tion and analysis of quantum programs. Parallel Com- puting, 2015, 45: 2–17

work page 2015

[12] [13]

Qiskit — ibm quantum computing,

IBM Quantum . Qiskit — ibm quantum computing,

work page

[13] [14]

Accessed: 2025-03-28

work page 2025

[14] [15]

Understanding quantum software engineering challenges an empirical study on stack exchange forums and github issues

Li H, Khomh F, Openja M, others . Understanding quantum software engineering challenges an empirical study on stack exchange forums and github issues. In: 2021 IEEE International Conference on Software Main- tenance and Evolution (ICSME). 2021, 343–354

work page 2021

[15] [16]

Towards quantum software requirements engineering

Yue T, Ali S, Arcaini P. Towards quantum software requirements engineering. In: 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). 2023, 161–164

work page 2023

[16] [17]

Towards process centered architecting for quantum software systems

Ahmad A, Khan A A, Waseem M, Fahmideh M, Mikkonen T. Towards process centered architecting for quantum software systems. In: 2022 IEEE international conference on quantum software (QSW). 2022, 26–31

work page 2022

[17] [18]

Mining q&a platforms for empirical evidence on quantum software programming

Khan A A, Ye B, Akbar M A, Khan J A, Mougouei D, Ma X. Mining q&a platforms for empirical evidence on quantum software programming. arXiv preprint arXiv:2503.05240, 2025

work page arXiv 2025

[18] [19]

A systematic decision- making framework for tackling quantum software engi- neering challenges

Akbar M A, Khan A A, Rafi S. A systematic decision- making framework for tackling quantum software engi- neering challenges. Automated Software Engineering, 2023, 30(2): 22

work page 2023

[19] [20]

Quantum software testing: A brief intro- duction

Ali S, Yue T. Quantum software testing: A brief intro- duction. In: 2023 IEEE/ACM 45th International Con- ference on Software Engineering: Companion Proceed- ings (ICSE-Companion). 2023, 332–333

work page 2023

[20] [21]

A survey on mining stack overflow: question and answering (q&a) commu- nity

Ahmad A, Feng C, Ge S, Yousif A. A survey on mining stack overflow: question and answering (q&a) commu- nity. Data Technologies and Applications, 2018, 52(2): 190–247

work page 2018

[21] [22]

How do oss developers reuse architectural solutions from q&a sites: An empir- ical study

Dieu d M J, Liang P, Shahin M. How do oss developers reuse architectural solutions from q&a sites: An empir- ical study. IEEE Transactions on Software Engineering, Nek Dil Khan et al. An Improved Quantum Software Challenges Classification Approach using Transfer Learning and Explainable AI 37 2025

work page 2025

[22] [23]

Insights into software development approaches: min- ing q &a repositories

Khan A A, Khan J A, Akbar M A, Zhou P, Fahmideh M. Insights into software development approaches: min- ing q &a repositories. Empirical Software Engineering, 2024, 29(1): 8

work page 2024

[23] [24]

Ma- chine learning for requirements engineering (ml4re): A systematic literature review complemented by practi- tioners’ voices from stack overflow

Li T, Zhang X, Wang Y , Zhou Q, Wang Y , Dong F. Ma- chine learning for requirements engineering (ml4re): A systematic literature review complemented by practi- tioners’ voices from stack overflow. Information and Software Technology, 2024, 172: 107477

work page 2024

[24] [25]

What kind of questions do developers ask on stack overflow? a comparison of automated approaches to classify posts into question categories

Beyer S, Macho C, Di Penta M, Pinzger M. What kind of questions do developers ask on stack overflow? a comparison of automated approaches to classify posts into question categories. Empirical Software Engineer- ing, 2020, 25: 2258–2301

work page 2020

[25] [26]

Exploring developers discussion forums for quantum software engineering: A fine-grained clas- sification approach using large language model (chat- gpt)

Husain M, Khan M S, Khan J A, Khan N D, Khan A, Akbar M A. Exploring developers discussion forums for quantum software engineering: A fine-grained clas- sification approach using large language model (chat- gpt). In: Proceedings of the 33rd ACM International Conference on the Foundations of Software Engineer- ing. 2025, 1742–1755

work page 2025

[26] [27]

Quantum computing: A taxonomy, sys- tematic review and future directions

Gill S S, Kumar A, Singh H, Singh M, Kaur K, Usman M, Buyya R. Quantum computing: A taxonomy, sys- tematic review and future directions. Software: Practice and Experience, 2022, 52(1): 66–114

work page 2022

[27] [28]

On decision support for quantum application developers: catego- rization, comparison, and analysis of existing technolo- gies

Vietz D, Barzen J, Leymann F, Wild K. On decision support for quantum application developers: catego- rization, comparison, and analysis of existing technolo- gies. In: International Conference on Computational Science. 2021, 127–141

work page 2021

[28] [29]

Quantum software engineering challenges from devel- opers’ perspective: Mapping research challenges to the proposed workflow model

Haghparast M, Mikkonen T, Nurminen J K, Stirbu V . Quantum software engineering challenges from devel- opers’ perspective: Mapping research challenges to the proposed workflow model. In: 2023 IEEE International Conference on Quantum Computing and Engineering (QCE). 2023, 173–176

work page 2023

[29] [30]

How do program- mers ask and answer questions on the web?(nier track)

Treude C, Barzilay O, Storey M A. How do program- mers ask and answer questions on the web?(nier track). In: Proceedings of the 33rd international conference on software engineering. 2011, 804–807

work page 2011

[30] [31]

Deep learning-based correct answer pre- diction for developer forums

Iftikhar H U, Rehman A U, Kalugina O A, Umer Q, Khan H A. Deep learning-based correct answer pre- diction for developer forums. IEEE Access, 2021, 9: 128166–128177

work page 2021

[31] [32]

An em- pirical study of question discussions on stack overflow

Zhu W, Zhang H, Hassan A E, Godfrey M W. An em- pirical study of question discussions on stack overflow. Empirical Software Engineering, 2022, 27(6): 148

work page 2022

[32] [33]

Valuating requirements arguments in the online user’s forum for requirements decision-making: the crowdre-varg framework

Khan J A, Yasin A, Fatima R, Vasan D, Khan A A, Khan A W. Valuating requirements arguments in the online user’s forum for requirements decision-making: the crowdre-varg framework. Software: Practice and Experience, 2022, 52(12): 2537–2573

work page 2022

[33] [34]

Requirements knowledge ac- quisition from online user forums

Ali Khan J, Liu L, Wen L. Requirements knowledge ac- quisition from online user forums. Iet Software, 2020, 14(3): 242–253

work page 2020

[34] [35]

A manual categorization of an- droid app development issues on stack overflow

Beyer S, Pinzger M. A manual categorization of an- droid app development issues on stack overflow. In: 2014 IEEE International Conference on Software Main- tenance and Evolution. 2014, 531–535

work page 2014

[35] [36]

Basics of qualitative research tech- niques

Strauss A, Corbin J. Basics of qualitative research tech- niques. 1998

work page 1998

[36] [37]

The content analysis guidebook

Neuendorf K A. The content analysis guidebook. sage, 2017

work page 2017

[37] [38]

Mining software insights: uncovering the frequently occurring issues in low-rating software applications

Khan N D, Khan J A, Li J, Ullah T, Zhao Q. Mining software insights: uncovering the frequently occurring issues in low-rating software applications. PeerJ Com- puter Science, 2024, 10: e2115

work page 2024

[38] [39]

Leveraging large language model chatgpt for enhanced understand- ing of end-user emotions in social media feedbacks

Khan N D, Khan J A, Li J, Ullah T, Zhao Q. Leveraging large language model chatgpt for enhanced understand- ing of end-user emotions in social media feedbacks. Ex- pert Systems with Applications, 2025, 261: 125524

work page 2025

[39] [40]

What are mobile developers asking about? a large scale study using stack overflow

Rosen C, Shihab E. What are mobile developers asking about? a large scale study using stack overflow. Empir- ical Software Engineering, 2016, 21: 1192–1223

work page 2016

[40] [41]

Why, when, and what: analyz- ing stack overflow questions by topic, type, and code

Allamanis M, Sutton C. Why, when, and what: analyz- ing stack overflow questions by topic, type, and code. In: 2013 10th Working conference on mining software repositories (MSR). 2013, 53–56

work page 2013

[41] [42]

Automatic mining of opinions ex- pressed about apis in stack overflow

Uddin G, Khomh F. Automatic mining of opinions ex- pressed about apis in stack overflow. IEEE Transactions on Software Engineering, 2019, 47(3): 522–559

work page 2019

[42] [43]

Predicting the programming language: Extracting knowledge from stack overflow posts

Baquero J F, Camargo J E, Restrepo-Calle F, Aponte J H, Gonz ´alez F A. Predicting the programming language: Extracting knowledge from stack overflow posts. In: Advances in Computing: 12th Colombian Conference, CCC 2017, Cali, Colombia, September 19- 22, 2017, Proceedings 12. 2017, 199–210

work page 2017

[43] [44]

Bug severity prediction using question-and-answer pairs from stack overflow

Tan Y , Xu S, Wang Z, Zhang T, Xu Z, Luo X. Bug severity prediction using question-and-answer pairs from stack overflow. Journal of Systems and Software, 2020, 165: 110567

work page 2020

[44] [45]

Bert: Pre- training of deep bidirectional transformers for language understanding

Devlin J, Chang M W, Lee K, Toutanova K. Bert: Pre- training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technolo- gies, volume 1 (long and short papers). 2019, 4171– 4186

work page 2019

[45] [46]

Compar- ing bert against traditional machine learning text classi- fication

Gonz ´alez-Carvajal S, Garrido-Merch ´an E C. Compar- ing bert against traditional machine learning text classi- fication. arXiv preprint arXiv:2005.13012, 2020 38 Front. Comput. Sci., 2025, 0(0): 1–40

work page arXiv 2005

[46] [47]

DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter

Sanh V , Debut L, Chaumond J, Wolf T. Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1910

[47] [48]

RoBERTa: A Robustly Optimized BERT Pretraining Approach

Liu Y , Ott M, Goyal N, Du J, Joshi M, Chen D, Levy O, Lewis M, Zettlemoyer L, Stoyanov V . Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[48] [49]

Bug classification in quantum software: A rule-based framework and its evaluation

Yousuf M M, Sofi S A. Bug classification in quantum software: A rule-based framework and its evaluation. arXiv preprint arXiv:2506.10397, 2025

work page arXiv 2025

[49] [50]

Architecture decisions in quantum soft- ware systems: An empirical study on stack exchange and github

Aktar M S, Liang P, Waseem M, Tahir A, Ahmad A, Zhang B, Li Z. Architecture decisions in quantum soft- ware systems: An empirical study on stack exchange and github. Information and Software Technology, 2025, 177: 107587

work page 2025

[50] [51]

Automated Code Recommendation System

Upadhyay K, Chhetri V , Siddique A, Farooq U. Analyz- ing the evolution and maintenance of quantum software repositories. arXiv preprint arXiv:2501.06894, 2025 NEK DIL KHAN received his B.Sc. degree in software engi- neering from the University of Science and Technology Bannu, Khyber Pakhtunkhwa, Pakistan. He continued to pursue his pas- sion and earned h...

work page arXiv 2025

[51] [52]

He has published over 130 research articles in well- reputed journals and international conferences. He taught and designed several core and advanced courses in software engineering and has been recognized with excellence in teaching, excellence in instructional tech- nology, and excellence in academic advising awards. His multidisciplinary research integ...

work page 2025