pith. sign in

arxiv: 2601.08367 · v2 · submitted 2026-01-13 · 🪐 quant-ph · cs.SE

A Methodological Analysis of Empirical Studies in Quantum Software Testing

Pith reviewed 2026-05-16 15:17 UTC · model grok-4.3

classification 🪐 quant-ph cs.SE
keywords quantum software testingempirical studiesmethodological analysisquantum software engineeringsystematic reviewtesting practicesresearch methodologyexperimental design
0
0 comments X

The pith

Empirical studies in quantum software testing show highly diverse designs and reporting that make results hard to interpret and compare.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper conducts a systematic review of 59 empirical studies on quantum software testing drawn from a pool of 384 papers. It examines these studies through ten research questions focused on test objects, baselines, setups, configurations, and tool support. The analysis reveals inconsistent practices in how experiments are designed and reported. These inconsistencies create barriers to understanding individual findings and synthesizing knowledge across the literature. The authors conclude by offering recommendations aimed at improving future empirical work in the area.

Core claim

The design and reporting of empirical studies in QST remain highly diverse, and a shared methodological understanding has yet to emerge, making it difficult to interpret results and compare findings across studies.

What carries the argument

Systematic examination of 59 primary studies organized around ten research questions on methodological dimensions including objects under test, baseline comparison, testing setup, experimental configuration, and tool and artifact support.

If this is right

  • Consistent baseline comparisons would allow clearer assessment of whether new testing techniques outperform existing ones.
  • Better documentation of experimental configurations would improve reproducibility of QST results.
  • Shared artifacts and tools would reduce duplication of effort in setting up quantum testing experiments.
  • Standardized reporting practices would facilitate meta-analyses that combine findings from multiple studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Without methodological convergence, progress in quantum software engineering may remain slower than in classical software testing.
  • The review points toward opportunities for community-developed benchmarks that multiple studies could adopt.
  • Lessons from similar analyses in classical software engineering could be adapted to accelerate standardization in QST.

Load-bearing premise

The 59 selected studies represent the broader literature and the ten research questions capture the key methodological dimensions without missing critical aspects.

What would settle it

A follow-up review of a larger set of QST studies that documents uniform designs and reporting standards across most papers would undermine the claim of persistent diversity.

Figures

Figures reproduced from arXiv: 2601.08367 by Jianjun Zhao, Minqi Shao, Qichen Wang, Yuechen Li.

Figure 1
Figure 1. Figure 1: Bibliometric analysis of the 59 primary studies [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example of PUTs and the corresponding CUT for a 3-qubit Quantum Fourier Transform [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Quantum algorithms and subroutines used in the primary studies [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Boxplots for the numbers of adopted quantum programs [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Statistics related to the buggy variants for fault detection [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Combinations of mutation actions and mutation targets adopted in primary studies, where the [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Numbers of mutant- and version-level buggy variants [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Quantity statistics for primary studies in terms of the scalability issue [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Complexity measures for CUTs involved in primary studies [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Key steps in a universal test process for QST [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Quantity statistics for the test cases adopted in the primary studies [PITH_FULL_IMAGE:figures/full_fig_p024_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: The number of primary studies adopting effectiveness or cost metrics [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Quantity statistics for the baselines adopted in the primary studies [PITH_FULL_IMAGE:figures/full_fig_p034_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: The roles of statistical repetitions (i.e., shots and experimental repetitions) in QST experiments [PITH_FULL_IMAGE:figures/full_fig_p037_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Visualization for shot counts configured in the primary studies [PITH_FULL_IMAGE:figures/full_fig_p038_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Histogram about experimental repetitions configured in experiments [PITH_FULL_IMAGE:figures/full_fig_p039_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: Execution Backends adopted in the primary studies for running the CUTs [PITH_FULL_IMAGE:figures/full_fig_p041_17.png] view at source ↗
read the original abstract

In quantum software engineering (QSE), quantum software testing (QST) has attracted increasing attention as quantum software systems grow in scale and complexity. Since QST evaluates quantum programs through execution under designed test inputs, empirical studies are widely used to assess the effectiveness of testing approaches. However, the design and reporting of empirical studies in QST remain highly diverse, and a shared methodological understanding has yet to emerge, making it difficult to interpret results and compare findings across studies. This paper presents a methodological analysis of empirical studies in QST through a systematic examination of 59 primary studies identified from a literature pool of size 384. We organize our analysis around ten research questions that cover key methodological dimensions of QST empirical studies, including objects under test, baseline comparison, testing setup, experimental configuration, and tool and artifact support. Through cross-study analysis along these dimensions, we characterize current empirical practices in QST, identify recurring limitations and inconsistencies, and highlight open methodological challenges. Based on our findings, we derive insights and recommendations to inform the design, execution, and reporting of future empirical studies in QST.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper conducts a systematic methodological analysis of empirical studies in quantum software testing (QST). From an initial pool of 384 papers, 59 primary studies are selected and analyzed using ten research questions covering objects under test, baseline comparisons, testing setups, experimental configurations, and tool support. The central claim is that the design and reporting of these studies are highly diverse, lacking a shared methodological understanding, which hinders interpretation and comparison of results. The authors identify limitations and provide recommendations for future studies.

Significance. This work is significant for the emerging field of quantum software engineering as it provides a comprehensive overview of current empirical practices in QST. By highlighting inconsistencies and diversity in methodologies, it can help establish better standards for designing, executing, and reporting empirical studies. The systematic review approach, drawing from 384 papers down to 59, is a strength, offering a broad perspective that could guide researchers in improving reproducibility and comparability in QST research.

major comments (2)
  1. [Section 3 (Methodology)] The description of the literature search and selection process lacks explicit details on the inclusion and exclusion criteria, as well as how inter-rater reliability was ensured during the screening of the 384 papers to 59 studies. This is critical for assessing the representativeness of the selected studies and the robustness of the cross-study analysis.
  2. [Section 4 (Analysis)] While the ten research questions are outlined, the paper should clarify how these questions were derived and whether they comprehensively cover all key methodological dimensions, such as statistical power analysis or handling of quantum-specific noise, to avoid potential gaps in the characterization of practices.
minor comments (2)
  1. [Abstract] The abstract mentions 'a literature pool of size 384' but does not specify the time period or databases searched; adding this would improve clarity.
  2. [Throughout] Some figures or tables summarizing the findings across the 59 studies could benefit from clearer labeling of categories to facilitate quick comparison.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of our paper and the constructive feedback. We address each major comment below and will incorporate revisions to improve clarity and transparency.

read point-by-point responses
  1. Referee: [Section 3 (Methodology)] The description of the literature search and selection process lacks explicit details on the inclusion and exclusion criteria, as well as how inter-rater reliability was ensured during the screening of the 384 papers to 59 studies. This is critical for assessing the representativeness of the selected studies and the robustness of the cross-study analysis.

    Authors: We agree that additional explicit details on the selection process would strengthen the methodological transparency. In the revised manuscript, we will expand Section 3 to provide the complete set of inclusion and exclusion criteria applied during screening. We will also describe the inter-rater reliability process, including independent screening by multiple authors, discussion of disagreements, and any quantitative measures used to ensure consistency. revision: yes

  2. Referee: [Section 4 (Analysis)] While the ten research questions are outlined, the paper should clarify how these questions were derived and whether they comprehensively cover all key methodological dimensions, such as statistical power analysis or handling of quantum-specific noise, to avoid potential gaps in the characterization of practices.

    Authors: The ten research questions were systematically derived by adapting established methodological dimensions from empirical software engineering literature (e.g., objects under test, baselines, and experimental setups) to the quantum software testing context, informed by an initial scoping of the literature. We will add a dedicated paragraph in Section 4 explaining this derivation process. While the questions address the primary dimensions observed across the 59 studies, we acknowledge that aspects such as statistical power analysis and explicit handling of quantum noise were not covered because they were rarely reported in the primary studies. We will revise the discussion to note this as a limitation and recommend these as priorities for future methodological work. revision: partial

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper conducts a systematic methodological review of 59 primary studies selected from 384 papers in quantum software testing. It defines ten research questions covering objects under test, baselines, setups, configurations, and tooling, then performs cross-study characterization without any equations, fitted parameters, predictions, or derivations. No self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear; the central claim of high diversity in empirical practices rests on direct analysis of external literature rather than internal reduction to the paper's own inputs. This is a standard descriptive survey with no circular structure.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard systematic literature review practices in empirical software engineering; no free parameters, invented entities, or ad-hoc axioms beyond the assumption that the chosen dimensions cover the methodological space.

axioms (1)
  • domain assumption Systematic literature review methodology is appropriate and sufficient to characterize empirical practices in QST
    Standard approach in software engineering research for synthesizing study designs

pith-pipeline@v0.9.0 · 5493 in / 1204 out tokens · 23927 ms · 2026-05-16T15:17:28.049489+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Probabilistic Condition, Decision and Path Coverage of Circuit-based Quantum Programs

    quant-ph 2026-04 unverdicted novelty 6.0

    Quantum circuits show high average condition (97.56%) and decision (97.63%) coverage but lower path coverage (71.84%), with probabilistic versions adding confidence levels (averages 88.87%, 88.65%, 37.18%); mutation t...

Reference graph

Works this paper leans on

143 extracted references · 143 canonical work pages · cited by 1 Pith paper · 8 internal anchors

  1. [1]

    Scott Aaronson and Daniel Gottesman. 2004. Improved simulation of stabilizer circuits.Physical Review A—Atomic, Molecular, and Optical Physics70, 5 (2004), 052328

  2. [2]

    J Abhijith, Adetokunbo Adedoyin, John Ambrosiano, Petr Anisimov, William Casper, Gopinath Chennupati, Carleton Coffrin, Hristo Djidjev, David Gunter, Satish Karra, et al. 2022. Quantum Algorithm: Implementations for Beginners. ACM Transactions on Quantum Computing3, 4 (2022), 1--92

  3. [3]

    Rui Abreu, João Paulo Fernandes, Luis Llana, and Guilherme Tavares. 2023. Metamorphic testing of oracle quantum programs. In2022 IEEE/ACM 3rd International Workshop on Quantum Software Engineering (Q-SE). 16–23. doi:10.114 5/3528230.3529189

  4. [4]

    Rui Abreu, Peter Zoeteweij, and Arjan JC Van Gemund. 2007. On the accuracy of spectrum-based fault localization. InTesting: Academic and industrial conference practice and research techniques-MUTATION (TAICPART-MUTATION 2007). IEEE, 89--98

  5. [5]

    De Jong, and Samah Mohamed Saeed

    Nikita Acharya, Miroslav Urbanek, Wibe A. De Jong, and Samah Mohamed Saeed. 2021. Test Points for Online Monitoring of Quantum Circuits.J. Emerg. Technol. Comput. Syst.18, 1 (2021). doi:10.1145/3477928

  6. [6]

    Chow, Antonio D

    Gadi Aleksandrowicz, Thomas Alexander, Panagiotis Barkoutsos, Luciano Bello, Yael Ben-Haim, David Bucher, Francisco Jose Cabrera-Hernández, Jorge Carballo-Franquis, Adrian Chen, Chun-Fu Chen, Jerry M. Chow, Antonio D. Córcoles-Gonzales, Abigail J. Cross, Andrew Cross, Juan Cruz-Benito, Chris Culver, Salvador De La Puente González, , Vol. 1, No. 1, Article...

  7. [7]

    Shaukat Ali, Paolo Arcaini, Xinyi Wang, and Tao Yue. 2021. Assessing the Effectiveness of Input and Output Coverage Criteria for Testing Quantum Programs. In2021 14th IEEE Conference on Software Testing, Verification and Validation (ICST 2021). 13--23. doi:10.1109/ICST49551.2021.00014

  8. [8]

    Andrea Arcuri and Lionel Briand. 2011. Adaptive random testing: An illusion of effectiveness?. InProceedings of the 2011 International Symposium on Software Testing and Analysis. 265--275

  9. [9]

    Andrea Arcuri and Lionel Briand. 2011. A practical guide for using statistical tests to assess randomized algorithms in software engineering. InProceedings of the 33rd international conference on software engineering. 1--10

  10. [10]

    Stephen M Barnett and Sarah Croke. 2009. Quantum state discrimination.Advances in Optics and Photonics1, 2 (2009), 238--278

  11. [11]

    Earl T Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2014. The oracle problem in software testing: A survey.IEEE transactions on software engineering41, 5 (2014), 507--525

  12. [12]

    Victor R Basili and Richard W Selby. 2006. Comparing the effectiveness of software testing strategies.IEEE transactions on software engineering12 (2006), 1278--1296

  13. [13]

    Richard Bellman. 1966. Dynamic programming.science153, 3731 (1966), 34--37

  14. [14]

    Ville Bergholm, Josh Izaac, Maria Schuld, Christian Gogolin, Shahnawaz Ahmed, Vishnu Ajith, M Sohaib Alam, Guillermo Alonso-Linaje, B AkashNarayanan, Ali Asadi, et al. 2018. Pennylane: Automatic differentiation of hybrid quantum-classical computations.arXiv preprint arXiv:1811.04968(2018)

  15. [15]

    François Bourguignon, Martin Fournier, and Marc Gurgand. 2007. Selection bias corrections based on the multinomial logit model: Monte Carlo comparisons.Journal of Economic surveys21, 1 (2007), 174--205

  16. [16]

    Harry Buhrman, Richard Cleve, John Watrous, and Ronald De Wolf. 2001. Quantum fingerprinting.Physical review letters87, 16 (2001), 167902

  17. [17]

    José Campos and André Souto. 2021. Qbugs: A collection of reproducible bugs in quantum algorithms and a supporting infrastructure to enable controlled quantum software testing and debugging experiments. In2021 IEEE/ACM 2nd International Workshop on Quantum Software Engineering (Q-SE). IEEE, 28--32

  18. [18]

    Kean Chen, Wang Fang, Ji Guan, Xin Hong, Mingyu Huang, Junyi Liu, Qisheng Wang, and Mingsheng Ying. 2022. VeriQBench: A benchmark for multiple types of quantum circuits.arXiv preprint arXiv:2206.10880(2022)

  19. [19]

    Kean Chen and Mingsheng Ying. 2024. Automatic Test Pattern Generation for Robust Quantum Circuit Testing. ACM Trans. Des. Autom. Electron. Syst.29, 6 (2024). doi:10.1145/3689333

  20. [20]

    Tsong Yueh Chen, Fei-Ching Kuo, Huai Liu, Pak-Lok Poon, Dave Towey, TH Tse, and Zhi Quan Zhou. 2018. Metamorphic testing: A review of challenges and opportunities.ACM Computing Surveys (CSUR)51, 1 (2018), 1--27

  21. [21]

    Tsong Yueh Chen, Hing Leung, and Ieng Kei Mak. 2004. Adaptive random testing. InAnnual Asian Computing Science Conference. Springer, 320--329

  22. [22]

    Yiqun T Chen, Rahul Gopinath, Anita Tadakamalla, Michael D Ernst, Reid Holmes, Gordon Fraser, Paul Ammann, and René Just. 2020. Revisiting the relationship between fault detection, test adequacy criteria, and test set size. In Proceedings of the 35th IEEE/ACM international conference on automated software engineering. 237--249

  23. [23]

    Iris Cong, Soonwon Choi, and Mikhail D Lukin. 2019. Quantum convolutional neural networks.Nature Physics15, 12 (2019), 1273--1278

  24. [24]

    Nuno Costa, João Paulo Fernandes, and Rui Abreu. 2022. Asserting the correctness of Shor implementations using metamorphic testing. InProceedings of the 1st International Workshop on Quantum Programming for Software Engineering. 32–36. doi:10.1145/3549036.3562062

  25. [25]

    Martin D Davis and Elaine J Weyuker. 1981. Pseudo-oracles for non-testable programs. InProceedings of the ACM’81 Conference. 254--257

  26. [26]

    Manuel De Stefano, Fabiano Pecorelli, Dario Di Nucci, Fabio Palomba, and Andrea De Lucia. 2024. The quantum frontier of software engineering: A systematic mapping study.Information and Software Technology175 (2024), , Vol. 1, No. 1, Article . Publication date: January 2025. A Methodological Analysis of Empirical Studies in Quantum Software Testing 53 107525

  27. [27]

    Elizabeth Dinella, Gabriel Ryan, Todd Mytkowicz, and Shuvendu K Lahiri. 2022. Toga: A neural method for test oracle generation. InProceedings of the 44th International Conference on Software Engineering. 2130--2141

  28. [28]

    Hyunsook Do, Sebastian Elbaum, and Gregg Rothermel. 2005. Supporting controlled experimentation with testing techniques: An infrastructure and its potential impact.Empirical Software Engineering10, 4 (2005), 405--435

  29. [29]

    Xiaoning Du, Xiaofei Xie, Yi Li, Lei Ma, Yang Liu, and Jianjun Zhao. 2019. Deepstellar: Model-based quantitative analysis of stateful deep learning systems. InProceedings of the 2019 27th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering. 477--487

  30. [30]

    Richard P Feynman. 2018. Simulating physics with computers. InFeynman and computation. cRc Press, 133--153

  31. [31]

    Daniel Fortunato, JOSÉ CAMPOS, and RUI ABREU. 2022. Mutation Testing of Quantum Programs: A Case Study With Qiskit.IEEE Transactions on Quantum Engineering3 (2022), 1--17. doi:10.1109/TQE.2022.3195061

  32. [32]

    Daniel Fortunato, José Campos, and Rui Abreu. 2022. QMutPy: A mutation testing tool for quantum algorithms and applications in Qiskit. InProceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. 797--800

  33. [33]

    Daniel Fortunato, José Campos, and Rui Abreu. 2024. Gate Branch Coverage: A Metric for Quantum Software Testing. InProceedings of the 1st ACM International Workshop on Quantum Software Engineering:the Next Evolution, Qse-Ne

  34. [34]
  35. [35]

    Antonio García de la Barrera, Ignacio García-Rodríguez de Guzmán, Macario Polo, and Mario Piattini. 2023. Quantum software testing: State of the art.Journal of Software: Evolution and Process35, 4 (2023), e2419

  36. [36]

    Juan Carlos Garcia-Escartin and Pedro Chamorro-Posada. 2011. Equivalent quantum circuits.arXiv preprint arXiv:1110.2998(2011)

  37. [37]

    Gregory Gay. 2010. A baseline method for search-based software engineering. InProceedings of the 6th International Conference on Predictive Models in Software Engineering. 1--11

  38. [38]

    Vincent Gierisch and Wolfgang Mauerer. 2025. QEF: Reproducible and Exploratory Quantum Software Experiments. arXiv preprint arXiv:2511.04563(2025)

  39. [39]

    Sinhué García Gil, Luis Llana Díaz, and José Ignacio Requeno Jarabo. 2024. QCRMut: Quantum circuit random mutant generator tool.arXiv Preprint arXiv:2410.01415(2024)

  40. [40]

    Daniel Gottesman. 1998. The Heisenberg representation of quantum computers.arXiv preprint quant-ph/9807006 (1998)

  41. [41]

    Xiaoyu Guo, Jianjun Zhao, and Pengzhan Zhao. 2024. On Repairing Quantum Programs Using ChatGPT. In2024 IEEE/ACM 5th International Workshop on Quantum Software Engineering (Q-SE). 9--16

  42. [42]

    1977.Elements of Software Science (Operating and programming systems series)

    Maurice H Halstead. 1977.Elements of Software Science (Operating and programming systems series). Elsevier Science Inc

  43. [43]

    Junda He, Christoph Treude, and David Lo. 2025. LLM-Based Multi-Agent Systems for Software Engineering: Literature Review, Vision, and the Road Ahead.ACM Transactions on Software Engineering and Methodology34, 5 (2025), 1--30

  44. [44]

    Shahin Honarvar, Mohammad Reza Mousavi, and Rajagopal Nagarajan. 2020. Property-based Testing of Quantum Programs in Q#. InProceedings of the IEEE/ACM 42nd International Conference on Software Engineering Workshops. 430–435. doi:10.1145/3387940.3391459

  45. [45]

    Linzhi Huang, Hanyu Pei, Yuechen Li, Beibei Yin, and Kai-Yuan Cai. 2024. A Strategy of Dynamic Random Testing with Hybrid Distance Metrics for Quantum Programs. In2024 IEEE 24th International Conference on Software Quality, Reliability and Security (QRS). 1--12. doi:10.1109/QRS62785.2024.00011

  46. [46]

    Yipeng Huang and Margaret Martonosi. 2019. Statistical assertions for validating patterns and finding bugs in quantum programs. InProceedings of the 46th International Symposium on Computer Architecture. 541–553. doi:10.114 5/3307650.3322213

  47. [47]

    2007.Automated defect prevention: best practices in software management

    Dorota Huizinga and Adam Kolawa. 2007.Automated defect prevention: best practices in software management. John Wiley & Sons

  48. [48]

    Yuta Ishimoto, Masanari Kondo, Naoyasu Ubayashi, Yasutaka Kamei, Ryota Katsube, Naoto Sato, and Hideto Ogawa

  49. [49]

    Evaluating Origin program output Fault Localization for Quantum Programs.arXiv Preprint arXiv:2505.09059 (2025)

  50. [50]

    Tiancheng Jin, Shangzhou Xia, and Jianjun Zhao. 2025. NovaQ: Improving Quantum Program Testing through Diversity-Guided Test Case Generation.arXiv Preprint arXiv:2509.04763(2025)

  51. [51]

    2019.Programming quantum computers: essential algorithms and code samples

    Eric R Johnston, Nic Harrigan, and Mercedes Gimeno-Segovia. 2019.Programming quantum computers: essential algorithms and code samples. O’Reilly Media

  52. [52]

    Subhash C Kak. 1995. Quantum neural computing.Advances in imaging and electron physics94 (1995), 259--313

  53. [53]

    Chan Gu Kang, Joonghoon Lee, and Hakjoo Oh. 2024. Statistical Testing of Quantum Programs via Fixed-Point Amplitude Amplification.Proc. ACM Program. Lang.8, OOPSLA2 (2024). doi:10.1145/3689716 , Vol. 1, No. 1, Article . Publication date: January 2025. 54 Yuechen Li, Minqi Shao, Jianjun Zhao, and Qichen Wang

  54. [54]

    Mykhailo Klymenko, Thong Hoang, Samuel A Wilkinson, Bahar Goldozian, Suyu Ma, Xiwei Xu, Qinghua Lu, Muhammad Usman, and Liming Zhu. 2025. Context-Aware Unit Testing for Quantum Subroutines.arXiv Preprint arXiv:2506.10348(2025)

  55. [55]

    Mykhailo V Klymenko, Thong Hoang, Hoa Nguyen, Samuel A Wilkinson, Bahar Goldozian, Xing Zhenchang, Qinghua Lu, Muhammad Usman, and Liming Zhu. 2025. QUT: A Unit Testing Framework for Quantum Subroutines.arXiv Preprint arXiv:2509.17538(2025)

  56. [56]

    Patricia Lago, Per Runeson, Qunying Song, and Roberto Verdecchia. 2024. Threats to validity in software engineering- -hypocritical paper section or essential analysis?. InProceedings of the 18th ACM/IEEE International Symposium on Empirical Software Engineering and Measurement. 314--324

  57. [57]

    Neilson Carlos Leite Ramalho, Higor Amario de Souza, and Marcos Lordello Chaim. 2025. Testing and debugging quantum programs: The road to 2030.ACM Transactions on Software Engineering and Methodology34, 5 (2025), 1--46

  58. [58]

    Ang Li, Samuel Stein, Sriram Krishnamoorthy, and James Ang. 2023. Qasmbench: A low-level quantum benchmark suite for nisq evaluation and simulation.ACM Transactions on Quantum Computing4, 2 (2023), 1--26

  59. [59]

    Gushu Li, Li Zhou, Nengkun Yu, Yufei Ding, Mingsheng Ying, and Yuan Xie. 2019. Proq: Projection-based runtime assertions for debugging on a quantum computer.arXiv Preprint arXiv:1911.12855(2019)

  60. [60]

    Gushu Li, Li Zhou, Nengkun Yu, Yufei Ding, Mingsheng Ying, and Yuan Xie. 2020. Projection-based runtime assertions for testing and debugging Quantum programs.Proc. ACM Program. Lang.4, OOPSLA (2020). doi:10.1145/3428218

  61. [61]

    Yuechen Li, Kai-Yuan Cai, and Beibei Yin. 2025. Preparation and Utilization of Mixed States for Testing Quantum Programs.ACM Trans. Softw. Eng. Methodol.34, 8 (2025). doi:10.1145/3736757

  62. [62]

    Yuechen Li, Hanyu Pei, Linzhi Huang, Beibei Yin, and Kai-Yuan Cai. 2024. Automatic repair of quantum programs via unitary operation.ACM Transactions on Software Engineering and Methodology33, 6 (2024), 1--43

  63. [63]

    2026.Artifact Repository for A Methodological Analysis of Empirical Studies in Quantum Software Testing

    Yuechen Li, Minqi Shao, Jianjun Zhao, and Qichen Wang. 2026.Artifact Repository for A Methodological Analysis of Empirical Studies in Quantum Software Testing. doi:10.5281/zenodo.18159892

  64. [64]

    Byrd, and Huiyang Zhou

    Ji Liu, Gregory T. Byrd, and Huiyang Zhou. 2020. Quantum Circuits for Dynamic Runtime Assertions in Quantum Computation. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems. 1017–1030. doi:10.1145/3373376.3378488

  65. [65]

    Jin-Guo Liu and Lei Wang. 2018. Differentiable learning of quantum circuit born machines.Physical Review A98, 6 (2018), 062324

  66. [66]

    Peixun Long and Jianjun Zhao. 2022. Testing quantum programs with multiple subroutines.arXiv Preprint arXiv:2208.09206(2022)

  67. [67]

    Peixun Long and Jianjun Zhao. 2024. Equivalence, identity, and unitarity checking in black-box testing of quantum programs.Journal of Systems and Software211 (2024). doi:10.1016/j.jss.2024.112000

  68. [68]

    Peixun Long and Jianjun Zhao. 2024. Testing Multi-Subroutine Quantum Programs: From Unit Testing to Integration Testing.ACM Transactions on Software Engineering and Methodology33, 6 (2024). doi:10.1145/3656339

  69. [69]

    Peixun Long and Jianjun Zhao. 2025. A Black-box Testing Framework for Oracle Quantum Programs.arXiv Preprint arXiv:2505.07243(2025)

  70. [70]

    Ana C Marcén, Antonio Iglesias, Raúl Lapeña, Francisca Pérez, and Carlos Cetina. 2024. A systematic literature review of model-driven engineering using machine learning.IEEE Transactions on Software Engineering(2024)

  71. [71]

    Quentin Mazouni, Helge Spieker, Arnaud Gotlieb, and Mathieu Acher. 2024. Policy Testing with MDPFuzz (Repli- cability Study). InProceedings of the 33rd ACM SIGSOFT International Symposium on Software Testing and Analysis. 1567--1578

  72. [72]

    Jarrod R McClean, Sergio Boixo, Vadim N Smelyanskiy, Ryan Babbush, and Hartmut Neven. 2018. Barren plateaus in quantum neural network training landscapes.Nature communications9, 1 (2018), 4812

  73. [73]

    Eñaut Mendiluze, Shaukat Ali, Paolo Arcaini, and Tao Yue. 2022. Muskit: a mutation analysis tool for quantum software testing. InProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering. 1266–1270. doi:10.1109/ASE51524.2021.9678563

  74. [74]

    Richard Meyes, Melanie Lu, Constantin Waubert De Puiseau, and Tobias Meisen. 2019. Ablation studies in artificial neural networks.arXiv preprint arXiv:1901.08644(2019)

  75. [75]

    Andriy Miranskyy. 2025. The Cost of Certainty: Shot Budgets in Quantum Program Testing.arXiv preprint arXiv:2510.22418(2025)

  76. [76]

    Andriy Miranskyy, José Campos, Anila Mjeda, Lei Zhang, and Ignacio García Rodríguez de Guzmán. 2025. On the Feasibility of Quantum Unit Testing.arXiv Preprint arXiv:2507.17235(2025)

  77. [77]

    Andriy Miranskyy and Lei Zhang. 2019. On testing quantum programs. In2019 IEEE/ACM 41st International Conference on Software Engineering: New Ideas and Emerging Results (ICSE-NIER). IEEE, 57--60

  78. [78]

    Michele Mosca. 2008. Quantum Algorithms. arXiv:0808.0369 [quant-ph] https://arxiv.org/abs/0808.0369

  79. [79]

    Asmar Muqeet, Shaukat Ali, and Paolo Arcaini. 2024. Quantum Program Testing Through Commuting Pauli Strings on IBM’s Quantum Computers. InProceedings of 2024 39th ACM/IEEE International Conference on Automated Software , Vol. 1, No. 1, Article . Publication date: January 2025. A Methodological Analysis of Empirical Studies in Quantum Software Testing 55 E...

  80. [80]

    Asmar Muqeet, Tao Yue, Shaukat Ali, and Paolo Arcaini. 2024. Mitigating Noise in Quantum Software Testing Using Machine Learning.IEEE Transactions on Software Engineering50, 11 (2024), 2947--2961. doi:10.1109/TSE.2024.3462974

Showing first 80 references.