pith. machine review for the scientific record. sign in

arxiv: 2605.13279 · v1 · submitted 2026-05-13 · 💻 cs.SE

Recognition: unknown

Robust Mutation Analysis of Quantum Programs Under Noise

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:19 UTC · model grok-4.3

classification 💻 cs.SE
keywords mutation analysisquantum programsquantum software testinghardware noisedistance metricsIBM devicesmutant detection
0
0 comments X

The pith

Noise alters behavioral distances between quantum programs and mutants, requiring noise-specific thresholds for detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines mutation analysis for quantum programs when run under realistic hardware noise rather than ideal conditions. Using 41 programs executed on noiseless and noisy simulators that emulate three IBM devices, the authors measure how noise changes the apparent differences between correct programs and their mutants. They compare several distance metrics and find that while density-matrix approaches discriminate best in simulation, output-distribution metrics remain usable in practice when thresholds are adjusted to each noise profile. The work concludes that quantum software testing must incorporate device-specific noise to avoid misclassifying faults.

Core claim

Our results show that noise significantly alters the behavioral distance between programs and mutants, making equivalent mutants harder to distinguish from real faults. Density-matrix metrics achieve the best discrimination, with misclassification rates up to 16.77%, but output-distribution metrics reach up to 73.03% accuracy and 74.89% F1-score. Noise-specific thresholds further improve detection compared to noiseless thresholds, and noise effects correlate more with algorithm and circuit characteristics than with mutation types.

What carries the argument

Behavioral distance metrics (density-matrix versus output-distribution) applied to circuit executions under emulated IBM-device noise profiles, together with noise-specific versus noiseless detection thresholds.

If this is right

  • Equivalent mutants become harder to separate from actual faults once noise is present.
  • Output-distribution metrics remain practical for hardware where density-matrix information is unavailable.
  • Thresholds derived from noiseless runs underperform compared with thresholds tuned to each device's noise profile.
  • Noise impact depends more on the underlying algorithm and circuit structure than on the type of mutation applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Quantum testing frameworks will need built-in support for device-calibrated noise models rather than generic assumptions.
  • Simulator results should be cross-checked against actual hardware runs to confirm the reported accuracy differences.
  • The same noise-aware adjustment principle could apply to other quantum program comparison tasks such as equivalence checking.

Load-bearing premise

The three emulated IBM-device noise profiles accurately represent real hardware behavior and the 41 programs plus mutation operators represent typical quantum software.

What would settle it

Run the identical mutation-analysis experiments on physical IBM quantum hardware and check whether the observed misclassification rates and accuracy figures match those obtained from the simulators.

Figures

Figures reproduced from arXiv: 2605.13279 by E\~naut Mendiluze Usandizaga, Mohammad Reza Mousavi, Paolo Arcaini, Shaukat Ali, Sophie Fortz.

Figure 1
Figure 1. Figure 1: Quantum circuit example consisting of four qubits (i.e., [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Experimental Platform Overview its main components interact and integrate into a coherent execution pipeline. The experimen￾tal workflow proceeds through five main steps (detailed in the remainder of this section), each associated with distinct computational and time requirements: A – Mutant Generation and Selection: We generate mutants from each CUT, and we select a subset of them to reduce the workload o… view at source ↗
Figure 3
Figure 3. Figure 3: Distances between the theoretical output of 41 benchmark programs and the output obtained by [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of the threshold distribution for the Hellinger distance metric. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: (𝑅𝑄1.1) Relationship between the original and mutated programs under various noise conditions, measured in terms of distance metrics. model (displayed as rows), we compare the distances obtained in the corresponding noisy simulator against those from the noiseless simulator. The x-axis represents the distance between the CUT and its mutants in the noiseless simulator, while the y-axis shows the same distan… view at source ↗
Figure 6
Figure 6. Figure 6: (𝑅𝑄1.1) Distance between all original and mutated programs, evaluated under various noise conditions. Sect. 2.4.1, Fidelity equals 1 for perfect similarity, so its boxplots must be interpreted with vertical symmetry (i.e., values closer to 1 indicate greater similarity). One clear observation is that under noise, the distribution of non-equivalent mutants (Fig. 6a) becomes more concentrated (i.e., exhibits… view at source ↗
Figure 7
Figure 7. Figure 7: (𝑅𝑄1.2) Comparison of distances between original and mutated programs under different noise conditions. Each sub-figure uses a different distance metric, with mutants grouped as equivalent or non￾equivalent. Using Trace distance (Fig. 7a), the noiseless simulator yields the expected outcome: most equiva￾lent mutants remain perfectly undetected (with a distance of 0), while non-equivalent mutants are ACM Tr… view at source ↗
Figure 8
Figure 8. Figure 8: (𝑅𝑄1.3) Confusion matrices comparing the detection of equivalent and non-equivalent mutants across all distance metrics. The Y-axis corresponds to the true nature of the mutant (i.e., the expected result), while the X-axis represents the results of our mutant detection. Each sub-figure evaluates the alignment between noisy and noiseless detections using a different noise model and/or threshold strategy. te… view at source ↗
Figure 9
Figure 9. Figure 9: (𝑅𝑄1.4) Comparison of distances between original and mutated programs under different noise conditions. Each sub-figure uses a different distance metric, with mutants grouped as equivalent or non￾equivalent. Horizontal lines indicate the mutant detection thresholds. Both accuracy and F1-score are consistent across the three noise models (Kyiv, Brisbane, and Sherbrooke). The only exception is for the expect… view at source ↗
read the original abstract

Mutation analysis has long been used in classical software testing and has recently been adopted for assessing the robustness of quantum software testing techniques. However, existing studies assume ideal, noiseless execution, overlooking the impact of quantum hardware noise. In this paper, we present an empirical study of noise-aware mutation analysis for quantum programs. We analyze how noise affects mutant detection using 41 quantum programs, executed on noiseless and noisy simulators emulating three IBM devices with different noise profiles. We compare several distance metrics and thresholding strategies to evaluate mutant detection under realistic noise. Our results show that noise significantly alters the behavioral distance between programs and mutants, making equivalent mutants harder to distinguish from real faults. Density-matrix metrics achieve the best discrimination, with misclassification rates up to 16.77%, but are not accessible on real hardware. Among practical alternatives, output-distribution metrics reach up to 73.03% accuracy and 74.89% F1-score. Noise-specific thresholds further improve detection compared to noiseless thresholds. We also find that noise effects correlate more with algorithm and circuit characteristics than with mutation types. Overall, our results highlight the need to adapt mutation analysis, and more generally quantum program comparison, to the noise profiles of target quantum devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents an empirical study of noise-aware mutation analysis for quantum programs. Using 41 quantum programs executed on noiseless and noisy simulators emulating three IBM devices, it compares distance metrics (density-matrix and output-distribution) and thresholding strategies. The central claims are that noise significantly alters behavioral distances (making equivalent mutants harder to distinguish), density-matrix metrics achieve the lowest misclassification (up to 16.77%), output-distribution metrics reach 73.03% accuracy and 74.89% F1-score, and noise-specific thresholds improve detection over noiseless ones, with noise effects correlating more with algorithm/circuit characteristics than mutation types.

Significance. If the results hold under realistic conditions, the work is significant for quantum software engineering: it provides concrete evidence that standard noiseless mutation analysis assumptions break under hardware noise and supplies practical metrics and thresholds that could inform testing tools for near-term devices. The use of multiple emulated noise profiles and explicit metric comparisons adds value for reproducibility in the field.

major comments (3)
  1. [Methods] Experimental setup (methods section describing simulators): The central claim that noise-specific thresholds improve detection rests entirely on Qiskit emulations of three IBM-device noise profiles (depolarizing and thermal relaxation channels). No discussion or sensitivity analysis addresses whether omitted effects such as crosstalk, spectator errors, or non-Markovian noise would alter the reported distance changes or threshold superiority, weakening the practical applicability of the noise-specific thresholds.
  2. [Results] Results section (accuracy/F1 numbers and threshold comparisons): The reported gains for noise-specific thresholds (e.g., output-distribution accuracy 73.03%, F1 74.89%) are presented without statistical significance tests, confidence intervals, or per-program variance across the 41 programs. This makes it impossible to assess whether the improvement over noiseless thresholds is robust or could be due to program selection.
  3. [Discussion] Discussion (correlation claim): The statement that noise effects correlate more with algorithm and circuit characteristics than mutation types is load-bearing for the broader interpretation but lacks the specific quantification method (e.g., correlation coefficients, regression details, or feature importance scores) used to reach this conclusion.
minor comments (2)
  1. [Abstract] Abstract: The phrase 'misclassification rates up to 16.77%' for density-matrix metrics should clarify whether this is the maximum across devices or an average, to avoid ambiguity.
  2. [Results] Table/figure captions: Ensure all tables reporting accuracy/F1 include the exact number of programs and runs per noise profile for clarity.

Circularity Check

0 steps flagged

No circularity: empirical simulation results derive directly from metric computations on program outputs

full rationale

The paper conducts an empirical study by executing 41 quantum programs and their mutants on noiseless and noisy simulators emulating IBM devices, then computes behavioral distances using density-matrix and output-distribution metrics, derives misclassification rates, accuracy, and F1-scores from those direct comparisons, and evaluates threshold strategies. No equations, parameters, or claims reduce by construction to fitted inputs or self-citations; all reported effects of noise on distances follow from the simulation runs themselves. The central findings are falsifiable against external hardware and do not rely on self-referential definitions.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claims rest on empirical thresholds for mutant detection and distance metrics selected for the study; no new physical entities or unproven mathematical axioms are introduced.

free parameters (1)
  • noise-specific detection thresholds
    Thresholds are tuned per noise profile to improve detection over noiseless baselines, functioning as data-dependent parameters.

pith-pipeline@v0.9.0 · 5528 in / 1206 out tokens · 86776 ms · 2026-05-14T18:19:50.548000+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

  1. [1]

    Konstantinos Adamopoulos, Mark Harman, and Robert M. Hierons. 2004. How to Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution. InGenetic and Evolutionary Computation - GECCO 2004, Genetic and Evolutionary Computation Conference, Seattle, W A, USA, June 26-30, 2004, Proceedings, Part II (Lecture Notes in Compu...

  2. [2]

    Shaukat Ali, Paolo Arcaini, Xinyi Wang, and Tao Yue. 2021. Assessing the Effectiveness of Input and Output Coverage Criteria for Testing Quantum Programs. In14th IEEE Conference on Software Testing, Verification and Validation, ICST 2021, Porto de Galinhas, Brazil, April 12-16, 2021. IEEE, Porto de Galinhas, Brazil, 13–23. doi:10.1109/ICST49551.2021. 00014

  3. [3]

    R Alicki. 2004. Decoherence and the appearance of a classical world in quantum theory.Journal of Physics A: Mathematical and General37, 5 (2004), 1948–1949

  4. [4]

    Andrews, Lionel C

    James H. Andrews, Lionel C. Briand, Yvan Labiche, and Akbar Siami Namin. 2006. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria.IEEE Trans. Software Eng.32, 8 (2006), 608–624. doi:10.1109/TSE.2006.83

  5. [5]

    Andrea Arcuri and Lionel C. Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering.Softw. Test. Verification Reliab.24, 3 (2014), 219–250. doi:10.1002/STVR.1486

  6. [6]

    Pablo Arnault, Pablo Arrighi, Steven Herbert, Evi Kasnetsi, and Tianyi Li. 2024. A typology of quantum algorithms. CoRRabs/2407.05178 (2024). arXiv:2407.05178 doi:10.48550/ARXIV.2407.05178

  7. [7]

    Fowler, Matteo Mariantoni, John M

    Jeff P. Barnes, Colin J. Trout, Dennis Lucarelli, and B. D. Clader. 2017. Quantum error-correction failure distributions: Comparison of coherent and stochastic error models.Phys. Rev. A95 (Jun 2017), 062338. Issue 6. doi:10.1103/PhysRevA. 95.062338

  8. [8]

    Khadeejah Bepari, Sarah Malik, Michael Spannowsky, and Simon Williams. 2021. Towards a quantum computing algorithm for helicity amplitudes and parton showers.Phys. Rev. D103 (Apr 2021), 076020. Issue 7. doi:10.1103/ PhysRevD.103.076020

  9. [9]

    Teresa Brecht, Wolfgang Pfaff, Chen Wang, Yiwen Chu, Luigi Frunzio, Michel H Devoret, and Robert J Schoelkopf

  10. [10]

    Multilayer microwave integrated quantum circuits for scalable quantum computing.npj Quantum Information2, ACM Trans. Softw. Eng. Methodol., Vol. X, No. X, Article X. Publication date: February 2026. X:46 Sophie Fortz, Eñaut Mendiluze Usandizaga, Shaukat Ali, Paolo Arcaini, and Mohammad Reza Mousavi 1 (Feb. 2016), 16002. doi:10.1038/npjqi.2016.2

  11. [11]

    Pascal Cerfontaine, René Otten, and Hendrik Bluhm. 2020. Self-Consistent Calibration of Quantum-Gate Sets.Phys. Rev. Appl.13 (Apr 2020), 044071. Issue 4. doi:10.1103/PhysRevApplied.13.044071

  12. [12]

    Kausthubh Chandramouli, Kelly Mae Allen, Christopher Mori, Dror Baron, and Mário A. T. Figueiredo. 2025. Statistical Signal Processing for Quantum Error Mitigation.CoRRabs/2506.00683 (2025). arXiv:2506.00683 doi:10.48550/ARXIV. 2506.00683

  13. [13]

    Isaac L Chuang, Raymond Laflamme, Peter W Shor, and Wojciech H Zurek. 1995. Quantum computers, factoring, and decoherence.Science270, 5242 (1995), 1633–1635

  14. [14]

    Cleophas and Aeilko H

    Ton J. Cleophas and Aeilko H. Zwinderman. 2018.Bayesian Pearson Correlation Analysis. Springer International Publishing, Cham, 111–118. doi:10.1007/978-3-319-92747-3_11

  15. [15]

    Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions.Psychological bulletin114, 3 (1993), 494. doi:10.1037/0033-2909.114.3.494

  16. [16]

    William G Cochran. 1952. The𝜒2 test of goodness of fit.The Annals of mathematical statistics(1952), 315–345

  17. [17]

    2013.Statistical power analysis for the behavioral sciences

    Jacob Cohen. 2013.Statistical power analysis for the behavioral sciences. Routledge, New York, NY, USA

  18. [18]

    James D. Cresser. 2011. Quantum Physics Notes

  19. [19]

    Andrew W Cross, Lev S Bishop, John A Smolin, and Jay M Gambetta. 2017. Open quantum assembly language.arXiv preprint arXiv:1707.03429(2017). doi:10.48550/arXiv.1707.03429

  20. [20]

    Samudra Dasgupta and Travis S. Humble. 2022. Characterizing the Reproducibility of Noisy Quantum Circuits.Entropy 24, 2 (2022), 244. doi:10.3390/E24020244

  21. [21]

    Michael A Fligner and Timothy J Killeen. 1976. Distribution-Free Two-Sample Tests for Scale.J. Amer. Statist. Assoc. 71, 353 (1976), 210–213. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1976.10481517 doi:10.1080/ 01621459.1976.10481517

  22. [22]

    Daniel Fortunato, José Campos, and Rui Abreu. 2022. Mutation Testing of Quantum Programs: A Case Study With Qiskit.IEEE Transactions on Quantum Engineering3 (2022), 1–17. doi:10.1109/TQE.2022.3195061

  23. [23]

    Daniel Fortunato, José Campos, and Rui Abreu. 2022. Mutation Testing of Quantum Programs Written in QISKit. In44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2022, Pittsburgh, PA, USA, May 22-24, 2022. ACM/IEEE, Pittsburgh, PA, USA, 358–359. doi:10.1145/3510454.3528649

  24. [24]

    Daniel Fortunato, José Campos, and Rui Abreu. 2022. QMutPy: a mutation testing tool for Quantum algorithms and applications in Qiskit. InISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022, Sukyoung Ryu and Yannis Smaragdakis (Eds.). ACM, Virtual Event, South Korea, 797–800. ...

  25. [25]

    Robust Mutation Analysis of Quantum Programs Under Noise

    Sophie Fortz and Eñaut Mendiluze. 2026. Supplementary material for the paper “Robust Mutation Analysis of Quantum Programs Under Noise”. https://github.com/sfortz/Noise-aware_Mutants

  26. [26]

    Gordon Fraser and Andreas Zeller. 2012. Mutation-Driven Generation of Unit Tests and Oracles.IEEE Trans. Software Eng.38, 2 (2012), 278–292. doi:10.1109/TSE.2011.93

  27. [27]

    Langford, and Michael A

    Alexei Gilchrist, Nathan K. Langford, and Michael A. Nielsen. 2005. Distance measures to compare real and ideal quantum processes.Phys. Rev. A71 (Jun 2005), 062310. Issue 6. doi:10.1103/PhysRevA.71.062310

  28. [28]

    Stanton A. Glantz. 2012.Primer of Biostatistics(7th edition ed.). McGraw Hill, New York. https://www.accessscience. com/content/book/9780071781503

  29. [29]

    Frattini, Shruti Puri, Shantanu O

    Alexander Grimm, Nicholas E. Frattini, Shruti Puri, Shantanu O. Mundhada, Steven Touzard, Mazyar Mirrahimi, Steven M. Girvin, Shyam Shankar, and Michel H. Devoret. 2020. Stabilization and operation of a Kerr-cat qubit.Nature 584, 7820 (Aug. 2020), 205–209. doi:10.1038/s41586-020-2587-z

  30. [30]

    Bernhard J. M. Grün, David Schuler, and Andreas Zeller. 2009. The Impact of Equivalent Mutants. InSecond International Conference on Software Testing Verification and Validation, ICST 2009, Denver, Colorado, USA, April 1-4, 2009, Workshops Proceedings. IEEE Computer Society, Denver, Colorado, USA, 192–199. doi:10.1109/ICSTW.2009.37

  31. [31]

    Lisa Hales and Sean Hallgren. 2000. An Improved Quantum Fourier Transform Algorithm and Applications. In41st Annual Symposium on Foundations of Computer Science, FOCS 2000, Redondo Beach, California, USA, November 12-14,

  32. [32]

    doi:10.1109/SFCS.2000.892139

    IEEE Computer Society, Redondo Beach, California, USA, 515–525. doi:10.1109/SFCS.2000.892139

  33. [33]

    Flammia, and Joel J

    Robin Harper, Steven T. Flammia, and Joel J. Wallman. 2020. Efficient learning of quantum noise.Nature Physics16, 12 (Dec. 2020), 1184–1188. doi:10.1038/s41567-020-0992-8

  34. [34]

    Shahin Honarvar, Mohammad Reza Mousavi, and Rajagopal Nagarajan. 2020. Property-based Testing of Quantum Programs in Q#. InICSE ’20: 42nd International Conference on Software Engineering, Workshops, Seoul, Republic of Korea, 27 June - 19 July, 2020. ACM, Seoul, Republic of Korea, 430–435. doi:10.1145/3387940.3391459

  35. [35]

    Yipeng Huang and Margaret Martonosi. 2019. Statistical assertions for validating patterns and finding bugs in quantum programs. InProceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019, Srilatha Bobbie Manne, Hillery C. Hunter, and Erik R. Altman (Eds.). ACM, Phoenix, AZ, USA, 541–553. doi:1...

  36. [36]

    Schuster, and Jens Koch

    Ziwen Huang, Yao Lu, Eliot Kapit, David I. Schuster, and Jens Koch. 2018. Universal stabilization of single-qubit states using a tunable coupler.Phys. Rev. A97 (Jun 2018), 062345. Issue 6. doi:10.1103/PhysRevA.97.062345

  37. [37]

    Shih-Han Hung, Kesha Hietala, Shaopeng Zhu, Mingsheng Ying, Michael Hicks, and Xiaodi Wu. 2019. Quantitative robustness analysis of quantum programs.Proc. ACM Program. Lang.3, POPL (2019), 31:1–31:29. doi:10.1145/3290344

  38. [38]

    Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing.IEEE Trans. Software Eng.37, 5 (2011), 649–678. doi:10.1109/TSE.2010.62

  39. [39]

    Shelby Kimmel, Guang Hao Low, and Theodore J. Yoder. 2015. Robust calibration of a universal single-qubit gate set via robust phase estimation.Phys. Rev. A92 (Dec 2015), 062315. Issue 6. doi:10.1103/PhysRevA.92.062315

  40. [40]

    Jonas Klamroth, Max Scheerer, and Oliver Denninger. 2025. Detecting and Tolerating Faults in Hybrid Quantum Software Systems Using Architectural Redundancy. InIEEE International Conference on Quantum Software, QSW 2025, Helsinki, Finland, July 7-12, 2025, Rong N. Chang, Carl K. Chang, Jingwei Yang, Nimanthi Atukorala, Dan Chen, Sumi Helal, Sasu Tarkoma, Q...

  41. [41]

    Junyu Liu, Frederik Wilde, Antonio Anna Mele, Liang Jiang, and Jens Eisert. 2022. Noise can be helpful for variational quantum algorithms.CoRRabs/2210.06723 (2022). arXiv:2210.06723 doi:10.48550/ARXIV.2210.06723

  42. [42]

    Chakram, N

    Yao Lu, S. Chakram, N. Leung, N. Earnest, R. K. Naik, Ziwen Huang, Peter Groszkowski, Eliot Kapit, Jens Koch, and David I. Schuster. 2017. Universal Stabilization of a Parametrically Coupled Qubit.Phys. Rev. Lett.119 (Oct 2017), 150502. Issue 15. doi:10.1103/PhysRevLett.119.150502

  43. [43]

    Lech Madeyski, Wojciech Orzeszyna, Richard Torkar, and Mariusz Jozala. 2014. Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation.IEEE Trans. Software Eng.40, 1 (2014), 23–42. doi:10.1109/TSE.2013.44

  44. [44]

    A. P. Majtey, P. W. Lamberti, and D. P. Prato. 2005. Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states.Phys. Rev. A72 (Nov 2005), 052310. Issue 5. doi:10.1103/PhysRevA.72.052310

  45. [45]

    Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other.The annals of mathematical statistics(1947), 50–60

  46. [46]

    Kane Meissel and Esther S Yao. 2024. Using Cliff’s delta as a non-parametric effect size measure: an accessible web app and R tutorial.Practical Assessment, Research, and Evaluation29, 1 (2024). doi:10.7275/pare.1977

  47. [47]

    Eñaut Mendiluze, Shaukat Ali, Paolo Arcaini, and Tao Yue. 2021. Muskit: A Mutation Analysis Tool for Quantum Software Testing. In36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, Melbourne, Australia, 1266–1270. doi:10.1109/ASE51524.2021.9678563

  48. [48]

    Asmar Muqeet, Shaukat Ali, and Paolo Arcaini. 2024. Approximating Stochastic Quantum Noise Through Genetic Programming. InSearch-Based Software Engineering - 16th International Symposium, SSBSE 2024, Porto de Galinhas, Brazil, July 15, 2024, Proceedings (Lecture Notes in Computer Science, Vol. 14767), Gunel Jahangirova and Foutse Khomh (Eds.). Springer, P...

  49. [49]

    Asmar Muqeet, Shaukat Ali, and Paolo Arcaini. 2024. Quantum Program Testing Through Commuting Pauli Strings on IBM’s Quantum Computers. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024, Vladimir Filkov, Baishakhi Ray, and Minghui Zhou (Eds.). ACM, Sa...

  50. [50]

    Asmar Muqeet, Shaukat Ali, and Paolo Arcaini. 2025. Tool: QUIET: A Tool for Sampling-Based Quantum Noise Error Mitigation.IEEE Softw.42, 6 (2025), 28–34. doi:10.1109/MS.2025.3532106

  51. [51]

    Asmar Muqeet, Shaukat Ali, Tao Yue, and Paolo Arcaini. 2024. A Machine Learning-Based Error Mitigation Approach for Reliable Software Development on IBM’s Quantum Computers. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas, Brazil, July 15-19, 2024, Marcelo d’Amorim (E...

  52. [52]

    Asmar Muqeet, Tao Yue, Shaukat Ali, and Paolo Arcaini. 2024. Mitigating Noise in Quantum Software Testing Using Machine Learning.IEEE Trans. Software Eng.50, 11 (2024), 2947–2961. doi:10.1109/TSE.2024.3462974

  53. [53]

    Nielsen and Isaac L

    Michael A. Nielsen and Isaac L. Chuang. 2016.Quantum Computation and Quantum Information (10th Anniver- sary edition). Cambridge University Press, Cambridge, UK. https://www.cambridge.org/de/academic/subjects/ physics/quantum-physics-quantum-information-and-quantum-computation/quantum-computation-and-quantum- information-10th-anniversary-edition?format=HB

  54. [54]

    2025.Testing and analysis of quantum software

    Matteo Paltenghi. 2025.Testing and analysis of quantum software. Ph. D. Dissertation. University of Stuttgart, Germany. https://nbn-resolving.org/urn:nbn:de:bsz:93-opus-ds-167500

  55. [55]

    Mike Papadakis, Márcio Eduardo Delamaro, and Yves Le Traon. 2014. Mitigating the effects of equivalent mutants with mutant classification strategies.Sci. Comput. Program.95 (2014), 298–319. doi:10.1016/J.SCICO.2014.05.012

  56. [56]

    Mike Papadakis, Yue Jia, Mark Harman, and Yves Le Traon. 2015. Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique. In37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1, Antonia Bertolino, Gerardo ACM Trans. Soft...

  57. [57]

    Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey.Adv. Comput.112 (2019), 275–378. doi:10.1016/BS.ADCOM.2018.03.015

  58. [58]

    Mike Papadakis, Donghwan Shin, Shin Yoo, and Doo-Hwan Bae. 2018. Are mutation scores correlated with real fault detection?: a large scale empirical study on the relationship between mutants and real faults. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Iv...

  59. [59]

    Tirthak Patel and Devesh Tiwari. 2021. Qraft: reverse your Quantum circuit and know the correct program output. In ASPLOS ’21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual Event, USA, April 19-23, 2021, Tim Sherwood, Emery D. Berger, and Christos Kozyrakis (Eds.). ACM, Virtual Event, U...

  60. [60]

    Kai Petersen and Çigdem Gencel. 2013. Worldviews, Research Methods, and their Relationship to Validity in Empirical Software Engineering Research. In2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, Ankara, Turkey, October 23-26, 2013. IEEE Com...

  61. [61]

    Gabriel Pontolillo, Mohammad Reza Mousavi, and Marek Grzesiuk. 2025. QuCheck: A Property-based Testing Framework for Quantum Programs in Qiskit.CoRRabs/2503.22641 (2025). arXiv:2503.22641 doi:10.48550/ARXIV.2503. 22641

  62. [62]

    Gabriel Pontolillo, Asmar Muqeet, Shaukat Ali, and Mohammad Reza Mousavi. 2025. From Ideal to Noisy: Adapting Property-Based Testing for Real-World Noisy Quantum Computers. In2025 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 01. 405–416. doi:10.1109/QCE65121.2025.00053

  63. [63]

    Gabriel Joseph Pontolillo and Mohammad Reza Mousavi. 2024. Delta Debugging for Property-Based Regression Testing of Quantum Programs. InProceedings of the 5th ACM/IEEE International Workshop on Quantum Software Engineering, Q-SE 2024, Lisbon, Portugal, 16 April 2024. ACM, Lisbon, Portugal, 1–8. doi:10.1145/3643667.3648219

  64. [64]

    J. S. Pratt and J. H. Eberly. 2001. Qubit cross talk and entanglement decay.Phys. Rev. B64 (Oct 2001), 195314. Issue 19. doi:10.1103/PhysRevB.64.195314

  65. [65]

    Nils Quetschlich, Lukas Burgholzer, and Robert Wille. 2023. MQT Bench: Benchmarking Software and Design Automation Tools for Quantum Computing.Quantum7 (2023), 1062. doi:10.22331/Q-2023-07-20-1062

  66. [66]

    Timothy C. Ralph. 2012. Howard Wiseman and Gerard Milburn: Quantum measurement and control.Quantum Inf. Process.11, 1 (2012), 313–315. doi:10.1007/S11128-011-0277-3

  67. [67]

    Karpuzcu

    Salonik Resch and Ulya R. Karpuzcu. 2022. Benchmarking Quantum Computers and the Impact of Quantum Noise. ACM Comput. Surv.54, 7 (2022), 142:1–142:35. doi:10.1145/3464420

  68. [68]

    Diego Riste, Stefano Poletto, M-Z Huang, Alessandro Bruno, Visa Vesterinen, O-P Saira, and Leonardo DiCarlo. 2015. Detecting bit-flip errors in a logical qubit using stabilizer measurements.Nature Communications6, 1 (April 2015),

  69. [69]

    doi:10.1038/ncomms7983

  70. [70]

    Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empir. Softw. Eng.14, 2 (2009), 131–164. doi:10.1007/S10664-008-9102-8

  71. [71]

    doi:10.22331/q-2020-09-11-321 , url =

    Mohan Sarovar, Timothy Proctor, Kenneth Rudinger, Kevin C. Young, Erik Nielsen, and Robin Blume-Kohout. 2020. Detecting crosstalk errors in quantum information processors.Quantum4 (2020), 321. doi:10.22331/Q-2020-09-11-321

  72. [72]

    Chong, and Ronghui Gu

    Runzhou Tao, Yunong Shi, Jianan Yao, John Hui, Frederic T. Chong, and Ronghui Gu. 2021. Gleipnir: toward practical error analysis for Quantum programs. InPLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, Virtual Event...

  73. [73]

    Ewa Tomczak and Maciej Tomczak. 2014. The need to report effect size estimates revisited. An overview of some recommended measures of effect size.TRENDS in Sport Sciences21, 1 (2014)

  74. [74]

    Eñaut Mendiluze Usandizaga, Shaukat Ali, Tao Yue, and Paolo Arcaini. 2025. Quantum circuit mutants: Empirical analysis and recommendations.Empir. Softw. Eng.30, 3 (2025), 100. doi:10.1007/S10664-025-10643-Z

  75. [75]

    Xinyi Wang, Paolo Arcaini, and Tao Yue and< Shaukat Ali. 2021. Quito: a Coverage-Guided Test Generator for Quantum Programs. In36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, Melbourne, Australia, 1237–1241. doi:10.1109/ASE51524.2021.9678798

  76. [76]

    Xinyi Wang, Paolo Arcaini, Tao Yue, and Shaukat Ali. 2021. Generating Failing Test Suites for Quantum Programs With Search. InSearch-Based Software Engineering - 13th International Symposium, SSBSE 2021, Bari, Italy, October 11-12, 2021, Proceedings (Lecture Notes in Computer Science, Vol. 12914), Una-May O’Reilly and Xavier Devroey (Eds.). Springer, Bari...

  77. [77]

    Dominic Widdows, Jyoti Rani, and Emmanuel M. Pothos. 2023. Quantum Circuit Components for Cognitive Decision- Making.Entropy25, 4 (2023), 548. doi:10.3390/E25040548 ACM Trans. Softw. Eng. Methodol., Vol. X, No. X, Article X. Publication date: February 2026. Robust Mutation Analysis of Quantum Programs Under Noise X:49

  78. [78]

    Ellis Wilson, Frank Mueller, Lindsay Bassman, and Costin Iancu. 2021. Empirical evaluation of circuit approximations on noisy quantum devices. InInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). AC...

  79. [79]

    Egger, Stefan Filipp, Frank K

    Nicolas Wittler, Federico Roy, Kevin Pack, Max Werninghaus, Anurag Saha Roy, Daniel J. Egger, Stefan Filipp, Frank K. Wilhelm, and Shai Machnes. 2021. Integrated Tool Set for Control, Calibration, and Characterization of Quantum Devices Applied to Superconducting Qubits.Phys. Rev. Appl.15 (Mar 2021), 034080. Issue 3. doi:10.1103/PhysRevApplied. 15.034080

  80. [80]

    Wootters and Wojciech H

    William K. Wootters and Wojciech H. Zurek. 1982. A single quantum cannot be cloned.Nature299, 5886 (01 10 1982), 802–803. doi:10.1038/299802a0

Showing first 80 references.