arxiv: 2605.13279 · v1 · submitted 2026-05-13 · 💻 cs.SE

Recognition: unknown

Robust Mutation Analysis of Quantum Programs Under Noise

Sophie Fortz , E\~naut Mendiluze Usandizaga , Shaukat Ali , Paolo Arcaini , Mohammad Reza Mousavi

Authors on Pith no claims yet

Pith reviewed 2026-05-14 18:19 UTC · model grok-4.3

classification 💻 cs.SE

keywords mutation analysisquantum programsquantum software testinghardware noisedistance metricsIBM devicesmutant detection

0 comments

The pith

Noise alters behavioral distances between quantum programs and mutants, requiring noise-specific thresholds for detection.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines mutation analysis for quantum programs when run under realistic hardware noise rather than ideal conditions. Using 41 programs executed on noiseless and noisy simulators that emulate three IBM devices, the authors measure how noise changes the apparent differences between correct programs and their mutants. They compare several distance metrics and find that while density-matrix approaches discriminate best in simulation, output-distribution metrics remain usable in practice when thresholds are adjusted to each noise profile. The work concludes that quantum software testing must incorporate device-specific noise to avoid misclassifying faults.

Core claim

Our results show that noise significantly alters the behavioral distance between programs and mutants, making equivalent mutants harder to distinguish from real faults. Density-matrix metrics achieve the best discrimination, with misclassification rates up to 16.77%, but output-distribution metrics reach up to 73.03% accuracy and 74.89% F1-score. Noise-specific thresholds further improve detection compared to noiseless thresholds, and noise effects correlate more with algorithm and circuit characteristics than with mutation types.

What carries the argument

Behavioral distance metrics (density-matrix versus output-distribution) applied to circuit executions under emulated IBM-device noise profiles, together with noise-specific versus noiseless detection thresholds.

If this is right

Equivalent mutants become harder to separate from actual faults once noise is present.
Output-distribution metrics remain practical for hardware where density-matrix information is unavailable.
Thresholds derived from noiseless runs underperform compared with thresholds tuned to each device's noise profile.
Noise impact depends more on the underlying algorithm and circuit structure than on the type of mutation applied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Quantum testing frameworks will need built-in support for device-calibrated noise models rather than generic assumptions.
Simulator results should be cross-checked against actual hardware runs to confirm the reported accuracy differences.
The same noise-aware adjustment principle could apply to other quantum program comparison tasks such as equivalence checking.

Load-bearing premise

The three emulated IBM-device noise profiles accurately represent real hardware behavior and the 41 programs plus mutation operators represent typical quantum software.

What would settle it

Run the identical mutation-analysis experiments on physical IBM quantum hardware and check whether the observed misclassification rates and accuracy figures match those obtained from the simulators.

Figures

Figures reproduced from arXiv: 2605.13279 by E\~naut Mendiluze Usandizaga, Mohammad Reza Mousavi, Paolo Arcaini, Shaukat Ali, Sophie Fortz.

**Figure 2.** Figure 2: Experimental Platform Overview its main components interact and integrate into a coherent execution pipeline. The experimental workflow proceeds through five main steps (detailed in the remainder of this section), each associated with distinct computational and time requirements: A – Mutant Generation and Selection: We generate mutants from each CUT, and we select a subset of them to reduce the workload o… view at source ↗

**Figure 3.** Figure 3: Distances between the theoretical output of 41 benchmark programs and the output obtained by [PITH_FULL_IMAGE:figures/full_fig_p014_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of the threshold distribution for the Hellinger distance metric. [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

**Figure 5.** Figure 5: (𝑅𝑄1.1) Relationship between the original and mutated programs under various noise conditions, measured in terms of distance metrics. model (displayed as rows), we compare the distances obtained in the corresponding noisy simulator against those from the noiseless simulator. The x-axis represents the distance between the CUT and its mutants in the noiseless simulator, while the y-axis shows the same distan… view at source ↗

**Figure 6.** Figure 6: (𝑅𝑄1.1) Distance between all original and mutated programs, evaluated under various noise conditions. Sect. 2.4.1, Fidelity equals 1 for perfect similarity, so its boxplots must be interpreted with vertical symmetry (i.e., values closer to 1 indicate greater similarity). One clear observation is that under noise, the distribution of non-equivalent mutants (Fig. 6a) becomes more concentrated (i.e., exhibits… view at source ↗

**Figure 7.** Figure 7: (𝑅𝑄1.2) Comparison of distances between original and mutated programs under different noise conditions. Each sub-figure uses a different distance metric, with mutants grouped as equivalent or nonequivalent. Using Trace distance (Fig. 7a), the noiseless simulator yields the expected outcome: most equivalent mutants remain perfectly undetected (with a distance of 0), while non-equivalent mutants are ACM Tr… view at source ↗

**Figure 8.** Figure 8: (𝑅𝑄1.3) Confusion matrices comparing the detection of equivalent and non-equivalent mutants across all distance metrics. The Y-axis corresponds to the true nature of the mutant (i.e., the expected result), while the X-axis represents the results of our mutant detection. Each sub-figure evaluates the alignment between noisy and noiseless detections using a different noise model and/or threshold strategy. te… view at source ↗

**Figure 9.** Figure 9: (𝑅𝑄1.4) Comparison of distances between original and mutated programs under different noise conditions. Each sub-figure uses a different distance metric, with mutants grouped as equivalent or nonequivalent. Horizontal lines indicate the mutant detection thresholds. Both accuracy and F1-score are consistent across the three noise models (Kyiv, Brisbane, and Sherbrooke). The only exception is for the expect… view at source ↗

read the original abstract

Mutation analysis has long been used in classical software testing and has recently been adopted for assessing the robustness of quantum software testing techniques. However, existing studies assume ideal, noiseless execution, overlooking the impact of quantum hardware noise. In this paper, we present an empirical study of noise-aware mutation analysis for quantum programs. We analyze how noise affects mutant detection using 41 quantum programs, executed on noiseless and noisy simulators emulating three IBM devices with different noise profiles. We compare several distance metrics and thresholding strategies to evaluate mutant detection under realistic noise. Our results show that noise significantly alters the behavioral distance between programs and mutants, making equivalent mutants harder to distinguish from real faults. Density-matrix metrics achieve the best discrimination, with misclassification rates up to 16.77%, but are not accessible on real hardware. Among practical alternatives, output-distribution metrics reach up to 73.03% accuracy and 74.89% F1-score. Noise-specific thresholds further improve detection compared to noiseless thresholds. We also find that noise effects correlate more with algorithm and circuit characteristics than with mutation types. Overall, our results highlight the need to adapt mutation analysis, and more generally quantum program comparison, to the noise profiles of target quantum devices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives the first data on how real-device noise changes mutant detection in quantum programs, but the simulator results need hardware checks before the thresholds can be trusted.

read the letter

The key point is that noise on quantum hardware shifts the distances between programs and mutants enough to break noiseless detection thresholds, and the authors show that device-specific thresholds improve accuracy on their emulated setups. They ran 41 programs through noiseless and noisy Qiskit simulators matched to three IBM devices, compared density-matrix and output-distribution metrics, and reported that output metrics reach 73% accuracy and 75% F1 under noise while noise-tuned thresholds beat the noiseless ones. Noise impact tracks algorithm and circuit traits more than mutation type. Density-matrix metrics discriminate best but cannot be used on actual hardware. This is new work; earlier quantum mutation studies stayed in the ideal case. The concrete numbers and the comparison across metrics are the useful parts. The approach is straightforward and the claims follow directly from the simulation runs without obvious circularity. The main soft spot is that all results rest on emulated noise profiles. Those profiles may miss crosstalk, spectator errors, or non-Markovian effects that would alter the distances and the relative performance of the thresholds on real machines. The set of 41 programs is also narrow, so it is unclear how far the findings generalize. The abstract gives no sign of statistical tests or error bars, which would be needed to judge whether the reported improvements are reliable. This work is aimed at researchers doing empirical testing of quantum software on NISQ hardware. Someone already working on quantum mutation analysis or device-aware testing would find the metric comparisons and the call to adapt thresholds useful. I would send it for peer review. The empirical gap it fills is real, the methods are reproducible in principle, and the central observation about noise changing distances holds up from the numbers given even if the simulator fidelity remains an open question.

Referee Report

3 major / 2 minor

Summary. The paper presents an empirical study of noise-aware mutation analysis for quantum programs. Using 41 quantum programs executed on noiseless and noisy simulators emulating three IBM devices, it compares distance metrics (density-matrix and output-distribution) and thresholding strategies. The central claims are that noise significantly alters behavioral distances (making equivalent mutants harder to distinguish), density-matrix metrics achieve the lowest misclassification (up to 16.77%), output-distribution metrics reach 73.03% accuracy and 74.89% F1-score, and noise-specific thresholds improve detection over noiseless ones, with noise effects correlating more with algorithm/circuit characteristics than mutation types.

Significance. If the results hold under realistic conditions, the work is significant for quantum software engineering: it provides concrete evidence that standard noiseless mutation analysis assumptions break under hardware noise and supplies practical metrics and thresholds that could inform testing tools for near-term devices. The use of multiple emulated noise profiles and explicit metric comparisons adds value for reproducibility in the field.

major comments (3)

[Methods] Experimental setup (methods section describing simulators): The central claim that noise-specific thresholds improve detection rests entirely on Qiskit emulations of three IBM-device noise profiles (depolarizing and thermal relaxation channels). No discussion or sensitivity analysis addresses whether omitted effects such as crosstalk, spectator errors, or non-Markovian noise would alter the reported distance changes or threshold superiority, weakening the practical applicability of the noise-specific thresholds.
[Results] Results section (accuracy/F1 numbers and threshold comparisons): The reported gains for noise-specific thresholds (e.g., output-distribution accuracy 73.03%, F1 74.89%) are presented without statistical significance tests, confidence intervals, or per-program variance across the 41 programs. This makes it impossible to assess whether the improvement over noiseless thresholds is robust or could be due to program selection.
[Discussion] Discussion (correlation claim): The statement that noise effects correlate more with algorithm and circuit characteristics than mutation types is load-bearing for the broader interpretation but lacks the specific quantification method (e.g., correlation coefficients, regression details, or feature importance scores) used to reach this conclusion.

minor comments (2)

[Abstract] Abstract: The phrase 'misclassification rates up to 16.77%' for density-matrix metrics should clarify whether this is the maximum across devices or an average, to avoid ambiguity.
[Results] Table/figure captions: Ensure all tables reporting accuracy/F1 include the exact number of programs and runs per noise profile for clarity.

Circularity Check

0 steps flagged

No circularity: empirical simulation results derive directly from metric computations on program outputs

full rationale

The paper conducts an empirical study by executing 41 quantum programs and their mutants on noiseless and noisy simulators emulating IBM devices, then computes behavioral distances using density-matrix and output-distribution metrics, derives misclassification rates, accuracy, and F1-scores from those direct comparisons, and evaluates threshold strategies. No equations, parameters, or claims reduce by construction to fitted inputs or self-citations; all reported effects of noise on distances follow from the simulation runs themselves. The central findings are falsifiable against external hardware and do not rely on self-referential definitions.

Axiom & Free-Parameter Ledger

1 free parameters · 0 axioms · 0 invented entities

The central claims rest on empirical thresholds for mutant detection and distance metrics selected for the study; no new physical entities or unproven mathematical axioms are introduced.

free parameters (1)

noise-specific detection thresholds
Thresholds are tuned per noise profile to improve detection over noiseless baselines, functioning as data-dependent parameters.

pith-pipeline@v0.9.0 · 5528 in / 1206 out tokens · 86776 ms · 2026-05-14T18:19:50.548000+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

84 extracted references · 84 canonical work pages · 2 internal anchors

[1]

Konstantinos Adamopoulos, Mark Harman, and Robert M. Hierons. 2004. How to Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution. InGenetic and Evolutionary Computation - GECCO 2004, Genetic and Evolutionary Computation Conference, Seattle, W A, USA, June 26-30, 2004, Proceedings, Part II (Lecture Notes in Compu...

work page doi:10.1007/978-3-540-24855-2_155 2004
[2]

Shaukat Ali, Paolo Arcaini, Xinyi Wang, and Tao Yue. 2021. Assessing the Effectiveness of Input and Output Coverage Criteria for Testing Quantum Programs. In14th IEEE Conference on Software Testing, Verification and Validation, ICST 2021, Porto de Galinhas, Brazil, April 12-16, 2021. IEEE, Porto de Galinhas, Brazil, 13–23. doi:10.1109/ICST49551.2021. 00014

work page doi:10.1109/icst49551.2021 2021
[3]

R Alicki. 2004. Decoherence and the appearance of a classical world in quantum theory.Journal of Physics A: Mathematical and General37, 5 (2004), 1948–1949

work page 2004
[4]

Andrews, Lionel C

James H. Andrews, Lionel C. Briand, Yvan Labiche, and Akbar Siami Namin. 2006. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria.IEEE Trans. Software Eng.32, 8 (2006), 608–624. doi:10.1109/TSE.2006.83

work page doi:10.1109/tse.2006.83 2006
[5]

Andrea Arcuri and Lionel C. Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering.Softw. Test. Verification Reliab.24, 3 (2014), 219–250. doi:10.1002/STVR.1486

work page doi:10.1002/stvr.1486 2014
[6]

Pablo Arnault, Pablo Arrighi, Steven Herbert, Evi Kasnetsi, and Tianyi Li. 2024. A typology of quantum algorithms. CoRRabs/2407.05178 (2024). arXiv:2407.05178 doi:10.48550/ARXIV.2407.05178

work page doi:10.48550/arxiv.2407.05178 2024
[7]

Fowler, Matteo Mariantoni, John M

Jeff P. Barnes, Colin J. Trout, Dennis Lucarelli, and B. D. Clader. 2017. Quantum error-correction failure distributions: Comparison of coherent and stochastic error models.Phys. Rev. A95 (Jun 2017), 062338. Issue 6. doi:10.1103/PhysRevA. 95.062338

work page doi:10.1103/physreva 2017
[8]

Khadeejah Bepari, Sarah Malik, Michael Spannowsky, and Simon Williams. 2021. Towards a quantum computing algorithm for helicity amplitudes and parton showers.Phys. Rev. D103 (Apr 2021), 076020. Issue 7. doi:10.1103/ PhysRevD.103.076020

work page 2021
[9]

Teresa Brecht, Wolfgang Pfaff, Chen Wang, Yiwen Chu, Luigi Frunzio, Michel H Devoret, and Robert J Schoelkopf

work page
[10]

Multilayer microwave integrated quantum circuits for scalable quantum computing.npj Quantum Information2, ACM Trans. Softw. Eng. Methodol., Vol. X, No. X, Article X. Publication date: February 2026. X:46 Sophie Fortz, Eñaut Mendiluze Usandizaga, Shaukat Ali, Paolo Arcaini, and Mohammad Reza Mousavi 1 (Feb. 2016), 16002. doi:10.1038/npjqi.2016.2

work page doi:10.1038/npjqi.2016.2 2026
[11]

Pascal Cerfontaine, René Otten, and Hendrik Bluhm. 2020. Self-Consistent Calibration of Quantum-Gate Sets.Phys. Rev. Appl.13 (Apr 2020), 044071. Issue 4. doi:10.1103/PhysRevApplied.13.044071

work page doi:10.1103/physrevapplied.13.044071 2020
[12]

Kausthubh Chandramouli, Kelly Mae Allen, Christopher Mori, Dror Baron, and Mário A. T. Figueiredo. 2025. Statistical Signal Processing for Quantum Error Mitigation.CoRRabs/2506.00683 (2025). arXiv:2506.00683 doi:10.48550/ARXIV. 2506.00683

work page internal anchor Pith review doi:10.48550/arxiv 2025
[13]

Isaac L Chuang, Raymond Laflamme, Peter W Shor, and Wojciech H Zurek. 1995. Quantum computers, factoring, and decoherence.Science270, 5242 (1995), 1633–1635

work page 1995
[14]

Cleophas and Aeilko H

Ton J. Cleophas and Aeilko H. Zwinderman. 2018.Bayesian Pearson Correlation Analysis. Springer International Publishing, Cham, 111–118. doi:10.1007/978-3-319-92747-3_11

work page doi:10.1007/978-3-319-92747-3_11 2018
[15]

Norman Cliff. 1993. Dominance statistics: Ordinal analyses to answer ordinal questions.Psychological bulletin114, 3 (1993), 494. doi:10.1037/0033-2909.114.3.494

work page doi:10.1037/0033-2909.114.3.494 1993
[16]

William G Cochran. 1952. The𝜒2 test of goodness of fit.The Annals of mathematical statistics(1952), 315–345

work page 1952
[17]

2013.Statistical power analysis for the behavioral sciences

Jacob Cohen. 2013.Statistical power analysis for the behavioral sciences. Routledge, New York, NY, USA

work page 2013
[18]

James D. Cresser. 2011. Quantum Physics Notes

work page 2011
[19]

Andrew W Cross, Lev S Bishop, John A Smolin, and Jay M Gambetta. 2017. Open quantum assembly language.arXiv preprint arXiv:1707.03429(2017). doi:10.48550/arXiv.1707.03429

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1707.03429 2017
[20]

Samudra Dasgupta and Travis S. Humble. 2022. Characterizing the Reproducibility of Noisy Quantum Circuits.Entropy 24, 2 (2022), 244. doi:10.3390/E24020244

work page doi:10.3390/e24020244 2022
[21]

Michael A Fligner and Timothy J Killeen. 1976. Distribution-Free Two-Sample Tests for Scale.J. Amer. Statist. Assoc. 71, 353 (1976), 210–213. arXiv:https://www.tandfonline.com/doi/pdf/10.1080/01621459.1976.10481517 doi:10.1080/ 01621459.1976.10481517

work page doi:10.1080/01621459.1976.10481517 1976
[22]

Daniel Fortunato, José Campos, and Rui Abreu. 2022. Mutation Testing of Quantum Programs: A Case Study With Qiskit.IEEE Transactions on Quantum Engineering3 (2022), 1–17. doi:10.1109/TQE.2022.3195061

work page doi:10.1109/tqe.2022.3195061 2022
[23]

Daniel Fortunato, José Campos, and Rui Abreu. 2022. Mutation Testing of Quantum Programs Written in QISKit. In44th IEEE/ACM International Conference on Software Engineering: Companion Proceedings, ICSE Companion 2022, Pittsburgh, PA, USA, May 22-24, 2022. ACM/IEEE, Pittsburgh, PA, USA, 358–359. doi:10.1145/3510454.3528649

work page doi:10.1145/3510454.3528649 2022
[24]

Daniel Fortunato, José Campos, and Rui Abreu. 2022. QMutPy: a mutation testing tool for Quantum algorithms and applications in Qiskit. InISSTA ’22: 31st ACM SIGSOFT International Symposium on Software Testing and Analysis, Virtual Event, South Korea, July 18 - 22, 2022, Sukyoung Ryu and Yannis Smaragdakis (Eds.). ACM, Virtual Event, South Korea, 797–800. ...

work page doi:10.1145/3533767.3543296 2022
[25]

Robust Mutation Analysis of Quantum Programs Under Noise

Sophie Fortz and Eñaut Mendiluze. 2026. Supplementary material for the paper “Robust Mutation Analysis of Quantum Programs Under Noise”. https://github.com/sfortz/Noise-aware_Mutants

work page 2026
[26]

Gordon Fraser and Andreas Zeller. 2012. Mutation-Driven Generation of Unit Tests and Oracles.IEEE Trans. Software Eng.38, 2 (2012), 278–292. doi:10.1109/TSE.2011.93

work page doi:10.1109/tse.2011.93 2012
[27]

Langford, and Michael A

Alexei Gilchrist, Nathan K. Langford, and Michael A. Nielsen. 2005. Distance measures to compare real and ideal quantum processes.Phys. Rev. A71 (Jun 2005), 062310. Issue 6. doi:10.1103/PhysRevA.71.062310

work page doi:10.1103/physreva.71.062310 2005
[28]

Stanton A. Glantz. 2012.Primer of Biostatistics(7th edition ed.). McGraw Hill, New York. https://www.accessscience. com/content/book/9780071781503

work page arXiv 2012
[29]

Frattini, Shruti Puri, Shantanu O

Alexander Grimm, Nicholas E. Frattini, Shruti Puri, Shantanu O. Mundhada, Steven Touzard, Mazyar Mirrahimi, Steven M. Girvin, Shyam Shankar, and Michel H. Devoret. 2020. Stabilization and operation of a Kerr-cat qubit.Nature 584, 7820 (Aug. 2020), 205–209. doi:10.1038/s41586-020-2587-z

work page doi:10.1038/s41586-020-2587-z 2020
[30]

Bernhard J. M. Grün, David Schuler, and Andreas Zeller. 2009. The Impact of Equivalent Mutants. InSecond International Conference on Software Testing Verification and Validation, ICST 2009, Denver, Colorado, USA, April 1-4, 2009, Workshops Proceedings. IEEE Computer Society, Denver, Colorado, USA, 192–199. doi:10.1109/ICSTW.2009.37

work page doi:10.1109/icstw.2009.37 2009
[31]

Lisa Hales and Sean Hallgren. 2000. An Improved Quantum Fourier Transform Algorithm and Applications. In41st Annual Symposium on Foundations of Computer Science, FOCS 2000, Redondo Beach, California, USA, November 12-14,

work page 2000
[32]

doi:10.1109/SFCS.2000.892139

IEEE Computer Society, Redondo Beach, California, USA, 515–525. doi:10.1109/SFCS.2000.892139

work page doi:10.1109/sfcs.2000.892139 2000
[33]

Flammia, and Joel J

Robin Harper, Steven T. Flammia, and Joel J. Wallman. 2020. Efficient learning of quantum noise.Nature Physics16, 12 (Dec. 2020), 1184–1188. doi:10.1038/s41567-020-0992-8

work page doi:10.1038/s41567-020-0992-8 2020
[34]

Shahin Honarvar, Mohammad Reza Mousavi, and Rajagopal Nagarajan. 2020. Property-based Testing of Quantum Programs in Q#. InICSE ’20: 42nd International Conference on Software Engineering, Workshops, Seoul, Republic of Korea, 27 June - 19 July, 2020. ACM, Seoul, Republic of Korea, 430–435. doi:10.1145/3387940.3391459

work page doi:10.1145/3387940.3391459 2020
[35]

Yipeng Huang and Margaret Martonosi. 2019. Statistical assertions for validating patterns and finding bugs in quantum programs. InProceedings of the 46th International Symposium on Computer Architecture, ISCA 2019, Phoenix, AZ, USA, June 22-26, 2019, Srilatha Bobbie Manne, Hillery C. Hunter, and Erik R. Altman (Eds.). ACM, Phoenix, AZ, USA, 541–553. doi:1...

work page doi:10.1145/3307650.3322213 2019
[36]

Schuster, and Jens Koch

Ziwen Huang, Yao Lu, Eliot Kapit, David I. Schuster, and Jens Koch. 2018. Universal stabilization of single-qubit states using a tunable coupler.Phys. Rev. A97 (Jun 2018), 062345. Issue 6. doi:10.1103/PhysRevA.97.062345

work page doi:10.1103/physreva.97.062345 2018
[37]

Shih-Han Hung, Kesha Hietala, Shaopeng Zhu, Mingsheng Ying, Michael Hicks, and Xiaodi Wu. 2019. Quantitative robustness analysis of quantum programs.Proc. ACM Program. Lang.3, POPL (2019), 31:1–31:29. doi:10.1145/3290344

work page doi:10.1145/3290344 2019
[38]

Yue Jia and Mark Harman. 2011. An Analysis and Survey of the Development of Mutation Testing.IEEE Trans. Software Eng.37, 5 (2011), 649–678. doi:10.1109/TSE.2010.62

work page doi:10.1109/tse.2010.62 2011
[39]

Shelby Kimmel, Guang Hao Low, and Theodore J. Yoder. 2015. Robust calibration of a universal single-qubit gate set via robust phase estimation.Phys. Rev. A92 (Dec 2015), 062315. Issue 6. doi:10.1103/PhysRevA.92.062315

work page doi:10.1103/physreva.92.062315 2015
[40]

Jonas Klamroth, Max Scheerer, and Oliver Denninger. 2025. Detecting and Tolerating Faults in Hybrid Quantum Software Systems Using Architectural Redundancy. InIEEE International Conference on Quantum Software, QSW 2025, Helsinki, Finland, July 7-12, 2025, Rong N. Chang, Carl K. Chang, Jingwei Yang, Nimanthi Atukorala, Dan Chen, Sumi Helal, Sasu Tarkoma, Q...

work page doi:10.1109/qsw67625.2025.00028 2025
[41]

Junyu Liu, Frederik Wilde, Antonio Anna Mele, Liang Jiang, and Jens Eisert. 2022. Noise can be helpful for variational quantum algorithms.CoRRabs/2210.06723 (2022). arXiv:2210.06723 doi:10.48550/ARXIV.2210.06723

work page doi:10.48550/arxiv.2210.06723 2022
[42]

Chakram, N

Yao Lu, S. Chakram, N. Leung, N. Earnest, R. K. Naik, Ziwen Huang, Peter Groszkowski, Eliot Kapit, Jens Koch, and David I. Schuster. 2017. Universal Stabilization of a Parametrically Coupled Qubit.Phys. Rev. Lett.119 (Oct 2017), 150502. Issue 15. doi:10.1103/PhysRevLett.119.150502

work page doi:10.1103/physrevlett.119.150502 2017
[43]

Lech Madeyski, Wojciech Orzeszyna, Richard Torkar, and Mariusz Jozala. 2014. Overcoming the Equivalent Mutant Problem: A Systematic Literature Review and a Comparative Experiment of Second Order Mutation.IEEE Trans. Software Eng.40, 1 (2014), 23–42. doi:10.1109/TSE.2013.44

work page doi:10.1109/tse.2013.44 2014
[44]

A. P. Majtey, P. W. Lamberti, and D. P. Prato. 2005. Jensen-Shannon divergence as a measure of distinguishability between mixed quantum states.Phys. Rev. A72 (Nov 2005), 052310. Issue 5. doi:10.1103/PhysRevA.72.052310

work page doi:10.1103/physreva.72.052310 2005
[45]

Henry B Mann and Donald R Whitney. 1947. On a test of whether one of two random variables is stochastically larger than the other.The annals of mathematical statistics(1947), 50–60

work page 1947
[46]

Kane Meissel and Esther S Yao. 2024. Using Cliff’s delta as a non-parametric effect size measure: an accessible web app and R tutorial.Practical Assessment, Research, and Evaluation29, 1 (2024). doi:10.7275/pare.1977

work page doi:10.7275/pare.1977 2024
[47]

Eñaut Mendiluze, Shaukat Ali, Paolo Arcaini, and Tao Yue. 2021. Muskit: A Mutation Analysis Tool for Quantum Software Testing. In36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, Melbourne, Australia, 1266–1270. doi:10.1109/ASE51524.2021.9678563

work page doi:10.1109/ase51524.2021.9678563 2021
[48]

Asmar Muqeet, Shaukat Ali, and Paolo Arcaini. 2024. Approximating Stochastic Quantum Noise Through Genetic Programming. InSearch-Based Software Engineering - 16th International Symposium, SSBSE 2024, Porto de Galinhas, Brazil, July 15, 2024, Proceedings (Lecture Notes in Computer Science, Vol. 14767), Gunel Jahangirova and Foutse Khomh (Eds.). Springer, P...

work page doi:10.1007/978-3-031-64573-0_5 2024
[49]

Asmar Muqeet, Shaukat Ali, and Paolo Arcaini. 2024. Quantum Program Testing Through Commuting Pauli Strings on IBM’s Quantum Computers. InProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering, ASE 2024, Sacramento, CA, USA, October 27 - November 1, 2024, Vladimir Filkov, Baishakhi Ray, and Minghui Zhou (Eds.). ACM, Sa...

work page doi:10.1145/3691620.3695275 2024
[50]

Asmar Muqeet, Shaukat Ali, and Paolo Arcaini. 2025. Tool: QUIET: A Tool for Sampling-Based Quantum Noise Error Mitigation.IEEE Softw.42, 6 (2025), 28–34. doi:10.1109/MS.2025.3532106

work page doi:10.1109/ms.2025.3532106 2025
[51]

Asmar Muqeet, Shaukat Ali, Tao Yue, and Paolo Arcaini. 2024. A Machine Learning-Based Error Mitigation Approach for Reliable Software Development on IBM’s Quantum Computers. InCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering, FSE 2024, Porto de Galinhas, Brazil, July 15-19, 2024, Marcelo d’Amorim (E...

work page doi:10.1145/3663529.3663830 2024
[52]

Asmar Muqeet, Tao Yue, Shaukat Ali, and Paolo Arcaini. 2024. Mitigating Noise in Quantum Software Testing Using Machine Learning.IEEE Trans. Software Eng.50, 11 (2024), 2947–2961. doi:10.1109/TSE.2024.3462974

work page doi:10.1109/tse.2024.3462974 2024
[53]

Nielsen and Isaac L

Michael A. Nielsen and Isaac L. Chuang. 2016.Quantum Computation and Quantum Information (10th Anniver- sary edition). Cambridge University Press, Cambridge, UK. https://www.cambridge.org/de/academic/subjects/ physics/quantum-physics-quantum-information-and-quantum-computation/quantum-computation-and-quantum- information-10th-anniversary-edition?format=HB

work page 2016
[54]

2025.Testing and analysis of quantum software

Matteo Paltenghi. 2025.Testing and analysis of quantum software. Ph. D. Dissertation. University of Stuttgart, Germany. https://nbn-resolving.org/urn:nbn:de:bsz:93-opus-ds-167500

work page 2025
[55]

Mike Papadakis, Márcio Eduardo Delamaro, and Yves Le Traon. 2014. Mitigating the effects of equivalent mutants with mutant classification strategies.Sci. Comput. Program.95 (2014), 298–319. doi:10.1016/J.SCICO.2014.05.012

work page doi:10.1016/j.scico.2014.05.012 2014
[56]

Mike Papadakis, Yue Jia, Mark Harman, and Yves Le Traon. 2015. Trivial Compiler Equivalence: A Large Scale Empirical Study of a Simple, Fast and Effective Equivalent Mutant Detection Technique. In37th IEEE/ACM International Conference on Software Engineering, ICSE 2015, Florence, Italy, May 16-24, 2015, Volume 1, Antonia Bertolino, Gerardo ACM Trans. Soft...

work page doi:10.1109/icse.2015.103 2015
[57]

Mike Papadakis, Marinos Kintis, Jie Zhang, Yue Jia, Yves Le Traon, and Mark Harman. 2019. Chapter Six - Mutation Testing Advances: An Analysis and Survey.Adv. Comput.112 (2019), 275–378. doi:10.1016/BS.ADCOM.2018.03.015

work page doi:10.1016/bs.adcom.2018.03.015 2019
[58]

Mike Papadakis, Donghwan Shin, Shin Yoo, and Doo-Hwan Bae. 2018. Are mutation scores correlated with real fault detection?: a large scale empirical study on the relationship between mutants and real faults. InProceedings of the 40th International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, May 27 - June 03, 2018, Michel Chaudron, Iv...

work page doi:10.1145/3180155.3180183 2018
[59]

Tirthak Patel and Devesh Tiwari. 2021. Qraft: reverse your Quantum circuit and know the correct program output. In ASPLOS ’21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Virtual Event, USA, April 19-23, 2021, Tim Sherwood, Emery D. Berger, and Christos Kozyrakis (Eds.). ACM, Virtual Event, U...

work page doi:10.1145/3445814.3446743 2021
[60]

Kai Petersen and Çigdem Gencel. 2013. Worldviews, Research Methods, and their Relationship to Validity in Empirical Software Engineering Research. In2013 Joint Conference of the 23rd International Workshop on Software Measurement and the 8th International Conference on Software Process and Product Measurement, Ankara, Turkey, October 23-26, 2013. IEEE Com...

work page doi:10.1109/iwsm-mensura.2013.22 2013
[61]

Gabriel Pontolillo, Mohammad Reza Mousavi, and Marek Grzesiuk. 2025. QuCheck: A Property-based Testing Framework for Quantum Programs in Qiskit.CoRRabs/2503.22641 (2025). arXiv:2503.22641 doi:10.48550/ARXIV.2503. 22641

work page doi:10.48550/arxiv.2503 2025
[62]

Gabriel Pontolillo, Asmar Muqeet, Shaukat Ali, and Mohammad Reza Mousavi. 2025. From Ideal to Noisy: Adapting Property-Based Testing for Real-World Noisy Quantum Computers. In2025 IEEE International Conference on Quantum Computing and Engineering (QCE), Vol. 01. 405–416. doi:10.1109/QCE65121.2025.00053

work page doi:10.1109/qce65121.2025.00053 2025
[63]

Gabriel Joseph Pontolillo and Mohammad Reza Mousavi. 2024. Delta Debugging for Property-Based Regression Testing of Quantum Programs. InProceedings of the 5th ACM/IEEE International Workshop on Quantum Software Engineering, Q-SE 2024, Lisbon, Portugal, 16 April 2024. ACM, Lisbon, Portugal, 1–8. doi:10.1145/3643667.3648219

work page doi:10.1145/3643667.3648219 2024
[64]

J. S. Pratt and J. H. Eberly. 2001. Qubit cross talk and entanglement decay.Phys. Rev. B64 (Oct 2001), 195314. Issue 19. doi:10.1103/PhysRevB.64.195314

work page doi:10.1103/physrevb.64.195314 2001
[65]

Nils Quetschlich, Lukas Burgholzer, and Robert Wille. 2023. MQT Bench: Benchmarking Software and Design Automation Tools for Quantum Computing.Quantum7 (2023), 1062. doi:10.22331/Q-2023-07-20-1062

work page doi:10.22331/q-2023-07-20-1062 2023
[66]

Timothy C. Ralph. 2012. Howard Wiseman and Gerard Milburn: Quantum measurement and control.Quantum Inf. Process.11, 1 (2012), 313–315. doi:10.1007/S11128-011-0277-3

work page doi:10.1007/s11128-011-0277-3 2012
[67]

Karpuzcu

Salonik Resch and Ulya R. Karpuzcu. 2022. Benchmarking Quantum Computers and the Impact of Quantum Noise. ACM Comput. Surv.54, 7 (2022), 142:1–142:35. doi:10.1145/3464420

work page doi:10.1145/3464420 2022
[68]

Diego Riste, Stefano Poletto, M-Z Huang, Alessandro Bruno, Visa Vesterinen, O-P Saira, and Leonardo DiCarlo. 2015. Detecting bit-flip errors in a logical qubit using stabilizer measurements.Nature Communications6, 1 (April 2015),

work page 2015
[69]

doi:10.1038/ncomms7983

work page doi:10.1038/ncomms7983
[70]

Per Runeson and Martin Höst. 2009. Guidelines for conducting and reporting case study research in software engineering.Empir. Softw. Eng.14, 2 (2009), 131–164. doi:10.1007/S10664-008-9102-8

work page doi:10.1007/s10664-008-9102-8 2009
[71]

doi:10.22331/q-2020-09-11-321 , url =

Mohan Sarovar, Timothy Proctor, Kenneth Rudinger, Kevin C. Young, Erik Nielsen, and Robin Blume-Kohout. 2020. Detecting crosstalk errors in quantum information processors.Quantum4 (2020), 321. doi:10.22331/Q-2020-09-11-321

work page doi:10.22331/q-2020-09-11-321 2020
[72]

Chong, and Ronghui Gu

Runzhou Tao, Yunong Shi, Jianan Yao, John Hui, Frederic T. Chong, and Ronghui Gu. 2021. Gleipnir: toward practical error analysis for Quantum programs. InPLDI ’21: 42nd ACM SIGPLAN International Conference on Programming Language Design and Implementation, Virtual Event, Canada, June 20-25, 2021, Stephen N. Freund and Eran Yahav (Eds.). ACM, Virtual Event...

work page doi:10.1145/3453483.3454029 2021
[73]

Ewa Tomczak and Maciej Tomczak. 2014. The need to report effect size estimates revisited. An overview of some recommended measures of effect size.TRENDS in Sport Sciences21, 1 (2014)

work page 2014
[74]

Eñaut Mendiluze Usandizaga, Shaukat Ali, Tao Yue, and Paolo Arcaini. 2025. Quantum circuit mutants: Empirical analysis and recommendations.Empir. Softw. Eng.30, 3 (2025), 100. doi:10.1007/S10664-025-10643-Z

work page doi:10.1007/s10664-025-10643-z 2025
[75]

Xinyi Wang, Paolo Arcaini, and Tao Yue and< Shaukat Ali. 2021. Quito: a Coverage-Guided Test Generator for Quantum Programs. In36th IEEE/ACM International Conference on Automated Software Engineering, ASE 2021, Melbourne, Australia, November 15-19, 2021. IEEE, Melbourne, Australia, 1237–1241. doi:10.1109/ASE51524.2021.9678798

work page doi:10.1109/ase51524.2021.9678798 2021
[76]

Xinyi Wang, Paolo Arcaini, Tao Yue, and Shaukat Ali. 2021. Generating Failing Test Suites for Quantum Programs With Search. InSearch-Based Software Engineering - 13th International Symposium, SSBSE 2021, Bari, Italy, October 11-12, 2021, Proceedings (Lecture Notes in Computer Science, Vol. 12914), Una-May O’Reilly and Xavier Devroey (Eds.). Springer, Bari...

work page doi:10.1007/978-3-030-88106-1_2 2021
[77]

Dominic Widdows, Jyoti Rani, and Emmanuel M. Pothos. 2023. Quantum Circuit Components for Cognitive Decision- Making.Entropy25, 4 (2023), 548. doi:10.3390/E25040548 ACM Trans. Softw. Eng. Methodol., Vol. X, No. X, Article X. Publication date: February 2026. Robust Mutation Analysis of Quantum Programs Under Noise X:49

work page doi:10.3390/e25040548 2023
[78]

Ellis Wilson, Frank Mueller, Lindsay Bassman, and Costin Iancu. 2021. Empirical evaluation of circuit approximations on noisy quantum devices. InInternational Conference for High Performance Computing, Networking, Storage and Analysis, SC 2021, St. Louis, Missouri, USA, November 14-19, 2021, Bronis R. de Supinski, Mary W. Hall, and Todd Gamblin (Eds.). AC...

work page doi:10.1145/3458817.3476189 2021
[79]

Egger, Stefan Filipp, Frank K

Nicolas Wittler, Federico Roy, Kevin Pack, Max Werninghaus, Anurag Saha Roy, Daniel J. Egger, Stefan Filipp, Frank K. Wilhelm, and Shai Machnes. 2021. Integrated Tool Set for Control, Calibration, and Characterization of Quantum Devices Applied to Superconducting Qubits.Phys. Rev. Appl.15 (Mar 2021), 034080. Issue 3. doi:10.1103/PhysRevApplied. 15.034080

work page doi:10.1103/physrevapplied 2021
[80]

Wootters and Wojciech H

William K. Wootters and Wojciech H. Zurek. 1982. A single quantum cannot be cloned.Nature299, 5886 (01 10 1982), 802–803. doi:10.1038/299802a0

work page doi:10.1038/299802a0 1982

Showing first 80 references.