{"paper":{"title":"Robust Mutation Analysis of Quantum Programs Under Noise","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Noise alters behavioral distances between quantum programs and mutants, requiring noise-specific thresholds for detection.","cross_cats":[],"primary_cat":"cs.SE","authors_text":"E\\~naut Mendiluze Usandizaga, Mohammad Reza Mousavi, Paolo Arcaini, Shaukat Ali, Sophie Fortz","submitted_at":"2026-05-13T09:56:05Z","abstract_excerpt":"Mutation analysis has long been used in classical software testing and has recently been adopted for assessing the robustness of quantum software testing techniques. However, existing studies assume ideal, noiseless execution, overlooking the impact of quantum hardware noise. In this paper, we present an empirical study of noise-aware mutation analysis for quantum programs. We analyze how noise affects mutant detection using 41 quantum programs, executed on noiseless and noisy simulators emulating three IBM devices with different noise profiles. We compare several distance metrics and threshol"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Our results show that noise significantly alters the behavioral distance between programs and mutants, making equivalent mutants harder to distinguish from real faults. Density-matrix metrics achieve the best discrimination, with misclassification rates up to 16.77%, but output-distribution metrics reach up to 73.03% accuracy and 74.89% F1-score. Noise-specific thresholds further improve detection compared to noiseless thresholds.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the three IBM-device noise profiles emulated in the simulators accurately represent real hardware behavior and that the 41 chosen programs plus mutation operators are representative of broader quantum software.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Noise alters behavioral distances between quantum programs and mutants, requiring noise-specific thresholds for detection.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"d54c6ac1d68ed0190c03b74752eed4a40cb361f0d162fe269b8c24b61411c244"},"source":{"id":"2605.13279","kind":"arxiv","version":1},"verdict":{"id":"fcde720b-1f7b-47d9-8fe5-e2f57a2e92cd","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T18:19:50.548000Z","strongest_claim":"Our results show that noise significantly alters the behavioral distance between programs and mutants, making equivalent mutants harder to distinguish from real faults. Density-matrix metrics achieve the best discrimination, with misclassification rates up to 16.77%, but output-distribution metrics reach up to 73.03% accuracy and 74.89% F1-score. Noise-specific thresholds further improve detection compared to noiseless thresholds.","one_line_summary":"Noise from quantum hardware simulators significantly alters mutant detection distances, making equivalent mutants harder to separate from faults, with output-distribution metrics reaching 73.03% accuracy and 74.89% F1-score under device-specific thresholds.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the three IBM-device noise profiles emulated in the simulators accurately represent real hardware behavior and that the 41 chosen programs plus mutation operators are representative of broader quantum software.","pith_extraction_headline":"Noise alters behavioral distances between quantum programs and mutants, requiring noise-specific thresholds for detection."},"references":{"count":84,"sample":[{"doi":"10.1007/978-3-540-24855-2_155","year":2004,"title":"Konstantinos Adamopoulos, Mark Harman, and Robert M. Hierons. 2004. How to Overcome the Equivalent Mutant Problem and Achieve Tailored Selective Mutation Using Co-evolution. InGenetic and Evolutionary","work_id":"127f0ecf-681f-4aec-8e2c-57141d67a8d0","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1109/icst49551.2021","year":2021,"title":"Shaukat Ali, Paolo Arcaini, Xinyi Wang, and Tao Yue. 2021. Assessing the Effectiveness of Input and Output Coverage Criteria for Testing Quantum Programs. In14th IEEE Conference on Software Testing, V","work_id":"62d5a0f1-f830-4091-9c4a-51a512f8e6a6","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2004,"title":"R Alicki. 2004. Decoherence and the appearance of a classical world in quantum theory.Journal of Physics A: Mathematical and General37, 5 (2004), 1948–1949","work_id":"3534a94b-3658-4167-a287-a752a3c963c6","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1109/tse.2006.83","year":2006,"title":"James H. Andrews, Lionel C. Briand, Yvan Labiche, and Akbar Siami Namin. 2006. Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria.IEEE Trans. Software Eng.32, 8 (2006), 608–","work_id":"ee31d27d-d6db-4295-b508-687c0e0873e4","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.1002/stvr.1486","year":2014,"title":"Andrea Arcuri and Lionel C. Briand. 2014. A Hitchhiker’s guide to statistical tests for assessing randomized algorithms in software engineering.Softw. Test. Verification Reliab.24, 3 (2014), 219–250. ","work_id":"497ea68e-07ac-4fd1-a546-71c047754f58","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":84,"snapshot_sha256":"f19470d515ea8f19d7636438ebc01e6f74defc08242c7db2b77e9bd46126cf30","internal_anchors":2},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}