Auditing Empirical Comparisons in Quantum Software

Arif Ali Khan; Boshuai Ye; Maryam Tavassoli Sabzevari; Peng Liang

arxiv: 2607.00516 · v1 · pith:GBDJHCNOnew · submitted 2026-07-01 · 💻 cs.SE

Auditing Empirical Comparisons in Quantum Software

Boshuai Ye , Peng Liang , Maryam Tavassoli Sabzevari , Arif Ali Khan This is my paper

Pith reviewed 2026-07-02 09:01 UTC · model grok-4.3

classification 💻 cs.SE

keywords quantum softwareempirical comparisonsauditing frameworkmaterialization gapreproducibilitybenchmarkingCLAIMSTAB-QCevidence classification

0 comments

The pith

Only 8 of 455 reported quantum-software comparisons expose enough evidence for locked audit without proxy reconstruction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces CLAIMSTAB-QC, a source-bounded auditing framework that records baselines, metrics, relations, and admissible evidence for a reported comparison, then locks the design before checking outcomes. When applied to 455 comparative claims drawn from 119 quantum-software papers, the framework shows a steep materialization gap: 175 claims can be represented for planning, 79 become scalar-directional records, 53 produce lockable designs, and just 8 supply matched evidence sufficient to audit the original claim. Of those 8, the outcomes split into 2 Sustained, 4 Unresolved, and 2 Reversed. Controlled diagnostics on 24 additional benchmark comparisons indicate that simpler checks often preserve directions whose support weakens once the audit scope is locked.

Core claim

CLAIMSTAB-QC classifies strict scalar-directional comparisons as Sustained, Unresolved, or Reversed inside a locked audit scope. Evaluation on 455 claims yields a materialization gap in which only 8 records expose matched evidence without proxy reconstruction, producing 2 Sustained, 4 Unresolved, and 2 Reversed outcomes; diagnostics show simpler checks can retain apparent directions that locked designs weaken.

What carries the argument

CLAIMSTAB-QC, a source-bounded framework that records baselines, metric, relation, and admissible evidence, locks the comparison design, and reports a scoped relation outcome or explicit evidence boundary.

If this is right

Most reported performance edges between compilers, optimizers, backends, or ansatzes cannot be verified from the evidence the papers expose.
Published comparisons that appear directional under informal checks frequently become Unresolved or Reversed once the audit scope is locked.
Benchmark-relevant comparisons require explicit recording of admissible evidence and locked designs before outcomes are computed.
Simpler post-hoc checks tend to preserve directions whose support weakens under the stricter locked-audit procedure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Journals and conferences could require authors to supply a locked-audit record alongside each comparative claim.
The same framework could be applied to other empirical domains where tool comparisons depend on benchmark scope and noise assumptions.
Reproducibility efforts would gain from treating the comparison design itself as an auditable artifact rather than only the code or data.
Extending CLAIMSTAB-QC to multi-metric or non-directional relations would cover a larger fraction of the 455 claims.

Load-bearing premise

The 455 extracted comparative claims are representative of empirical comparisons in the quantum software literature and CLAIMSTAB-QC's evidence classification rules can be applied consistently from the information stated in the source papers.

What would settle it

Re-running the audit on the same 119 papers after authors supply the missing matched evidence for the 45 claims that reached lockable designs but lacked full evidence, and counting how many of the original directions remain Sustained.

Figures

Figures reproduced from arXiv: 2607.00516 by Arif Ali Khan, Boshuai Ye, Maryam Tavassoli Sabzevari, Peng Liang.

**Figure 1.** Figure 1: The CLAIMSTAB-QC workflow. A reported comparison is represented as a claim card, locked before outcome computation, evaluated through comparison records, and reported with an explicit evidence boundary. TABLE II CORE CLAIMSTAB-QC CONCEPTS. Concept Meaning Reported comparison A source-paper statement comparing two baselines under a metric, scope, and outcome rule. Claim card Fixed representation of a report… view at source ↗

**Figure 2.** Figure 2: Corpus extraction and materialization funnel. The 53 lockable designs [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Wilson 95% confidence intervals for the eight Tier-1 proxy-free scoped [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: EX-C7 locked-cell grid. Each cell aggregates five transpiler seeds and [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

read the original abstract

Empirical quantum-software papers often report that one compiler, optimizer, backend, or ansatz outperforms another. Such comparisons are not properties of a tool alone: they can change with benchmark scope, circuit construction, compilation, sampling, backend or noise assumptions, optimizer choices, and resource budgets. Existing testing, benchmarking, and reproducibility methods help assess programs, tools, executions, and platforms, but they do not directly audit whether the reported comparison itself is supported by the evidence exposed in the source paper or accompanying materials. We present CLAIMSTAB-QC, a source-bounded framework for auditing empirical comparisons in quantum software. Given a reported comparison, the framework records the baselines, metric, relation, and admissible evidence; locks the comparison design before outcomes are computed; and reports either a scoped relation outcome or an explicit evidence boundary. For strict scalar-directional comparisons, the reported direction is classified as Sustained, Unresolved, or Reversed within the locked audit scope. We evaluate CLAIMSTAB-QC on 455 comparative claims from 119 quantum-software papers. The central finding is a materialization gap: 175 claims can be represented for audit planning, 79 become scalar-directional planning records, 53 yield lockable audit or diagnostic designs, and only 8 expose enough matched evidence to audit the original comparison without proxy reconstruction. These 8 records yield 2 Sustained, 4 Unresolved, and 2 Reversed outcomes. Controlled diagnostics over 24 benchmark-relevant comparisons further show that simpler checks can preserve apparent directions whose support weakens under locked audit designs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CLAIMSTAB-QC shows most quantum software comparisons lack enough source evidence to audit, but the 119-paper sample has no described selection method.

read the letter

The main thing to know is that this paper's CLAIMSTAB-QC framework finds a steep materialization gap: only 8 of 455 claims from 119 papers can be audited directly from the source without reconstruction, and those 8 split into 2 sustained, 4 unresolved, and 2 reversed.

The framework is the actual new piece. It records the comparison elements first, locks the audit scope before outcomes are checked, and then classifies the result within that scope. The controlled diagnostics on 24 comparisons also make a useful point that simpler checks can preserve directions that weaken under locked designs.

This is a practical contribution for an applied field where comparisons depend on many moving parts like benchmarks and noise models. The structure forces explicit evidence boundaries, which is a step forward from informal reproducibility discussions.

The soft spot is the corpus. The abstract gives no search strategy, inclusion criteria, date bounds, or database for the 119 papers, so the drop from 455 claims to 8 auditable ones could be tied to how the papers were picked rather than a field-wide pattern. No information appears on claim extraction rules or agreement between reviewers either.

This is for researchers doing empirical work in quantum software or people focused on benchmarking and reproducibility methods. A reader looking for concrete ways to audit comparisons would get something usable from the framework description.

Send it to peer review. The core idea targets a real problem and the classification approach is worth discussion, but the evaluation needs clearer methodological details before the numbers can be treated as representative.

Referee Report

2 major / 1 minor

Summary. The paper introduces CLAIMSTAB-QC, a source-bounded framework that records baselines, metrics, relations, and admissible evidence for empirical comparisons in quantum software, locks the audit design, and classifies strict scalar-directional outcomes as Sustained, Unresolved, or Reversed. Applied to 455 comparative claims extracted from 119 papers, it reports a materialization gap: 175 claims representable for planning, 79 scalar-directional, 53 lockable designs, and only 8 fully auditable without proxies, yielding 2 Sustained, 4 Unresolved, and 2 Reversed. Controlled diagnostics on 24 comparisons illustrate that simpler checks can preserve directions that weaken under locked audits.

Significance. If the sampled corpus is representative, the materialization gap would demonstrate that most reported comparisons in quantum software lack sufficient exposed evidence for direct verification, with implications for reproducibility and benchmarking practices in the field. The framework itself is a constructive contribution that separates planning from outcome computation and applies to external papers without circularity or self-referential parameters.

major comments (2)

[Abstract and evaluation section] The selection of the 119 papers and extraction of the 455 claims is presented without any search strategy, inclusion/exclusion criteria, date bounds, database, or sampling justification (Abstract and the evaluation that produces the headline counts 175/79/53/8). This is load-bearing for the central claim of a literature-wide materialization gap, as the steep drop-off could be an artifact of an arbitrary or convenience corpus rather than a representative sample.
[Abstract and framework application] The manuscript provides no information on claim selection criteria, inter-rater reliability, or how CLAIMSTAB-QC's evidence classification rules handle ambiguous cases when reducing 455 claims to the reported counts (Abstract). Without these details the precise materialization numbers cannot be independently verified or reproduced.

minor comments (1)

[Abstract] The abstract uses the symbol 'o' in the chain 455 claims o 175 representable; this should be replaced by an explicit arrow or '→' for clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key gaps in methodological transparency. We address each major comment below and will revise the manuscript to incorporate the requested details.

read point-by-point responses

Referee: [Abstract and evaluation section] The selection of the 119 papers and extraction of the 455 claims is presented without any search strategy, inclusion/exclusion criteria, date bounds, database, or sampling justification (Abstract and the evaluation that produces the headline counts 175/79/53/8). This is load-bearing for the central claim of a literature-wide materialization gap, as the steep drop-off could be an artifact of an arbitrary or convenience corpus rather than a representative sample.

Authors: We agree that the absence of explicit corpus-construction details weakens the support for a literature-wide claim. The current manuscript describes the counts but does not document how the 119 papers were identified. In revision we will add a new subsection (likely 4.1) that reports: the database(s) queried (arXiv), the date range, the keyword combinations used to locate quantum-software papers containing empirical comparisons, the inclusion criteria applied to retain only papers with at least one explicit baseline-metric-relation statement, and any exclusion rules (e.g., purely theoretical or simulation-only works). We will also state that the sample is a convenience corpus of recent, publicly available papers rather than a probabilistically representative draw, and we will qualify the materialization-gap finding accordingly while retaining the illustrative value of the 8 fully auditable cases. revision: yes
Referee: [Abstract and framework application] The manuscript provides no information on claim selection criteria, inter-rater reliability, or how CLAIMSTAB-QC's evidence classification rules handle ambiguous cases when reducing 455 claims to the reported counts (Abstract). Without these details the precise materialization numbers cannot be independently verified or reproduced.

Authors: We concur that reproducibility of the headline counts requires documentation of the claim-extraction and classification process. The manuscript currently reports only the final tallies. In the revised version we will expand Section 4 to include: (i) the operational definition used to identify a “comparative claim” (explicit mention of two or more baselines, a scalar or directional metric, and a stated relation), (ii) whether extraction was performed by a single rater or multiple raters and, if the latter, any inter-rater agreement statistic, and (iii) concrete examples of ambiguous cases together with the exact rule from CLAIMSTAB-QC that resolved them (e.g., “when the paper states a direction but omits variance, the claim is classified as scalar-directional but not lockable”). These additions will allow an independent team to replicate the reduction from 455 to 8. revision: yes

Circularity Check

0 steps flagged

No circularity: framework application to external corpus yields independent counts

full rationale

The paper defines CLAIMSTAB-QC as a source-bounded auditing procedure and applies its classification rules (representable claims, scalar-directional records, lockable designs, matched evidence) directly to 455 claims extracted from 119 external quantum-software papers. The resulting materialization gap (175→79→53→8) is produced by those rule applications on outside data; no equations, fitted parameters, or self-citation chains reduce the reported outcomes to quantities defined inside the present work. The evaluation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the introduction of CLAIMSTAB-QC and the representativeness of the sampled claims; no free parameters, standard axioms, or independent evidence for the framework are supplied in the abstract.

invented entities (1)

CLAIMSTAB-QC no independent evidence
purpose: Source-bounded framework for auditing empirical comparisons
Newly defined in the paper to perform the audits described.

pith-pipeline@v0.9.1-grok · 5818 in / 1200 out tokens · 41497 ms · 2026-07-02T09:01:42.085609+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 15 canonical work pages · 3 internal anchors

[1]

M. A. Nielsen and I. L. Chuang,Quantum computation and quantum information. Cambridge University Press, 2010

2010
[2]

Algorithms for quantum computation: discrete logarithms and factoring,

P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” inProceedings 35th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 1994, pp. 124–134

1994
[3]

A fast quantum mechanical algorithm for database search,

L. K. Grover, “A fast quantum mechanical algorithm for database search,” inProceedings of the 28th Annual ACM symposium on Theory of computing (STOC). ACM, 1996, pp. 212–219

1996
[4]

Quantum computing in the NISQ era and beyond,

J. Preskill, “Quantum computing in the NISQ era and beyond,”Quantum, vol. 2, p. 79, 2018

2018
[5]

A variational eigenvalue solver on a photonic quantum processor,

A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature Communications, vol. 5, no. 1, p. 4213, 2014

2014
[6]

A Quantum Approximate Optimization Algorithm

E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximate optimization algorithm,”arXiv preprint arXiv:1411.4028, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[7]

Cirq: A python framework for creating, editing, and invoking noisy intermediate-scale quantum (NISQ) circuits,

Cirq Developers, “Cirq: A python framework for creating, editing, and invoking noisy intermediate-scale quantum (NISQ) circuits,” https:// github.com/quantumlib/Cirq, 2022, quantum AI Team, Google

2022
[8]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

V . Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V . Ajith, M. S. Alam, G. Alonso-Linaje, B. AkashNarayanan, A. Asadiet al., “PennyLane: Automatic differentiation of hybrid quantum-classical computations,”arXiv preprint arXiv:1811.04968, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[9]

Quantum computer benchmarking: An explorative systematic literature review,

T. Rohe, F. H. Ruiloba, S. Egger, S. von Beck, J. Stein, and C. Linnhoff- Popien, “Quantum computer benchmarking: An explorative systematic literature review,”arXiv preprint arXiv:2509.03078, 2025

work page arXiv 2025
[10]

Tackling the qubit mapping problem for nisq-era quantum devices,

G. Li, Y . Ding, and Y . Xie, “Tackling the qubit mapping problem for nisq-era quantum devices,” inProceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2019, pp. 1001–1014

2019
[11]

An empirical study into the effects of transpilation on quantum circuit smells,

M. D. Stefano, D. D. Nucci, F. Palomba, and A. D. Lucia, “An empirical study into the effects of transpilation on quantum circuit smells,”Empirical Software Engineering, vol. 29, no. 3, p. 61, 2024

2024
[12]

MorphQ: Metamorphic testing of the qiskit quantum computing platform,

M. Paltenghi and M. Pradel, “MorphQ: Metamorphic testing of the qiskit quantum computing platform,” inProceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 2413–2424

2023
[13]

Benchmarking the performance of quantum computing software for quantum circuit creation, manipulation and compilation,

P. D. Nation, A. A. Saki, S. Brandhofer, L. Bello, S. Garion, M. Treinish, and A. Javadi-Abhari, “Benchmarking the performance of quantum computing software for quantum circuit creation, manipulation and compilation,”Nature Computational Science, vol. 5, pp. 427–435, 2025

2025
[14]

1-2-3 reproducibility for quantum software experiments,

W. Mauerer and S. Scherzinger, “1-2-3 reproducibility for quantum software experiments,” inProceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2022, pp. 1247–1248

2022
[15]

Stability of quantum computers,

S. Dasgupta, “Stability of quantum computers,”arXiv preprint arXiv:2404.19082, 2024

work page arXiv 2024
[16]

Bench- marking the quantum approximate optimization algorithm,

M. Willsch, D. Willsch, F. Jin, H. De Raedt, and K. Michielsen, “Bench- marking the quantum approximate optimization algorithm,”Quantum Information Processing, vol. 19, no. 7, p. 197, 2020

2020
[17]

Quantum noise in the flow of time: A temporal study of the noise in quantum computers,

B. Baheri, Q. Guan, V . Chaudhary, and A. Li, “Quantum noise in the flow of time: A temporal study of the noise in quantum computers,” inProceedings of the 28th IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS). IEEE, 2022, pp. 1–5

2022
[18]

Adaptive mitigation of time-varying quantum noise,

S. Dasgupta, T. S. Humble, and A. Danageozian, “Adaptive mitigation of time-varying quantum noise,” inProceedings of the 4th IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2023, pp. 99–110

2023
[19]

CLAIMSTAB-QC: Audit evidence package,

B. Ye, P. Liang, M. T. Sabzevari, and A. A. Khan, “CLAIMSTAB-QC: Audit evidence package,” 2026, artifact package to be released publicly after the review period

2026
[20]

Arline benchmarks: Automated benchmarking platform for quantum compilers,

Y . Kharkov, A. Ivanova, E. Mikhantiev, and A. Kotelnikov, “Arline benchmarks: Automated benchmarking platform for quantum compilers,” arXiv preprint arXiv:2202.14025, 2022

work page arXiv 2022
[21]

Probable inference, the law of succession, and statistical inference,

E. B. Wilson, “Probable inference, the law of succession, and statistical inference,”Journal of the American Statistical Association, vol. 22, no. 158, pp. 209–212, 1927

1927
[22]

Interval estimation for a binomial proportion,

L. D. Brown, T. T. Cai, and A. DasGupta, “Interval estimation for a binomial proportion,”Statistical Science, vol. 16, no. 2, pp. 101–133, 2001

2001
[23]

D. G. Altman, D. Machin, T. N. Bryant, and M. J. Gardner, Eds.,Statistics with Confidence: Confidence Intervals and Statistical Guidelines, 2nd ed. London: BMJ Books, 2000

2000
[24]

Quantum computing with Qiskit

A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lishman, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Crosset al., “Quantum computing with Qiskit,”arXiv preprint arXiv:2405.08810, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Array programming with numpy,

C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smithet al., “Array programming with numpy,”Nature, vol. 585, no. 7825, pp. 357–362, 2020

2020
[26]

SciPy 1.0: Fundamental algorithms for scientific computing in Python,

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Brightet al., “SciPy 1.0: Fundamental algorithms for scientific computing in Python,” Nature Methods, vol. 17, no. 3, pp. 261–272, 2020

2020
[27]

A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation,

A. Meijer-van de Griend, “A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation,” arXiv preprint arXiv:2304.08814, 2023

work page arXiv 2023
[28]

Optimal layout synthesis for quantum computing,

B. Tan and J. Cong, “Optimal layout synthesis for quantum computing,” arXiv preprint arXiv:2007.15671, 2020

work page arXiv 2007
[29]

Quantum tree generator improves QAOA state-of-the-art for the knapsack problem,

P. Christiansen, L. Binkowski, D. Ramacciotti, and S. Wilkening, “Quantum tree generator improves QAOA state-of-the-art for the knapsack problem,”arXiv preprint arXiv:2411.00518, 2024

work page arXiv 2024
[30]

Eclipse Qrisp QAOA: description and preliminary comparison with Qiskit counterparts,

E. Osaba, M. Petri ˇc, I. Oregi, R. Seidel, A. Ruiz, S. Bock, and M.-A. Kourtis, “Eclipse Qrisp QAOA: description and preliminary comparison with Qiskit counterparts,”arXiv preprint arXiv:2405.20173, 2024

work page arXiv 2024
[31]

Reducing the CNOT count for Clifford+T circuits on NISQ architectures,

V . Gheorghiu, J. Huang, S. M. Li, M. Mosca, and P. Mukhopadhyay, “Reducing the CNOT count for Clifford+T circuits on NISQ architectures,” arXiv preprint arXiv:2011.12191, 2020

work page arXiv 2011
[32]

Highly optimized quantum circuits synthesized via data- flow engines,

P. Rakyta, G. Morse, J. N ´adori, Z. Majnay-Tak ´acs, O. Mencer, and Z. Zimbor ´as, “Highly optimized quantum circuits synthesized via data- flow engines,”arXiv preprint arXiv:2211.07685, 2022

work page arXiv 2022
[33]

QASMBench: A low- level quantum benchmark suite for NISQ evaluation and simulation,

A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “QASMBench: A low- level quantum benchmark suite for NISQ evaluation and simulation,” ACM Transactions on Quantum Computing, vol. 4, no. 2, pp. 1–26, 2023

2023
[34]

MQT Bench: Bench- marking software and design automation tools for quantum computing,

N. Quetschlich, L. Burgholzer, and R. Wille, “MQT Bench: Bench- marking software and design automation tools for quantum computing,” Quantum, vol. 7, p. 1062, 2023

2023
[35]

MaxCut quantum approximate optimization algorithm performance guarantees for p >1 ,

J. Wurtz and P. J. Love, “MaxCut quantum approximate optimization algorithm performance guarantees for p >1 ,”Physical Review A, vol. 103, no. 4, p. 042612, 2021

2021
[36]

Increasing transparency through a multiverse analysis,

S. Steegen, F. Tuerlinckx, A. Gelman, and W. Vanpaemel, “Increasing transparency through a multiverse analysis,”Perspectives on Psychologi- cal Science, vol. 11, no. 5, pp. 702–712, 2016

2016
[37]

Specification curve analysis,

U. Simonsohn, J. P. Simmons, and L. D. Nelson, “Specification curve analysis,”Nature Human Behaviour, vol. 4, no. 11, pp. 1208–1214, 2020

2020
[38]

Qdiff: Differential testing of quantum software stacks,

J. Wang, Q. Zhang, G. H. Xu, and M. Kim, “Qdiff: Differential testing of quantum software stacks,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 692–704

2021
[39]

Muskit: A mutation analysis tool for quantum software testing,

E. Mendiluze, S. Ali, P. Arcaini, and T. Yue, “Muskit: A mutation analysis tool for quantum software testing,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 1266–1270

2021
[40]

Quito: a coverage-guided test generator for quantum programs,

X. Wang, P. Arcaini, T. Yue, and S. Ali, “Quito: a coverage-guided test generator for quantum programs,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 1237–1241

2021
[41]

MorphQ++: A reproducibility study of metamorphic testing on quantum compilers,

L. J. Kitt and M. B. Cohen, “MorphQ++: A reproducibility study of metamorphic testing on quantum compilers,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). ACM, 2024, pp. 8–14

2024
[42]

Qite: Assembly-level, cross-platform testing of quantum computing platforms,

M. Paltenghi and M. Pradel, “Qite: Assembly-level, cross-platform testing of quantum computing platforms,”arXiv preprint arXiv:2503.17322, 2025

work page arXiv 2025
[43]

Qsimbench: An execution-level benchmark suite for quantum software engineering,

G. Bisicchia, A. Bocci, J. Garc ´ıa-Alonso, J. M. Murillo, and A. Brogi, “Qsimbench: An execution-level benchmark suite for quantum software engineering,” inProceedings of the 6th IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2025, pp. 175–180

2025
[44]

The state of open science in software engineering research: A case study of ICSE artifacts,

A. Muttakin, S. Mondal, and C. K. Roy, “The state of open science in software engineering research: A case study of ICSE artifacts,”arXiv preprint arXiv:2601.02066, 2026

work page arXiv 2026
[45]

Qef: Reproducible and exploratory quantum software experiments,

V . Gierisch and W. Mauerer, “Qef: Reproducible and exploratory quantum software experiments,”arXiv preprint arXiv:2511.04563, 2025

work page arXiv 2025
[46]

Quantum software experiments: A reporting and laboratory package structure guidelines proposal,

E. Moguel, J. A. Parejo, A. Ruiz-Cort ´es, J. Garcia-Alonso, and J. M. Murillo, “Quantum software experiments: A reporting and laboratory package structure guidelines proposal,” inProceedings of the 4th IEEE International Conference on Quantum Software (QSW). IEEE, 2025, pp. 185–194

2025
[47]

Reproducibility in quantum computing,

S. Dasgupta and T. S. Humble, “Reproducibility in quantum computing,” inProceedings of the 20th IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2021, pp. 458–461

2021
[48]

Bugs in quantum computing platforms: an empirical study,

M. Paltenghi and M. Pradel, “Bugs in quantum computing platforms: an empirical study,”Proceedings of the ACM on Programming Languages, vol. 6, no. OOPSLA1, pp. 1–27, 2022

2022
[49]

The quantum frontier of software engineering: A systematic mapping study,

M. De Stefano, F. Pecorelli, D. Di Nucci, F. Palomba, and A. De Lucia, “The quantum frontier of software engineering: A systematic mapping study,”Information and Software Technology, vol. 175, p. 107525, 2024

2024
[50]

Quantum software testing: State of the art,

A. Garc´ıa de la Barrera, I. Garc ´ıa-Rodr´ıguez de Guzm´an, M. Polo, and M. Piattini, “Quantum software testing: State of the art,”Journal of Software: Evolution and Process, vol. 35, no. 4, p. e2419, 2023

2023

[1] [1]

M. A. Nielsen and I. L. Chuang,Quantum computation and quantum information. Cambridge University Press, 2010

2010

[2] [2]

Algorithms for quantum computation: discrete logarithms and factoring,

P. W. Shor, “Algorithms for quantum computation: discrete logarithms and factoring,” inProceedings 35th Annual Symposium on Foundations of Computer Science (FOCS). IEEE, 1994, pp. 124–134

1994

[3] [3]

A fast quantum mechanical algorithm for database search,

L. K. Grover, “A fast quantum mechanical algorithm for database search,” inProceedings of the 28th Annual ACM symposium on Theory of computing (STOC). ACM, 1996, pp. 212–219

1996

[4] [4]

Quantum computing in the NISQ era and beyond,

J. Preskill, “Quantum computing in the NISQ era and beyond,”Quantum, vol. 2, p. 79, 2018

2018

[5] [5]

A variational eigenvalue solver on a photonic quantum processor,

A. Peruzzo, J. McClean, P. Shadbolt, M.-H. Yung, X.-Q. Zhou, P. J. Love, A. Aspuru-Guzik, and J. L. O’brien, “A variational eigenvalue solver on a photonic quantum processor,”Nature Communications, vol. 5, no. 1, p. 4213, 2014

2014

[6] [6]

A Quantum Approximate Optimization Algorithm

E. Farhi, J. Goldstone, and S. Gutmann, “A quantum approximate optimization algorithm,”arXiv preprint arXiv:1411.4028, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[7] [7]

Cirq: A python framework for creating, editing, and invoking noisy intermediate-scale quantum (NISQ) circuits,

Cirq Developers, “Cirq: A python framework for creating, editing, and invoking noisy intermediate-scale quantum (NISQ) circuits,” https:// github.com/quantumlib/Cirq, 2022, quantum AI Team, Google

2022

[8] [8]

PennyLane: Automatic differentiation of hybrid quantum-classical computations

V . Bergholm, J. Izaac, M. Schuld, C. Gogolin, S. Ahmed, V . Ajith, M. S. Alam, G. Alonso-Linaje, B. AkashNarayanan, A. Asadiet al., “PennyLane: Automatic differentiation of hybrid quantum-classical computations,”arXiv preprint arXiv:1811.04968, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[9] [9]

Quantum computer benchmarking: An explorative systematic literature review,

T. Rohe, F. H. Ruiloba, S. Egger, S. von Beck, J. Stein, and C. Linnhoff- Popien, “Quantum computer benchmarking: An explorative systematic literature review,”arXiv preprint arXiv:2509.03078, 2025

work page arXiv 2025

[10] [10]

Tackling the qubit mapping problem for nisq-era quantum devices,

G. Li, Y . Ding, and Y . Xie, “Tackling the qubit mapping problem for nisq-era quantum devices,” inProceedings of the 24th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). ACM, 2019, pp. 1001–1014

2019

[11] [11]

An empirical study into the effects of transpilation on quantum circuit smells,

M. D. Stefano, D. D. Nucci, F. Palomba, and A. D. Lucia, “An empirical study into the effects of transpilation on quantum circuit smells,”Empirical Software Engineering, vol. 29, no. 3, p. 61, 2024

2024

[12] [12]

MorphQ: Metamorphic testing of the qiskit quantum computing platform,

M. Paltenghi and M. Pradel, “MorphQ: Metamorphic testing of the qiskit quantum computing platform,” inProceedings of the 45th IEEE/ACM International Conference on Software Engineering (ICSE). IEEE, 2023, pp. 2413–2424

2023

[13] [13]

Benchmarking the performance of quantum computing software for quantum circuit creation, manipulation and compilation,

P. D. Nation, A. A. Saki, S. Brandhofer, L. Bello, S. Garion, M. Treinish, and A. Javadi-Abhari, “Benchmarking the performance of quantum computing software for quantum circuit creation, manipulation and compilation,”Nature Computational Science, vol. 5, pp. 427–435, 2025

2025

[14] [14]

1-2-3 reproducibility for quantum software experiments,

W. Mauerer and S. Scherzinger, “1-2-3 reproducibility for quantum software experiments,” inProceedings of the 29th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER). IEEE, 2022, pp. 1247–1248

2022

[15] [15]

Stability of quantum computers,

S. Dasgupta, “Stability of quantum computers,”arXiv preprint arXiv:2404.19082, 2024

work page arXiv 2024

[16] [16]

Bench- marking the quantum approximate optimization algorithm,

M. Willsch, D. Willsch, F. Jin, H. De Raedt, and K. Michielsen, “Bench- marking the quantum approximate optimization algorithm,”Quantum Information Processing, vol. 19, no. 7, p. 197, 2020

2020

[17] [17]

Quantum noise in the flow of time: A temporal study of the noise in quantum computers,

B. Baheri, Q. Guan, V . Chaudhary, and A. Li, “Quantum noise in the flow of time: A temporal study of the noise in quantum computers,” inProceedings of the 28th IEEE International Symposium on On-Line Testing and Robust System Design (IOLTS). IEEE, 2022, pp. 1–5

2022

[18] [18]

Adaptive mitigation of time-varying quantum noise,

S. Dasgupta, T. S. Humble, and A. Danageozian, “Adaptive mitigation of time-varying quantum noise,” inProceedings of the 4th IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2023, pp. 99–110

2023

[19] [19]

CLAIMSTAB-QC: Audit evidence package,

B. Ye, P. Liang, M. T. Sabzevari, and A. A. Khan, “CLAIMSTAB-QC: Audit evidence package,” 2026, artifact package to be released publicly after the review period

2026

[20] [20]

Arline benchmarks: Automated benchmarking platform for quantum compilers,

Y . Kharkov, A. Ivanova, E. Mikhantiev, and A. Kotelnikov, “Arline benchmarks: Automated benchmarking platform for quantum compilers,” arXiv preprint arXiv:2202.14025, 2022

work page arXiv 2022

[21] [21]

Probable inference, the law of succession, and statistical inference,

E. B. Wilson, “Probable inference, the law of succession, and statistical inference,”Journal of the American Statistical Association, vol. 22, no. 158, pp. 209–212, 1927

1927

[22] [22]

Interval estimation for a binomial proportion,

L. D. Brown, T. T. Cai, and A. DasGupta, “Interval estimation for a binomial proportion,”Statistical Science, vol. 16, no. 2, pp. 101–133, 2001

2001

[23] [23]

D. G. Altman, D. Machin, T. N. Bryant, and M. J. Gardner, Eds.,Statistics with Confidence: Confidence Intervals and Statistical Guidelines, 2nd ed. London: BMJ Books, 2000

2000

[24] [24]

Quantum computing with Qiskit

A. Javadi-Abhari, M. Treinish, K. Krsulich, C. J. Wood, J. Lishman, J. Gacon, S. Martiel, P. D. Nation, L. S. Bishop, A. W. Crosset al., “Quantum computing with Qiskit,”arXiv preprint arXiv:2405.08810, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Array programming with numpy,

C. R. Harris, K. J. Millman, S. J. Van Der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smithet al., “Array programming with numpy,”Nature, vol. 585, no. 7825, pp. 357–362, 2020

2020

[26] [26]

SciPy 1.0: Fundamental algorithms for scientific computing in Python,

P. Virtanen, R. Gommers, T. E. Oliphant, M. Haberland, T. Reddy, D. Cournapeau, E. Burovski, P. Peterson, W. Weckesser, J. Brightet al., “SciPy 1.0: Fundamental algorithms for scientific computing in Python,” Nature Methods, vol. 17, no. 3, pp. 261–272, 2020

2020

[27] [27]

A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation,

A. Meijer-van de Griend, “A comparison of quantum compilers using a DAG-based or phase polynomial-based intermediate representation,” arXiv preprint arXiv:2304.08814, 2023

work page arXiv 2023

[28] [28]

Optimal layout synthesis for quantum computing,

B. Tan and J. Cong, “Optimal layout synthesis for quantum computing,” arXiv preprint arXiv:2007.15671, 2020

work page arXiv 2007

[29] [29]

Quantum tree generator improves QAOA state-of-the-art for the knapsack problem,

P. Christiansen, L. Binkowski, D. Ramacciotti, and S. Wilkening, “Quantum tree generator improves QAOA state-of-the-art for the knapsack problem,”arXiv preprint arXiv:2411.00518, 2024

work page arXiv 2024

[30] [30]

Eclipse Qrisp QAOA: description and preliminary comparison with Qiskit counterparts,

E. Osaba, M. Petri ˇc, I. Oregi, R. Seidel, A. Ruiz, S. Bock, and M.-A. Kourtis, “Eclipse Qrisp QAOA: description and preliminary comparison with Qiskit counterparts,”arXiv preprint arXiv:2405.20173, 2024

work page arXiv 2024

[31] [31]

Reducing the CNOT count for Clifford+T circuits on NISQ architectures,

V . Gheorghiu, J. Huang, S. M. Li, M. Mosca, and P. Mukhopadhyay, “Reducing the CNOT count for Clifford+T circuits on NISQ architectures,” arXiv preprint arXiv:2011.12191, 2020

work page arXiv 2011

[32] [32]

Highly optimized quantum circuits synthesized via data- flow engines,

P. Rakyta, G. Morse, J. N ´adori, Z. Majnay-Tak ´acs, O. Mencer, and Z. Zimbor ´as, “Highly optimized quantum circuits synthesized via data- flow engines,”arXiv preprint arXiv:2211.07685, 2022

work page arXiv 2022

[33] [33]

QASMBench: A low- level quantum benchmark suite for NISQ evaluation and simulation,

A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “QASMBench: A low- level quantum benchmark suite for NISQ evaluation and simulation,” ACM Transactions on Quantum Computing, vol. 4, no. 2, pp. 1–26, 2023

2023

[34] [34]

MQT Bench: Bench- marking software and design automation tools for quantum computing,

N. Quetschlich, L. Burgholzer, and R. Wille, “MQT Bench: Bench- marking software and design automation tools for quantum computing,” Quantum, vol. 7, p. 1062, 2023

2023

[35] [35]

MaxCut quantum approximate optimization algorithm performance guarantees for p >1 ,

J. Wurtz and P. J. Love, “MaxCut quantum approximate optimization algorithm performance guarantees for p >1 ,”Physical Review A, vol. 103, no. 4, p. 042612, 2021

2021

[36] [36]

Increasing transparency through a multiverse analysis,

S. Steegen, F. Tuerlinckx, A. Gelman, and W. Vanpaemel, “Increasing transparency through a multiverse analysis,”Perspectives on Psychologi- cal Science, vol. 11, no. 5, pp. 702–712, 2016

2016

[37] [37]

Specification curve analysis,

U. Simonsohn, J. P. Simmons, and L. D. Nelson, “Specification curve analysis,”Nature Human Behaviour, vol. 4, no. 11, pp. 1208–1214, 2020

2020

[38] [38]

Qdiff: Differential testing of quantum software stacks,

J. Wang, Q. Zhang, G. H. Xu, and M. Kim, “Qdiff: Differential testing of quantum software stacks,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 692–704

2021

[39] [39]

Muskit: A mutation analysis tool for quantum software testing,

E. Mendiluze, S. Ali, P. Arcaini, and T. Yue, “Muskit: A mutation analysis tool for quantum software testing,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 1266–1270

2021

[40] [40]

Quito: a coverage-guided test generator for quantum programs,

X. Wang, P. Arcaini, T. Yue, and S. Ali, “Quito: a coverage-guided test generator for quantum programs,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering (ASE). IEEE, 2021, pp. 1237–1241

2021

[41] [41]

MorphQ++: A reproducibility study of metamorphic testing on quantum compilers,

L. J. Kitt and M. B. Cohen, “MorphQ++: A reproducibility study of metamorphic testing on quantum compilers,” inProceedings of the 39th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW). ACM, 2024, pp. 8–14

2024

[42] [42]

Qite: Assembly-level, cross-platform testing of quantum computing platforms,

M. Paltenghi and M. Pradel, “Qite: Assembly-level, cross-platform testing of quantum computing platforms,”arXiv preprint arXiv:2503.17322, 2025

work page arXiv 2025

[43] [43]

Qsimbench: An execution-level benchmark suite for quantum software engineering,

G. Bisicchia, A. Bocci, J. Garc ´ıa-Alonso, J. M. Murillo, and A. Brogi, “Qsimbench: An execution-level benchmark suite for quantum software engineering,” inProceedings of the 6th IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2025, pp. 175–180

2025

[44] [44]

The state of open science in software engineering research: A case study of ICSE artifacts,

A. Muttakin, S. Mondal, and C. K. Roy, “The state of open science in software engineering research: A case study of ICSE artifacts,”arXiv preprint arXiv:2601.02066, 2026

work page arXiv 2026

[45] [45]

Qef: Reproducible and exploratory quantum software experiments,

V . Gierisch and W. Mauerer, “Qef: Reproducible and exploratory quantum software experiments,”arXiv preprint arXiv:2511.04563, 2025

work page arXiv 2025

[46] [46]

Quantum software experiments: A reporting and laboratory package structure guidelines proposal,

E. Moguel, J. A. Parejo, A. Ruiz-Cort ´es, J. Garcia-Alonso, and J. M. Murillo, “Quantum software experiments: A reporting and laboratory package structure guidelines proposal,” inProceedings of the 4th IEEE International Conference on Quantum Software (QSW). IEEE, 2025, pp. 185–194

2025

[47] [47]

Reproducibility in quantum computing,

S. Dasgupta and T. S. Humble, “Reproducibility in quantum computing,” inProceedings of the 20th IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2021, pp. 458–461

2021

[48] [48]

Bugs in quantum computing platforms: an empirical study,

M. Paltenghi and M. Pradel, “Bugs in quantum computing platforms: an empirical study,”Proceedings of the ACM on Programming Languages, vol. 6, no. OOPSLA1, pp. 1–27, 2022

2022

[49] [49]

The quantum frontier of software engineering: A systematic mapping study,

M. De Stefano, F. Pecorelli, D. Di Nucci, F. Palomba, and A. De Lucia, “The quantum frontier of software engineering: A systematic mapping study,”Information and Software Technology, vol. 175, p. 107525, 2024

2024

[50] [50]

Quantum software testing: State of the art,

A. Garc´ıa de la Barrera, I. Garc ´ıa-Rodr´ıguez de Guzm´an, M. Polo, and M. Piattini, “Quantum software testing: State of the art,”Journal of Software: Evolution and Process, vol. 35, no. 4, p. e2419, 2023

2023