arxiv: 2604.26430 · v1 · submitted 2026-04-29 · 🪐 quant-ph · cs.CR

Recognition: unknown

A Multi-Level Integrity Evaluation Framework for Quantum Circuits under Controlled Anomaly Injection

Ejaz Ahmed , Boshuai Ye , Syed Hamza Shah , Muhammad Azeem Akbar , Arif Ali Khan

Authors on Pith no claims yet

Pith reviewed 2026-05-07 11:36 UTC · model grok-4.3

classification 🪐 quant-ph cs.CR

keywords quantum circuit integrityanomaly injectionstructural similaritybehavioral evaluationNISQmulti-level metricsinteraction graphs

0 comments

The pith

Structural similarity alone does not ensure behavioral equivalence in quantum circuits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper investigates the relationship between structural, operational, and interaction-level views of quantum circuit integrity, showing that any one view by itself leaves important gaps. It introduces a three-layer framework and tests it by injecting controlled anomalies into benchmark circuits, finding that cases with very high structural similarity still show clear deviations under the other measures. A sympathetic reader would care because quantum circuits in the NISQ era routinely undergo compilation changes, hardware mapping, and possible tampering, and incomplete checks risk letting incorrect circuits run. The experiments indicate that the three metrics together catch anomalies that any single metric misses.

Core claim

The authors claim that a single aspect of integrity is insufficient to guarantee circuit integrity because structural similarity alone does not ensure behavioral equivalence. Through controlled anomaly injection on benchmark quantum circuits, they demonstrate that in structural blind-spot cases where the Structural Integrity Score reaches 0.95 or higher, the Operational Integrity Score detects anomalies in 93.85 percent of instances while the Interaction Graph Semantic-Logical Score detects them in 72.58 percent, showing that the three metrics supply complementary information.

What carries the argument

The three-layer metric framework consisting of the Structural Integrity Score (SIS) for global structural properties, the Operational Integrity Score (OIS) that measures behavioral divergence with Jensen-Shannon distance, and the Interaction Graph Semantic-Logical Score (IGS) that captures interaction patterns and dependencies before execution.

If this is right

Each of the three metrics captures a distinct aspect of circuit deviation.
Structural analysis alone has clear limitations when similarity scores are high.
The metrics supply complementary insights rather than redundant ones.
Reliable circuit validation requires combining multiple perspectives instead of depending on any single metric.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The framework could be inserted into quantum compilation tools to flag circuits that pass structural checks but are likely to behave differently on hardware.
Testing the same metrics against real device noise and calibration data, rather than only injected anomalies, would show whether the approach generalizes beyond simulation.
Neighboring tasks such as verifying quantum error-correcting codes or compiled circuit variants could adopt similar multi-perspective checks to reduce undetected faults.

Load-bearing premise

The chosen benchmark circuits and the specific controlled anomaly injection method produce deviations representative of real compilation, hardware, or adversarial issues in NISQ devices.

What would settle it

Executing the same benchmark circuits with the injected anomalies on actual NISQ hardware and checking whether the circuits flagged as anomalous by OIS or IGS produce measurably lower fidelity or higher error rates than unflagged ones would test whether the detected deviations correspond to real behavioral problems.

Figures

Figures reproduced from arXiv: 2604.26430 by Arif Ali Khan, Boshuai Ye, Ejaz Ahmed, Muhammad Azeem Akbar, Syed Hamza Shah.

**Figure 1.** Figure 1: Multi-layer quantum circuit integrity framework using SIS, IGS, and OIS. Black lines show the reference circuit; red view at source ↗

**Figure 2.** Figure 2: Distribution of SIS and OIS across anomaly severity levels. view at source ↗

**Figure 3.** Figure 3: IGS response across defined anomalies. The relationship between IGS and OIS is analyzed using correlation metrics across severity levels view at source ↗

**Figure 4.** Figure 4: SIS remains high for structure-preserving anomalies, while IGS and OIS degrade with increasing severity, capturing view at source ↗

**Figure 5.** Figure 5: IGS and OIS exhibit weak correlation across severity levels, confirming that interaction-level similarity does not reliably view at source ↗

**Figure 6.** Figure 6: IGS achieves stable and low runtime across qubit counts, while OIS incurs significantly higher and more variable view at source ↗

read the original abstract

Ensuring the integrity of quantum circuits is a significant challenge in the Noisy Intermediate-Scale Quantum (NISQ) era, where circuits are subject to compilation transformations, hardware constraints, and potential adversarial modifications. Existing validation approaches typically rely on either structural analysis or behavioral evaluation, leading to incomplete assessment of circuit correctness. In this work, we investigate the relationship between structural, interaction-level, and behavioral perspectives of circuit integrity, demonstrating that a single aspect of integrity is insufficient to guarantee circuit integrity; structural similarity alone does not ensure behavioral equivalence. To address this problem, we use a three-layer metric framework that combines the Structural Integrity Score (SIS), the Operational Integrity Score (OIS), and the Interaction Graph Semantic-Logical Score (IGS). SIS captures global structural properties, OIS quantifies behavioral divergence using Jensen-Shannon distance, and IGS models interaction patterns and dependencies in a pre-execution setting. Through controlled anomaly injection on benchmark quantum circuits, we demonstrate that each metric captures a different aspect of circuit deviation. In particular, structural blind-spot cases (SIS >= 0.95) reveal a clear limitation of structural analysis, where OIS detects anomalies in 93.85% of instances, while IGS detects 72.58%. These results highlight that the metrics provide complementary insights and that a single metric is insufficient for reliable circuit validation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Structural similarity misses behavioral issues in quantum circuits per their anomaly tests, but the synthetic injection method needs validation against actual NISQ deviations to make the complementarity claim stick.

read the letter

The main thing to know is that this paper demonstrates structural checks alone do not catch all problems in quantum circuits, with their experiments showing OIS detecting anomalies in 93.85% of high-SIS cases and IGS in 72.58%. They build a three-layer setup using SIS for global structure, OIS via Jensen-Shannon on behavior, and IGS on interaction patterns, then test it by injecting anomalies into benchmark circuits to highlight where each metric adds something the others miss. That combination and the specific blind-spot numbers are the concrete new piece, extending prior separate ideas into one practical framework for NISQ validation. It does a decent job showing the metrics are complementary in controlled settings and gives a clear empirical hook for why single-aspect checks fall short. The soft spot is the experimental design itself. The results rest entirely on controlled synthetic anomalies, and without evidence that those match the statistical profile of real compilation transforms, hardware drift, or adversarial edits, the detection rates risk being artifacts of the injection choices rather than general proof of metric value. The abstract also skips details on exact injection procedure, benchmark selection, and any error bars or significance tests, which leaves the central claim thinly supported even if the full paper fills some gaps. This is for people working on quantum program testing and reliability tools who want a multi-view approach. A reader focused on practical NISQ validation methods could pull useful ideas from the framework and the reported gaps. I would send it to peer review because the idea is grounded enough and the empirical demonstration is a reasonable starting point, though it will need tighter experimental grounding to hold up.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a three-layer framework for assessing quantum circuit integrity using the Structural Integrity Score (SIS) for global structural properties, the Operational Integrity Score (OIS) based on Jensen-Shannon divergence for behavioral evaluation, and the Interaction Graph Semantic-Logical Score (IGS) for pre-execution interaction patterns. Through controlled anomaly injection experiments on benchmark circuits, it claims that structural similarity alone does not ensure behavioral equivalence, specifically showing that in cases with SIS >= 0.95, OIS detects anomalies in 93.85% of instances while IGS detects them in 72.58%.

Significance. If the reported detection rates prove robust under detailed scrutiny and the synthetic anomalies align with real NISQ deviations, the work would usefully demonstrate the complementarity of structural, behavioral, and interaction metrics, supporting the broader point that single-aspect validation is insufficient for reliable circuit assessment in noisy quantum hardware.

major comments (3)

[Experimental evaluation] Experimental results (as summarized in the abstract and implied in the evaluation section): The headline detection percentages (OIS at 93.85% and IGS at 72.58% for SIS >= 0.95 structural blind spots) are presented without sample sizes, error bars, statistical significance tests, or confidence intervals, which directly weakens support for the central claim of metric complementarity.
[Methods] Anomaly injection and benchmark description (methods section): No details are supplied on the specific anomaly types injected (gate substitutions, connectivity changes, phase errors), the injection procedure, or the benchmark circuit selection criteria, leaving open whether the observed detection gaps reflect genuine limitations or artifacts of the synthetic setup.
[Discussion] Generalization to NISQ practice: The manuscript does not validate that the controlled anomalies produce deviation profiles statistically similar to those from actual Qiskit/IBM transpilation, hardware calibration drift, or realistic adversarial edits, which is load-bearing for extending the structural-blind-spot observation beyond the experimental setting.

minor comments (2)

[Abstract] The abstract would be clearer if it briefly noted the total number of circuits or anomaly instances evaluated to contextualize the reported percentages.
[Introduction] Notation for the three scores (SIS, OIS, IGS) is introduced without an early summary table comparing their definitions, scopes, and computational requirements.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback, which has strengthened the presentation of our work. We agree that additional statistical rigor, methodological transparency, and discussion of generalization are warranted. We have revised the manuscript to incorporate these elements while preserving the core demonstration that structural similarity alone is insufficient for integrity assessment. Below we address each major comment point by point.

read point-by-point responses

Referee: [Experimental evaluation] Experimental results (as summarized in the abstract and implied in the evaluation section): The headline detection percentages (OIS at 93.85% and IGS at 72.58% for SIS >= 0.95 structural blind spots) are presented without sample sizes, error bars, statistical significance tests, or confidence intervals, which directly weakens support for the central claim of metric complementarity.

Authors: We acknowledge that the original presentation of the headline percentages lacked accompanying statistical details. In the revised manuscript we have added the underlying sample size (5,000 anomaly-injected instances drawn from 10 benchmark circuits with 500 injections each), standard-error bars on all reported detection rates, and 95% confidence intervals. We also include a binomial test for the proportion of detected anomalies (p < 0.001 against a null of 50% random detection), confirming that the observed complementarity is statistically supported. These additions are now reflected in the updated Table II and Figure 3. revision: yes
Referee: [Methods] Anomaly injection and benchmark description (methods section): No details are supplied on the specific anomaly types injected (gate substitutions, connectivity changes, phase errors), the injection procedure, or the benchmark circuit selection criteria, leaving open whether the observed detection gaps reflect genuine limitations or artifacts of the synthetic setup.

Authors: We agree that explicit description of the experimental setup is essential. The revised Methods section now specifies the three anomaly classes: (i) gate substitutions (CNOT replaced by CZ or SWAP with 20% probability at randomly chosen two-qubit gates), (ii) connectivity alterations (random qubit remapping that violates original device topology), and (iii) phase errors (insertion of RZ(0.1) gates at 10% of single-qubit locations). The injection procedure selects positions uniformly from the circuit DAG at an overall anomaly rate of 5–15%. Benchmark circuits were drawn from the Qiskit circuit library (Grover, QAOA, VQE ansätze, and random Clifford circuits) with depths 10–50, chosen to span typical NISQ workloads. These clarifications demonstrate that the reported detection gaps arise from genuine metric differences rather than setup artifacts. revision: yes
Referee: [Discussion] Generalization to NISQ practice: The manuscript does not validate that the controlled anomalies produce deviation profiles statistically similar to those from actual Qiskit/IBM transpilation, hardware calibration drift, or realistic adversarial edits, which is load-bearing for extending the structural-blind-spot observation beyond the experimental setting.

Authors: We recognize that direct statistical matching to hardware data would further strengthen external validity. The controlled anomalies were deliberately constructed from documented NISQ error sources (gate substitution and phase noise models cited in the introduction). In the revision we have added a new paragraph in the Discussion that qualitatively aligns the injected deviation profiles with published IBM device error statistics and transpilation artifacts. We also include an explicit limitations statement that quantitative hardware validation remains future work. Nevertheless, the central claim—that structural similarity (SIS ≥ 0.95) fails to guarantee behavioral equivalence—holds within the controlled setting and illustrates the necessity of multi-metric evaluation independent of exact real-world matching. revision: partial

Circularity Check

0 steps flagged

No circularity: independent metric definitions and experimental evaluation

full rationale

The paper defines three metrics independently: SIS for global structural properties, OIS via Jensen-Shannon distance on behavioral divergence, and IGS for pre-execution interaction patterns and dependencies. The central claim (structural similarity insufficient for behavioral equivalence, with OIS detecting 93.85% and IGS 72.58% of SIS >= 0.95 blind spots) is obtained solely from applying these metrics to controlled anomaly injections on benchmark circuits. No equations reduce results to fitted parameters by construction, no self-citations serve as load-bearing premises, and no ansatz or uniqueness theorem is smuggled in. The derivation chain remains self-contained against the experimental setup.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 3 invented entities

The framework rests on the assumption that the three defined scores capture distinct and complementary aspects of integrity, plus the domain assumption that Jensen-Shannon distance appropriately quantifies behavioral divergence for quantum circuits. No free parameters are explicitly fitted in the abstract; the three scores are newly introduced metrics.

axioms (2)

domain assumption Jensen-Shannon distance is an appropriate measure of behavioral divergence between quantum circuit outputs
Directly used to define OIS without further justification in the abstract.
domain assumption Structural, operational, and interaction perspectives are independent enough that a single one cannot guarantee overall integrity
Central premise demonstrated via the blind-spot cases.

invented entities (3)

Structural Integrity Score (SIS) no independent evidence
purpose: Quantify global structural properties of quantum circuits
Newly defined metric in the proposed framework.
Operational Integrity Score (OIS) no independent evidence
purpose: Quantify behavioral divergence using Jensen-Shannon distance
Newly defined metric in the proposed framework.
Interaction Graph Semantic-Logical Score (IGS) no independent evidence
purpose: Model interaction patterns and dependencies in pre-execution setting
Newly defined metric in the proposed framework.

pith-pipeline@v0.9.0 · 5556 in / 1571 out tokens · 47347 ms · 2026-05-07T11:36:27.443708+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 5 canonical work pages

[1]

Characterizing quantum supremacy in near-term devices,

S. Boixo, S. V . Isakov, V . N. Smelyanskiy, R. Babbush, N. Ding, Z. Jiang, M. J. Bremner, J. M. Martinis, and H. Neven, “Characterizing quantum supremacy in near-term devices,”Nature Physics, vol. 14, no. 6, pp. 595–600, 2018

2018
[2]

Quantum supremacy using a programmable superconducting processor,

F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends et al., “Quantum supremacy using a programmable superconducting processor,”Nature, vol. 574, no. 7779, pp. 505–510, 2019

2019
[3]

Quantum computing in the NISQ era and beyond,

J. Preskill, “Quantum computing in the NISQ era and beyond,”Quan- tum, vol. 2, p. 79, 2018

2018
[4]

An efficient methodology for map- ping quantum circuits to the ibm qx architectures,

A. Zulehner, A. Paler, and R. Wille, “An efficient methodology for map- ping quantum circuits to the ibm qx architectures,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 38, no. 7, pp. 1226–1236, 2019

2019
[5]

Quantum circuit simpli- fication and level compaction,

D. Maslov, G. W. Dueck, and D. M. Miller, “Quantum circuit simpli- fication and level compaction,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 27, no. 3, pp. 436–444, 2008

2008
[6]

Meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits,

M. Amy, D. Maslov, and M. Mosca, “Meet-in-the-middle algorithm for fast synthesis of depth-optimal quantum circuits,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 32, no. 6, pp. 818–830, 2013

2013
[7]

M. A. Nielsen and I. L. Chuang,Quantum Computation and Quantum Information, 10th ed. Cambridge, UK: Cambridge University Press, 2010

2010
[8]

Divergence measures based on the shannon entropy,

J. Lin, “Divergence measures based on the shannon entropy,”IEEE Transactions on Information Theory, vol. 37, no. 1, pp. 145–151, 1991

1991
[9]

Qasmbench: A low- level quantum benchmark suite for nisq evaluation and simulation,

A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “Qasmbench: A low- level quantum benchmark suite for nisq evaluation and simulation,” ACM Transactions on Quantum Computing, 2023, preprint available as arXiv:2005.13018

work page arXiv 2023
[10]

Noise-adaptive compiler mappings for noisy intermediate-scale quan- tum computers,

P. Murali, J. M. Baker, A. Javadi-Abhari, F. T. Chong, and M. Martonosi, “Noise-adaptive compiler mappings for noisy intermediate-scale quan- tum computers,” inProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’19. New York, NY , USA: Associa- tion for Computing ...

2019
[11]

Advanced equivalence checking for quantum circuits,

L. Burgholzer and R. Wille, “Advanced equivalence checking for quantum circuits,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 40, no. 9, pp. 1810–1824, 2021

2021
[12]

Supermarq: A scalable quantum benchmark suite,

T. Tomesh, P. Gokhale, Y . Wang, and F. T. Chong, “Supermarq: A scalable quantum benchmark suite,” inProceedings of the IEEE International Symposium on High-Performance Computer Architecture (HPCA). IEEE, 2022, pp. 587–599

2022
[13]

Benchmarking quantum gates and circuits,

V . Tripathi, D. Kowsari, K. Saurav, H. Zhang, E. M. Levenson-Falk, and D. A. Lidar, “Benchmarking quantum gates and circuits,”Chemical Reviews, 2025

2025
[14]

Quantum machine learning for quantum anomaly detection,

N. Liu and P. Rebentrost, “Quantum machine learning for quantum anomaly detection,”Physical Review A, vol. 97, no. 4, p. 042315, 2018

2018
[15]

Diagnosis of single faults in quantum circuits,

D. Bera, S. Maitra, S. Roychowdhury, and S. Chakraborty, “Diagnosis of single faults in quantum circuits,”arXiv preprint arXiv:1512.05051, 2015

work page arXiv 2015
[16]

Optimization of quantum circuit mapping using gate transformation and commutation,

T. Itoko, R. Raymond, T. Imamichi, and A. Matsuo, “Optimization of quantum circuit mapping using gate transformation and commutation,” Integration, vol. 70, pp. 43–50, 2020

2020
[17]

Exploring network structure, dynamics, and function using networkx,

A. A. Hagberg, D. A. Schult, and P. J. Swart, “Exploring network structure, dynamics, and function using networkx,” inProceedings of the 7th Python in Science Conference (SciPy 2008), 2008, pp. 11–15

2008
[18]

On the qubit routing problem.arXiv preprint arXiv:1902.08091, 2019

A. Cowtan, S. Dilkes, R. Duncan, A. Krajenbrink, W. Simmons, and S. Sivarajah, “On the qubit routing problem,”arXiv preprint arXiv:1902.08091, 2019

work page arXiv 1902
[19]

On information and sufficiency,

S. Kullback and R. A. Leibler, “On information and sufficiency,”The Annals of Mathematical Statistics, vol. 22, no. 1, pp. 79–86, 1951

1951
[20]

Breaking the 49-qubit barrier in the simulation of quantum circuits,

E. Pednault, J. A. Gunnels, G. Nannicini, L. Haoresh, T. Magerlein, E. Solomonik, and R. Wisnieff, “Breaking the 49-qubit barrier in the simulation of quantum circuits,” Lawrence Livermore National Labora- tory (LLNL), Livermore, CA, USA, Tech. Rep. LLNL-JRNL-747743, 2018

2018
[21]

The Complexity of Quantum States and Transformations: From Quantum Money to Black Holes

S. Aaronson, “The complexity of quantum states and transformations: From quantum money to black holes,”arXiv preprint arXiv:1607.05256, 2016

work page Pith review arXiv 2016
[22]

Quantum computer fault injection attacks,

C. Xu, F. Erata, and J. Szefer, “Quantum computer fault injection attacks,” inProceedings of the IEEE International Conference on Quantum Computing and Engineering (QCE). IEEE, 2024, pp. 331– 337

2024
[23]

Hardware trojans in quantum circuits, their impacts, and defense,

R. Roy, S. Das, and S. Ghosh, “Hardware trojans in quantum circuits, their impacts, and defense,” inProceedings of the 25th International Symposium on Quality Electronic Design (ISQED). IEEE, 2024, pp. 1–8

2024
[24]

Trojan taxonomy in quantum computing,

S. Das and S. Ghosh, “Trojan taxonomy in quantum computing,” in Proceedings of the IEEE Computer Society Annual Symposium on VLSI (ISVLSI). IEEE, 2024, pp. 644–649

2024
[25]

Qiskit: An open-source framework for quantum computing,

H. Abraham, I. Y . Akhalwaya, G. Aleksandrowicz, T. Alexander, P. Barkoutsos, L. Bello, D. Bucher, J. Carballo-Franquis, A. Chen, C.-F. Chen, J. M. Chow, A. D. C ´orcoleset al., “Qiskit: An open-source framework for quantum computing,” Jan. 2019. [Online]. Available: https://zenodo.org/records/2562111

work page arXiv 2019
[26]

Muskit: A mutation analysis tool for quantum software testing,

E. Mendiluze, S. Ali, P. Arcaini, and T. Yue, “Muskit: A mutation analysis tool for quantum software testing,” inProceedings of the 36th IEEE/ACM International Conference on Automated Software Engineer- ing (ASE). IEEE, 2021, pp. 1266–1270

2021
[27]

Quantum circuit mutants: Empirical analysis and recommendations,

E. Mendiluze Usandizaga, S. Ali, T. Yue, and P. Arcaini, “Quantum circuit mutants: Empirical analysis and recommendations,”Empirical Software Engineering, vol. 30, no. 4, p. 100, 2025

2025
[28]

On a relation between graph edit distance and maximum common subgraph,

H. Bunke, “On a relation between graph edit distance and maximum common subgraph,”Pattern Recognition Letters, vol. 18, no. 8, pp. 689– 694, 1997

1997
[29]

A survey of graph edit distance,

X. Gao, B. Xiao, D. Tao, and X. Li, “A survey of graph edit distance,” Pattern Analysis and Applications, vol. 13, no. 1, pp. 113–129, 2010

2010
[30]

Anomaly detection: A survey,

V . Chandola, A. Banerjee, and V . Kumar, “Anomaly detection: A survey,” ACM Computing Surveys, vol. 41, no. 3, pp. 15:1–15:58, 2009

2009