pith. machine review for the scientific record. sign in

arxiv: 2604.20804 · v1 · submitted 2026-04-22 · 🪐 quant-ph

Recognition: unknown

Quantum hardware noise learning via differentiable Kraus representation on tensor networks

Authors on Pith no claims yet

Pith reviewed 2026-05-10 00:23 UTC · model grok-4.3

classification 🪐 quant-ph
keywords quantum noise learningKraus operatorstensor networkssuperconducting quantum processorsnoise characterizationquantum circuit simulationerror mitigation
0
0 comments X

The pith

A noise learning method using differentiable Kraus operators on tensor networks generalizes from one circuit to another on a superconducting processor.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a technique to learn the noise characteristics of quantum hardware by optimizing differentiable representations of noise channels against data from a single experiment. This is achieved through a parameterization that guarantees physical validity and efficient simulation via tensor networks. The key result is that parameters trained on a ripple-carry adder circuit successfully reproduce the output distribution of both the training circuit and an unrelated multiplier circuit on the ibm_fez device. This suggests the model has captured device-intrinsic noise properties rather than circuit-specific artifacts. Such a capability supports more reliable offline testing of quantum algorithms under realistic noise conditions.

Core claim

We present a method for learning quantum hardware noise from a measurement distribution of a single device experiment using automatically differentiable Kraus operators from a Stinespring-based parameterization. Circuits are simulated with a matrix product density operator forward model, with independent channels for each native gate type, nearest-neighbor crosstalk interactions, and state preparation and measurement operations. All channels are optimized end-to-end to match observed measurement distributions. On the ibm_fez processor, training on a ripple-carry adder reproduces the device output, and the same parameters track the distribution of an unrelated multiplier circuit, indicating 1

What carries the argument

Differentiable Kraus operators obtained from Stinespring parameterization, attached independently to gate types, crosstalk, and SPAM, simulated via matrix product density operators forward model.

If this is right

  • The learned noise parameters reproduce the observed output distribution of the training circuit.
  • The same parameters accurately track the device distribution for an unrelated circuit without retraining.
  • The method captures intrinsic device noise characteristics rather than overfitting to specific circuits.
  • It enables offline noise-aware predictions for quantum algorithms such as QAOA with error detection schemes.
  • Generalization holds consistently across a range of benchmark circuits.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The framework could be extended to predict noise in even larger circuits or different hardware architectures by reusing the learned parameters.
  • This single-experiment learning might integrate with existing calibration routines to reduce overall characterization overhead.
  • It opens the possibility of using the model for designing error-corrected algorithms tailored to the specific device's noise profile.

Load-bearing premise

Independent noise channels for each native gate type, nearest-neighbor crosstalk interactions, and SPAM operations suffice to capture the device's full noise behavior and generalize across arbitrary circuits.

What would settle it

Observing a significant mismatch between the simulated distribution using the learned parameters and actual device measurements on a new circuit not used in training or evaluation would falsify the generalization claim.

Figures

Figures reproduced from arXiv: 2604.20804 by Ryo Sakai, Yu Yamashiro.

Figure 1
Figure 1. Figure 1: Noise model architecture on a linear four-qubit chain. Red boxes [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: shows the convergence of LNLL and LOT in train￾ing on the distribution of the adder n10 circuit on ibm fez. Both runs use the same MPDO configuration (χ = 4, κ = 8). For the OT run, we set the entropic regularization to ε = 0.1 and solve with 1024 Sinkhorn iterations. Overall, both losses show similar convergence behavior, especially in the early and late iterations, so for the present dataset the two stra… view at source ↗
Figure 5
Figure 5. Figure 5: Generalization test: device vs. MPDO-simulated distributions for [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 4
Figure 4. Figure 4: Measurement distribution overlay for the adder [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Classical fidelity between device and simulated output distributions for [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Single-layer QAOA circuit with parity-check error detection. The [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: Expectation value of the merit factor for the accepted LABS QAOA [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

We present a method for learning quantum hardware noise from a measurement distribution of a single device experiment. Each noise channel is represented by automatically differentiable Kraus operators obtained from a Stinespring-based parameterization that is completely positive and trace preserving by construction, and circuits are simulated with a matrix product density operator forward model. Independent channels are attached to each native gate type, to each nearest-neighbor crosstalk interaction, and to state preparation and measurement, and all channels are optimized end-to-end against a distance between the simulated and observed measurement distributions. On ibm_fez, a Heron-generation superconducting processor, training on a ripple-carry adder circuit reproduces the device output distribution, and the same learned parameters, applied without retraining, also track the device distribution of an unrelated multiplier circuit, indicating that the method captures intrinsic device characteristics rather than overfitting to the training circuit. A systematic evaluation across a range of benchmark circuits confirms that this generalization is consistent. We further use the learned model to perform an offline feasibility assessment of the quantum approximate optimization algorithm with an error detection scheme, demonstrating the kind of noise-aware prediction the framework is designed to enable.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces a noise-learning framework that represents each noise channel (per native gate type, nearest-neighbor crosstalk, and SPAM) via automatically differentiable Kraus operators obtained from a Stinespring parameterization, which is CPTP by construction. Circuits are simulated with a matrix-product density-operator forward pass, and all channel parameters are optimized end-to-end by minimizing a distance between the simulated and observed measurement distributions. On the ibm_fez Heron processor, parameters trained on a ripple-carry adder circuit are shown to reproduce the device output distribution and, without retraining, to track the distribution of an unrelated multiplier circuit; the same model is further evaluated on a suite of benchmark circuits and used for an offline QAOA feasibility study with error detection.

Significance. If the reported generalization holds under quantitative scrutiny, the method would provide a practical route to constructing transferable, device-specific noise models from modest experimental data. The combination of Stinespring parameterization, tensor-network simulation, and end-to-end differentiability is a technical strength that could scale to larger circuits and support noise-aware algorithm design.

major comments (3)
  1. [Abstract and results section] Abstract and results section: the central claim that the learned parameters capture intrinsic device characteristics (rather than circuit-specific artifacts) rests on successful transfer from the ripple-carry adder to the multiplier and consistent benchmark performance, yet the manuscript supplies no numerical values for the distribution distance (e.g., total variation or Hellinger), no error bars across optimization runs or data splits, and no convergence diagnostics; without these metrics the strength of the generalization evidence cannot be assessed.
  2. [Section 3] Model definition (Section 3): the noise model attaches independent Kraus channels to each gate type and to nearest-neighbor crosstalk only; the manuscript does not test whether this structure is sufficient when the device exhibits qubit-position-dependent noise, time-varying effects, or correlations beyond nearest neighbors, which directly affects whether the observed transfer constitutes evidence of intrinsic capture or merely a good fit within the model's limited expressivity.
  3. [Optimization and data handling] Optimization and data handling: the end-to-end fitting procedure is described, but no details are given on the choice of distance metric, regularization, data exclusion criteria, or whether the training circuit's gate set and depth are representative enough to constrain all free parameters (Kraus coefficients for every gate type and crosstalk pair); this leaves open the possibility that the optimization succeeds on the training distribution while under-constraining the model for truly arbitrary circuits.
minor comments (3)
  1. [Methods] Notation for the Stinespring dilation and the resulting Kraus operators should be introduced with an explicit equation reference in the methods section to aid reproducibility.
  2. [Figures] Figure captions for the benchmark results should state the exact number of shots per circuit and whether the plotted distributions are raw or post-processed.
  3. [Table] The manuscript would benefit from a short table listing the number of free parameters per channel type and the total parameter count for the ibm_fez model.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed review of our manuscript. We address each major comment point by point below, indicating where revisions have been made to strengthen the presentation and evidence.

read point-by-point responses
  1. Referee: [Abstract and results section] Abstract and results section: the central claim that the learned parameters capture intrinsic device characteristics (rather than circuit-specific artifacts) rests on successful transfer from the ripple-carry adder to the multiplier and consistent benchmark performance, yet the manuscript supplies no numerical values for the distribution distance (e.g., total variation or Hellinger), no error bars across optimization runs or data splits, and no convergence diagnostics; without these metrics the strength of the generalization evidence cannot be assessed.

    Authors: We agree that quantitative metrics are required to properly evaluate the generalization evidence. In the revised manuscript we now report explicit total variation and Hellinger distances between the simulated and measured output distributions for both the ripple-carry adder (training) and multiplier (test) circuits. We also include error bars computed over five independent optimization runs that differ in random seed and in the random split of the experimental shots, together with convergence plots of the loss versus iteration count. These additions allow the reader to assess both the magnitude of the agreement and its statistical stability. revision: yes

  2. Referee: [Section 3] Model definition (Section 3): the noise model attaches independent Kraus channels to each gate type and to nearest-neighbor crosstalk only; the manuscript does not test whether this structure is sufficient when the device exhibits qubit-position-dependent noise, time-varying effects, or correlations beyond nearest neighbors, which directly affects whether the observed transfer constitutes evidence of intrinsic capture or merely a good fit within the model's limited expressivity.

    Authors: The per-gate-type and nearest-neighbor structure is chosen precisely to favor transferability across circuits that share the same native gate set, which is what the adder-to-multiplier transfer and the benchmark suite are intended to demonstrate. We acknowledge that the model does not explicitly incorporate qubit-position dependence, temporal drift, or longer-range correlations. In the revision we have added a dedicated paragraph in Section 3 that states these modeling assumptions, discusses their implications for the observed generalization, and outlines how the framework could be extended (e.g., by making Kraus parameters qubit-indexed) if future devices require it. The current results therefore constitute evidence of capture within the stated model class rather than a claim of universality. revision: partial

  3. Referee: [Optimization and data handling] Optimization and data handling: the end-to-end fitting procedure is described, but no details are given on the choice of distance metric, regularization, data exclusion criteria, or whether the training circuit's gate set and depth are representative enough to constrain all free parameters (Kraus coefficients for every gate type and crosstalk pair); this leaves open the possibility that the optimization succeeds on the training distribution while under-constraining the model for truly arbitrary circuits.

    Authors: We have expanded the relevant subsection to supply the missing details. The distance metric minimized is the Kullback-Leibler divergence between the simulated and experimental bit-string probability distributions. No explicit regularization term is added; the Stinespring parameterization already guarantees complete positivity and trace preservation. Data exclusion is limited to discarding any shots flagged by the hardware as invalid (typically <0.1 % of the total). We further include a short analysis showing that the ripple-carry adder exercises every native gate type and every nearest-neighbor pair present on the device, and that the resulting parameter set yields consistent accuracy on a diverse collection of benchmark circuits whose gate counts and depths differ substantially from the training circuit. These clarifications address the concern about under-constrained parameters. revision: yes

Circularity Check

0 steps flagged

No circularity: generalization tested on independent circuit with external device data

full rationale

The paper fits noise-channel parameters by minimizing a distance between simulated and observed measurement distributions on a ripple-carry adder training circuit, using a Stinespring-parameterized Kraus representation that is CPTP by construction and a matrix-product density operator simulator. The same fixed parameters are then applied without retraining to an unrelated multiplier circuit and to a suite of benchmark circuits, with the device output distributions serving as independent external targets. Because the test distributions are not part of the fitting objective and the model structure is fixed in advance, the reported transfer performance does not reduce to a tautology or self-definition; the central claim therefore rests on empirical generalization rather than on any of the enumerated circular patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on fitting parameters of noise channels to experimental distributions; no new physical entities are postulated, and the parameterization draws on standard quantum information constructions.

free parameters (1)
  • Kraus operator parameters for each noise channel
    Automatically optimized end-to-end for native gates, nearest-neighbor crosstalk, and SPAM to match observed measurement distributions.
axioms (1)
  • standard math Stinespring dilation yields a completely positive trace-preserving (CPTP) channel by construction
    Invoked to guarantee physical validity of the learned noise operators.

pith-pipeline@v0.9.0 · 5489 in / 1434 out tokens · 51937 ms · 2026-05-10T00:23:05.901719+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

57 extracted references · 29 canonical work pages · 2 internal anchors

  1. [1]

    Prescription for ex- perimental determination of the dynamics of a quantum black box,

    I. L. Chuang and M. A. Nielsen, “Prescription for ex- perimental determination of the dynamics of a quantum black box,”J. Mod. Opt., vol. 44, p. 2455, 1997. arXiv: quant-ph/9610001

  2. [2]

    Quantum process tomography with unsupervised learning and tensor networks,

    G. Torlai, C. J. Wood, A. Acharya, G. Carleo, J. Car- rasquilla, and L. Aolita, “Quantum process tomography with unsupervised learning and tensor networks,”Nat. Commun., vol. 14, no. 1, p. 2858, 2023. arXiv: 2006. 02424[quant-ph]

  3. [3]

    Tensor network noise characterization for near-term quantum comput- ers,

    S. Mangini, M. Cattaneo, D. Cavalcanti, S. Filippov, M. A. C. Rossi, and G. Garc ´ıa-P´erez, “Tensor network noise characterization for near-term quantum comput- ers,”Phys. Rev. Res., vol. 6, no. 3, p. 033 217, 2024. arXiv: 2402.08556[quant-ph]

  4. [4]

    Tomography- assisted noisy quantum circuit simulator using matrix product density operators,

    W.-g. Ma, Y .-H. Shi, K. Xu, and H. Fan, “Tomography- assisted noisy quantum circuit simulator using matrix product density operators,”Phys. Rev. A, vol. 110, no. 3, p. 032 604, 2024. arXiv: 2508.07610[quant-ph]

  5. [5]

    Matrix product channel: Variationally optimized quantum tensor network to mitigate noise and reduce errors for the variational quantum eigensolver,

    S. Filippov et al., “Matrix product channel: Variationally optimized quantum tensor network to mitigate noise and reduce errors for the variational quantum eigensolver,” Dec. 2022. arXiv: 2212.10225[quant-ph]

  6. [6]

    Gradient- Descent Quantum Process Tomography by Learning Kraus Operators,

    S. Ahmed, F. Quijandr ´ıa, and A. F. Kockum, “Gradient- Descent Quantum Process Tomography by Learning Kraus Operators,”Phys. Rev. Lett., vol. 130, no. 15, p. 150 402, 2023. arXiv: 2208.00812[quant-ph]

  7. [7]

    Variational method for learning Quantum Channels via Stinespring Dilation on neutral atom systems,

    L. Y . Visser, R. J. P. T. de Keijzer, O. Tse, and S. J. J. M. F. Kokkelmans, “Variational method for learning Quantum Channels via Stinespring Dilation on neutral atom systems,” Sep. 2023. arXiv: 2309.10593 [quant-ph]

  8. [8]

    Bayesian inference of general noise-model parameters from the syndrome statistics of surface codes,

    T. Kobori and S. Todo, “Bayesian inference of general noise-model parameters from the syndrome statistics of surface codes,”Phys. Rev. A, vol. 112, no. 5, p. 052 448,

  9. [9]

    arXiv: 2406.08981[quant-ph]

  10. [10]

    Differentiable Maximum Likelihood Noise Estimation for Quantum Error Correction,

    H. Cao, D. Feng, C. Ye, and F. Pan, “Differentiable Maximum Likelihood Noise Estimation for Quantum Error Correction,” Feb. 2026. arXiv: 2602 . 19722 [quant-ph]

  11. [11]

    Unifying Non-Markovian Characterization with an Ef- ficient and Self-Consistent Framework,

    G. A. L. White, P. Jurcevic, C. D. Hill, and K. Modi, “Unifying Non-Markovian Characterization with an Ef- ficient and Self-Consistent Framework,”Phys. Rev. X, vol. 15, no. 2, p. 021 047, 2025. arXiv: 2312 . 08454 [quant-ph]

  12. [12]

    Towards Predictive Quantum Algorithmic Performance: Model- ing Time-Correlated Noise at Scale,

    A. Jamadagni, G. Quiroz, and E. Dumitrescu, “Towards Predictive Quantum Algorithmic Performance: Model- ing Time-Correlated Noise at Scale,” Mar. 2026. arXiv: 2603.04524[quant-ph]

  13. [13]

    Data-Efficient Quantum Noise Modeling via Machine Learning,

    Y . Ji, M. Roth, D. A. Kreplin, I. Polian, and F. K. Wilhelm, “Data-Efficient Quantum Noise Modeling via Machine Learning,” Sep. 2025. arXiv: 2509 . 12933 [quant-ph]

  14. [14]

    Quantum state tomography with locally purified density operators and local measure- ments,

    Y . Guo and S. Yang, “Quantum state tomography with locally purified density operators and local measure- ments,”Commun. Phys., vol. 7, no. 1, p. 322, 2024. arXiv: 2307.16381[quant-ph]

  15. [15]

    Learning Mixed Quantum States in Large-Scale Experiments,

    M. V otto et al., “Learning Mixed Quantum States in Large-Scale Experiments,”Phys. Rev. Lett., vol. 136, no. 9, p. 090 801, 2026. arXiv: 2507 . 12550 [quant-ph]

  16. [16]

    Efficiently Learning Global Quantum Channels with Local Tomography,

    Z. Liu and D. S. Wild, “Efficiently Learning Global Quantum Channels with Local Tomography,” Mar

  17. [17]

    arXiv: 2603.07037[quant-ph]

  18. [18]

    Ma- trix Product Density Operators: Simulation of Finite- Temperature and Dissipative Systems,

    F. Verstraete, J. J. Garc ´ıa-Ripoll, and J. I. Cirac, “Ma- trix Product Density Operators: Simulation of Finite- Temperature and Dissipative Systems,”Phys. Rev. Lett., vol. 93, no. 20, p. 207 204, 2004. arXiv: cond - mat / 0406426

  19. [19]

    Simulating noisy quantum circuits with matrix product density operators,

    S. Cheng et al., “Simulating noisy quantum circuits with matrix product density operators,”Phys. Rev. Res., vol. 3, no. 2, p. 023 005, Apr. 2021, Art. no. 023005. arXiv: 2004.02388[quant-ph]

  20. [20]

    Locally purified density oper- ators for noisy quantum circuits,

    Y . Guo and S. Yang, “Locally purified density oper- ators for noisy quantum circuits,”Chin. Phys. Lett., vol. 41, no. 12, p. 120 302, 2024. arXiv: 2312.02854 [quant-ph]

  21. [21]

    Dif- ferentiable Programming Tensor Networks,

    H.-J. Liao, J.-G. Liu, L. Wang, and T. Xiang, “Dif- ferentiable Programming Tensor Networks,”Phys. Rev. X, vol. 9, no. 3, p. 031 041, 2019. arXiv: 1903.09650 [cond-mat.str-el]

  22. [22]

    Bradbury et al.,JAX: Composable transformations of Python+NumPy programs, 2018

    J. Bradbury et al.,JAX: Composable transformations of Python+NumPy programs, 2018

  23. [23]

    Sinkhorn distances: Lightspeed computa- tion of optimal transport,

    M. Cuturi, “Sinkhorn distances: Lightspeed computa- tion of optimal transport,”Advances in neural informa- tion processing systems, vol. 26, 2013. arXiv: 1306 . 0895[stat.ML]

  24. [24]

    QASMBench: A Low-Level Quantum Benchmark Suite for NISQ Evaluation and Simulation,

    A. Li, S. Stein, S. Krishnamoorthy, and J. Ang, “QASMBench: A Low-Level Quantum Benchmark Suite for NISQ Evaluation and Simulation,”ACM Trans. Quantum Comput., 2022. arXiv: 2005 . 13018 [quant-ph]

  25. [25]

    A new quantum ripple-carry addition circuit

    S. A. Cuccaro, T. G. Draper, S. A. Kutin, and D. P. Moulton, “A new quantum ripple-carry addition cir- cuit,” Oct. 2004. arXiv: quant-ph/0410184

  26. [26]

    Evidence of scaling advan- tage for the quantum approximate optimization algo- rithm on a classically intractable problem,

    R. Shaydulin et al., “Evidence of scaling advan- tage for the quantum approximate optimization algo- rithm on a classically intractable problem,”Sci. Adv., vol. 10, no. 22, adm6761, 2024. arXiv: 2308 . 02342 [quant-ph]

  27. [27]

    Binary pulse compression codes,

    A. Boehmer, “Binary pulse compression codes,”IEEE Transactions on Information Theory, vol. 13, no. 2, pp. 156–167, 1967

  28. [28]

    Synthesis of low-peak-factor signals and binary sequences with low autocorrelation (corresp.),

    M. Schroeder, “Synthesis of low-peak-factor signals and binary sequences with low autocorrelation (corresp.),” IEEE Transactions on Information Theory, vol. 16, no. 1, pp. 85–89, 1970

  29. [29]

    Kraus, A

    K. Kraus, A. B ¨ohm, J. Dollard, and W. Wootters, States, Effects, and Operations: Fundamental Notions of Quantum Theory(Lecture Notes in Physics). Springer Berlin Heidelberg, 1983

  30. [30]

    Positive functions onC*-algebras,

    W. F. Stinespring, “Positive functions onC*-algebras,” Proc. Am. Math. Soc., vol. 6, no. 2, pp. 211–216, 1955

  31. [31]

    Completely positive linear maps on com- plex matrices,

    M.-D. Choi, “Completely positive linear maps on com- plex matrices,”Linear Algebra Its Appl., vol. 10, no. 3, pp. 285–290, 1975

  32. [32]

    Control landscapes for two-level open quantum sys- tems,

    A. Pechen, D. Prokhorenko, R. Wu, and H. Rabitz, “Control landscapes for two-level open quantum sys- tems,”J. Phys. A, vol. 41, p. 045 205, 2008. arXiv: 0710.0604[quant-ph]

  33. [33]

    Optimization search effort over the control landscapes for open quantum systems with Kraus-map evolution,

    A. Oza, A. Pechen, J. D. V . Beltrani, K. Moore, and H. Rabitz, “Optimization search effort over the control landscapes for open quantum systems with Kraus-map evolution,”J. Phys. A, vol. 42, p. 205 305, 2009. arXiv: 0905.1149[quant-ph]

  34. [34]

    Riemannian geometry and automatic differentiation for optimization problems of quantum physics and quantum technologies,

    I. A. Luchnikov, M. E. Krechetov, and S. N. Filippov, “Riemannian geometry and automatic differentiation for optimization problems of quantum physics and quantum technologies,”New J. Phys., vol. 23, p. 073 006, 2021. arXiv: 2007.01287[quant-ph]

  35. [35]

    Geometric Parameterization of Kraus Operators with Applications to Quasi Inverse Channels for Multi Qubit Systems,

    Z. Ateeq and M. Faryad, “Geometric Parameterization of Kraus Operators with Applications to Quasi Inverse Channels for Multi Qubit Systems,” Nov. 2025. arXiv: 2512.00577[quant-ph]

  36. [36]

    Classical simulation of noisy random circuits from exponential decay of correlation,

    S.-u. Lee, S. Ghosh, C. Oh, K. Noh, B. Fefferman, and L. Jiang, “Classical simulation of noisy random circuits from exponential decay of correlation,” Oct

  37. [37]

    arXiv: 2510.06328[quant-ph]

  38. [38]

    Noise-induced contraction of MPO truncation errors in noisy random circuits and Lindbladian dynamics,

    Z.-Y . Wei, J. Rajakumar, J. Nelson, D. Malz, M. J. Gul- lans, and A. V . Gorshkov, “Noise-induced contraction of MPO truncation errors in noisy random circuits and Lindbladian dynamics,” Mar. 2026. arXiv: 2603.20400 [quant-ph]

  39. [39]

    Simple statistical gradient-following algorithms for connectionist reinforcement learning,

    R. J. Williams, “Simple statistical gradient-following algorithms for connectionist reinforcement learning,” Mach. Learn., vol. 8, no. 3, pp. 229–256, 1992

  40. [40]

    Decoupled weight de- cay regularization,

    I. Loshchilov and F. Hutter, “Decoupled weight de- cay regularization,” inInternational Conference on Learning Representations, 2019. arXiv: 1711 . 05101 [cs.LG]

  41. [41]

    DeepMind et al.,The DeepMind JAX Ecosystem, 2020

  42. [42]

    Javadi-Abhari et al.,Quantum computing with Qiskit,

    A. Javadi-Abhari et al.,Quantum computing with Qiskit,

  43. [43]

    arXiv: 2405.08810[quant-ph]

  44. [44]

    Qulacs: A fast and versatile quantum circuit simulator for research purpose

    Y . Suzuki et al., “Qulacs: A fast and versatile quantum circuit simulator for research purpose,”Quantum, vol. 5, p. 559, Oct. 2021. arXiv: 2011.13524[quant-ph]

  45. [45]

    Linear transformations which pre- serve trace and positive semidefiniteness of operators,

    A. Jamiołkowski, “Linear transformations which pre- serve trace and positive semidefiniteness of operators,” Rep. Math. Phys., vol. 3, pp. 275–278, 1972

  46. [46]

    The “transition probability

    A. Uhlmann, “The “transition probability” in the state space of a∗-algebra,”Rep. Math. Phys., vol. 9, no. 2, pp. 273–279, 1976

  47. [47]

    Cryptographic distin- guishability measures for quantum-mechanical states,

    C. A. Fuchs and J. van de Graaf, “Cryptographic distin- guishability measures for quantum-mechanical states,” IEEE Trans. Inf. Theory, vol. 45, no. 4, pp. 1216–1227,

  48. [48]

    arXiv: quant-ph/9712042

  49. [49]

    cloud.ibm.com/}, 2025

    IBM Quantum,\NoCaseChange{https : / / quantum . cloud.ibm.com/}, 2025

  50. [50]

    A Quantum Approximate Optimization Algorithm

    E. Farhi, J. Goldstone, and S. Gutmann, “A Quan- tum Approximate Optimization Algorithm,” Nov. 2014. arXiv: 1411.4028[quant-ph]

  51. [51]

    A Review on Quantum Approximate Optimization Algorithm and its Variants,

    K. Blekos et al., “A Review on Quantum Approximate Optimization Algorithm and its Variants,” Jun. 2023. arXiv: 2306.09198[quant-ph]

  52. [52]

    Demonstration of the trapped-ion quantum CCD computer architecture,

    J. M. Pino et al., “Demonstration of the trapped-ion quantum CCD computer architecture,”Nature, vol. 592, no. 7853, pp. 209–213, 2021. arXiv: 2003 . 01293 [quant-ph]

  53. [53]

    A Race-Track Trapped-Ion Quan- tum Processor,

    S. A. Moses et al., “A Race-Track Trapped-Ion Quan- tum Processor,”Phys. Rev. X, vol. 13, no. 4, p. 041 052,

  54. [54]

    arXiv: 2305.03828[quant-ph]

  55. [55]

    Q-Cluster: Quantum Error Mitigation Through Noise-Aware Un- supervised Learning,

    H. P. Patil, D. Baron, and H. Zhou, “Q-Cluster: Quantum Error Mitigation Through Noise-Aware Un- supervised Learning,” in2025 International Conference on Quantum Computing and Engineering, Apr. 2025. arXiv: 2504.10801[quant-ph]

  56. [56]

    Temme, S

    K. Temme, S. Bravyi, and J. M. Gambetta, “Error Mitigation for Short-Depth Quantum Circuits,”Phys. Rev. Lett., vol. 119, no. 18, p. 180 509, 2017. arXiv: 1612.02058[quant-ph]

  57. [57]

    Probabilistic error cancellation with sparse Pauli–Lindblad models on noisy quantum processors,

    E. v. d. Berg, Z. K. Minev, A. Kandala, and K. Temme, “Probabilistic error cancellation with sparse Pauli–Lindblad models on noisy quantum processors,” Nature Phys., vol. 19, no. 8, pp. 1116–1121, 2023. arXiv: 2201.09866[quant-ph]