pith. sign in

arxiv: 2509.08555 · v4 · submitted 2025-09-10 · 🪐 quant-ph

Benchmarking Optimization Algorithms for Automated Calibration of Quantum Devices

Pith reviewed 2026-05-18 17:57 UTC · model grok-4.3

classification 🪐 quant-ph
keywords quantum device calibrationoptimization algorithmsCMA-ESquantum controlautomated tuningNelder-Meadsystem identificationpulse optimization
0
0 comments X

The pith

CMA-ES outperforms other optimizers for quantum device calibration in tests mimicking real conditions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper benchmarks a range of optimization algorithms to automate the calibration of quantum devices. It runs the tests inside a simulation built to reflect the difficulties of actual lab environments, covering both simple low-parameter pulse shapes and complex high-parameter ones. CMA-ES shows better results than alternatives such as Nelder-Mead in every case examined. This result matters because reliable automatic tuning can reduce the time and expertise needed to prepare quantum hardware for experiments.

Core claim

The authors benchmark optimization algorithms for calibrating quantum devices inside a simulated setting that reproduces real experimental challenges. The comparison covers standard methods including Nelder-Mead and the CMA-ES algorithm, applied to low-dimensional cases that match current simple control pulses and to high-dimensional cases that match complex pulses with many parameters. The results indicate that CMA-ES delivers superior performance across all scenarios, which leads directly to the recommendation that it be used for automated bring-up, tune-up, and system identification.

What carries the argument

Benchmark comparison of optimization algorithms, with CMA-ES adapting its covariance matrix to search parameter spaces for quantum pulse calibration.

If this is right

  • Automated calibration procedures can adopt CMA-ES to reach target performance with fewer iterations in both simple and complex pulse designs.
  • System identification tasks become more practical when the optimizer handles high-dimensional parameter spaces reliably.
  • Bringing up new quantum devices requires less manual adjustment once CMA-ES is integrated into the tuning workflow.
  • Current optimal control protocols gain efficiency by switching to the recommended algorithm for parameter search.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the simulation matches hardware closely enough, direct tests on real devices would likely confirm the same performance ordering.
  • The same benchmarking method could be applied to other quantum tasks such as variational circuit optimization or error mitigation parameter tuning.
  • Extending the study to include additional noise models or hardware-specific constraints would strengthen the case for using CMA-ES in larger systems.

Load-bearing premise

The simulation accurately reproduces the noise sources, constraints, and failure modes that appear during real quantum device calibration.

What would settle it

Executing the same calibration tasks on physical quantum hardware and observing that CMA-ES no longer outperforms the other algorithms or that the ranking changes.

Figures

Figures reproduced from arXiv: 2509.08555 by Frank K. Wilhelm, Kevin Pack, Shai Machnes.

Figure 1
Figure 1. Figure 1: FIG. 1. Simulated landscape of loss function specified in Eq. ( [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. In-phase and quadrature components of the DRAG and PWC pulses for an [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Hyperparameter optimization using the CMA-ES algorithm. Panel [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Comparison of 120 CMA-ES optimization runs using optimized hyperparameters versus randomly detuned hyper [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Benchmarking results for the DRAG and PWC pulse simulations. The x-axis shows the number of function evaluations, [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
read the original abstract

We present the results of a comprehensive study of optimization algorithms for the calibration of quantum devices. As part of our ongoing efforts to automate bring-up, tune-up, and system identification procedures, we investigate a broad range of optimizers within a simulated environment designed to closely mimic the challenges of real-world experimental conditions. Our benchmark includes widely used algorithms such as Nelder-Mead and the state-of-the-art Covariance Matrix Adaptation Evolution Strategy (CMA-ES). We evaluate performance in both low-dimensional settings, representing simple pulse shapes used in current optimal control protocols with a limited number of parameters, and high-dimensional regimes, which reflect the demands of complex control pulses with many parameters. Based on our findings, we recommend the CMA-ES algorithm and provide empirical evidence for its superior performance across all tested scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript reports a benchmark of optimization algorithms for automated calibration of quantum devices. It evaluates a range of methods, prominently including Nelder-Mead and CMA-ES, inside a simulated environment constructed to reproduce experimental challenges. Performance is compared in low-dimensional regimes (simple pulse shapes with few parameters) and high-dimensional regimes (complex control pulses with many parameters). The central conclusion is that CMA-ES exhibits superior performance across all tested scenarios and is therefore recommended for practical use in quantum-device bring-up and tune-up.

Significance. If the simulated cost landscapes and noise models faithfully capture the dominant experimental difficulties, the empirical ranking supplies actionable guidance for selecting optimizers in automated quantum calibration workflows. The explicit comparison across dimensionality regimes is a useful contribution to the growing literature on quantum control automation.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (simulation setup): the recommendation of CMA-ES rests on the premise that the simulated environment 'closely mimic[s] the challenges of real-world experimental conditions.' No quantitative validation—such as side-by-side comparison of optimizer rankings on hardware versus simulation, or explicit matching of decoherence spectra, readout noise statistics, or parameter drift—is presented. This assumption is load-bearing for the transferability of the empirical ranking to real devices.
  2. [§4] §4 (results and statistical analysis): the abstract asserts 'empirical evidence for its superior performance,' yet the manuscript provides neither the number of independent runs, confidence intervals on the reported metrics, nor statistical tests comparing CMA-ES against Nelder-Mead. Without these, it is impossible to judge whether observed differences are robust or could arise from random variation in the simulated landscapes.
minor comments (2)
  1. [Abstract] The abstract states that 'a broad range of optimizers' was investigated but names only Nelder-Mead and CMA-ES. Listing the complete set of algorithms and their hyper-parameter choices in a table would improve reproducibility and clarity.
  2. [Figures] Figure captions and axis labels should explicitly state the cost-function definition and the precise noise model used in each panel to allow readers to assess the simulation fidelity without returning to the main text.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive comments on our manuscript. We respond point-by-point to the major comments and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (simulation setup): the recommendation of CMA-ES rests on the premise that the simulated environment 'closely mimic[s] the challenges of real-world experimental conditions.' No quantitative validation—such as side-by-side comparison of optimizer rankings on hardware versus simulation, or explicit matching of decoherence spectra, readout noise statistics, or parameter drift—is presented. This assumption is load-bearing for the transferability of the empirical ranking to real devices.

    Authors: We agree that the absence of quantitative hardware validation limits strong claims about transferability. The simulation incorporates standard models of decoherence, readout noise, and control imperfections drawn from the quantum control literature, but we do not claim or demonstrate exact matching of experimental spectra or drift statistics. In revision we will add an explicit limitations paragraph in §3, qualify the recommendation as applying to the modeled environments, and remove any implication of direct real-device equivalence without further validation. revision: partial

  2. Referee: [§4] §4 (results and statistical analysis): the abstract asserts 'empirical evidence for its superior performance,' yet the manuscript provides neither the number of independent runs, confidence intervals on the reported metrics, nor statistical tests comparing CMA-ES against Nelder-Mead. Without these, it is impossible to judge whether observed differences are robust or could arise from random variation in the simulated landscapes.

    Authors: We accept this criticism. The revised §4 will report the exact number of independent runs per configuration, include confidence intervals or standard errors on all performance metrics, and apply non-parametric statistical tests (e.g., Wilcoxon signed-rank) to assess whether CMA-ES differences versus Nelder-Mead are significant. These additions will allow readers to evaluate robustness directly. revision: yes

standing simulated objections not resolved
  • Direct quantitative validation of the simulation against real hardware, including side-by-side optimizer rankings and explicit matching of decoherence spectra or parameter drift, as this requires new experimental campaigns on physical quantum devices outside the scope of the present simulation study.

Circularity Check

0 steps flagged

No circularity: empirical benchmark of standard optimizers

full rationale

The paper is a straightforward empirical comparison of off-the-shelf optimization algorithms (Nelder-Mead, CMA-ES, etc.) inside a simulated calibration environment. No mathematical derivation chain exists that could reduce a claimed result to its own inputs by construction. Performance rankings are reported from direct runs on the simulator; the recommendation of CMA-ES follows from those observed outcomes rather than from any fitted parameter, self-definition, or self-citation that is load-bearing for the central claim. The simulation-fidelity assumption is an external-validity concern, not an internal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmarking study with no mathematical derivations, free parameters, or postulated entities; the only structural assumption is that the simulation faithfully represents experimental conditions.

pith-pipeline@v0.9.0 · 5657 in / 944 out tokens · 28778 ms · 2026-05-18T17:57:33.559117+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

  1. [1]

    (5) Here,Ais the amplitude of the pulse, Ω Gauss(t) is a Gaus- sian envelope,ω d is the drive frequency,δis the so-called DRAG parameter, andϕ xy is the phase of the pulse

    Derivative Removal by Adiabatic Gate (DRAG) Pulse The input signal of the DRAG pulse is given by the following equation, derived from the derivative removal by adiabatic gate (DRAG) method: ε(t) =AΩ Gauss(t) cos(ωdt+ϕ xy) + 1 δ A ˙ΩGauss(t) sin(ωdt+ϕ xy). (5) Here,Ais the amplitude of the pulse, Ω Gauss(t) is a Gaus- sian envelope,ω d is the drive frequen...

  2. [2]

    In this setup, to provide a working ini- tial guess, the shape of the step function is chosen as a discretization of Ω Gauss

    Piecewise Constant (PWC) Pulse For the Piecewise Constant Pulse (PWC), the previ- ous envelope Ω Gauss is replaced by a piecewise constant envelope ΩPWC. In this setup, to provide a working ini- tial guess, the shape of the step function is chosen as a discretization of Ω Gauss. The optimization task for the PWC pulse lies in the fine-tuning of each indiv...

  3. [3]

    We chose a worst-case scenario in which each parameter is initially detuned by 5% from its optimal value

    Realistic Starting Position To make the simulation as realistic as possible, the initial detuning of the pulse parameters from their fine- tuned values must be defined. We chose a worst-case scenario in which each parameter is initially detuned by 5% from its optimal value. This detuning is intended to mimic the state of the sys- tem after a rough calibra...

  4. [4]

    Arute, K

    F. Arute, K. Arya, R. Babbush, D. Bacon, J. C. Bardin, R. Barends, R. Biswas, S. Boixo, F. G. S. L. Brandao, D. A. Buell,et al., Nature574, 505 (2019)

  5. [5]

    Jurcevic, A

    P. Jurcevic, A. Javadi-Abhari, L. S. Bishop, I. Lauer, D. F. Bogorin, M. Brink, L. Capelluto, O. G¨ unl¨ uk, T. Itoko, N. Kanazawa,et al., Quantum Science and Technology6, 025020 (2021)

  6. [6]

    Sheldon, E

    S. Sheldon, E. Magesan, J. M. Chow, and J. M. Gam- betta, Phys. Rev. A93, 060302(R) (2016)

  7. [7]

    Sheldon, L

    S. Sheldon, L. S. Bishop, E. Magesan, S. Filipp, J. M. Chow, and J. M. Gambetta, Phys. Rev. A93, 012301 (2016)

  8. [8]

    Proctor, M

    T. Proctor, M. Revelle, E. Nielsen, K. Rudinger, D. Lob- ser, P. Maunz, R. Blume-Kohout, and K. Young, Nature Communications11, 5396 (2020)

  9. [9]

    J. J. Burnett, A. Bengtsson, M. Scigliuzzo, D. Niepce, M. Kudra, P. Delsing, and J. Bylander, npj Quantum Information5, 54 (2019)

  10. [10]

    Werninghaus, D

    M. Werninghaus, D. J. Egger, F. Roy, S. Machnes, F. K. Wilhelm, and S. Filipp, npj Quantum Information7, 14 (2021)

  11. [12]

    M. A. Rol, C. C. Bultink, T. E. O’Brien, S. R. de Jong, L. S. Theis, X. Fu, F. Luthi, R. F. L. Vermeulen, J. C. de Sterke, A. Bruno, D. Deurloo, R. N. Schouten, F. K. Wilhelm, and L. DiCarlo, Phys. Rev. Appl.7, 041001 (2017)

  12. [13]

    Physical qubit calibration on a directed acyclic graph

    J. Kelly, P. O’Malley, M. Neeley, H. Neven, and J. M. Martinis, arXiv:1803.03226 (2018)

  13. [14]

    Wittler, F

    N. Wittler, F. Roy, K. Pack, M. Werninghaus, A. S. Roy, D. J. Egger, S. Filipp, F. K. Wilhelm, and S. Machnes, Phys. Rev. Appl.15, 034080 (2021)

  14. [15]

    A. S. Roy, K. Pack, N. Wittler, and S. Machnes, in 2025 17th International Conference on COMmunication Systems and NETworks (COMSNETS)(2025) pp. 1062– 1067

  15. [16]

    S. J. Glaser, U. Boscain, T. Calarco, C. P. Koch, W. K¨ ockenberger, R. Kosloff, I. Kuprov, B. Luy, S. Schirmer, T. Schulte-Herbr¨ uggen, D. Sugny, and F. K. Wilhelm, Eur. Phys. J. D69, 279 (2015)

  16. [17]

    Machnes, U

    S. Machnes, U. Sander, S. J. Glaser, P. de Fouqui` eres, A. Gruslys, S. Schirmer, and T. Schulte-Herbr¨ uggen, Phys. Rev. A84, 022305 (2011)

  17. [18]

    D. J. Egger and F. K. Wilhelm, Phys. Rev. Lett.112, 240503 (2014)

  18. [19]

    Magesan, J

    E. Magesan, J. M. Gambetta, and J. Emerson, Phys. Rev. Lett.106, 180504 (2011)

  19. [20]

    Knill, D

    E. Knill, D. Leibfried, R. Reichle, J. Britton, R. B. Blakestad, J. D. Jost, C. Langer, R. Ozeri, S. Seidelin, and D. J. Wineland, Phys. Rev. A77, 012307 (2008)

  20. [21]

    J. M. Chow, J. M. Gambetta, L. Tornberg, J. Koch, L. S. Bishop, A. A. Houck, B. R. Johnson, L. Frunzio, S. M. Girvin, and R. J. Schoelkopf, Phys. Rev. Lett.102, 090502 (2009)

  21. [22]

    Magesan, J

    E. Magesan, J. M. Gambetta, and J. Emerson, Phys. Rev. A85, 042311 (2012)

  22. [23]

    A. D. C´ orcoles, J. M. Gambetta, J. M. Chow, J. A. Smolin, M. Ware, J. Strand, B. L. T. Plourde, and M. Steffen, Phys. Rev. A87, 030301(R) (2013)

  23. [24]

    J. L. O’Brien, G. J. Pryde, A. Gilchrist, D. F. V. James, N. K. Langford, T. C. Ralph, and A. G. White, Phys. Rev. Lett.93, 080502 (2004)

  24. [25]

    Kelly, R

    J. Kelly, R. Barends, B. Campbell, Y. Chen, Z. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler, I.-C. Hoi, E. Jef- frey, A. Megrant, J. Mutus, C. Neill, P. J. J. O’Malley, C. Quintana, P. Roushan, D. Sank, A. Vainsencher, J. Wenner, T. C. White, A. N. Cleland, and J. M. Mar- tinis, Phys. Rev. Lett.112, 240504 (2014)

  25. [26]

    D. Zhu, X. Wu, and T. Yang, arXiv preprint arXiv:2203.14177 (2022)

  26. [27]

    Probst, A.-L

    P. Probst, A.-L. Boulesteix, and B. Bischl, Journal of Machine Learning Research20, 1 (2019)

  27. [28]

    Weerts, A

    H. Weerts, A. Mueller, and J. Vanschoren, Importance of tuning hyperparameters of machine learning algorithms (2020)

  28. [29]

    Z. Li, X. Lin, Q. Zhang, and H. Liu, Swarm and Evolu- tionary Computation56, 100694 (2020)

  29. [30]

    Kennedy and R

    J. Kennedy and R. C. Eberhart, inProceedings of the IEEE International Conference on Neural Networks (ICNN’95)(Perth, Australia, 1995) pp. 1942–1948

  30. [31]

    Bratton and J

    D. Bratton and J. Kennedy, inProceedings of the IEEE Swarm Intelligence Symposium(Honolulu, HI, 2007) pp. 120–127

  31. [32]

    M. J. Kochenderfer and T. A. Wheeler,Algorithms for Optimization(MIT Press, 2019)

  32. [33]

    Wang, An overview of spsa: recent development and applications (2020)

    C. Wang, An overview of spsa: recent development and applications (2020)

  33. [34]

    J. A. Nelder and R. Mead, The Computer Journal7, 308 (1965)

  34. [35]

    Han and M

    L. Han and M. Neumann, Optim. Method. Softw.21, 1 (2006)

  35. [36]

    Shahriari, K

    B. Shahriari, K. Swersky, Z. Wang, R. P. Adams, and N. de Freitas, Proceedings of the IEEE104, 148 (2016)

  36. [37]

    Reducing the

    N. Hansen, S. D. M¨ uller, and P. Koumout- sakos, Evol. Comput.11, 1 (2003), https://doi.org/10.1162/106365603321828970

  37. [38]

    The CMA Evolution Strategy: A Tutorial

    N. Hansen, The cma evolution strategy: A tutorial (2023), arXiv:1604.00772 [cs.LG]

  38. [39]

    M. J. D. Powell, The Computer Journal7, 155 (1964), https://academic.oup.com/comjnl/article- 12 pdf/7/2/155/959784/070155.pdf

  39. [40]

    Virtanen, R

    P. Virtanen, R. Gommers, T. E. Oliphant, M. Haber- land, T. Reddy, D. Cournapeau, E. Burovski, P. Pe- terson, W. Weckesser, J. Bright, S. J. van der Walt, M. Brett, J. Wilson, K. J. Millman, N. Mayorov, A. R. J. Nelson, E. Jones, R. Kern, E. Larson, C. J. Carey, ˙I. Po- lat, Y. Feng, E. W. Moore, J. VanderPlas, D. Laxalde, J. Perktold, R. Cimrman, I. Henr...

  40. [41]

    Rapin and O

    J. Rapin and O. Teytaud, Nevergrad - A gradient- free optimization platform, https://GitHub.com/ FacebookResearch/Nevergrad (2018)

  41. [43]

    Motzoi, J

    F. Motzoi, J. M. Gambetta, P. Rebentrost, and F. K. Wilhelm, Phys. Rev. Lett.103, 110501 (2009)

  42. [44]

    Z. Chen, J. Kelly, C. Quintana, R. Barends, B. Campbell, Y. Chen, B. Chiaro, A. Dunsworth, A. Fowler, E. Lucero, E. Jeffrey, A. Megrant, J. Mutus, M. Neeley, C. Neill, P. O’Malley, P. Roushan, D. Sank, A. Vainsencher, J. Wenner, T. White, A. Korotkov, and J. M. Martinis, Phys. Rev. Lett.116, 020501 (2016)

  43. [45]

    Z. Chen, J. Kelly, C. Quintana, R. Barends, B. Camp- bell, Y. Chen, B. Chiaro, A. Dunsworth, A. G. Fowler, E. Lucero, E. Jeffrey, A. Megrant, J. Mutus, M. Nee- ley, C. Neill, P. J. J. O’Malley, P. Roushan, D. Sank, A. Vainsencher, J. Wenner, T. C. White, A. N. Korotkov, and J. M. Martinis, Phys. Rev. Lett.116, 020501 (2016)

  44. [46]

    Rapin, M

    J. Rapin, M. Gallagher, P. Kerschke, M. Preuss, and O. Teytaud, inProceedings of the Genetic and Evolu- tionary Computation Conference Companion, GECCO ’19 (Association for Computing Machinery, New York, NY, USA, 2019) p. 1888–1896