pith. sign in

arxiv: 2606.11496 · v1 · pith:NFGKFUSNnew · submitted 2026-06-09 · 🪐 quant-ph

Logical error estimation from syndrome data of surface-code experiments

Pith reviewed 2026-06-27 12:39 UTC · model grok-4.3

classification 🪐 quant-ph
keywords surface codedetector error modelquantum error correctionsyndrome datalogical error probabilitydecoder priorsquantum computing hardware
0
0 comments X

The pith

Estimating detector error model probabilities from surface-code syndrome data alone improves decoded logical error rates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that one can estimate the probabilities of errors in a detector error model using only the observed syndromes from surface-code experiments. This approach avoids separate device calibrations and does not require knowledge of the logical outcomes. The resulting estimates serve as decoder priors that reduce logical error probabilities compared with models built from independent device data. The gains appear on both Google's Willow chip and IBM's ibm_miami processor, typically at the 5-10 percent level. A reader would care because the method folds error-model learning directly into the experiment data stream itself.

Core claim

Estimating DEM event probabilities from experimental syndromes is feasible, avoids independent device benchmarking, and produces useful decoder priors for estimating and reducing decoded logical error probabilities. Evaluation on open-source surface-code memory data from Google's Willow chip and on analogous experiments run on IBM's ibm_miami processor shows that the estimated DEMs improve logical error probabilities relative to baseline device-informed DEMs, typically at the 5%-10% level and with larger gains in some IBM cases, without additional calibration circuits, decoder fine-tuning, or supervised fitting to logical outcomes.

What carries the argument

The detector error model (DEM), which records for each error its probability together with the detectors and logical observables it flips; the estimation procedure fits these probabilities directly to the observed syndrome statistics.

If this is right

  • The estimated DEMs function as effective priors for decoding without access to logical outcome labels.
  • Logical error reductions occur on hardware with different physical error scales.
  • No extra calibration circuits or decoder adjustments are needed to obtain the gains.
  • The same estimation procedure applies to surface-code memory experiments on distinct processors.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could support continuous updating of error models while an experiment runs, rather than relying on one-time calibration.
  • Similar syndrome-based fitting might extend to other stabilizer codes if the detector structure remains comparable.
  • Dominant error mechanisms could be identified by inspecting which DEM probabilities the fit increases most.

Load-bearing premise

Syndrome statistics alone contain enough information to produce unbiased DEM probability estimates that improve decoding without post-hoc selection or knowledge of logical outcomes.

What would settle it

If decoders that use the syndrome-derived DEM probabilities produce equal or higher logical error rates than decoders that use independent device-characterization DEMs, the improvement claim would be falsified.

Figures

Figures reproduced from arXiv: 2606.11496 by Arian Vezvaee, Cesar Benito, Daniel A. Lidar, Evangelia Takou, Kenneth R. Brown.

Figure 1
Figure 1. Figure 1: FIG. 1. Logical error probability comparison for the [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Logical error probability comparison for the [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. The unrotated [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Logical performance on [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Detector correlations and detector-event structure for [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. (a) Detector rates over time, for [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: FIG. 7. Fractional percent change in LEP relative [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: FIG. 8. Comparison of effective hyperedge-rate diagnostics between [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: FIG. 9. Crosstalk test on [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: FIG. 10. (a) Entanglement infidelity calculated using the IBM-like noise model and the DEM extracted from syndrome data, [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: FIG. 11. Entanglement infidelity for the Willow code-scaling experiments using the SI1000 noise model (top row) and the RL [PITH_FULL_IMAGE:figures/full_fig_p016_11.png] view at source ↗
read the original abstract

Decoders for quantum error correction (QEC) experiments rely on detector error models (DEMs), which encode, for each error, its probability and the detectors and logical observables it flips. Here we show that estimating DEM event probabilities from experimental syndromes is feasible, avoids independent device benchmarking, and produces useful decoder priors for estimating and reducing decoded logical error probabilities. We evaluate our methods using open-source data from surface-code memory experiments performed on Google's Willow chip, and we carry out analogous surface-code experiments on IBM's \texttt{ibm\_miami} processor. Despite the different physical error scales of the Google and IBM devices, in both cases our estimated DEMs improve logical error probabilities relative to baseline device-informed DEMs, typically at the $5\%-10\%$ level and with larger gains in some IBM cases, without additional calibration circuits, decoder fine-tuning, or supervised fitting to logical outcomes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that DEM event probabilities can be estimated directly from experimental syndrome counts in surface-code memory experiments, yielding decoder priors that improve decoded logical error probabilities by typically 5-10% (with larger gains in some cases) relative to device-informed baselines. The method is evaluated on open data from Google's Willow chip and new experiments on IBM's ibm_miami processor, without additional calibration circuits, decoder tuning, or supervised fitting to logical outcomes.

Significance. If the estimates are unbiased and the improvements hold under cross-validation, the approach would allow decoder calibration from existing syndrome data alone, reducing the need for separate benchmarking. The use of data from two distinct hardware platforms with different error scales provides a basic reproducibility check. However, the absence of any derivation showing the inverse problem is well-posed or any reported validation against overfitting or error bars weakens the evidential support for the central claim.

major comments (3)
  1. [Abstract] Abstract: the reported 5%-10% improvements in logical error probability are stated without error bars, bootstrap estimates, or cross-validation on held-out syndrome or logical data; this leaves open whether the gains reflect true DEM accuracy or simply better fitting to the training syndrome marginals.
  2. [Method (implied by abstract description of fitting to detector-flip frequencies)] The manuscript provides no analysis or theorem addressing whether syndrome flip frequencies uniquely determine individual DEM probabilities up to logical-flip rates; in surface-code DEMs, distinct error mechanisms can share identical detector marginals while differing in logical observables, so the maximum-likelihood or moment-matching fit may converge to a biased solution whose apparent improvement is an artifact of matching the training data.
  3. [Results (implied by the empirical evaluation on Willow and ibm_miami data)] No comparison is shown between the estimated DEMs and the true underlying error rates (where known from device characterization) or against a null model that simply rescales a baseline DEM to match the observed syndrome statistics; without such controls it is impossible to separate genuine information gain from parameter adjustment.
minor comments (1)
  1. [Abstract] The abstract states gains occur "typically at the 5%-10% level" but does not define the precise metric (e.g., relative reduction in logical error rate per round) or the number of experimental shots used for each platform.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback. We address each major comment below and commit to revisions that strengthen the statistical support and discussion of limitations while preserving the empirical focus of the work.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported 5%-10% improvements in logical error probability are stated without error bars, bootstrap estimates, or cross-validation on held-out syndrome or logical data; this leaves open whether the gains reflect true DEM accuracy or simply better fitting to the training syndrome marginals.

    Authors: We agree that the abstract and results would be strengthened by explicit uncertainty quantification. In the revision we will add bootstrap-derived error bars on the reported improvements and perform cross-validation by holding out a fraction of the syndrome shots for evaluation of the decoded logical error rates. revision: yes

  2. Referee: [Method (implied by abstract description of fitting to detector-flip frequencies)] The manuscript provides no analysis or theorem addressing whether syndrome flip frequencies uniquely determine individual DEM probabilities up to logical-flip rates; in surface-code DEMs, distinct error mechanisms can share identical detector marginals while differing in logical observables, so the maximum-likelihood or moment-matching fit may converge to a biased solution whose apparent improvement is an artifact of matching the training data.

    Authors: The manuscript does not contain a formal identifiability theorem because its contribution is the demonstration that a practical moment-matching procedure yields decoder priors that measurably reduce logical error on real hardware. We will add a concise discussion of the non-uniqueness issue, noting that any residual bias is mitigated by the fact that the logical-error improvement is measured on independent decoding runs and is reproducible across two distinct processors with different error scales. revision: partial

  3. Referee: [Results (implied by the empirical evaluation on Willow and ibm_miami data)] No comparison is shown between the estimated DEMs and the true underlying error rates (where known from device characterization) or against a null model that simply rescales a baseline DEM to match the observed syndrome statistics; without such controls it is impossible to separate genuine information gain from parameter adjustment.

    Authors: Ground-truth per-mechanism rates are not available for the full DEM on either device, which is precisely why a syndrome-only estimator is useful. We will add, in the revised results, an explicit comparison against a null model that uniformly rescales the baseline DEM probabilities to match the observed detector-flip marginals; this will isolate the benefit attributable to the per-event fitting procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation self-contained

full rationale

The paper estimates DEM probabilities directly from raw experimental syndrome counts and applies the resulting priors to decoding. Logical-error improvements are reported relative to baseline device-informed DEMs without supervised fitting to logical outcomes or post-hoc selection on the same data. No load-bearing step reduces by the paper's equations to a fitted parameter defined from the target quantity itself; the central claim therefore rests on external experimental benchmarks rather than tautological re-expression of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only abstract available; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5693 in / 1008 out tokens · 22625 ms · 2026-06-27T12:39:51.439302+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 3 canonical work pages

  1. [1]

    Google Quantum AI and Collaborators, Quantum error correction below the surface code threshold, Nature638, 920 (2025)

  2. [2]

    T. He, W. Lin, R. Wang,et al., Experimental quan- tum error correction below the surface code threshold via all-microwave leakage suppression, Phys. Rev. Lett. 135, 260601 (2025)

  3. [3]

    M. M. A. Erhard, H. P. Nautrupet al., Entangling logical qubits with lattice surgery, Nature589, 220– (2021)

  4. [4]

    Ryan-Anderson, N

    C. Ryan-Anderson, N. C. Brown, C. H. Baldwin, et al., High-fidelity teleportation of a logical qubit using transversal gates and lattice surgery, Science385, 1327 (2024)

  5. [5]

    P. S. Rodriguez, J. M. Robinson, P. N. Jepsen,et al., Experimental demonstration of logical magic state distil- lation, Nature645, 620– (2025)

  6. [6]

    L. A. Beni, O. Higgott, and N. Shutty, Tesseract: A search-based decoder for quantum error correction (2025), arXiv:2503.10988 [quant-ph]

  7. [7]

    Bausch, A

    J. Bausch, A. W. Senior, F. J. H. Heras,et al., Learn- ing high-accuracy error decoding for quantum processors, Nature635, 834 (2024)

  8. [8]

    A. W. Senior, T. Edlich, F. J. H. Heras,et al., A scal- able and real-time neural decoder for topological quan- tum codes (2026), arXiv:2512.07737 [quant-ph]

  9. [9]

    H. Cao, D. Feng, C. Ye, and F. Pan, Differentiable max- imum likelihood noise estimation for quantum error cor- rection (2026), arXiv:2602.19722 [quant-ph]

  10. [10]

    M¨ uller, T

    T. M¨ uller, T. Alexander, M. E. Beverland, M. B¨ uhler, B. R. Johnson, T. Maurer, and D. Vandeth, Improved belief propagation is sufficient for real-time decoding of quantum memory (2025), arXiv:2506.01779 [quant-ph]

  11. [11]

    M. Ye, D. Wecker, and N. Delfosse, Beam search decoder for quantum ldpc codes (2025), arXiv:2512.07057 [quant- ph]

  12. [12]

    G. M. D’Ariano and P. Lo Presti, Quantum tomography for measuring experimentally the matrix elements of an arbitrary quantum operation, Phys. Rev. Lett.86, 4195 (2001)

  13. [13]

    Mohseni, A

    M. Mohseni, A. T. Rezakhani, and D. A. Lidar, Quantum-process tomography: Resource analysis of dif- ferent strategies, Phys. Rev. A77, 032322 (2008)

  14. [14]

    E. T. Hockings, A. C. Doherty, and R. Harper, Scal- able noise characterization of syndrome-extraction cir- cuits with averaged circuit eigenvalue sampling, PRX Quantum6, 010334 (2025)

  15. [15]

    P.-J. H. Derks, A. Townsend-Teague, A. G. Burchards, and J. Eisert, Designing fault-tolerant circuits using de- tector error models, Quantum9, 1905 (2025)

  16. [16]

    Blume-Kohout and K

    R. Blume-Kohout and K. Young, Estimating detector er- ror models from syndrome data (2025), arXiv:2504.14643 [quant-ph]

  17. [17]

    Takou and K

    E. Takou and K. R. Brown, Estimating decoding graphs and hypergraphs of memory quantum error-correction experiments, Phys. Rev. A112, 052414 (2025)

  18. [18]

    Takou and K

    E. Takou and K. R. Brown, Estimating and decoding coherent errors of qec experiments with detector error models (2025), arXiv:2510.23797 [quant-ph]

  19. [19]

    Bhardwaj, E

    D. Bhardwaj, E. Takou, Y. Lin, and K. R. Brown, Adap- tive estimation of drifting noise in quantum error correc- tion (2025), arXiv:2511.09491 [quant-ph]

  20. [20]

    Hines, C

    J. Hines, C. Ostrove, K. Rudinger, S. Seritan, K. Young, R. Blume-Kohout, and T. Proctor, Simulating quantum error correction beyond pauli stochastic errors (2026), arXiv:2603.18457 [quant-ph]

  21. [21]

    K. E. Arms, M. J. McHugh, J. E. Nyhan, W. F. Reus, and J. L. Ulrich, Estimating detector error models on google’s willow (2026), arXiv:2512.10814 [quant-ph]

  22. [22]

    Sivak, M

    V. Sivak, M. Newman, and P. Klimov, Optimization of decoder priors for accurate quantum error correction, Phys. Rev. Lett.133, 150603 (2024)

  23. [23]

    Ellis and R

    Google Quantum AI, Data for “quantum error correc- tion below the surface code threshold”, 10.5281/zen- odo.13273331 (2024)

  24. [24]

    Gidney, Stim: a fast stabilizer circuit simulator, Quan- tum5, 497 (2021)

    C. Gidney, Stim: a fast stabilizer circuit simulator, Quan- tum5, 497 (2021)

  25. [25]

    Higgott, Pymatching: A python package for decod- ing quantum codes with minimum-weight perfect match- ing, ACM Trans

    O. Higgott, Pymatching: A python package for decod- ing quantum codes with minimum-weight perfect match- ing, ACM Trans. Quantum Comput.3, 10.1145/3505637 (2022)

  26. [26]

    Higgott and C

    O. Higgott and C. Gidney, Sparse Blossom: correcting a million errors per core second with minimum-weight matching, Quantum9, 1600 (2025)

  27. [27]

    J. P. Bonilla Ataides, D. K. Tuckett, S. D. Bartlett, S. T. Flammia, and B. J. Brown, The xzzx surface code, Na- ture Communications12, 2172 (2021)

  28. [28]

    Harper, C

    R. Harper, C. Lain´ e, E. Hockings, C. McLauchlan, G. M. Nixon, B. J. Brown, and S. D. Bartlett, Characterising the failure mechanisms of error-corrected quantum logic gates (2025), arXiv:2504.07258 [quant-ph]

  29. [29]

    Vezvaee, C

    A. Vezvaee, C. Benito, M. Morford-Oberst, A. Bermudez, and D. A. Lidar, Surface code scaling on heavy-hex super- conducting quantum processors (2025), arXiv:2510.18847 [quant-ph]

  30. [30]

    Gullion, D

    T. Gullion, D. B. Baker, and M. S. Conradi, New, compensated carr-purcell sequences, Journal of Magnetic Resonance (1969)89, 479 (1990)

  31. [31]

    Viola, E

    L. Viola, E. Knill, and S. Lloyd, Dynamical decoupling of open quantum systems, Physical Review Letters82, 2417 (1999)

  32. [32]

    Vezvaee, V

    A. Vezvaee, V. Tripathi, M. Morford-Oberst, F. Butt, V. Kasatkin, and D. A. Lidar, Demonstration of high- fidelity entangled logical qubits using transmons, Na- ture Communications17, 10.1038/s41467-026-70011-3 (2026)

  33. [33]

    Kasatkin, M

    V. Kasatkin, M. Morford-Oberst, A. Vezvaee, and D. A. Lidar, Quantum error correction and dynam- ical decoupling: Better together or apart? (2026), arXiv:2602.19042 [quant-ph]

  34. [34]

    G. P. Geh´ er, M. Jastrzebski, E. T. Campbell, and O. Crawford, To reset, or not to reset – that is the ques- tion (2025), arXiv:2408.00758 [quant-ph]

  35. [35]

    Tripathi, H

    V. Tripathi, H. Chen, M. Khezri, K.-W. Yip, E. Levenson-Falk, and D. A. Lidar, Suppression of crosstalk in superconducting qubits using dynamical de- coupling, Phys. Rev. Appl.18, 024068 (2022)

  36. [36]

    Z. Zhou, R. Sitler, Y. Oda, K. Schultz, and G. Quiroz, Quantum crosstalk robust quantum control, Physical Re- view Letters131, 210802 (2023)

  37. [37]

    Dennis, A

    E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, Topological quantum memory, Journal of Mathematical Physics43, 4452 (2002). 19

  38. [38]

    A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Surface codes: Towards practical large-scale quantum computation, Phys. Rev. A86, 032324 (2012)

  39. [39]

    S. T. Spitz, B. Tarasinski, C. W. J. Beenakker, and T. E. O’Brien, Adaptive weight estimator for quantum error correction in a time-dependent environment, Advanced Quantum Technologies1, 1800012 (2018)

  40. [40]

    A. Remm, N. Lacroix, L. B¨ odeker, E. Genois, C. Hellings, F. m. c. Swiadek, G. J. Norris, C. Eichler, A. Blais, M. M¨ uller, S. Krinner, and A. Wallraff, Experimentally informed decoding of stabilizer codes based on syndrome correlations, Phys. Rev. Res.8, 013044 (2026)

  41. [41]

    J. A. K. C. Miao, M. McEwenet al., Overcoming leakage in quantum error correction, Nat. Phys.19, 1780 (2023)

  42. [42]

    A. F. Brown and D. A. Lidar, Efficient chromatic- number-based multiqubit decoherence and crosstalk sup- pression, PRX Quantum6, 020354 (2025)

  43. [43]

    Evert, Z

    B. Evert, Z. Gonzalez Izquierdo, J. Sud, H.-Y. Hu, S. Grabbe, E. G. Rieffel, M. J. Reagor, and Z. Wang, Syncopated dynamical decoupling to suppress crosstalk in quantum circuits, Physical Review Applied24, 044025 (2025)