Logical error estimation from syndrome data of surface-code experiments
Pith reviewed 2026-06-27 12:39 UTC · model grok-4.3
The pith
Estimating detector error model probabilities from surface-code syndrome data alone improves decoded logical error rates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Estimating DEM event probabilities from experimental syndromes is feasible, avoids independent device benchmarking, and produces useful decoder priors for estimating and reducing decoded logical error probabilities. Evaluation on open-source surface-code memory data from Google's Willow chip and on analogous experiments run on IBM's ibm_miami processor shows that the estimated DEMs improve logical error probabilities relative to baseline device-informed DEMs, typically at the 5%-10% level and with larger gains in some IBM cases, without additional calibration circuits, decoder fine-tuning, or supervised fitting to logical outcomes.
What carries the argument
The detector error model (DEM), which records for each error its probability together with the detectors and logical observables it flips; the estimation procedure fits these probabilities directly to the observed syndrome statistics.
If this is right
- The estimated DEMs function as effective priors for decoding without access to logical outcome labels.
- Logical error reductions occur on hardware with different physical error scales.
- No extra calibration circuits or decoder adjustments are needed to obtain the gains.
- The same estimation procedure applies to surface-code memory experiments on distinct processors.
Where Pith is reading between the lines
- The method could support continuous updating of error models while an experiment runs, rather than relying on one-time calibration.
- Similar syndrome-based fitting might extend to other stabilizer codes if the detector structure remains comparable.
- Dominant error mechanisms could be identified by inspecting which DEM probabilities the fit increases most.
Load-bearing premise
Syndrome statistics alone contain enough information to produce unbiased DEM probability estimates that improve decoding without post-hoc selection or knowledge of logical outcomes.
What would settle it
If decoders that use the syndrome-derived DEM probabilities produce equal or higher logical error rates than decoders that use independent device-characterization DEMs, the improvement claim would be falsified.
Figures
read the original abstract
Decoders for quantum error correction (QEC) experiments rely on detector error models (DEMs), which encode, for each error, its probability and the detectors and logical observables it flips. Here we show that estimating DEM event probabilities from experimental syndromes is feasible, avoids independent device benchmarking, and produces useful decoder priors for estimating and reducing decoded logical error probabilities. We evaluate our methods using open-source data from surface-code memory experiments performed on Google's Willow chip, and we carry out analogous surface-code experiments on IBM's \texttt{ibm\_miami} processor. Despite the different physical error scales of the Google and IBM devices, in both cases our estimated DEMs improve logical error probabilities relative to baseline device-informed DEMs, typically at the $5\%-10\%$ level and with larger gains in some IBM cases, without additional calibration circuits, decoder fine-tuning, or supervised fitting to logical outcomes.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that DEM event probabilities can be estimated directly from experimental syndrome counts in surface-code memory experiments, yielding decoder priors that improve decoded logical error probabilities by typically 5-10% (with larger gains in some cases) relative to device-informed baselines. The method is evaluated on open data from Google's Willow chip and new experiments on IBM's ibm_miami processor, without additional calibration circuits, decoder tuning, or supervised fitting to logical outcomes.
Significance. If the estimates are unbiased and the improvements hold under cross-validation, the approach would allow decoder calibration from existing syndrome data alone, reducing the need for separate benchmarking. The use of data from two distinct hardware platforms with different error scales provides a basic reproducibility check. However, the absence of any derivation showing the inverse problem is well-posed or any reported validation against overfitting or error bars weakens the evidential support for the central claim.
major comments (3)
- [Abstract] Abstract: the reported 5%-10% improvements in logical error probability are stated without error bars, bootstrap estimates, or cross-validation on held-out syndrome or logical data; this leaves open whether the gains reflect true DEM accuracy or simply better fitting to the training syndrome marginals.
- [Method (implied by abstract description of fitting to detector-flip frequencies)] The manuscript provides no analysis or theorem addressing whether syndrome flip frequencies uniquely determine individual DEM probabilities up to logical-flip rates; in surface-code DEMs, distinct error mechanisms can share identical detector marginals while differing in logical observables, so the maximum-likelihood or moment-matching fit may converge to a biased solution whose apparent improvement is an artifact of matching the training data.
- [Results (implied by the empirical evaluation on Willow and ibm_miami data)] No comparison is shown between the estimated DEMs and the true underlying error rates (where known from device characterization) or against a null model that simply rescales a baseline DEM to match the observed syndrome statistics; without such controls it is impossible to separate genuine information gain from parameter adjustment.
minor comments (1)
- [Abstract] The abstract states gains occur "typically at the 5%-10% level" but does not define the precise metric (e.g., relative reduction in logical error rate per round) or the number of experimental shots used for each platform.
Simulated Author's Rebuttal
We thank the referee for their careful review and constructive feedback. We address each major comment below and commit to revisions that strengthen the statistical support and discussion of limitations while preserving the empirical focus of the work.
read point-by-point responses
-
Referee: [Abstract] Abstract: the reported 5%-10% improvements in logical error probability are stated without error bars, bootstrap estimates, or cross-validation on held-out syndrome or logical data; this leaves open whether the gains reflect true DEM accuracy or simply better fitting to the training syndrome marginals.
Authors: We agree that the abstract and results would be strengthened by explicit uncertainty quantification. In the revision we will add bootstrap-derived error bars on the reported improvements and perform cross-validation by holding out a fraction of the syndrome shots for evaluation of the decoded logical error rates. revision: yes
-
Referee: [Method (implied by abstract description of fitting to detector-flip frequencies)] The manuscript provides no analysis or theorem addressing whether syndrome flip frequencies uniquely determine individual DEM probabilities up to logical-flip rates; in surface-code DEMs, distinct error mechanisms can share identical detector marginals while differing in logical observables, so the maximum-likelihood or moment-matching fit may converge to a biased solution whose apparent improvement is an artifact of matching the training data.
Authors: The manuscript does not contain a formal identifiability theorem because its contribution is the demonstration that a practical moment-matching procedure yields decoder priors that measurably reduce logical error on real hardware. We will add a concise discussion of the non-uniqueness issue, noting that any residual bias is mitigated by the fact that the logical-error improvement is measured on independent decoding runs and is reproducible across two distinct processors with different error scales. revision: partial
-
Referee: [Results (implied by the empirical evaluation on Willow and ibm_miami data)] No comparison is shown between the estimated DEMs and the true underlying error rates (where known from device characterization) or against a null model that simply rescales a baseline DEM to match the observed syndrome statistics; without such controls it is impossible to separate genuine information gain from parameter adjustment.
Authors: Ground-truth per-mechanism rates are not available for the full DEM on either device, which is precisely why a syndrome-only estimator is useful. We will add, in the revised results, an explicit comparison against a null model that uniformly rescales the baseline DEM probabilities to match the observed detector-flip marginals; this will isolate the benefit attributable to the per-event fitting procedure. revision: yes
Circularity Check
No significant circularity; derivation self-contained
full rationale
The paper estimates DEM probabilities directly from raw experimental syndrome counts and applies the resulting priors to decoding. Logical-error improvements are reported relative to baseline device-informed DEMs without supervised fitting to logical outcomes or post-hoc selection on the same data. No load-bearing step reduces by the paper's equations to a fitted parameter defined from the target quantity itself; the central claim therefore rests on external experimental benchmarks rather than tautological re-expression of inputs.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Google Quantum AI and Collaborators, Quantum error correction below the surface code threshold, Nature638, 920 (2025)
2025
-
[2]
T. He, W. Lin, R. Wang,et al., Experimental quan- tum error correction below the surface code threshold via all-microwave leakage suppression, Phys. Rev. Lett. 135, 260601 (2025)
2025
-
[3]
M. M. A. Erhard, H. P. Nautrupet al., Entangling logical qubits with lattice surgery, Nature589, 220– (2021)
2021
-
[4]
Ryan-Anderson, N
C. Ryan-Anderson, N. C. Brown, C. H. Baldwin, et al., High-fidelity teleportation of a logical qubit using transversal gates and lattice surgery, Science385, 1327 (2024)
2024
-
[5]
P. S. Rodriguez, J. M. Robinson, P. N. Jepsen,et al., Experimental demonstration of logical magic state distil- lation, Nature645, 620– (2025)
2025
-
[6]
L. A. Beni, O. Higgott, and N. Shutty, Tesseract: A search-based decoder for quantum error correction (2025), arXiv:2503.10988 [quant-ph]
arXiv 2025
-
[7]
Bausch, A
J. Bausch, A. W. Senior, F. J. H. Heras,et al., Learn- ing high-accuracy error decoding for quantum processors, Nature635, 834 (2024)
2024
-
[8]
A. W. Senior, T. Edlich, F. J. H. Heras,et al., A scal- able and real-time neural decoder for topological quan- tum codes (2026), arXiv:2512.07737 [quant-ph]
arXiv 2026
-
[9]
H. Cao, D. Feng, C. Ye, and F. Pan, Differentiable max- imum likelihood noise estimation for quantum error cor- rection (2026), arXiv:2602.19722 [quant-ph]
arXiv 2026
-
[10]
T. M¨ uller, T. Alexander, M. E. Beverland, M. B¨ uhler, B. R. Johnson, T. Maurer, and D. Vandeth, Improved belief propagation is sufficient for real-time decoding of quantum memory (2025), arXiv:2506.01779 [quant-ph]
arXiv 2025
-
[11]
M. Ye, D. Wecker, and N. Delfosse, Beam search decoder for quantum ldpc codes (2025), arXiv:2512.07057 [quant- ph]
arXiv 2025
-
[12]
G. M. D’Ariano and P. Lo Presti, Quantum tomography for measuring experimentally the matrix elements of an arbitrary quantum operation, Phys. Rev. Lett.86, 4195 (2001)
2001
-
[13]
Mohseni, A
M. Mohseni, A. T. Rezakhani, and D. A. Lidar, Quantum-process tomography: Resource analysis of dif- ferent strategies, Phys. Rev. A77, 032322 (2008)
2008
-
[14]
E. T. Hockings, A. C. Doherty, and R. Harper, Scal- able noise characterization of syndrome-extraction cir- cuits with averaged circuit eigenvalue sampling, PRX Quantum6, 010334 (2025)
2025
-
[15]
P.-J. H. Derks, A. Townsend-Teague, A. G. Burchards, and J. Eisert, Designing fault-tolerant circuits using de- tector error models, Quantum9, 1905 (2025)
1905
-
[16]
R. Blume-Kohout and K. Young, Estimating detector er- ror models from syndrome data (2025), arXiv:2504.14643 [quant-ph]
arXiv 2025
-
[17]
Takou and K
E. Takou and K. R. Brown, Estimating decoding graphs and hypergraphs of memory quantum error-correction experiments, Phys. Rev. A112, 052414 (2025)
2025
-
[18]
E. Takou and K. R. Brown, Estimating and decoding coherent errors of qec experiments with detector error models (2025), arXiv:2510.23797 [quant-ph]
arXiv 2025
-
[19]
D. Bhardwaj, E. Takou, Y. Lin, and K. R. Brown, Adap- tive estimation of drifting noise in quantum error correc- tion (2025), arXiv:2511.09491 [quant-ph]
arXiv 2025
- [20]
-
[21]
K. E. Arms, M. J. McHugh, J. E. Nyhan, W. F. Reus, and J. L. Ulrich, Estimating detector error models on google’s willow (2026), arXiv:2512.10814 [quant-ph]
arXiv 2026
-
[22]
Sivak, M
V. Sivak, M. Newman, and P. Klimov, Optimization of decoder priors for accurate quantum error correction, Phys. Rev. Lett.133, 150603 (2024)
2024
-
[23]
quantum error correc- tion below the surface code threshold
Google Quantum AI, Data for “quantum error correc- tion below the surface code threshold”, 10.5281/zen- odo.13273331 (2024)
-
[24]
Gidney, Stim: a fast stabilizer circuit simulator, Quan- tum5, 497 (2021)
C. Gidney, Stim: a fast stabilizer circuit simulator, Quan- tum5, 497 (2021)
2021
-
[25]
O. Higgott, Pymatching: A python package for decod- ing quantum codes with minimum-weight perfect match- ing, ACM Trans. Quantum Comput.3, 10.1145/3505637 (2022)
-
[26]
Higgott and C
O. Higgott and C. Gidney, Sparse Blossom: correcting a million errors per core second with minimum-weight matching, Quantum9, 1600 (2025)
2025
-
[27]
J. P. Bonilla Ataides, D. K. Tuckett, S. D. Bartlett, S. T. Flammia, and B. J. Brown, The xzzx surface code, Na- ture Communications12, 2172 (2021)
2021
- [28]
-
[29]
A. Vezvaee, C. Benito, M. Morford-Oberst, A. Bermudez, and D. A. Lidar, Surface code scaling on heavy-hex super- conducting quantum processors (2025), arXiv:2510.18847 [quant-ph]
arXiv 2025
-
[30]
Gullion, D
T. Gullion, D. B. Baker, and M. S. Conradi, New, compensated carr-purcell sequences, Journal of Magnetic Resonance (1969)89, 479 (1990)
1969
-
[31]
Viola, E
L. Viola, E. Knill, and S. Lloyd, Dynamical decoupling of open quantum systems, Physical Review Letters82, 2417 (1999)
1999
-
[32]
A. Vezvaee, V. Tripathi, M. Morford-Oberst, F. Butt, V. Kasatkin, and D. A. Lidar, Demonstration of high- fidelity entangled logical qubits using transmons, Na- ture Communications17, 10.1038/s41467-026-70011-3 (2026)
-
[33]
V. Kasatkin, M. Morford-Oberst, A. Vezvaee, and D. A. Lidar, Quantum error correction and dynam- ical decoupling: Better together or apart? (2026), arXiv:2602.19042 [quant-ph]
arXiv 2026
-
[34]
G. P. Geh´ er, M. Jastrzebski, E. T. Campbell, and O. Crawford, To reset, or not to reset – that is the ques- tion (2025), arXiv:2408.00758 [quant-ph]
arXiv 2025
-
[35]
Tripathi, H
V. Tripathi, H. Chen, M. Khezri, K.-W. Yip, E. Levenson-Falk, and D. A. Lidar, Suppression of crosstalk in superconducting qubits using dynamical de- coupling, Phys. Rev. Appl.18, 024068 (2022)
2022
-
[36]
Z. Zhou, R. Sitler, Y. Oda, K. Schultz, and G. Quiroz, Quantum crosstalk robust quantum control, Physical Re- view Letters131, 210802 (2023)
2023
-
[37]
Dennis, A
E. Dennis, A. Kitaev, A. Landahl, and J. Preskill, Topological quantum memory, Journal of Mathematical Physics43, 4452 (2002). 19
2002
-
[38]
A. G. Fowler, M. Mariantoni, J. M. Martinis, and A. N. Cleland, Surface codes: Towards practical large-scale quantum computation, Phys. Rev. A86, 032324 (2012)
2012
-
[39]
S. T. Spitz, B. Tarasinski, C. W. J. Beenakker, and T. E. O’Brien, Adaptive weight estimator for quantum error correction in a time-dependent environment, Advanced Quantum Technologies1, 1800012 (2018)
2018
-
[40]
A. Remm, N. Lacroix, L. B¨ odeker, E. Genois, C. Hellings, F. m. c. Swiadek, G. J. Norris, C. Eichler, A. Blais, M. M¨ uller, S. Krinner, and A. Wallraff, Experimentally informed decoding of stabilizer codes based on syndrome correlations, Phys. Rev. Res.8, 013044 (2026)
2026
-
[41]
J. A. K. C. Miao, M. McEwenet al., Overcoming leakage in quantum error correction, Nat. Phys.19, 1780 (2023)
2023
-
[42]
A. F. Brown and D. A. Lidar, Efficient chromatic- number-based multiqubit decoherence and crosstalk sup- pression, PRX Quantum6, 020354 (2025)
2025
-
[43]
Evert, Z
B. Evert, Z. Gonzalez Izquierdo, J. Sud, H.-Y. Hu, S. Grabbe, E. G. Rieffel, M. J. Reagor, and Z. Wang, Syncopated dynamical decoupling to suppress crosstalk in quantum circuits, Physical Review Applied24, 044025 (2025)
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.