A Scalable FPGA Architecture for Real-Time Decoding of Quantum LDPC Codes Using GARI
Pith reviewed 2026-05-09 18:53 UTC · model grok-4.3
The pith
A resource-reusing FPGA architecture decodes correlated errors in any quantum LDPC code at 596 ns average latency while using one-sixth the resources of prior designs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The architecture implements message-passing decoding directly on the detector-error graph produced by GARI, augments that graph once per code, then reuses the same processing elements across rounds and across multiple cores; this yields a flexible, scalable decoder that maintains decoding accuracy for correlated errors, achieves an average latency of 596 ns per round on the [[144,12,12]] code, and reduces resource consumption by a factor of six relative to the previous GARI-based design, thereby enabling the first multi-core correlated-error decoder on a single FPGA.
What carries the argument
GARI-augmented graph for message-passing inference, which encodes detector correlations so that standard belief-propagation hardware can handle realistic error models while supporting resource reuse across cores.
If this is right
- Multiple decoder cores can operate simultaneously on one FPGA device, raising throughput for larger quantum processors.
- The same hardware block can be retargeted to new LDPC codes by changing only the graph wiring, reducing redesign effort.
- Power and area of the classical control layer drop enough to allow denser integration with quantum hardware.
- Real-time decoding latency stays below one microsecond even when handling correlated errors.
- The architecture meets the scaling requirements for energy-conscious quantum error correction without accuracy trade-offs.
Where Pith is reading between the lines
- If the reuse pattern continues to scale, the decoder could be placed on the same die as the quantum control electronics, shortening the feedback loop.
- The method may apply to other sparse-graph codes that need correlation handling, such as certain surface-code variants.
- Testing the design at higher core counts would reveal whether routing congestion eventually limits the claimed flexibility.
- Lower classical power draw could extend the practical runtime of cryogenic quantum systems before heat-load limits are reached.
Load-bearing premise
The GARI graph structure preserves enough accuracy for message passing to work on correlated errors without extra hardware or loss of performance when resources are shared among cores.
What would settle it
Run the same three-core design on a second, structurally different quantum LDPC code, measure both its FPGA resource count and its logical error rate under a realistic correlated-noise model, and check whether resource savings and accuracy remain within the reported bounds.
Figures
read the original abstract
In this work, we introduce a new hardware architecture for decoding correlated errors in quantum LDPC codes. The decoder is based on message passing and exploits the structure of the detector error model obtained through the recently introduced Graph Augmentation and Rewiring for Inference (GARI) method. The proposed architecture enables flexible scaling and can, in principle, adapt to any quantum LDPC codes using the GARI framework. It leverages resource reuse while maintaining a modest degree of parallelism, thereby reducing power consumption and area requirements, while preserving low decoding latency. As a case study, the architecture was implemented on a VCU19P FPGA as an ensemble of three decoder cores targeting the [[144,12,12]] bivariate bicycle code, achieving an average latency of 596 ns per decoding round. This implementation consumes six times fewer resources than the previous GARI-based proposal, being the first reported implementation of multiple decoder cores for correlated errors on a single FPGA device. This enables better energy-conscious scaling of the quantum error correction layer on the classical side, reducing overall power consumption while meeting real-time constraints without compromising decoding accuracy under correlated errors.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a scalable FPGA architecture for real-time decoding of correlated errors in quantum LDPC codes, based on message-passing decoding of the detector error model produced by the GARI graph augmentation method. The design uses resource reuse with modest parallelism to reduce area and power while targeting low latency. As a case study, an ensemble of three decoder cores is implemented on a VCU19P FPGA for the [[144,12,12]] bivariate bicycle code, reporting an average latency of 596 ns per decoding round and six times fewer resources than the prior GARI-based proposal. The architecture is presented as adaptable in principle to arbitrary quantum LDPC codes via GARI, enabling energy-conscious scaling of the classical QEC layer without accuracy loss under correlated errors.
Significance. The concrete FPGA implementation with measured resource counts and latency provides practical data for hardware QEC, and the demonstration of multiple cores on a single device is a clear advance toward scalable classical control. If the accuracy-preservation claim holds, the work supports better power/area trade-offs for real-time decoding of correlated errors. The in-principle adaptability argument is a strength, though its significance depends on verification that GARI-augmented graphs remain efficiently decodable by message passing.
major comments (1)
- Abstract and case-study description: the central claim that the architecture operates 'without compromising decoding accuracy under correlated errors' is load-bearing yet unsupported by any error-rate curves, logical-error-rate comparisons against a non-GARI baseline, or details on how accuracy was verified for the [[144,12,12]] code. Without these data the performance claim cannot be assessed.
minor comments (2)
- The description of the resource-reuse strategy and three-core ensemble would benefit from an explicit diagram or table showing the mapping of GARI-augmented check nodes to FPGA resources.
- Clarify whether the reported 596 ns latency includes all pipeline stages or only the message-passing iterations; this affects direct comparison with other real-time decoders.
Simulated Author's Rebuttal
We thank the referee for their detailed review and constructive feedback. We address the major comment point by point below.
read point-by-point responses
-
Referee: Abstract and case-study description: the central claim that the architecture operates 'without compromising decoding accuracy under correlated errors' is load-bearing yet unsupported by any error-rate curves, logical-error-rate comparisons against a non-GARI baseline, or details on how accuracy was verified for the [[144,12,12]] code. Without these data the performance claim cannot be assessed.
Authors: We thank the referee for highlighting this point. The architecture implements the exact message-passing procedure on the GARI-augmented detector error model with no algorithmic changes or approximations, so decoding accuracy is identical to the software GARI decoder. The original GARI publication already provides error-rate curves and logical-error-rate comparisons (including against non-augmented baselines) for the [[144,12,12]] bivariate bicycle code under correlated errors. During hardware development we performed bit-accurate and cycle-accurate simulations confirming that the FPGA output matches the software decoder on the same inputs. In the revised manuscript we will add a short paragraph in the case-study section that (i) explicitly references the relevant accuracy results from the GARI work and (ii) describes the equivalence verification steps performed for the FPGA implementation. We will also qualify the abstract claim to state that accuracy is preserved relative to the GARI software decoder. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper reports an FPGA implementation of a message-passing decoder that exploits the GARI-augmented graph structure for a specific bivariate bicycle code. All quantitative claims (596 ns average latency, six-fold resource reduction versus prior GARI proposal, three-core ensemble on VCU19P) are direct measurements from synthesis and timing analysis on the target device. No equations, fitted parameters, or predictions are defined in terms of the reported outcomes; the architecture description treats GARI as an external input framework rather than re-deriving or fitting its properties. The scaling and adaptability statements are scoped as design goals supported by the concrete case study, not as self-referential derivations.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption GARI-augmented graphs admit efficient message-passing decoding without accuracy degradation for the bivariate bicycle code under the detector error model.
Reference graph
Works this paper leans on
-
[1]
How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits
M. Mohseni, A. Scherer, K. G. Johnson, O. Wertheim, M. Otten, N. A. Aadit, Y . Alexeev, K. M. Bresniker, K. Y . Camsari, B. Chapman, S. Chatterjee, G. A. Dagnew, A. Esposito, F. Fahim, M. Fiorentino, A. Gajjar, A. Khalid, X. Kong, B. Kulchytskyy, E. Kyoseva, R. Li, P. A. Lott, I. L. Markov, R. F. McDermott, G. Pedretti, P. Rao, E. Rieffel, A. Silva, J. So...
work page internal anchor Pith review arXiv 2025
-
[2]
Computer Science Challenges in Quantum Computing: Early Fault-Tolerance and Beyond,
J. Palsberg, J. Cong, Y . Ding, B. Fefferman, M. Qureshi, G. S. Ravi, K. N. Smith, H. Wang, X. Wu, and H. Yuen, “Computer Science Challenges in Quantum Computing: Early Fault-Tolerance and Beyond,” 2026. [Online]. Available: https://arxiv.org/abs/2601.20247
-
[3]
Tesseract: A search-based de- coder for quantum error correction
L. A. Beni, O. Higgott, and N. Shutty, “Tesseract: A Search-Based Decoder for Quantum Error Correction,” 2025. [Online]. Available: https://arxiv.org/abs/2503.10988
-
[4]
doi:10.1103/PRXQuantum.4.040319
M. Fellous-Asiani, J. H. Chai, Y . Thonnart, H. K. Ng, R. S. Whitney, and A. Auff `eves, “Optimizing Resource Efficiencies for Scalable Full-Stack Quantum Computers,”PRX Quantum, vol. 4, p. 040319, Oct 2023. [Online]. Available: https://link.aps.org/doi/10.1103/PRXQuantum.4.040319
-
[5]
Better Than Worst-Case Decoding for Quantum Error Correction,
G. S. Ravi, J. M. Baker, A. Fayyazi, S. F. Lin, A. Javadi- Abhari, M. Pedram, and F. T. Chong, “Better Than Worst-Case Decoding for Quantum Error Correction,” 2022. [Online]. Available: https://arxiv.org/abs/2208.08547
-
[6]
N. Liyanage, Y . Wu, A. Deters, and L. Zhong, “Scalable Quantum Error Correction for Surface Codes using FPGA,” 2023. [Online]. Available: https://arxiv.org/abs/2301.08419
-
[7]
S. Vittal, P. Das, and M. Qureshi, “Astrea: Accurate Quantum Error-Decoding via Practical Minimum-Weight Perfect-Matching,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA ’23. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3579371.3589037
-
[8]
B. Barber, K. M. Barnes, T. Bialas, O. Bu ˘gdaycı, E. T. Campbell, N. I. Gillespie, K. Johar, R. Rajan, A. W. Richardson, L. Skoric, C. Topal, M. L. Turner, and A. B. Ziad, “A real-time, scalable, fast and resource-efficient decoder for a quantum computer,”Nature Electronics, vol. 8, no. 1, p. 84–91, Jan. 2025. [Online]. Available: http://dx.doi.org/10.10...
-
[9]
Y . Wu, N. Liyanage, and L. Zhong, “Micro Blossom: Accelerated Minimum-Weight Perfect Matching Decoding for Quantum Error Correction,” inProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume 2, ser. ASPLOS ’25. ACM, Mar. 2025, p. 639–654. [Online]. Available: http://dx.doi.or...
-
[10]
S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, and T. J. Yoder, “High-threshold and low-overhead fault-tolerant quantum memory,”Nature, vol. 627, no. 8005, p. 778–782, Mar. 2024. [Online]. Available: http://dx.doi.org/10.1038/s41586-024-07107-7
-
[11]
arXiv preprint arXiv:2510.21600 , year=
T. Maurer, M. B ¨uhler, M. Kr¨oner, F. Haverkamp, T. M¨uller, D. Vandeth, and B. R. Johnson, “Real-time decoding of the gross code memory with FPGAs,” 2025. [Online]. Available: https://arxiv.org/abs/2510.21600
-
[12]
Exploring the FPGA and ASIC design space of belief propagation and ordered statistics decoders for quantum error correction codes,
D. B ´ascones, F. Garcia-Herrero, and J. Valls, “Exploring the FPGA and ASIC design space of belief propagation and ordered statistics decoders for quantum error correction codes,”EPJ Quantum Technology, vol. 12, no. 1, p. 140, 2025
2025
- [13]
-
[14]
Versal Prime Series VMK180 Evaluation Kit,
Advanced Micro Devices, “Versal Prime Series VMK180 Evaluation Kit,” https://www.amd.com/en/products/adaptive-socs- and-fpgas/evaluation-boards/vmk180.html, 2026, accessed: 2026-03-05
2026
-
[15]
Toward a union-find decoder for quantum ldpc codes,
N. Delfosse, V . Londe, and M. E. Beverland, “Toward a union-find decoder for quantum ldpc codes,”IEEE Transactions on Information Theory, vol. 68, no. 5, pp. 3187–3199, 2022
2022
-
[16]
Virtex UltraScale+ VU19P FPGA,
Advanced Micro Devices, “Virtex UltraScale+ VU19P FPGA,” https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/virtex- ultrascale-plus-vu19p.html, 2026, accessed: 2026-03-05
2026
-
[17]
F. Garcia-Herrero, J. Valls, L. Vergara-Picazo, and V . Torres, “Diversity Methods for Improving Convergence and Accuracy of Quantum Error Correction Decoders Through Hardware Emulation,” 2025. [Online]. Available: https://arxiv.org/abs/2504.01164
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Au- tomorphism ensemble decoding of quantum LDPC codes
S. Koutsioumpas, H. Sayginel, M. Webster, and D. E. Browne, “Automorphism Ensemble Decoding of Quantum LDPC Codes,” 2025. [Online]. Available: https://arxiv.org/abs/2503.01738
-
[19]
Colour Codes Reach Surface Code Performance using Vibe Decoding,
S. Koutsioumpas, T. Noszko, H. Sayginel, M. Webster, and J. Roffe, “Colour Codes Reach Surface Code Performance using Vibe Decoding,”
-
[20]
Colour codes reach surface code performance using Vibe decoding,
[Online]. Available: https://arxiv.org/abs/2508.15743
-
[21]
Neural and Computational Mechanisms Underlying One -Shot Perceptual Learning in Humans
A. S. Maan, F. M. Garcia Herrero, A. Paler, and V . Savin, “Decoding correlated errors in quantum LDPC codes,”Nature Communications, Mar. 2026. [Online]. Available: http://dx.doi.org/10.1038/s41467-026- 70556-3
-
[22]
Energy-consumption advantage of quantum computation,
F. Meier and H. Yamasaki, “Energy-consumption advantage of quantum computation,”PRX Energy, vol. 4, no. 2, p. 023008, 2025
2025
-
[23]
Auffèves, Quantum technologies need a quantum energy initiative, PRX Quantum 3 (2022) 020101
A. Auff `eves, “Quantum Technologies Need a Quantum Energy Initiative,”PRX Quantum, vol. 3, no. 2, Jun. 2022. [Online]. Available: http://dx.doi.org/10.1103/PRXQuantum.3.020101
-
[24]
Syndrome- Based Min-Sum vs OSD-0 Decoders: FPGA Implementation and Anal- ysis for Quantum LDPC Codes,
J. Valls, F. Garcia-Herrero, N. Raveendran, and B. Vasi ´c, “Syndrome- Based Min-Sum vs OSD-0 Decoders: FPGA Implementation and Anal- ysis for Quantum LDPC Codes,”IEEE Access, vol. 9, pp. 138 734– 138 743, 2021
2021
-
[25]
Degenerate Quantum LDPC Codes With Good Finite Length Performance,
P. Panteleev and G. Kalachev, “Degenerate Quantum LDPC Codes With Good Finite Length Performance,”Quantum, vol. 5, p. 585, Nov
-
[26]
Degenerate Quan- tum LDPC Codes With Good Finite Length Perfor- mance
[Online]. Available: http://dx.doi.org/10.22331/q-2021-11-22-585
-
[27]
Lay- ered Decoding of Quantum LDPC Codes,
J. Du Crest, F. Garcia-Herrero, M. Mhalla, V . Savin, and J. Valls, “Lay- ered Decoding of Quantum LDPC Codes,” in2023 12th International Symposium on Topics in Coding (ISTC), 2023, pp. 1–5
2023
-
[28]
Area, throughput, and energy-efficiency trade-offs in the VLSI implementation of LDPC decoders,
C. Roth, A. Cevrero, C. Studer, Y . Leblebici, and A. Burg, “Area, throughput, and energy-efficiency trade-offs in the VLSI implementation of LDPC decoders,” in2011 IEEE International Symposium of Circuits and Systems (ISCAS), 2011, pp. 1772–1775
2011
-
[29]
Perfect absorption in complex scattering systems with or without hidden symmetries,
T. Proctor, M. Revelle, E. Nielsen, K. Rudinger, D. Lobser, P. Maunz, R. Blume-Kohout, and K. Young, “Detecting and tracking drift in quantum information processors,”Nature Communications, vol. 11, no. 1, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1038/s41467- 020-19074-4
-
[30]
LDPC decoders,
V . Savin, “LDPC decoders,” inChannel coding: Theory, algorithms, and applications, D. Declercq, M. Fossorier, and E. Biglieri, Eds. Elsevier, 2014, pp. 211–260
2014
-
[31]
The Crosspoint-Queued Switch,
Y . Kanizo, D. Hay, and I. Keslassy, “The Crosspoint-Queued Switch,” inIEEE INFOCOM 2009, 2009, pp. 729–737
2009
-
[32]
T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms. MIT press, 2022
2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.