pith. sign in

arxiv: 2605.01035 · v1 · submitted 2026-05-01 · 🪐 quant-ph

A Scalable FPGA Architecture for Real-Time Decoding of Quantum LDPC Codes Using GARI

Pith reviewed 2026-05-09 18:53 UTC · model grok-4.3

classification 🪐 quant-ph
keywords FPGA architecturequantum LDPC codesGARI methodmessage-passing decodingcorrelated errorsreal-time decodingresource reusebivariate bicycle code
0
0 comments X p. Extension

The pith

A resource-reusing FPGA architecture decodes correlated errors in any quantum LDPC code at 596 ns average latency while using one-sixth the resources of prior designs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a hardware decoder that runs message-passing inference on graphs modified by the GARI method to capture error correlations in quantum LDPC codes. The design reuses computational resources across a modest number of parallel cores so that the same circuit block can serve different codes without a full redesign. A working implementation on a VCU19P device with three cores for the [[144,12,12]] bivariate bicycle code reaches 596 ns per round and consumes six times fewer FPGA resources than the earlier GARI proposal. If the approach holds, quantum error-correction layers can place multiple accurate decoders on one classical chip while staying inside real-time and power budgets.

Core claim

The architecture implements message-passing decoding directly on the detector-error graph produced by GARI, augments that graph once per code, then reuses the same processing elements across rounds and across multiple cores; this yields a flexible, scalable decoder that maintains decoding accuracy for correlated errors, achieves an average latency of 596 ns per round on the [[144,12,12]] code, and reduces resource consumption by a factor of six relative to the previous GARI-based design, thereby enabling the first multi-core correlated-error decoder on a single FPGA.

What carries the argument

GARI-augmented graph for message-passing inference, which encodes detector correlations so that standard belief-propagation hardware can handle realistic error models while supporting resource reuse across cores.

If this is right

  • Multiple decoder cores can operate simultaneously on one FPGA device, raising throughput for larger quantum processors.
  • The same hardware block can be retargeted to new LDPC codes by changing only the graph wiring, reducing redesign effort.
  • Power and area of the classical control layer drop enough to allow denser integration with quantum hardware.
  • Real-time decoding latency stays below one microsecond even when handling correlated errors.
  • The architecture meets the scaling requirements for energy-conscious quantum error correction without accuracy trade-offs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the reuse pattern continues to scale, the decoder could be placed on the same die as the quantum control electronics, shortening the feedback loop.
  • The method may apply to other sparse-graph codes that need correlation handling, such as certain surface-code variants.
  • Testing the design at higher core counts would reveal whether routing congestion eventually limits the claimed flexibility.
  • Lower classical power draw could extend the practical runtime of cryogenic quantum systems before heat-load limits are reached.

Load-bearing premise

The GARI graph structure preserves enough accuracy for message passing to work on correlated errors without extra hardware or loss of performance when resources are shared among cores.

What would settle it

Run the same three-core design on a second, structurally different quantum LDPC code, measure both its FPGA resource count and its logical error rate under a realistic correlated-noise model, and check whether resource savings and accuracy remain within the reported bounds.

Figures

Figures reproduced from arXiv: 2605.01035 by Arshpreet Singh Maan, Daniel B\'ascones, Francisco Garcia-Herrero, Valentin Savin.

Figure 1
Figure 1. Figure 1: Overview of the GARI architecture from this work. view at source ↗
Figure 2
Figure 2. Figure 2: The architecture for the DX, DZ decoder. Memory elements shown in blue (light for RAM, dark for ROM). I/O in purple. Sub-modules in gray. Output filtering is not required, since downstream processing is protected by the input stage, which blocks invalid values. Within the CNU, message updates are performed according to the normalized min-sum rules [28], exchanging information between all connected variable… view at source ↗
Figure 3
Figure 3. Figure 3: Architecture for the U, V decoder. Memory elements shown in blue (light for RAM, dark for ROM), I/O in purple. Sub-modules in gray. to simplify the hardware, since their messages are equal to the initial LLR value plus the incoming message from the opposite matrix update. • A single variable node from either e¯X or e¯Z, whose message must be brought in from the DX, DZ unit. Note that any processing order w… view at source ↗
Figure 4
Figure 4. Figure 4: The crossbar interconnect. Note that the inputs and outputs are view at source ↗
Figure 5
Figure 5. Figure 5: Architecture for the full decoder. Note the straightforward mapping between view at source ↗
Figure 7
Figure 7. Figure 7: Timing for the [[144, 12, 12]] gross code. plus a small overhead to fill the pipeline. In our experiments, eight pipeline stages are sufficient to achieve clock frequencies above 300 MHz. It is important to note that the separation between checks that share a variable must be greater than the number of pipeline stages to avoid using outdated (stale) values. In practice, for the [[144, 12, 12]] gross code, … view at source ↗
Figure 8
Figure 8. Figure 8: Placement of a single core of the proposed decoder for correlated view at source ↗
Figure 9
Figure 9. Figure 9: Placement of a three-decoder ensemble for correlated errors within the view at source ↗
read the original abstract

In this work, we introduce a new hardware architecture for decoding correlated errors in quantum LDPC codes. The decoder is based on message passing and exploits the structure of the detector error model obtained through the recently introduced Graph Augmentation and Rewiring for Inference (GARI) method. The proposed architecture enables flexible scaling and can, in principle, adapt to any quantum LDPC codes using the GARI framework. It leverages resource reuse while maintaining a modest degree of parallelism, thereby reducing power consumption and area requirements, while preserving low decoding latency. As a case study, the architecture was implemented on a VCU19P FPGA as an ensemble of three decoder cores targeting the [[144,12,12]] bivariate bicycle code, achieving an average latency of 596 ns per decoding round. This implementation consumes six times fewer resources than the previous GARI-based proposal, being the first reported implementation of multiple decoder cores for correlated errors on a single FPGA device. This enables better energy-conscious scaling of the quantum error correction layer on the classical side, reducing overall power consumption while meeting real-time constraints without compromising decoding accuracy under correlated errors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper introduces a scalable FPGA architecture for real-time decoding of correlated errors in quantum LDPC codes, based on message-passing decoding of the detector error model produced by the GARI graph augmentation method. The design uses resource reuse with modest parallelism to reduce area and power while targeting low latency. As a case study, an ensemble of three decoder cores is implemented on a VCU19P FPGA for the [[144,12,12]] bivariate bicycle code, reporting an average latency of 596 ns per decoding round and six times fewer resources than the prior GARI-based proposal. The architecture is presented as adaptable in principle to arbitrary quantum LDPC codes via GARI, enabling energy-conscious scaling of the classical QEC layer without accuracy loss under correlated errors.

Significance. The concrete FPGA implementation with measured resource counts and latency provides practical data for hardware QEC, and the demonstration of multiple cores on a single device is a clear advance toward scalable classical control. If the accuracy-preservation claim holds, the work supports better power/area trade-offs for real-time decoding of correlated errors. The in-principle adaptability argument is a strength, though its significance depends on verification that GARI-augmented graphs remain efficiently decodable by message passing.

major comments (1)
  1. Abstract and case-study description: the central claim that the architecture operates 'without compromising decoding accuracy under correlated errors' is load-bearing yet unsupported by any error-rate curves, logical-error-rate comparisons against a non-GARI baseline, or details on how accuracy was verified for the [[144,12,12]] code. Without these data the performance claim cannot be assessed.
minor comments (2)
  1. The description of the resource-reuse strategy and three-core ensemble would benefit from an explicit diagram or table showing the mapping of GARI-augmented check nodes to FPGA resources.
  2. Clarify whether the reported 596 ns latency includes all pipeline stages or only the message-passing iterations; this affects direct comparison with other real-time decoders.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their detailed review and constructive feedback. We address the major comment point by point below.

read point-by-point responses
  1. Referee: Abstract and case-study description: the central claim that the architecture operates 'without compromising decoding accuracy under correlated errors' is load-bearing yet unsupported by any error-rate curves, logical-error-rate comparisons against a non-GARI baseline, or details on how accuracy was verified for the [[144,12,12]] code. Without these data the performance claim cannot be assessed.

    Authors: We thank the referee for highlighting this point. The architecture implements the exact message-passing procedure on the GARI-augmented detector error model with no algorithmic changes or approximations, so decoding accuracy is identical to the software GARI decoder. The original GARI publication already provides error-rate curves and logical-error-rate comparisons (including against non-augmented baselines) for the [[144,12,12]] bivariate bicycle code under correlated errors. During hardware development we performed bit-accurate and cycle-accurate simulations confirming that the FPGA output matches the software decoder on the same inputs. In the revised manuscript we will add a short paragraph in the case-study section that (i) explicitly references the relevant accuracy results from the GARI work and (ii) describes the equivalence verification steps performed for the FPGA implementation. We will also qualify the abstract claim to state that accuracy is preserved relative to the GARI software decoder. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper reports an FPGA implementation of a message-passing decoder that exploits the GARI-augmented graph structure for a specific bivariate bicycle code. All quantitative claims (596 ns average latency, six-fold resource reduction versus prior GARI proposal, three-core ensemble on VCU19P) are direct measurements from synthesis and timing analysis on the target device. No equations, fitted parameters, or predictions are defined in terms of the reported outcomes; the architecture description treats GARI as an external input framework rather than re-deriving or fitting its properties. The scaling and adaptability statements are scoped as design goals supported by the concrete case study, not as self-referential derivations.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the assumption that the GARI rewiring produces a graph whose message-passing decoder remains accurate for the target correlated error model; no independent verification of that accuracy is supplied in the abstract. No free parameters or new invented entities are introduced in the abstract.

axioms (1)
  • domain assumption GARI-augmented graphs admit efficient message-passing decoding without accuracy degradation for the bivariate bicycle code under the detector error model.
    Invoked implicitly when the architecture is claimed to preserve decoding accuracy while exploiting the GARI structure.

pith-pipeline@v0.9.0 · 5510 in / 1373 out tokens · 33485 ms · 2026-05-09T18:53:34.979022+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    How to Build a Quantum Supercomputer: Scaling from Hundreds to Millions of Qubits

    M. Mohseni, A. Scherer, K. G. Johnson, O. Wertheim, M. Otten, N. A. Aadit, Y . Alexeev, K. M. Bresniker, K. Y . Camsari, B. Chapman, S. Chatterjee, G. A. Dagnew, A. Esposito, F. Fahim, M. Fiorentino, A. Gajjar, A. Khalid, X. Kong, B. Kulchytskyy, E. Kyoseva, R. Li, P. A. Lott, I. L. Markov, R. F. McDermott, G. Pedretti, P. Rao, E. Rieffel, A. Silva, J. So...

  2. [2]

    Computer Science Challenges in Quantum Computing: Early Fault-Tolerance and Beyond,

    J. Palsberg, J. Cong, Y . Ding, B. Fefferman, M. Qureshi, G. S. Ravi, K. N. Smith, H. Wang, X. Wu, and H. Yuen, “Computer Science Challenges in Quantum Computing: Early Fault-Tolerance and Beyond,” 2026. [Online]. Available: https://arxiv.org/abs/2601.20247

  3. [3]

    Tesseract: A search-based de- coder for quantum error correction

    L. A. Beni, O. Higgott, and N. Shutty, “Tesseract: A Search-Based Decoder for Quantum Error Correction,” 2025. [Online]. Available: https://arxiv.org/abs/2503.10988

  4. [4]

    doi:10.1103/PRXQuantum.4.040319

    M. Fellous-Asiani, J. H. Chai, Y . Thonnart, H. K. Ng, R. S. Whitney, and A. Auff `eves, “Optimizing Resource Efficiencies for Scalable Full-Stack Quantum Computers,”PRX Quantum, vol. 4, p. 040319, Oct 2023. [Online]. Available: https://link.aps.org/doi/10.1103/PRXQuantum.4.040319

  5. [5]

    Better Than Worst-Case Decoding for Quantum Error Correction,

    G. S. Ravi, J. M. Baker, A. Fayyazi, S. F. Lin, A. Javadi- Abhari, M. Pedram, and F. T. Chong, “Better Than Worst-Case Decoding for Quantum Error Correction,” 2022. [Online]. Available: https://arxiv.org/abs/2208.08547

  6. [6]

    Liyanage, L

    N. Liyanage, Y . Wu, A. Deters, and L. Zhong, “Scalable Quantum Error Correction for Surface Codes using FPGA,” 2023. [Online]. Available: https://arxiv.org/abs/2301.08419

  7. [7]

    OliVe: Accelerating Large Language Models via Hardware-friendly Outlier-Victim Pair Quantization , url=

    S. Vittal, P. Das, and M. Qureshi, “Astrea: Accurate Quantum Error-Decoding via Practical Minimum-Weight Perfect-Matching,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA ’23. New York, NY , USA: Association for Computing Machinery, 2023. [Online]. Available: https://doi.org/10.1145/3579371.3589037

  8. [8]

    2025), 84–91

    B. Barber, K. M. Barnes, T. Bialas, O. Bu ˘gdaycı, E. T. Campbell, N. I. Gillespie, K. Johar, R. Rajan, A. W. Richardson, L. Skoric, C. Topal, M. L. Turner, and A. B. Ziad, “A real-time, scalable, fast and resource-efficient decoder for a quantum computer,”Nature Electronics, vol. 8, no. 1, p. 84–91, Jan. 2025. [Online]. Available: http://dx.doi.org/10.10...

  9. [9]

    Cross, Theodore J

    Y . Wu, N. Liyanage, and L. Zhong, “Micro Blossom: Accelerated Minimum-Weight Perfect Matching Decoding for Quantum Error Correction,” inProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume 2, ser. ASPLOS ’25. ACM, Mar. 2025, p. 639–654. [Online]. Available: http://dx.doi.or...

  10. [10]

    and Gambetta, Jay M

    S. Bravyi, A. W. Cross, J. M. Gambetta, D. Maslov, P. Rall, and T. J. Yoder, “High-threshold and low-overhead fault-tolerant quantum memory,”Nature, vol. 627, no. 8005, p. 778–782, Mar. 2024. [Online]. Available: http://dx.doi.org/10.1038/s41586-024-07107-7

  11. [11]

    arXiv preprint arXiv:2510.21600 , year=

    T. Maurer, M. B ¨uhler, M. Kr¨oner, F. Haverkamp, T. M¨uller, D. Vandeth, and B. R. Johnson, “Real-time decoding of the gross code memory with FPGAs,” 2025. [Online]. Available: https://arxiv.org/abs/2510.21600

  12. [12]

    Exploring the FPGA and ASIC design space of belief propagation and ordered statistics decoders for quantum error correction codes,

    D. B ´ascones, F. Garcia-Herrero, and J. Valls, “Exploring the FPGA and ASIC design space of belief propagation and ordered statistics decoders for quantum error correction codes,”EPJ Quantum Technology, vol. 12, no. 1, p. 140, 2025

  13. [13]

    Maurya, T

    S. Maurya, T. Maurer, M. B ¨uhler, D. Vandeth, and M. E. Beverland, “FPGA-tailored algorithms for real-time decoding of quantum LDPC codes,” 2026. [Online]. Available: https://arxiv.org/abs/2511.21660

  14. [14]

    Versal Prime Series VMK180 Evaluation Kit,

    Advanced Micro Devices, “Versal Prime Series VMK180 Evaluation Kit,” https://www.amd.com/en/products/adaptive-socs- and-fpgas/evaluation-boards/vmk180.html, 2026, accessed: 2026-03-05

  15. [15]

    Toward a union-find decoder for quantum ldpc codes,

    N. Delfosse, V . Londe, and M. E. Beverland, “Toward a union-find decoder for quantum ldpc codes,”IEEE Transactions on Information Theory, vol. 68, no. 5, pp. 3187–3199, 2022

  16. [16]

    Virtex UltraScale+ VU19P FPGA,

    Advanced Micro Devices, “Virtex UltraScale+ VU19P FPGA,” https://www.amd.com/en/products/adaptive-socs-and-fpgas/fpga/virtex- ultrascale-plus-vu19p.html, 2026, accessed: 2026-03-05

  17. [17]

    Diversity Methods for Improving Convergence and Accuracy of Quantum Error Correction Decoders Through Hardware Emulation

    F. Garcia-Herrero, J. Valls, L. Vergara-Picazo, and V . Torres, “Diversity Methods for Improving Convergence and Accuracy of Quantum Error Correction Decoders Through Hardware Emulation,” 2025. [Online]. Available: https://arxiv.org/abs/2504.01164

  18. [18]

    Au- tomorphism ensemble decoding of quantum LDPC codes

    S. Koutsioumpas, H. Sayginel, M. Webster, and D. E. Browne, “Automorphism Ensemble Decoding of Quantum LDPC Codes,” 2025. [Online]. Available: https://arxiv.org/abs/2503.01738

  19. [19]

    Colour Codes Reach Surface Code Performance using Vibe Decoding,

    S. Koutsioumpas, T. Noszko, H. Sayginel, M. Webster, and J. Roffe, “Colour Codes Reach Surface Code Performance using Vibe Decoding,”

  20. [20]

    Colour codes reach surface code performance using Vibe decoding,

    [Online]. Available: https://arxiv.org/abs/2508.15743

  21. [21]

    Neural and Computational Mechanisms Underlying One -Shot Perceptual Learning in Humans

    A. S. Maan, F. M. Garcia Herrero, A. Paler, and V . Savin, “Decoding correlated errors in quantum LDPC codes,”Nature Communications, Mar. 2026. [Online]. Available: http://dx.doi.org/10.1038/s41467-026- 70556-3

  22. [22]

    Energy-consumption advantage of quantum computation,

    F. Meier and H. Yamasaki, “Energy-consumption advantage of quantum computation,”PRX Energy, vol. 4, no. 2, p. 023008, 2025

  23. [23]

    Auffèves, Quantum technologies need a quantum energy initiative, PRX Quantum 3 (2022) 020101

    A. Auff `eves, “Quantum Technologies Need a Quantum Energy Initiative,”PRX Quantum, vol. 3, no. 2, Jun. 2022. [Online]. Available: http://dx.doi.org/10.1103/PRXQuantum.3.020101

  24. [24]

    Syndrome- Based Min-Sum vs OSD-0 Decoders: FPGA Implementation and Anal- ysis for Quantum LDPC Codes,

    J. Valls, F. Garcia-Herrero, N. Raveendran, and B. Vasi ´c, “Syndrome- Based Min-Sum vs OSD-0 Decoders: FPGA Implementation and Anal- ysis for Quantum LDPC Codes,”IEEE Access, vol. 9, pp. 138 734– 138 743, 2021

  25. [25]

    Degenerate Quantum LDPC Codes With Good Finite Length Performance,

    P. Panteleev and G. Kalachev, “Degenerate Quantum LDPC Codes With Good Finite Length Performance,”Quantum, vol. 5, p. 585, Nov

  26. [26]

    Degenerate Quan- tum LDPC Codes With Good Finite Length Perfor- mance

    [Online]. Available: http://dx.doi.org/10.22331/q-2021-11-22-585

  27. [27]

    Lay- ered Decoding of Quantum LDPC Codes,

    J. Du Crest, F. Garcia-Herrero, M. Mhalla, V . Savin, and J. Valls, “Lay- ered Decoding of Quantum LDPC Codes,” in2023 12th International Symposium on Topics in Coding (ISTC), 2023, pp. 1–5

  28. [28]

    Area, throughput, and energy-efficiency trade-offs in the VLSI implementation of LDPC decoders,

    C. Roth, A. Cevrero, C. Studer, Y . Leblebici, and A. Burg, “Area, throughput, and energy-efficiency trade-offs in the VLSI implementation of LDPC decoders,” in2011 IEEE International Symposium of Circuits and Systems (ISCAS), 2011, pp. 1772–1775

  29. [29]

    Perfect absorption in complex scattering systems with or without hidden symmetries,

    T. Proctor, M. Revelle, E. Nielsen, K. Rudinger, D. Lobser, P. Maunz, R. Blume-Kohout, and K. Young, “Detecting and tracking drift in quantum information processors,”Nature Communications, vol. 11, no. 1, Oct. 2020. [Online]. Available: http://dx.doi.org/10.1038/s41467- 020-19074-4

  30. [30]

    LDPC decoders,

    V . Savin, “LDPC decoders,” inChannel coding: Theory, algorithms, and applications, D. Declercq, M. Fossorier, and E. Biglieri, Eds. Elsevier, 2014, pp. 211–260

  31. [31]

    The Crosspoint-Queued Switch,

    Y . Kanizo, D. Hay, and I. Keslassy, “The Crosspoint-Queued Switch,” inIEEE INFOCOM 2009, 2009, pp. 729–737

  32. [32]

    T. H. Cormen, C. E. Leiserson, R. L. Rivest, and C. Stein,Introduction to algorithms. MIT press, 2022