arxiv: 2603.22149 · v2 · submitted 2026-03-23 · 🪐 quant-ph · cs.AR

Recognition: no theorem link

Low Latency GNN Accelerator for Quantum Error Correction

Alessio Cicero , Luigi Altamura , Moritz Lange , Mats Granath , Pedro Trancoso

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:43 UTC · model grok-4.3

classification 🪐 quant-ph cs.AR

keywords quantum error correctionsurface codegraph neural networkFPGA acceleratordecoder latencysuperconducting qubitslogical error ratereal-time decoding

0 comments

The pith

An FPGA accelerator for a graph neural network decoder performs quantum error correction in under one microsecond with lower error rates than prior methods for surface codes up to distance 7.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that a high-accuracy graph neural network decoder for surface-code quantum error correction can be adapted with hardware-aware optimizations and mapped to an FPGA to satisfy the strict one-microsecond decoding deadline set by superconducting qubit coherence times. This yields both the required speed and a lower logical error rate than existing decoders for code distances up to seven. A reader would care because real-time decoding is the principal bottleneck preventing fault-tolerant operation of near-term quantum hardware; meeting the latency bound without accuracy loss removes one concrete barrier to scaling logical qubits. The work shows that neural-network decoders need not trade accuracy for speed when the implementation is co-designed with the target hardware.

Core claim

By applying hardware-aware optimizations to a high-accuracy GNN-based decoder and implementing several accelerator-level improvements on an FPGA, the system reaches a decoding latency smaller than one microsecond while producing a lower logical error rate than the state-of-the-art for surface codes of distance up to d=7.

What carries the argument

Hardware-aware GNN decoder mapped to an FPGA accelerator that enforces the one-microsecond latency bound while preserving decoding accuracy.

If this is right

Decoding finishes inside the coherence window of current superconducting qubits, allowing error correction to keep pace with physical operations.
The same optimized GNN model delivers lower logical error rates than lookup-table or minimum-weight perfect-matching decoders for distances up to seven.
FPGA resource usage remains compatible with integration alongside qubit control electronics on the same board.
The approach removes the accuracy-latency trade-off that previously forced designers to accept higher logical error rates to meet the one-microsecond deadline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same optimization pattern extends to distance nine or eleven, the latency margin could accommodate more complex decoding graphs without additional hardware.
Embedding the accelerator directly in the cryogenic control stack could eliminate the round-trip communication delay that currently adds to total correction time.
The technique may transfer to other neural-network decoders for color codes or heavy-hexagon codes once equivalent hardware-aware pruning rules are derived.

Load-bearing premise

The hardware-aware optimizations applied to the GNN decoder preserve its accuracy sufficiently to outperform prior decoders while meeting the one-microsecond timing constraint.

What would settle it

Direct measurement on the target FPGA showing that, for code distance seven, the logical error rate rises above the best competing decoder once latency is forced below one microsecond.

Figures

Figures reproduced from arXiv: 2603.22149 by Alessio Cicero, Luigi Altamura, Mats Granath, Moritz Lange, Pedro Trancoso.

**Figure 2.** Figure 2: Surface code of distance 3, with qubits highlighted according to their [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Three pipeline stages architecture of the GNN. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 5.** Figure 5: Tail probabilities of the GNN input graph node count for code distance [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Effect of weights, layers output features, or biases only quan [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Latency as a function of the number of input graph nodes [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of logical error rate and latency across different [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

read the original abstract

Quantum computers have the potential to solve certain complex problems in a much more efficient way than classical computers. Nevertheless, current quantum computer implementations are limited by high physical error rates. This issue is addressed by Quantum Error Correction (QEC) codes, which use multiple physical qubits to form a logical qubit to achieve a lower logical error rate, with the surface code being one of the most commonly used. The most time-critical step in this process is interpreting the measurements of the physical qubits to determine which errors have most likely occurred - a task called decoding. Consequently, the main challenge for QEC is to achieve error correction with high accuracy within the tight $1\mu s$ decoding time budget imposed by superconducting qubits. State-of-the-art QEC approaches trade accuracy for latency. In this work, we propose an FPGA accelerator for a Neural Network based decoder as a way to achieve a lower logical error rate than current methods within the tight time constraint, for code distance up to d=7. We achieved this goal by applying different hardware-aware optimizations to a high-accuracy GNN-based decoder. In addition, we propose several accelerator optimizations leading to the FPGA-based decoder achieving a latency smaller than $1\mu s$, with a lower error rate compared to the state-of-the-art.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FPGA GNN decoder hits sub-1us latency for d=7 surface codes via hardware optimizations, with claimed accuracy edge that needs direct SOTA tables to confirm.

read the letter

The paper describes an FPGA accelerator built around a graph neural network decoder for surface-code quantum error correction. It reaches measured latency below 1 microsecond for code distances up to 7 while reporting a lower logical error rate than prior methods after applying hardware-aware changes to the network. They start from an existing high-accuracy GNN and add quantization, pruning, and fixed-point conversion, then layer on FPGA-specific scheduling and memory optimizations to fit the tight timing window set by superconducting qubits. The result is a concrete implementation that stays inside the real-time budget without dropping to CPU or GPU simulation speeds. This engineering focus is the useful part. Most decoder papers either stay at the algorithmic level or ignore the 1us constraint entirely, so showing an end-to-end FPGA path that meets it for d=7 gives people building control hardware something they can actually try to replicate or extend. The optimizations are described at a level that looks reproducible if the code or bitstream details are released. The soft spot is the accuracy claim after optimization. The abstract asserts outperformance over state-of-the-art, but the central question is whether the hardware changes preserve enough of the original GNN accuracy to beat MWPM and other neural baselines under identical noise models and distances. If the full results include side-by-side logical error rate tables with error bars for the exact same setups, that closes the gap; without them the advantage stays harder to judge. The work stops at d=7, which is fine for a first hardware demonstration but leaves scaling behavior open. This is mainly for readers who design or evaluate real-time decoders for superconducting platforms. Someone working on FPGA or ASIC implementations for QEC will find the latency numbers and optimization choices directly relevant. It deserves peer review because the timing result is concrete and the topic matters for near-term hardware, even if the accuracy comparison needs tighter documentation in revision.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an FPGA accelerator for a Graph Neural Network (GNN)-based decoder for surface-code quantum error correction. Through hardware-aware optimizations including quantization and pruning, it claims to deliver end-to-end decoding latency below 1 μs while achieving lower logical error rates than state-of-the-art methods (MWPM and prior NN decoders) for code distances up to d=7.

Significance. If the accuracy-preservation and latency claims hold under identical noise models, the work would be significant for practical QEC: it directly targets the sub-1 μs coherence-time constraint of superconducting qubits and supplies a concrete, synthesizable FPGA implementation rather than an abstract algorithm. Reproducible hardware results and explicit baseline comparisons would strengthen its utility for near-term fault-tolerant experiments.

major comments (2)

[Results section] Results section: the post-optimization logical error rates for d=7 are stated to be lower than SOTA, yet no side-by-side table compares the optimized GNN against MWPM and the exact prior NN baselines under the same noise model, code distances, and measurement protocol; without this, the central outperformance claim cannot be verified.
[Hardware-Aware Optimizations section] Hardware-Aware Optimizations section: the manuscript does not report the logical error rate of the unoptimized GNN versus the quantized/pruned version for d=7, nor does it supply error bars or statistical details on how accuracy was measured after fixed-point conversion; this leaves the accuracy-preservation assumption untested and load-bearing for the latency-accuracy tradeoff claim.

minor comments (2)

[Abstract] Abstract: quantitative latency and error-rate numbers are asserted but not supplied, reducing clarity for readers.
[Figures] Figure captions: several figures lack explicit axis units or legend definitions for the noise model parameters.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help strengthen the clarity and verifiability of our claims. We address each major point below and have revised the manuscript to incorporate the requested comparisons and details.

read point-by-point responses

Referee: [Results section] Results section: the post-optimization logical error rates for d=7 are stated to be lower than SOTA, yet no side-by-side table compares the optimized GNN against MWPM and the exact prior NN baselines under the same noise model, code distances, and measurement protocol; without this, the central outperformance claim cannot be verified.

Authors: We agree that a direct side-by-side comparison table is necessary to substantiate the outperformance claim. In the revised manuscript, we have added a new table in the Results section that explicitly compares the logical error rates of the optimized GNN decoder against MWPM and the prior NN baselines. All entries use identical noise models, code distances up to d=7, and the same measurement protocol, confirming the lower error rates achieved by our approach. revision: yes
Referee: [Hardware-Aware Optimizations section] Hardware-Aware Optimizations section: the manuscript does not report the logical error rate of the unoptimized GNN versus the quantized/pruned version for d=7, nor does it supply error bars or statistical details on how accuracy was measured after fixed-point conversion; this leaves the accuracy-preservation assumption untested and load-bearing for the latency-accuracy tradeoff claim.

Authors: We acknowledge the need for these details to validate accuracy preservation. The revised Hardware-Aware Optimizations section now reports the logical error rates for the unoptimized GNN versus the quantized/pruned version at d=7. We have also added error bars derived from multiple independent simulation runs and included a description of the statistical methodology and fixed-point conversion protocol used to measure post-optimization accuracy. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering implementation of GNN decoder accelerator

full rationale

The paper presents an FPGA-based hardware accelerator for a pre-existing GNN decoder, applying standard optimizations such as quantization and pruning to meet latency constraints. No mathematical derivation chain, equations, or predictions are shown that reduce claimed performance metrics to parameters fitted from the same data or to self-citations. Claims rest on empirical benchmarking and hardware measurements rather than any self-definitional or fitted-input structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate free parameters or axioms; the central claim implicitly rests on the assumption that a pre-existing GNN decoder architecture can be ported to FPGA with accuracy-preserving optimizations, but no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5528 in / 1107 out tokens · 29278 ms · 2026-05-15T00:43:07.669982+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

[1]

Benjamin, and Xiao Yuan

Sam McArdle, Suguru Endo, Al ´an Aspuru-Guzik, Simon C. Benjamin, and Xiao Yuan. Quantum computational chemistry. Reviews of Modern Physics, 92:015003, Mar 2020

work page 2020
[2]

Emerging quantum computing algorithms for quantum chemistry

Mario Motta and Julia E Rice. Emerging quantum computing algorithms for quantum chemistry. Wiley Interdisciplinary Reviews: Computational Molecular Science, 12(3):e1580, 2022

work page 2022
[3]

Evaluating the evidence for exponential quantum advantage in ground-state quantum chemistry

Seunghoon Lee, Joonho Lee, Huanchen Zhai, Yu Tong, Alexander M Dalzell, Ashutosh Kumar, Phillip Helms, Johnnie Gray, Zhi-Hao Cui, Wenyuan Liu, et al. Evaluating the evidence for exponential quantum advantage in ground-state quantum chemistry. Nature communications, 14(1):1952, 2023

work page 1952
[4]

Quantum algorithms for quantum chemistry and quantum materials science

Bela Bauer, Sergey Bravyi, Mario Motta, and Garnet Kin-Lic Chan. Quantum algorithms for quantum chemistry and quantum materials science. Chemical Reviews, 120(22):12685–12717, 2020

work page 2020
[5]

Quantum-centric supercom- puting for materials science: A perspective on challenges and future directions

Yuri Alexeev, Maximilian Amsler, Marco Antonio Barroca, Sanzio Bassini, Torey Battelle, Daan Camps, David Casanova, Young Jay Choi, Frederic T Chong, Charles Chung, et al. Quantum-centric supercom- puting for materials science: A perspective on challenges and future directions. Future Generation Computer Systems, 160:666–710, 2024

work page 2024
[6]

Real-time decoding for fault-tolerant quantum computing: Progress, challenges and outlook

Francesco Battistel, Christopher Chamberland, Kauser Johar, Ramon WJ Overwater, Fabio Sebastiano, Luka Skoric, Yosuke Ueno, and Muham- mad Usman. Real-time decoding for fault-tolerant quantum computing: Progress, challenges and outlook. Nano Futures, 7(3):032003, 2023

work page 2023
[7]

Nielsen and Isaac L

Michael A. Nielsen and Isaac L. Chuang. Quantum Computation and Quantum Information. Cambridge University Press, 2023

work page 2023
[8]

Quantum error correction for dummies

Avimita Chatterjee, Koustubh Phalak, and Swaroop Ghosh. Quantum error correction for dummies. In 2023 IEEE International Conference on Quantum Computing and Engineering (QCE), volume 1, pages 70–81. IEEE, 2023

work page 2023
[9]

Surface codes: Towards practical large-scale quantum compu- tation

Austin G Fowler, Matteo Mariantoni, John M Martinis, and Andrew N Cleland. Surface codes: Towards practical large-scale quantum compu- tation. Physical Review A—Atomic, Molecular, and Optical Physics, 86(3):032324, 2012

work page 2012
[10]

Suppressing quantum errors by scaling a surface code logical qubit

Google Quantum AI. Suppressing quantum errors by scaling a surface code logical qubit. Nature, 614(7949):676–681, 2023

work page 2023
[11]

Quantum error correction for quantum memories

Barbara M Terhal. Quantum error correction for quantum memories. Reviews of Modern Physics, 87(2):307–346, 2015

work page 2015
[12]

Building logical qubits in a superconducting quantum computing system

Jay M Gambetta, Jerry M Chow, and Matthias Steffen. Building logical qubits in a superconducting quantum computing system. npj quantum information, 3(1):2, 2017

work page 2017
[13]

Promatch: Extending the reach of real-time quantum error correction with adaptive predecoding

Narges Alavisamani, Suhas Vittal, Ramin Ayanzadeh, Poulami Das, and Moinuddin Qureshi. Promatch: Extending the reach of real-time quantum error correction with adaptive predecoding. In Proceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume3, pages 818– 833, 2024

work page 2024
[14]

Lilliput: a lightweight low-latency lookup-table decoder for near-term quantum error cor- rection

Poulami Das, Aditya Locharla, and Cody Jones. Lilliput: a lightweight low-latency lookup-table decoder for near-term quantum error cor- rection. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, pages 541–553, 2022

work page 2022
[15]

Parallel window decoding enables scalable fault tolerant quantum computation

Luka Skoric, Dan E Browne, Kenton M Barnes, Neil I Gillespie, and Earl T Campbell. Parallel window decoding enables scalable fault tolerant quantum computation. Nature Communications, 14(1):7040, 2023

work page 2023
[16]

Micro blossom: Accel- erated minimum-weight perfect matching decoding for quantum error correction

Yue Wu, Namitha Liyanage, and Lin Zhong. Micro blossom: Accel- erated minimum-weight perfect matching decoding for quantum error correction. In Proceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, V olume2, pages 639–654, 2025

work page 2025
[17]

Astrea: Accu- rate quantum error-decoding via practical minimum-weight perfect- matching

Suhas Vittal, Poulami Das, and Moinuddin Qureshi. Astrea: Accu- rate quantum error-decoding via practical minimum-weight perfect- matching. In Proceedings of the 50th Annual International Symposium on Computer Architecture, pages 1–16, 2023

work page 2023
[18]

Data-driven decoding of quantum error cor- recting codes using graph neural networks

Moritz Lange, Pontus Havstr ¨om, Basudha Srivastava, Isak Bengtsson, Valdemar Bergentall, Karl Hammar, Olivia Heuts, Evert van Nieuwen- burg, and Mats Granath. Data-driven decoding of quantum error cor- recting codes using graph neural networks. Physical Review Research, 7(2):023181, 2025

work page 2025
[19]

Quantum error correction below the surface code threshold

Google Quantum AI. Quantum error correction below the surface code threshold. Nature, 2024

work page 2024
[20]

Demonstration of quantum volume 64 on a superconducting quantum computing system

Petar Jurcevic, Ali Javadi-Abhari, Lev S Bishop, Isaac Lauer, Daniela F Bogorin, Markus Brink, Lauren Capelluto, Oktay G ¨unl¨uk, Toshinari Itoko, Naoki Kanazawa, Abhinav Kandala, George A Keefe, Kevin Kr- sulich, William Landers, Eric P Lewandowski, Douglas T McClure, Gia- como Nannicini, Adinath Narasgond, Hasan M Nayfeh, Emily Pritchett, Mary Beth Roth...

work page 2021
[21]

How to factor 2048 bit RSA integers with less than a million noisy qubits

Craig Gidney. How to factor 2048 bit rsa integers with less than a million noisy qubits. arXiv preprint arXiv:2505.15917, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2048
[22]

Demonstration of fault-tolerant universal quantum gate operations

Lukas Postler, Sascha Heuβen, Ivan Pogorelov, Manuel Rispler, Thomas Feldker, Michael Meth, Christian D Marciniak, Roman Stricker, Martin Ringbauer, Rainer Blatt, et al. Demonstration of fault-tolerant universal quantum gate operations. Nature, 605(7911):675–680, 2022

work page 2022
[23]

Quantum error correction below the surface code threshold

Google Quantum AI. Quantum error correction below the surface code threshold. Nature, 638(8052):920–926, 2025

work page 2025
[24]

Almost-linear time decoding algorithm for topological codes

Nicolas Delfosse and Naomi H Nickerson. Almost-linear time decoding algorithm for topological codes. Quantum, 5:595, 2021

work page 2021
[25]

Blossom v: a new implementation of a min- imum cost perfect matching algorithm

Vladimir Kolmogorov. Blossom v: a new implementation of a min- imum cost perfect matching algorithm. Mathematical Programming Computation, 1:43–67, 2009

work page 2009
[26]

Generalized belief propagation algo- rithms for decoding of surface codes

Josias Old and Manuel Rispler. Generalized belief propagation algo- rithms for decoding of surface codes. Quantum, 7:1037, 2023

work page 2023
[27]

Quantum low-density parity-check codes

Nikolas P Breuckmann and Jens Niklas Eberhardt. Quantum low-density parity-check codes. Prx Quantum, 2(4):040101, 2021

work page 2021
[28]

De- coding surface code with a distributed neural network–based decoder

Savvas Varsamopoulos, Koen Bertels, and Carmen G Almudever. De- coding surface code with a distributed neural network–based decoder. Quantum Machine Intelligence, 2:1–12, 2020

work page 2020
[29]

Neural network decoder for near-term surface-code experiments

Boris M Varbanov, Marc Serra-Peralta, David Byfield, and Barbara M Terhal. Neural network decoder for near-term surface-code experiments. Physical Review Research, 7(1):013029, 2025

work page 2025
[30]

Learning to decode the surface code with a recurrent, transformer-based neural network

Johannes Bausch, Andrew W Senior, Francisco JH Heras, Thomas Edlich, Alex Davies, Michael Newman, Cody Jones, Kevin Satzinger, Murphy Yuezhen Niu, Sam Blackwell, et al. Learning to decode the surface code with a recurrent, transformer-based neural network. arXiv preprint arXiv:2310.05900, 2023

work page arXiv 2023
[31]

Fpga-based distributed union-find decoder for surface codes

Namitha Liyanage, Yue Wu, Siona Tagare, and Lin Zhong. Fpga-based distributed union-find decoder for surface codes. IEEE Transactions on Quantum Engineering, 2024

work page 2024
[32]

Qubic: An open-source fpga-based control and measurement system for superconducting quantum information processors

Yilun Xu, Gang Huang, Jan Balewski, Ravi Naik, Alexis Morvan, Bradley Mitchell, Kasra Nowrouzi, David I Santiago, and Irfan Siddiqi. Qubic: An open-source fpga-based control and measurement system for superconducting quantum information processors. IEEE Transactions on Quantum Engineering, 2:1–11, 2021

work page 2021
[33]

Pymatching: A python package for decoding quantum codes with minimum-weight perfect matching

Oscar Higgott. Pymatching: A python package for decoding quantum codes with minimum-weight perfect matching. ACM Transactions on Quantum Computing, 3(3):1–16, 2022

work page 2022
[34]

Improving post-training structured pruning via two-stage reconstruction

Chenhao Li, Lin Li, Zhibin Zhang, Qiang Qiu, Jiafeng Guo, and Xueqi Cheng. Improving post-training structured pruning via two-stage reconstruction. Expert Systems with Applications, page 128930, 2025

work page 2025
[35]

Slimgpt: Layer-wise structured pruning for large language models

Gui Ling, Ziyang Wang, and Qingwen Liu. Slimgpt: Layer-wise structured pruning for large language models. Advances in Neural Information Processing Systems, 37:107112–107137, 2024

work page 2024
[36]

Fluctuation-based adaptive structured pruning for large language models

Yongqi An, Xu Zhao, Tao Yu, Ming Tang, and Jinqiao Wang. Fluctuation-based adaptive structured pruning for large language models. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 38, pages 10865–10873, 2024

work page 2024
[37]

Otov2: Automatic, generic, user-friendly

Tianyi Chen, Luming Liang, Tianyu Ding, Zhihui Zhu, and Ilya Zharkov. Otov2: Automatic, generic, user-friendly. arXiv preprint arXiv:2303.06862, 2023

work page arXiv 2023
[38]

A signal propagation perspective for pruning neural networks at initialization

Namhoon Lee, Thalaiyasingam Ajanthan, Stephen Gould, and Philip HS Torr. A signal propagation perspective for pruning neural networks at initialization. arXiv preprint arXiv:1906.06307, 2019

work page arXiv 1906
[39]

An automatic network structure search via channel pruning for accelerating human activity inference on mobile devices

Junjie Liang, Lei Zhang, Can Bu, Dongzhou Cheng, Hao Wu, and Aiguo Song. An automatic network structure search via channel pruning for accelerating human activity inference on mobile devices. Expert Systems with Applications, 238:122180, 2024

work page 2024
[40]

Mahoney, and Kurt Keutzer

Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference, 2021

work page 2021
[41]

Synthesis of control circuits in folded pipelined dsp architectures

Keshab K Parhi, C-Y Wang, and Andrew P Brown. Synthesis of control circuits in folded pipelined dsp architectures. IEEE Journal of Solid-State Circuits, 27(1):29–43, 2002

work page 2002
[42]

Stim: a fast stabilizer circuit simulator

Craig Gidney. Stim: a fast stabilizer circuit simulator. Quantum, 5:497, July 2021

work page 2021
[43]

Eod: Enabling low latency gnn inference via near-memory concate- nate aggregation

Taehwan Kim, Yunki Han, Seohye Ha, Jiwan Kim, and Lee-Sup Kim. Eod: Enabling low latency gnn inference via near-memory concate- nate aggregation. In Proceedings of the 52nd Annual International Symposium on Computer Architecture, pages 1125–1139, 2025

work page 2025
[44]

Omega: A low-latency gnn serving system for large graphs

Geon-Woo Kim, Donghyun Kim, Jeongyoon Moon, Henry Liu, Taran- num Khan, Anand Iyer, Daehyeok Kim, and Aditya Akella. Omega: A low-latency gnn serving system for large graphs. arXiv preprint arXiv:2501.08547, 2025

work page arXiv 2025
[45]

Model-architecture co-design for high performance temporal gnn inference on fpga

Hongkuan Zhou, Bingyi Zhang, Rajgopal Kannan, Viktor Prasanna, and Carl Busart. Model-architecture co-design for high performance temporal gnn inference on fpga. In 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), pages 1108–1117. IEEE, 2022

work page 2022
[46]

Low-latency mini- batch gnn inference on cpu-fpga heterogeneous platform

Bingyi Zhang, Hanqing Zeng, and Viktor Prasanna. Low-latency mini- batch gnn inference on cpu-fpga heterogeneous platform. In 2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC), pages 11–21. IEEE, 2022

work page 2022
[47]

Gnnbuilder: An automated frame- work for generic graph neural network accelerator generation, simu- lation, and optimization

Stefan Abi-Karam and Cong Hao. Gnnbuilder: An automated frame- work for generic graph neural network accelerator generation, simu- lation, and optimization. In 2023 33rd International Conference on Field-Programmable Logic and Applications (FPL), pages 212–218. IEEE, 2023

work page 2023
[48]

Ll-gnn: Low latency graph neural networks on fpgas for high energy physics

Zhiqiang Que, Hongxiang Fan, Marcus Loo, He Li, Michaela Blott, Maurizio Pierini, Alexander Tapper, and Wayne Luk. Ll-gnn: Low latency graph neural networks on fpgas for high energy physics. ACM Transactions on Embedded Computing Systems, 23(2):1–28, 2024

work page 2024