JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs
Pith reviewed 2026-05-18 21:54 UTC · model grok-4.3
The pith
JEDI-linear replaces explicit pairwise interactions in GNNs with shared transformations and global aggregation to reach linear complexity for jet tagging on FPGAs.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
JEDI-linear approximates full Interaction Networks for jet tagging by replacing explicit pairwise interactions with shared transformations and global aggregation, yielding linear computational complexity. When combined with per-parameter quantization-aware training and distributed arithmetic for multiplier-free multiply-accumulate operations, the model maps efficiently onto FPGAs. This produces an implementation with 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, up to 6.2 times lower LUT usage, higher accuracy, and zero DSP blocks compared with state-of-the-art GNNs, satisfying the requirements for the HL-LHC CMS Level-1 trigger.
What carries the argument
JEDI-linear architecture, which substitutes explicit pairwise interactions with shared transformations and global aggregation to achieve linear complexity while preserving jet-tagging accuracy under FPGA constraints.
If this is right
- The FPGA implementation meets HL-LHC CMS Level-1 trigger latency requirements while delivering higher model accuracy than prior GNN designs.
- Latency is reduced by a factor of 3.7 to 11.5 relative to state-of-the-art GNNs on FPGAs.
- Initiation interval is lowered by up to 150 times and LUT usage by up to 6.2 times.
- No DSP blocks are required, freeing resources for other trigger logic.
- Open-sourced templates support reproducibility for similar real-time scientific inference tasks.
Where Pith is reading between the lines
- The shared-transformation approach may transfer to other real-time GNN tasks in particle physics or similar domains where quadratic pairwise costs are prohibitive.
- Multiplier-free distributed arithmetic could be combined with the same quantization strategy to reduce power in additional embedded inference settings.
- Per-parameter bitwidth optimization offers a general lever for trading accuracy against hardware cost that could be tested on non-physics edge-AI models.
Load-bearing premise
The linear-complexity approximation that replaces explicit pairwise interactions with shared transformations and global aggregation preserves jet-tagging accuracy under the quantization and hardware constraints of the FPGA implementation.
What would settle it
A side-by-side evaluation of the full Interaction Network and JEDI-linear on identical jet-tagging data, using the same quantization scheme and training procedure, that shows a substantial accuracy drop for JEDI-linear would falsify the claim that the approximation preserves performance.
Figures
read the original abstract
Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider (HL-LHC). However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply. In this work, we propose JEDI-linear, a novel GNN architecture with linear computational complexity that eliminates explicit pairwise interactions by leveraging shared transformations and global aggregation. To further enhance hardware efficiency, we introduce fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic. Evaluation results show that our FPGA-based JEDI-linear achieves 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, and up to 6.2 times lower LUT usage compared to state-of-the-art GNN designs while also delivering higher model accuracy and eliminating the need for DSP blocks entirely. This is the first interaction-based GNN to achieve less than 60~ns latency and currently meets the requirements for use in the HL-LHC CMS Level-1 trigger system. This work advances the next-generation trigger systems by enabling accurate, scalable, and resource-efficient GNN inference in real-time environments. Our open-sourced templates will further support reproducibility and broader adoption across scientific applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces JEDI-linear, a novel GNN architecture for jet tagging that achieves linear computational complexity by replacing explicit pairwise interactions from Interaction Networks with shared transformations and global aggregation. It further applies fine-grained quantization-aware training with per-parameter bitwidth optimization and distributed arithmetic to enable multiplier-free operations on FPGAs. The central claims are that the resulting FPGA implementation delivers 3.7–11.5× lower latency, up to 150× lower initiation interval, up to 6.2× lower LUT usage, higher accuracy than prior GNN designs, eliminates DSP blocks entirely, and achieves <60 ns latency suitable for the HL-LHC CMS Level-1 trigger.
Significance. If the reported performance and accuracy numbers hold under rigorous validation, the work would be significant for high-energy physics by demonstrating the first interaction-based GNN that meets the strict latency and resource constraints of HL-LHC trigger systems while improving accuracy. The open-sourcing of implementation templates would further aid reproducibility and adoption of hardware-efficient GNNs in scientific computing.
major comments (2)
- [§4] §4 (Evaluation) and associated tables: The headline claims of higher model accuracy together with 3.7–11.5× latency reduction and no DSP usage rest on the untested assumption that the linear approximation (shared transforms + global aggregation) preserves the pairwise interaction features needed for jet tagging after per-parameter quantization. No ablation isolating the accuracy impact of this substitution versus a full Interaction Network on the identical dataset and task is reported, which is load-bearing for the central claim.
- [Table 2] Table 2 (or equivalent results table): The latency, initiation-interval, and resource comparisons lack explicit specification of the target FPGA device, synthesis tool chain, clock frequency, and exact baseline implementations (including their quantization schemes), preventing independent verification of the reported speed-ups and resource savings.
minor comments (2)
- [Abstract] The abstract states quantitative improvements but supplies no dataset name, jet-tagging task definition, or number of particles per graph; adding these would improve clarity without altering the technical content.
- [§3] Notation for the global aggregation operation and the per-parameter bitwidth assignment should be defined once in §3 before being used in the hardware mapping description.
Simulated Author's Rebuttal
We thank the referee for their thorough review and constructive feedback. We address the major comments point by point below, agreeing where revisions are warranted to strengthen the manuscript.
read point-by-point responses
-
Referee: [§4] §4 (Evaluation) and associated tables: The headline claims of higher model accuracy together with 3.7–11.5× latency reduction and no DSP usage rest on the untested assumption that the linear approximation (shared transforms + global aggregation) preserves the pairwise interaction features needed for jet tagging after per-parameter quantization. No ablation isolating the accuracy impact of this substitution versus a full Interaction Network on the identical dataset and task is reported, which is load-bearing for the central claim.
Authors: We agree that an explicit ablation would provide stronger support for the central claim. In the revised manuscript we will add a dedicated comparison in §4 (with an accompanying table) that evaluates JEDI-linear against a full Interaction Network on the identical jet-tagging dataset and task, reporting accuracy both before and after the per-parameter quantization step. This will quantify the accuracy impact of the shared-transform plus global-aggregation substitution and confirm that essential pairwise features are retained while linear complexity is achieved. revision: yes
-
Referee: [Table 2] Table 2 (or equivalent results table): The latency, initiation-interval, and resource comparisons lack explicit specification of the target FPGA device, synthesis tool chain, clock frequency, and exact baseline implementations (including their quantization schemes), preventing independent verification of the reported speed-ups and resource savings.
Authors: We acknowledge the need for full reproducibility details. The revised manuscript will explicitly state in the caption and body of Table 2 (and in §4) the target device (Xilinx Virtex UltraScale+), synthesis tool chain and version, target clock frequency for timing closure, and the precise quantization schemes used for each baseline implementation. These clarifications will enable independent verification of the reported latency, initiation-interval, and resource figures. revision: yes
Circularity Check
No significant circularity; claims rest on empirical hardware evaluation
full rationale
The paper introduces JEDI-linear as an architectural design choice that replaces explicit pairwise interactions with shared transformations and global aggregation to achieve linear complexity, then validates this via FPGA implementation, quantization-aware training, and direct comparisons to prior GNN designs on latency, resource usage, and accuracy. No equations, derivations, or predictions are shown that reduce reported results to quantities fitted or defined internally by the paper itself. The central performance claims (sub-60 ns latency, higher accuracy, no DSP blocks) are supported by external hardware synthesis results and benchmarks against state-of-the-art methods, making the evaluation self-contained against independent implementation data rather than self-referential.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we require fR to be an affine transformation... rewrite the interaction embedding... linear complexity with respect to NO... global context vector summarizing the features of all particles
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging
PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.
-
HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference
HGQ-LUT delivers a practical LUT-aware training framework with new tensor-based layers, heterogeneous quantization, and a resource surrogate that automates accuracy-efficiency trade-offs for FPGA DNN inference.
-
E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features
E-PCN reaches 94.67% macro-accuracy on 10-class jet tagging by weighting graphs with angular separation, transverse momentum, momentum fraction, and invariant mass, with Grad-CAM showing the first two account for 76% ...
Reference graph
Works this paper leans on
-
[1]
C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T. K. Aarrestad, V . Loncar, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency in- ference on the edge for particle detectors,” Nature Machine Intelligence, vol. 3, no. 8, pp. 675–686, 2021
work page 2021
-
[2]
Fast inference of deep neural networks in FPGAs for particle physics,
J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran et al. , “Fast inference of deep neural networks in FPGAs for particle physics,” Journal of Instrumentation, vol. 13, no. 07, p. P07027, 2018
work page 2018
-
[3]
JEDI-net: a jet identification algorithm based on interaction networks,
E. A. Moreno, O. Cerri, J. M. Duarte, H. B. Newman, T. Q. Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu, and J.-R. Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” The European Physical Journal C , vol. 80, no. 1, pp. 1–15, 2020
work page 2020
-
[4]
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,
Z. Que, H. Fan, M. Loo, H. Li, M. Blott, M. Pierini, A. Tapper, and W. Luk, “LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,” ACM Transactions on Embedded Computing Systems, vol. 23, no. 2, pp. 1–28, 2024
work page 2024
-
[5]
System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,
S. Summers and the CMS Collaboration, “System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,” Pre- sented at TWEPP 2024, Glasgow, UK, 2024, for the CMS Collaboration
work page 2024
-
[6]
Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,
S. Summers, I. Bestintzanos, and G. Petrucciani, “Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,” 2023. [Online]. Available: https://arxiv.org/abs/2310.08062
-
[7]
Ultrafast jet classi- fication at the HL-LHC,
P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini et al., “Ultrafast jet classi- fication at the HL-LHC,” Machine Learning: Science and Technology , vol. 5, no. 3, p. 035017, 2024
work page 2024
-
[8]
M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,”Advances in neural information processing systems, vol. 30, 2017
work page 2017
-
[9]
Interaction networks for the identification of boosted H → bb decays,
E. A. Moreno, T. Q. Nguyen, J.-R. Vlimant, O. Cerri, H. B. Newman, A. Periwal, M. Spiropulu, J. M. Duarte, and M. Pierini, “Interaction networks for the identification of boosted H → bb decays,” Physical Review D, vol. 102, no. 1, p. 012010, 2020
work page 2020
-
[10]
Jet tagging via particle clouds,
H. Qu and L. Gouskos, “Jet tagging via particle clouds,” Physical Review D, vol. 101, no. 5, p. 056019, 2020
work page 2020
-
[11]
Particle transformer for jet tagging,
H. Qu, C. Li, and S. Qian, “Particle transformer for jet tagging,” in International Conference on Machine Learning . PMLR, 2022, pp. 18 281–18 292
work page 2022
-
[12]
Interpreting Transformers for Jet Tagging,
A. Wang, A. Gandrakota, J. Ngadiuba, V . Sahu, P. Bhatnagar, E. E. Khoda, and J. Duarte, “Interpreting Transformers for Jet Tagging,”arXiv preprint arXiv:2412.03673, 2024
-
[13]
Reconfigurable acceleration of graph neural networks for jet identification in particle physics,
Z. Que, M. Loo, and W. Luk, “Reconfigurable acceleration of graph neural networks for jet identification in particle physics,” in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 202–205
work page 2022
-
[14]
Optimizing graph neural networks for jet tagging in particle physics on FPGAs,
Z. Que, M. Loo, H. Fan, M. Pierini, A. Tapper, and W. Luk, “Optimizing graph neural networks for jet tagging in particle physics on FPGAs,” in 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2022, pp. 327–333
work page 2022
-
[15]
ParticleNet for Jet Tagging in Particle Physics on FPGA,
Y . Zhang, Y . Cheng, and Y . Gao, “ParticleNet for Jet Tagging in Particle Physics on FPGA,” in BenchCouncil International Symposium on Intelligent Computers, Algorithms, and Applications. Springer, 2023, pp. 244–253
work page 2023
-
[16]
Gradient-based automatic mixed precision quantization for neural net- works on-chip,
C. Sun, T. K. ˚Arrestad, V . Loncar, J. Ngadiuba, and M. Spiropulu, “Gradient-based automatic mixed precision quantization for neural net- works on-chip,” arXiv preprint arXiv:2405.00645 , 2024
-
[17]
da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs
C. Sun, Z. Que, V . Loncar, W. Luk, and M. Spiropulu, “da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs,”arXiv preprint arXiv:2507.04535, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
The Phase-2 Upgrade of the CMS Level-1 Trigger,
The CMS Collaboration, “The Phase-2 Upgrade of the CMS Level-1 Trigger,” CERN, Geneva, Tech. Rep., 2020, final version. [Online]. Available: https://cds.cern.ch/record/2714892
-
[19]
Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,
J. Ngadiuba, V . Loncar, M. Pierini, S. Summers, G. Di Guglielmo, J. Duarte, P. Harris, D. Rankin, S. Jindariani, M. Liuet al., “Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,” Machine Learning: Science and Technology , vol. 2, no. 1, p. 015001, 2020
work page 2020
-
[20]
FINN: A framework for fast, scalable binarized neural network inference,
Y . Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “FINN: A framework for fast, scalable binarized neural network inference,” in Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays , 2017, pp. 65–74
work page 2017
-
[21]
A real-time object detection accelerator with com- pressed SSDLite on FPGA,
H. Fan, S. Liu, M. Ferianc, H.-C. Ng, Z. Que, S. Liu, X. Niu, and W. Luk, “A real-time object detection accelerator with com- pressed SSDLite on FPGA,” in 2018 International conference on field- programmable technology (FPT). IEEE, 2018, pp. 14–21
work page 2018
-
[22]
Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,
Z. Que, H. Nakahara, E. Nurvitadhi, A. Boutros, H. Fan, C. Zeng, J. Meng, K. H. Tsoi, X. Niu, and W. Luk, “Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,”IEEE Trans- actions on Very Large Scale Integration (VLSI) Systems , vol. 30, no. 2, pp. 227–237, 2021
work page 2021
-
[23]
W. Snyder, P. Wasson, D. Galbi, and et al, “Verilator,” Veripool, repository: https://github.com/verilator/verilator. [Online]. Available: https://verilator.org
-
[24]
HLS4ML LHC jet dataset (150 particles)
M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, “Hls4ml lhc jet dataset (150 particles),” Jan. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3602260
-
[25]
The importance of calorimetry for highly-boosted jet substructure,
E. Coleman, M. Freytsis, A. Hinzmann, M. Narain, J. Thaler, N. Tran, and C. Vernieri, “The importance of calorimetry for highly-boosted jet substructure,” Journal of Instrumentation , vol. 13, no. 01, p. T01003, jan 2018. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/13/ 01/T01003
-
[26]
Fast jet tagging with mlp-mixers on fpgas,
C. Sun, J. Ngadiuba, M. Pierini, and M. Spiropulu, “Fast jet tagging with mlp-mixers on fpgas,” Machine Learning: Science and 9 Technology, 2025. [Online]. Available: http://iopscience.iop.org/article/ 10.1088/2632-2153/adf596
-
[27]
Machine Learning: Science and Technology , abstract =
J. Weitz, D. Demler, L. McDermott, N. Tran, and J. Duarte, “Neural architecture codesign for fast physics applications,” Machine Learning: Science and Technology , vol. 6, no. 3, p. 035009, jul 2025. [Online]. Available: https://dx.doi.org/10.1088/2632-2153/adede1
-
[28]
Ultrafast jet classification on FPGAs for HL-LHC
P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini, P. Rincke, A. Seksaria, S. Summers, A. Sznajder, A. Tapper, and T. K. ˚Arrestad, “Ultrafast jet classification at the hl-lhc,” Machine Learning: Science and Technology, vol. 5, no. 3, p. 035017, Jul. 2024. [Online]. Available: http://dx.doi.org/...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.