pith. sign in

arxiv: 2508.15468 · v2 · submitted 2025-08-21 · ✦ hep-ex · cs.AR· cs.LG

JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs

Pith reviewed 2026-05-18 21:54 UTC · model grok-4.3

classification ✦ hep-ex cs.ARcs.LG
keywords graph neural networksjet taggingFPGAhigh-luminosity LHCtrigger systemsquantization-aware traininglinear complexityinteraction networks
0
0 comments X

The pith

JEDI-linear replaces explicit pairwise interactions in GNNs with shared transformations and global aggregation to reach linear complexity for jet tagging on FPGAs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces JEDI-linear, a graph neural network architecture designed for jet tagging in high-energy physics experiments at the HL-LHC. It approximates the performance of full Interaction Networks by using shared transformations and global aggregation instead of computing every pairwise interaction, which reduces the complexity from quadratic to linear. Fine-grained quantization-aware training with per-parameter bitwidth choices and multiplier-free operations via distributed arithmetic further adapt the model for FPGA hardware. A sympathetic reader would care because the CMS Level-1 trigger must select events in real time under strict latency and resource limits that conventional GNNs exceed. The resulting implementation runs below 60 ns while using fewer resources and achieving higher accuracy than prior designs.

Core claim

JEDI-linear approximates full Interaction Networks for jet tagging by replacing explicit pairwise interactions with shared transformations and global aggregation, yielding linear computational complexity. When combined with per-parameter quantization-aware training and distributed arithmetic for multiplier-free multiply-accumulate operations, the model maps efficiently onto FPGAs. This produces an implementation with 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, up to 6.2 times lower LUT usage, higher accuracy, and zero DSP blocks compared with state-of-the-art GNNs, satisfying the requirements for the HL-LHC CMS Level-1 trigger.

What carries the argument

JEDI-linear architecture, which substitutes explicit pairwise interactions with shared transformations and global aggregation to achieve linear complexity while preserving jet-tagging accuracy under FPGA constraints.

If this is right

  • The FPGA implementation meets HL-LHC CMS Level-1 trigger latency requirements while delivering higher model accuracy than prior GNN designs.
  • Latency is reduced by a factor of 3.7 to 11.5 relative to state-of-the-art GNNs on FPGAs.
  • Initiation interval is lowered by up to 150 times and LUT usage by up to 6.2 times.
  • No DSP blocks are required, freeing resources for other trigger logic.
  • Open-sourced templates support reproducibility for similar real-time scientific inference tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shared-transformation approach may transfer to other real-time GNN tasks in particle physics or similar domains where quadratic pairwise costs are prohibitive.
  • Multiplier-free distributed arithmetic could be combined with the same quantization strategy to reduce power in additional embedded inference settings.
  • Per-parameter bitwidth optimization offers a general lever for trading accuracy against hardware cost that could be tested on non-physics edge-AI models.

Load-bearing premise

The linear-complexity approximation that replaces explicit pairwise interactions with shared transformations and global aggregation preserves jet-tagging accuracy under the quantization and hardware constraints of the FPGA implementation.

What would settle it

A side-by-side evaluation of the full Interaction Network and JEDI-linear on identical jet-tagging data, using the same quantization scheme and training procedure, that shows a substantial accuracy drop for JEDI-linear would falsify the claim that the approximation preserves performance.

Figures

Figures reproduced from arXiv: 2508.15468 by Alexander Tapper, Arianna Cox, Chang Sun, Christopher Brown, Emyr Clement, Katerina Karakoulaki, Lauri Laatu, Maria Spiropulu, Sudarshan Paramesvaran, Wayne Luk, Zhiqiang Que.

Figure 1
Figure 1. Figure 1: Schematic representation of various jet types in particle physics [9]. [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Conventional interaction information gathering in JEDI-net. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The proposed interaction information gathering for JEDI-net. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: A sketch of the CMS Level-1 Trigger system in the HL-LHC, adapted [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Model accuracy across different numbers of particles per jet. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Bitwidth of trained JEDI-linear models with datasets of 64/128 [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Latency vs. number of particles per jet. Our JEDI-linear designs [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Comparison of the JEDI-linear model with MLP-Mixer on the 16 [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗
read the original abstract

Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider (HL-LHC). However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply. In this work, we propose JEDI-linear, a novel GNN architecture with linear computational complexity that eliminates explicit pairwise interactions by leveraging shared transformations and global aggregation. To further enhance hardware efficiency, we introduce fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic. Evaluation results show that our FPGA-based JEDI-linear achieves 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, and up to 6.2 times lower LUT usage compared to state-of-the-art GNN designs while also delivering higher model accuracy and eliminating the need for DSP blocks entirely. This is the first interaction-based GNN to achieve less than 60~ns latency and currently meets the requirements for use in the HL-LHC CMS Level-1 trigger system. This work advances the next-generation trigger systems by enabling accurate, scalable, and resource-efficient GNN inference in real-time environments. Our open-sourced templates will further support reproducibility and broader adoption across scientific applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces JEDI-linear, a novel GNN architecture for jet tagging that achieves linear computational complexity by replacing explicit pairwise interactions from Interaction Networks with shared transformations and global aggregation. It further applies fine-grained quantization-aware training with per-parameter bitwidth optimization and distributed arithmetic to enable multiplier-free operations on FPGAs. The central claims are that the resulting FPGA implementation delivers 3.7–11.5× lower latency, up to 150× lower initiation interval, up to 6.2× lower LUT usage, higher accuracy than prior GNN designs, eliminates DSP blocks entirely, and achieves <60 ns latency suitable for the HL-LHC CMS Level-1 trigger.

Significance. If the reported performance and accuracy numbers hold under rigorous validation, the work would be significant for high-energy physics by demonstrating the first interaction-based GNN that meets the strict latency and resource constraints of HL-LHC trigger systems while improving accuracy. The open-sourcing of implementation templates would further aid reproducibility and adoption of hardware-efficient GNNs in scientific computing.

major comments (2)
  1. [§4] §4 (Evaluation) and associated tables: The headline claims of higher model accuracy together with 3.7–11.5× latency reduction and no DSP usage rest on the untested assumption that the linear approximation (shared transforms + global aggregation) preserves the pairwise interaction features needed for jet tagging after per-parameter quantization. No ablation isolating the accuracy impact of this substitution versus a full Interaction Network on the identical dataset and task is reported, which is load-bearing for the central claim.
  2. [Table 2] Table 2 (or equivalent results table): The latency, initiation-interval, and resource comparisons lack explicit specification of the target FPGA device, synthesis tool chain, clock frequency, and exact baseline implementations (including their quantization schemes), preventing independent verification of the reported speed-ups and resource savings.
minor comments (2)
  1. [Abstract] The abstract states quantitative improvements but supplies no dataset name, jet-tagging task definition, or number of particles per graph; adding these would improve clarity without altering the technical content.
  2. [§3] Notation for the global aggregation operation and the per-parameter bitwidth assignment should be defined once in §3 before being used in the hardware mapping description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback. We address the major comments point by point below, agreeing where revisions are warranted to strengthen the manuscript.

read point-by-point responses
  1. Referee: [§4] §4 (Evaluation) and associated tables: The headline claims of higher model accuracy together with 3.7–11.5× latency reduction and no DSP usage rest on the untested assumption that the linear approximation (shared transforms + global aggregation) preserves the pairwise interaction features needed for jet tagging after per-parameter quantization. No ablation isolating the accuracy impact of this substitution versus a full Interaction Network on the identical dataset and task is reported, which is load-bearing for the central claim.

    Authors: We agree that an explicit ablation would provide stronger support for the central claim. In the revised manuscript we will add a dedicated comparison in §4 (with an accompanying table) that evaluates JEDI-linear against a full Interaction Network on the identical jet-tagging dataset and task, reporting accuracy both before and after the per-parameter quantization step. This will quantify the accuracy impact of the shared-transform plus global-aggregation substitution and confirm that essential pairwise features are retained while linear complexity is achieved. revision: yes

  2. Referee: [Table 2] Table 2 (or equivalent results table): The latency, initiation-interval, and resource comparisons lack explicit specification of the target FPGA device, synthesis tool chain, clock frequency, and exact baseline implementations (including their quantization schemes), preventing independent verification of the reported speed-ups and resource savings.

    Authors: We acknowledge the need for full reproducibility details. The revised manuscript will explicitly state in the caption and body of Table 2 (and in §4) the target device (Xilinx Virtex UltraScale+), synthesis tool chain and version, target clock frequency for timing closure, and the precise quantization schemes used for each baseline implementation. These clarifications will enable independent verification of the reported latency, initiation-interval, and resource figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical hardware evaluation

full rationale

The paper introduces JEDI-linear as an architectural design choice that replaces explicit pairwise interactions with shared transformations and global aggregation to achieve linear complexity, then validates this via FPGA implementation, quantization-aware training, and direct comparisons to prior GNN designs on latency, resource usage, and accuracy. No equations, derivations, or predictions are shown that reduce reported results to quantities fitted or defined internally by the paper itself. The central performance claims (sub-60 ns latency, higher accuracy, no DSP blocks) are supported by external hardware synthesis results and benchmarks against state-of-the-art methods, making the evaluation self-contained against independent implementation data rather than self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The work relies on standard GNN modeling assumptions for jet tagging and conventional FPGA design practices.

pith-pipeline@v0.9.0 · 5828 in / 1236 out tokens · 44716 ms · 2026-05-18T21:54:10.204173+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging

    hep-ex 2026-05 unverdicted novelty 7.0

    PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.

  2. HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference

    cs.AR 2026-04 unverdicted novelty 6.0

    HGQ-LUT delivers a practical LUT-aware training framework with new tensor-based layers, heterogeneous quantization, and a resource surrogate that automates accuracy-efficiency trade-offs for FPGA DNN inference.

  3. E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features

    hep-ph 2025-12 conditional novelty 5.0

    E-PCN reaches 94.67% macro-accuracy on 10-class jet tagging by weighting graphs with angular separation, transverse momentum, momentum fraction, and invariant mass, with Grad-CAM showing the first two account for 76% ...

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 3 Pith papers · 1 internal anchor

  1. [1]

    Automatic heterogeneous quantization of deep neural networks for low-latency in- ference on the edge for particle detectors,

    C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T. K. Aarrestad, V . Loncar, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency in- ference on the edge for particle detectors,” Nature Machine Intelligence, vol. 3, no. 8, pp. 675–686, 2021

  2. [2]

    Fast inference of deep neural networks in FPGAs for particle physics,

    J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran et al. , “Fast inference of deep neural networks in FPGAs for particle physics,” Journal of Instrumentation, vol. 13, no. 07, p. P07027, 2018

  3. [3]

    JEDI-net: a jet identification algorithm based on interaction networks,

    E. A. Moreno, O. Cerri, J. M. Duarte, H. B. Newman, T. Q. Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu, and J.-R. Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” The European Physical Journal C , vol. 80, no. 1, pp. 1–15, 2020

  4. [4]

    LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,

    Z. Que, H. Fan, M. Loo, H. Li, M. Blott, M. Pierini, A. Tapper, and W. Luk, “LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,” ACM Transactions on Embedded Computing Systems, vol. 23, no. 2, pp. 1–28, 2024

  5. [5]

    System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,

    S. Summers and the CMS Collaboration, “System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,” Pre- sented at TWEPP 2024, Glasgow, UK, 2024, for the CMS Collaboration

  6. [6]

    Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,

    S. Summers, I. Bestintzanos, and G. Petrucciani, “Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,” 2023. [Online]. Available: https://arxiv.org/abs/2310.08062

  7. [7]

    Ultrafast jet classi- fication at the HL-LHC,

    P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini et al., “Ultrafast jet classi- fication at the HL-LHC,” Machine Learning: Science and Technology , vol. 5, no. 3, p. 035017, 2024

  8. [8]

    Deep sets,

    M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,”Advances in neural information processing systems, vol. 30, 2017

  9. [9]

    Interaction networks for the identification of boosted H → bb decays,

    E. A. Moreno, T. Q. Nguyen, J.-R. Vlimant, O. Cerri, H. B. Newman, A. Periwal, M. Spiropulu, J. M. Duarte, and M. Pierini, “Interaction networks for the identification of boosted H → bb decays,” Physical Review D, vol. 102, no. 1, p. 012010, 2020

  10. [10]

    Jet tagging via particle clouds,

    H. Qu and L. Gouskos, “Jet tagging via particle clouds,” Physical Review D, vol. 101, no. 5, p. 056019, 2020

  11. [11]

    Particle transformer for jet tagging,

    H. Qu, C. Li, and S. Qian, “Particle transformer for jet tagging,” in International Conference on Machine Learning . PMLR, 2022, pp. 18 281–18 292

  12. [12]

    Interpreting Transformers for Jet Tagging,

    A. Wang, A. Gandrakota, J. Ngadiuba, V . Sahu, P. Bhatnagar, E. E. Khoda, and J. Duarte, “Interpreting Transformers for Jet Tagging,”arXiv preprint arXiv:2412.03673, 2024

  13. [13]

    Reconfigurable acceleration of graph neural networks for jet identification in particle physics,

    Z. Que, M. Loo, and W. Luk, “Reconfigurable acceleration of graph neural networks for jet identification in particle physics,” in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 202–205

  14. [14]

    Optimizing graph neural networks for jet tagging in particle physics on FPGAs,

    Z. Que, M. Loo, H. Fan, M. Pierini, A. Tapper, and W. Luk, “Optimizing graph neural networks for jet tagging in particle physics on FPGAs,” in 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2022, pp. 327–333

  15. [15]

    ParticleNet for Jet Tagging in Particle Physics on FPGA,

    Y . Zhang, Y . Cheng, and Y . Gao, “ParticleNet for Jet Tagging in Particle Physics on FPGA,” in BenchCouncil International Symposium on Intelligent Computers, Algorithms, and Applications. Springer, 2023, pp. 244–253

  16. [16]

    Gradient-based automatic mixed precision quantization for neural net- works on-chip,

    C. Sun, T. K. ˚Arrestad, V . Loncar, J. Ngadiuba, and M. Spiropulu, “Gradient-based automatic mixed precision quantization for neural net- works on-chip,” arXiv preprint arXiv:2405.00645 , 2024

  17. [17]

    da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

    C. Sun, Z. Que, V . Loncar, W. Luk, and M. Spiropulu, “da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs,”arXiv preprint arXiv:2507.04535, 2025

  18. [18]

    The Phase-2 Upgrade of the CMS Level-1 Trigger,

    The CMS Collaboration, “The Phase-2 Upgrade of the CMS Level-1 Trigger,” CERN, Geneva, Tech. Rep., 2020, final version. [Online]. Available: https://cds.cern.ch/record/2714892

  19. [19]

    Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,

    J. Ngadiuba, V . Loncar, M. Pierini, S. Summers, G. Di Guglielmo, J. Duarte, P. Harris, D. Rankin, S. Jindariani, M. Liuet al., “Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,” Machine Learning: Science and Technology , vol. 2, no. 1, p. 015001, 2020

  20. [20]

    FINN: A framework for fast, scalable binarized neural network inference,

    Y . Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “FINN: A framework for fast, scalable binarized neural network inference,” in Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays , 2017, pp. 65–74

  21. [21]

    A real-time object detection accelerator with com- pressed SSDLite on FPGA,

    H. Fan, S. Liu, M. Ferianc, H.-C. Ng, Z. Que, S. Liu, X. Niu, and W. Luk, “A real-time object detection accelerator with com- pressed SSDLite on FPGA,” in 2018 International conference on field- programmable technology (FPT). IEEE, 2018, pp. 14–21

  22. [22]

    Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,

    Z. Que, H. Nakahara, E. Nurvitadhi, A. Boutros, H. Fan, C. Zeng, J. Meng, K. H. Tsoi, X. Niu, and W. Luk, “Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,”IEEE Trans- actions on Very Large Scale Integration (VLSI) Systems , vol. 30, no. 2, pp. 227–237, 2021

  23. [23]

    Verilator,

    W. Snyder, P. Wasson, D. Galbi, and et al, “Verilator,” Veripool, repository: https://github.com/verilator/verilator. [Online]. Available: https://verilator.org

  24. [24]

    HLS4ML LHC jet dataset (150 particles)

    M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, “Hls4ml lhc jet dataset (150 particles),” Jan. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3602260

  25. [25]

    The importance of calorimetry for highly-boosted jet substructure,

    E. Coleman, M. Freytsis, A. Hinzmann, M. Narain, J. Thaler, N. Tran, and C. Vernieri, “The importance of calorimetry for highly-boosted jet substructure,” Journal of Instrumentation , vol. 13, no. 01, p. T01003, jan 2018. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/13/ 01/T01003

  26. [26]

    Fast jet tagging with mlp-mixers on fpgas,

    C. Sun, J. Ngadiuba, M. Pierini, and M. Spiropulu, “Fast jet tagging with mlp-mixers on fpgas,” Machine Learning: Science and 9 Technology, 2025. [Online]. Available: http://iopscience.iop.org/article/ 10.1088/2632-2153/adf596

  27. [27]

    Machine Learning: Science and Technology , abstract =

    J. Weitz, D. Demler, L. McDermott, N. Tran, and J. Duarte, “Neural architecture codesign for fast physics applications,” Machine Learning: Science and Technology , vol. 6, no. 3, p. 035009, jul 2025. [Online]. Available: https://dx.doi.org/10.1088/2632-2153/adede1

  28. [28]

    Ultrafast jet classification on FPGAs for HL-LHC

    P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini, P. Rincke, A. Seksaria, S. Summers, A. Sznajder, A. Tapper, and T. K. ˚Arrestad, “Ultrafast jet classification at the hl-lhc,” Machine Learning: Science and Technology, vol. 5, no. 3, p. 035017, Jul. 2024. [Online]. Available: http://dx.doi.org/...