JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs

Alexander Tapper; Arianna Cox; Chang Sun; Christopher Brown; Emyr Clement; Katerina Karakoulaki; Lauri Laatu; Maria Spiropulu; Sudarshan Paramesvaran; Wayne Luk

arxiv: 2508.15468 · v2 · submitted 2025-08-21 · ✦ hep-ex · cs.AR· cs.LG

JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs

Zhiqiang Que , Chang Sun , Sudarshan Paramesvaran , Emyr Clement , Katerina Karakoulaki , Christopher Brown , Lauri Laatu , Arianna Cox

show 3 more authors

Alexander Tapper Wayne Luk Maria Spiropulu

This is my paper

Pith reviewed 2026-05-18 21:54 UTC · model grok-4.3

classification ✦ hep-ex cs.ARcs.LG

keywords graph neural networksjet taggingFPGAhigh-luminosity LHCtrigger systemsquantization-aware traininglinear complexityinteraction networks

0 comments

The pith

JEDI-linear replaces explicit pairwise interactions in GNNs with shared transformations and global aggregation to reach linear complexity for jet tagging on FPGAs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces JEDI-linear, a graph neural network architecture designed for jet tagging in high-energy physics experiments at the HL-LHC. It approximates the performance of full Interaction Networks by using shared transformations and global aggregation instead of computing every pairwise interaction, which reduces the complexity from quadratic to linear. Fine-grained quantization-aware training with per-parameter bitwidth choices and multiplier-free operations via distributed arithmetic further adapt the model for FPGA hardware. A sympathetic reader would care because the CMS Level-1 trigger must select events in real time under strict latency and resource limits that conventional GNNs exceed. The resulting implementation runs below 60 ns while using fewer resources and achieving higher accuracy than prior designs.

Core claim

JEDI-linear approximates full Interaction Networks for jet tagging by replacing explicit pairwise interactions with shared transformations and global aggregation, yielding linear computational complexity. When combined with per-parameter quantization-aware training and distributed arithmetic for multiplier-free multiply-accumulate operations, the model maps efficiently onto FPGAs. This produces an implementation with 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, up to 6.2 times lower LUT usage, higher accuracy, and zero DSP blocks compared with state-of-the-art GNNs, satisfying the requirements for the HL-LHC CMS Level-1 trigger.

What carries the argument

JEDI-linear architecture, which substitutes explicit pairwise interactions with shared transformations and global aggregation to achieve linear complexity while preserving jet-tagging accuracy under FPGA constraints.

If this is right

The FPGA implementation meets HL-LHC CMS Level-1 trigger latency requirements while delivering higher model accuracy than prior GNN designs.
Latency is reduced by a factor of 3.7 to 11.5 relative to state-of-the-art GNNs on FPGAs.
Initiation interval is lowered by up to 150 times and LUT usage by up to 6.2 times.
No DSP blocks are required, freeing resources for other trigger logic.
Open-sourced templates support reproducibility for similar real-time scientific inference tasks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The shared-transformation approach may transfer to other real-time GNN tasks in particle physics or similar domains where quadratic pairwise costs are prohibitive.
Multiplier-free distributed arithmetic could be combined with the same quantization strategy to reduce power in additional embedded inference settings.
Per-parameter bitwidth optimization offers a general lever for trading accuracy against hardware cost that could be tested on non-physics edge-AI models.

Load-bearing premise

The linear-complexity approximation that replaces explicit pairwise interactions with shared transformations and global aggregation preserves jet-tagging accuracy under the quantization and hardware constraints of the FPGA implementation.

What would settle it

A side-by-side evaluation of the full Interaction Network and JEDI-linear on identical jet-tagging data, using the same quantization scheme and training procedure, that shows a substantial accuracy drop for JEDI-linear would falsify the claim that the approximation preserves performance.

Figures

Figures reproduced from arXiv: 2508.15468 by Alexander Tapper, Arianna Cox, Chang Sun, Christopher Brown, Emyr Clement, Katerina Karakoulaki, Lauri Laatu, Maria Spiropulu, Sudarshan Paramesvaran, Wayne Luk, Zhiqiang Que.

**Figure 2.** Figure 2: Conventional interaction information gathering in JEDI-net. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The proposed interaction information gathering for JEDI-net. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 5.** Figure 5: A sketch of the CMS Level-1 Trigger system in the HL-LHC, adapted [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Model accuracy across different numbers of particles per jet. (a) [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Bitwidth of trained JEDI-linear models with datasets of 64/128 [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗

**Figure 8.** Figure 8: Latency vs. number of particles per jet. Our JEDI-linear designs [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of the JEDI-linear model with MLP-Mixer on the 16 [PITH_FULL_IMAGE:figures/full_fig_p009_9.png] view at source ↗

read the original abstract

Graph Neural Networks (GNNs), particularly Interaction Networks (INs), have shown exceptional performance for jet tagging at the CERN High-Luminosity Large Hadron Collider (HL-LHC). However, their computational complexity and irregular memory access patterns pose significant challenges for deployment on FPGAs in hardware trigger systems, where strict latency and resource constraints apply. In this work, we propose JEDI-linear, a novel GNN architecture with linear computational complexity that eliminates explicit pairwise interactions by leveraging shared transformations and global aggregation. To further enhance hardware efficiency, we introduce fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic. Evaluation results show that our FPGA-based JEDI-linear achieves 3.7 to 11.5 times lower latency, up to 150 times lower initiation interval, and up to 6.2 times lower LUT usage compared to state-of-the-art GNN designs while also delivering higher model accuracy and eliminating the need for DSP blocks entirely. This is the first interaction-based GNN to achieve less than 60~ns latency and currently meets the requirements for use in the HL-LHC CMS Level-1 trigger system. This work advances the next-generation trigger systems by enabling accurate, scalable, and resource-efficient GNN inference in real-time environments. Our open-sourced templates will further support reproducibility and broader adoption across scientific applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

JEDI-linear linearizes Interaction Networks for jet tagging to hit under 60 ns on FPGA with claimed accuracy gains and no DSPs, but the accuracy cost of dropping explicit pairwise terms is not isolated.

read the letter

The one thing to know is that this work gets a linearized interaction-style GNN running under 60 ns on FPGA for jet tagging, with reported gains in speed and accuracy over earlier designs and no DSP usage at all. They achieve the linear complexity by dropping explicit pairwise interactions in favor of shared transformations and global aggregation. Fine-grained quantization with per-parameter bit widths and distributed arithmetic for MACs handle the hardware side. The evaluation claims 3.7-11.5x lower latency, much better initiation interval, lower LUT count, and better accuracy than state-of-the-art GNNs on FPGAs. It is positioned as the first such model to fit the HL-LHC CMS trigger timing. This is practical progress on deploying more expressive ML in real-time collider triggers. The open-sourced templates add to its usefulness for others in the field. The soft spots are around the evidence for the approximation. There is no reported test isolating how much tagging performance changes when moving from full pairwise interactions to the linear version, particularly once quantization is applied. Without that, it is difficult to separate the contribution of the new architecture from other optimizations. The abstract also skips details on dataset splits, baselines, and statistical errors, which makes the quantitative claims harder to judge right away. Readers working on FPGA implementations for particle physics triggers will get the most from this. It is relevant for anyone looking at efficient GNNs under strict latency and resource limits. The paper deserves a serious referee because the hardware results are specific and the problem it tackles is important for upcoming LHC runs. I recommend putting it through peer review.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces JEDI-linear, a novel GNN architecture for jet tagging that achieves linear computational complexity by replacing explicit pairwise interactions from Interaction Networks with shared transformations and global aggregation. It further applies fine-grained quantization-aware training with per-parameter bitwidth optimization and distributed arithmetic to enable multiplier-free operations on FPGAs. The central claims are that the resulting FPGA implementation delivers 3.7–11.5× lower latency, up to 150× lower initiation interval, up to 6.2× lower LUT usage, higher accuracy than prior GNN designs, eliminates DSP blocks entirely, and achieves <60 ns latency suitable for the HL-LHC CMS Level-1 trigger.

Significance. If the reported performance and accuracy numbers hold under rigorous validation, the work would be significant for high-energy physics by demonstrating the first interaction-based GNN that meets the strict latency and resource constraints of HL-LHC trigger systems while improving accuracy. The open-sourcing of implementation templates would further aid reproducibility and adoption of hardware-efficient GNNs in scientific computing.

major comments (2)

[§4] §4 (Evaluation) and associated tables: The headline claims of higher model accuracy together with 3.7–11.5× latency reduction and no DSP usage rest on the untested assumption that the linear approximation (shared transforms + global aggregation) preserves the pairwise interaction features needed for jet tagging after per-parameter quantization. No ablation isolating the accuracy impact of this substitution versus a full Interaction Network on the identical dataset and task is reported, which is load-bearing for the central claim.
[Table 2] Table 2 (or equivalent results table): The latency, initiation-interval, and resource comparisons lack explicit specification of the target FPGA device, synthesis tool chain, clock frequency, and exact baseline implementations (including their quantization schemes), preventing independent verification of the reported speed-ups and resource savings.

minor comments (2)

[Abstract] The abstract states quantitative improvements but supplies no dataset name, jet-tagging task definition, or number of particles per graph; adding these would improve clarity without altering the technical content.
[§3] Notation for the global aggregation operation and the per-parameter bitwidth assignment should be defined once in §3 before being used in the hardware mapping description.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough review and constructive feedback. We address the major comments point by point below, agreeing where revisions are warranted to strengthen the manuscript.

read point-by-point responses

Referee: [§4] §4 (Evaluation) and associated tables: The headline claims of higher model accuracy together with 3.7–11.5× latency reduction and no DSP usage rest on the untested assumption that the linear approximation (shared transforms + global aggregation) preserves the pairwise interaction features needed for jet tagging after per-parameter quantization. No ablation isolating the accuracy impact of this substitution versus a full Interaction Network on the identical dataset and task is reported, which is load-bearing for the central claim.

Authors: We agree that an explicit ablation would provide stronger support for the central claim. In the revised manuscript we will add a dedicated comparison in §4 (with an accompanying table) that evaluates JEDI-linear against a full Interaction Network on the identical jet-tagging dataset and task, reporting accuracy both before and after the per-parameter quantization step. This will quantify the accuracy impact of the shared-transform plus global-aggregation substitution and confirm that essential pairwise features are retained while linear complexity is achieved. revision: yes
Referee: [Table 2] Table 2 (or equivalent results table): The latency, initiation-interval, and resource comparisons lack explicit specification of the target FPGA device, synthesis tool chain, clock frequency, and exact baseline implementations (including their quantization schemes), preventing independent verification of the reported speed-ups and resource savings.

Authors: We acknowledge the need for full reproducibility details. The revised manuscript will explicitly state in the caption and body of Table 2 (and in §4) the target device (Xilinx Virtex UltraScale+), synthesis tool chain and version, target clock frequency for timing closure, and the precise quantization schemes used for each baseline implementation. These clarifications will enable independent verification of the reported latency, initiation-interval, and resource figures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; claims rest on empirical hardware evaluation

full rationale

The paper introduces JEDI-linear as an architectural design choice that replaces explicit pairwise interactions with shared transformations and global aggregation to achieve linear complexity, then validates this via FPGA implementation, quantization-aware training, and direct comparisons to prior GNN designs on latency, resource usage, and accuracy. No equations, derivations, or predictions are shown that reduce reported results to quantities fitted or defined internally by the paper itself. The central performance claims (sub-60 ns latency, higher accuracy, no DSP blocks) are supported by external hardware synthesis results and benchmarks against state-of-the-art methods, making the evaluation self-contained against independent implementation data rather than self-referential.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities are stated. The work relies on standard GNN modeling assumptions for jet tagging and conventional FPGA design practices.

pith-pipeline@v0.9.0 · 5828 in / 1236 out tokens · 44716 ms · 2026-05-18T21:54:10.204173+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we require fR to be an affine transformation... rewrite the interaction embedding... linear complexity with respect to NO... global context vector summarizing the features of all particles
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

fine-grained quantization-aware training with per-parameter bitwidth optimization and employ multiplier-free multiply-accumulate operations via distributed arithmetic

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Patch Hierarchical Attention Transformer for Efficient Particle Jet Tagging
hep-ex 2026-05 unverdicted novelty 7.0

PHAT-JeT combines geometric message-passing with hierarchical patch attention to reach state-of-the-art accuracy and background rejection among resource-constrained jet tagging models on four benchmarks.
HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference
cs.AR 2026-04 unverdicted novelty 6.0

HGQ-LUT delivers a practical LUT-aware training framework with new tensor-based layers, heterogeneous quantization, and a resource surrogate that automates accuracy-efficiency trade-offs for FPGA DNN inference.
E-PCN: Jet Tagging with Explainable Particle Chebyshev Networks Using Kinematic Features
hep-ph 2025-12 conditional novelty 5.0

E-PCN reaches 94.67% macro-accuracy on 10-class jet tagging by weighting graphs with angular separation, transverse momentum, momentum fraction, and invariant mass, with Grad-CAM showing the first two account for 76% ...

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · cited by 3 Pith papers · 1 internal anchor

[1]

Automatic heterogeneous quantization of deep neural networks for low-latency in- ference on the edge for particle detectors,

C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T. K. Aarrestad, V . Loncar, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency in- ference on the edge for particle detectors,” Nature Machine Intelligence, vol. 3, no. 8, pp. 675–686, 2021

work page 2021
[2]

Fast inference of deep neural networks in FPGAs for particle physics,

J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran et al. , “Fast inference of deep neural networks in FPGAs for particle physics,” Journal of Instrumentation, vol. 13, no. 07, p. P07027, 2018

work page 2018
[3]

JEDI-net: a jet identification algorithm based on interaction networks,

E. A. Moreno, O. Cerri, J. M. Duarte, H. B. Newman, T. Q. Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu, and J.-R. Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” The European Physical Journal C , vol. 80, no. 1, pp. 1–15, 2020

work page 2020
[4]

LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,

Z. Que, H. Fan, M. Loo, H. Li, M. Blott, M. Pierini, A. Tapper, and W. Luk, “LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,” ACM Transactions on Embedded Computing Systems, vol. 23, no. 2, pp. 1–28, 2024

work page 2024
[5]

System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,

S. Summers and the CMS Collaboration, “System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,” Pre- sented at TWEPP 2024, Glasgow, UK, 2024, for the CMS Collaboration

work page 2024
[6]

Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,

S. Summers, I. Bestintzanos, and G. Petrucciani, “Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,” 2023. [Online]. Available: https://arxiv.org/abs/2310.08062

work page arXiv 2023
[7]

Ultrafast jet classi- fication at the HL-LHC,

P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini et al., “Ultrafast jet classi- fication at the HL-LHC,” Machine Learning: Science and Technology , vol. 5, no. 3, p. 035017, 2024

work page 2024
[8]

Deep sets,

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,”Advances in neural information processing systems, vol. 30, 2017

work page 2017
[9]

Interaction networks for the identification of boosted H → bb decays,

E. A. Moreno, T. Q. Nguyen, J.-R. Vlimant, O. Cerri, H. B. Newman, A. Periwal, M. Spiropulu, J. M. Duarte, and M. Pierini, “Interaction networks for the identification of boosted H → bb decays,” Physical Review D, vol. 102, no. 1, p. 012010, 2020

work page 2020
[10]

Jet tagging via particle clouds,

H. Qu and L. Gouskos, “Jet tagging via particle clouds,” Physical Review D, vol. 101, no. 5, p. 056019, 2020

work page 2020
[11]

Particle transformer for jet tagging,

H. Qu, C. Li, and S. Qian, “Particle transformer for jet tagging,” in International Conference on Machine Learning . PMLR, 2022, pp. 18 281–18 292

work page 2022
[12]

Interpreting Transformers for Jet Tagging,

A. Wang, A. Gandrakota, J. Ngadiuba, V . Sahu, P. Bhatnagar, E. E. Khoda, and J. Duarte, “Interpreting Transformers for Jet Tagging,”arXiv preprint arXiv:2412.03673, 2024

work page arXiv 2024
[13]

Reconfigurable acceleration of graph neural networks for jet identification in particle physics,

Z. Que, M. Loo, and W. Luk, “Reconfigurable acceleration of graph neural networks for jet identification in particle physics,” in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 202–205

work page 2022
[14]

Optimizing graph neural networks for jet tagging in particle physics on FPGAs,

Z. Que, M. Loo, H. Fan, M. Pierini, A. Tapper, and W. Luk, “Optimizing graph neural networks for jet tagging in particle physics on FPGAs,” in 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2022, pp. 327–333

work page 2022
[15]

ParticleNet for Jet Tagging in Particle Physics on FPGA,

Y . Zhang, Y . Cheng, and Y . Gao, “ParticleNet for Jet Tagging in Particle Physics on FPGA,” in BenchCouncil International Symposium on Intelligent Computers, Algorithms, and Applications. Springer, 2023, pp. 244–253

work page 2023
[16]

Gradient-based automatic mixed precision quantization for neural net- works on-chip,

C. Sun, T. K. ˚Arrestad, V . Loncar, J. Ngadiuba, and M. Spiropulu, “Gradient-based automatic mixed precision quantization for neural net- works on-chip,” arXiv preprint arXiv:2405.00645 , 2024

work page arXiv 2024
[17]

da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

C. Sun, Z. Que, V . Loncar, W. Luk, and M. Spiropulu, “da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs,”arXiv preprint arXiv:2507.04535, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[18]

The Phase-2 Upgrade of the CMS Level-1 Trigger,

The CMS Collaboration, “The Phase-2 Upgrade of the CMS Level-1 Trigger,” CERN, Geneva, Tech. Rep., 2020, final version. [Online]. Available: https://cds.cern.ch/record/2714892

work page arXiv 2020
[19]

Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,

J. Ngadiuba, V . Loncar, M. Pierini, S. Summers, G. Di Guglielmo, J. Duarte, P. Harris, D. Rankin, S. Jindariani, M. Liuet al., “Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,” Machine Learning: Science and Technology , vol. 2, no. 1, p. 015001, 2020

work page 2020
[20]

FINN: A framework for fast, scalable binarized neural network inference,

Y . Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “FINN: A framework for fast, scalable binarized neural network inference,” in Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays , 2017, pp. 65–74

work page 2017
[21]

A real-time object detection accelerator with com- pressed SSDLite on FPGA,

H. Fan, S. Liu, M. Ferianc, H.-C. Ng, Z. Que, S. Liu, X. Niu, and W. Luk, “A real-time object detection accelerator with com- pressed SSDLite on FPGA,” in 2018 International conference on field- programmable technology (FPT). IEEE, 2018, pp. 14–21

work page 2018
[22]

Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,

Z. Que, H. Nakahara, E. Nurvitadhi, A. Boutros, H. Fan, C. Zeng, J. Meng, K. H. Tsoi, X. Niu, and W. Luk, “Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,”IEEE Trans- actions on Very Large Scale Integration (VLSI) Systems , vol. 30, no. 2, pp. 227–237, 2021

work page 2021
[23]

Verilator,

W. Snyder, P. Wasson, D. Galbi, and et al, “Verilator,” Veripool, repository: https://github.com/verilator/verilator. [Online]. Available: https://verilator.org

work page
[24]

HLS4ML LHC jet dataset (150 particles)

M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, “Hls4ml lhc jet dataset (150 particles),” Jan. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3602260

work page doi:10.5281/zenodo.3602260 2020
[25]

The importance of calorimetry for highly-boosted jet substructure,

E. Coleman, M. Freytsis, A. Hinzmann, M. Narain, J. Thaler, N. Tran, and C. Vernieri, “The importance of calorimetry for highly-boosted jet substructure,” Journal of Instrumentation , vol. 13, no. 01, p. T01003, jan 2018. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/13/ 01/T01003

work page doi:10.1088/1748-0221/13/ 2018
[26]

Fast jet tagging with mlp-mixers on fpgas,

C. Sun, J. Ngadiuba, M. Pierini, and M. Spiropulu, “Fast jet tagging with mlp-mixers on fpgas,” Machine Learning: Science and 9 Technology, 2025. [Online]. Available: http://iopscience.iop.org/article/ 10.1088/2632-2153/adf596

work page doi:10.1088/2632-2153/adf596 2025
[27]

Machine Learning: Science and Technology , abstract =

J. Weitz, D. Demler, L. McDermott, N. Tran, and J. Duarte, “Neural architecture codesign for fast physics applications,” Machine Learning: Science and Technology , vol. 6, no. 3, p. 035009, jul 2025. [Online]. Available: https://dx.doi.org/10.1088/2632-2153/adede1

work page doi:10.1088/2632-2153/adede1 2025
[28]

Ultrafast jet classification on FPGAs for HL-LHC

P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini, P. Rincke, A. Seksaria, S. Summers, A. Sznajder, A. Tapper, and T. K. ˚Arrestad, “Ultrafast jet classification at the hl-lhc,” Machine Learning: Science and Technology, vol. 5, no. 3, p. 035017, Jul. 2024. [Online]. Available: http://dx.doi.org/...

work page doi:10.1088/2632-2153/ad5f10 2024

[1] [1]

Automatic heterogeneous quantization of deep neural networks for low-latency in- ference on the edge for particle detectors,

C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T. K. Aarrestad, V . Loncar, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency in- ference on the edge for particle detectors,” Nature Machine Intelligence, vol. 3, no. 8, pp. 675–686, 2021

work page 2021

[2] [2]

Fast inference of deep neural networks in FPGAs for particle physics,

J. Duarte, S. Han, P. Harris, S. Jindariani, E. Kreinar, B. Kreis, J. Ngadiuba, M. Pierini, R. Rivera, N. Tran et al. , “Fast inference of deep neural networks in FPGAs for particle physics,” Journal of Instrumentation, vol. 13, no. 07, p. P07027, 2018

work page 2018

[3] [3]

JEDI-net: a jet identification algorithm based on interaction networks,

E. A. Moreno, O. Cerri, J. M. Duarte, H. B. Newman, T. Q. Nguyen, A. Periwal, M. Pierini, A. Serikova, M. Spiropulu, and J.-R. Vlimant, “JEDI-net: a jet identification algorithm based on interaction networks,” The European Physical Journal C , vol. 80, no. 1, pp. 1–15, 2020

work page 2020

[4] [4]

LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,

Z. Que, H. Fan, M. Loo, H. Li, M. Blott, M. Pierini, A. Tapper, and W. Luk, “LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,” ACM Transactions on Embedded Computing Systems, vol. 23, no. 2, pp. 1–28, 2024

work page 2024

[5] [5]

System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,

S. Summers and the CMS Collaboration, “System Design and Prototyp- ing for the CMS Level-1 Trigger at the High-Luminosity LHC,” Pre- sented at TWEPP 2024, Glasgow, UK, 2024, for the CMS Collaboration

work page 2024

[6] [6]

Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,

S. Summers, I. Bestintzanos, and G. Petrucciani, “Reconstructing jets in the phase-2 upgrade of the cms level-1 trigger with a seeded cone algorithm,” 2023. [Online]. Available: https://arxiv.org/abs/2310.08062

work page arXiv 2023

[7] [7]

Ultrafast jet classi- fication at the HL-LHC,

P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini et al., “Ultrafast jet classi- fication at the HL-LHC,” Machine Learning: Science and Technology , vol. 5, no. 3, p. 035017, 2024

work page 2024

[8] [8]

Deep sets,

M. Zaheer, S. Kottur, S. Ravanbakhsh, B. Poczos, R. R. Salakhutdinov, and A. J. Smola, “Deep sets,”Advances in neural information processing systems, vol. 30, 2017

work page 2017

[9] [9]

Interaction networks for the identification of boosted H → bb decays,

E. A. Moreno, T. Q. Nguyen, J.-R. Vlimant, O. Cerri, H. B. Newman, A. Periwal, M. Spiropulu, J. M. Duarte, and M. Pierini, “Interaction networks for the identification of boosted H → bb decays,” Physical Review D, vol. 102, no. 1, p. 012010, 2020

work page 2020

[10] [10]

Jet tagging via particle clouds,

H. Qu and L. Gouskos, “Jet tagging via particle clouds,” Physical Review D, vol. 101, no. 5, p. 056019, 2020

work page 2020

[11] [11]

Particle transformer for jet tagging,

H. Qu, C. Li, and S. Qian, “Particle transformer for jet tagging,” in International Conference on Machine Learning . PMLR, 2022, pp. 18 281–18 292

work page 2022

[12] [12]

Interpreting Transformers for Jet Tagging,

A. Wang, A. Gandrakota, J. Ngadiuba, V . Sahu, P. Bhatnagar, E. E. Khoda, and J. Duarte, “Interpreting Transformers for Jet Tagging,”arXiv preprint arXiv:2412.03673, 2024

work page arXiv 2024

[13] [13]

Reconfigurable acceleration of graph neural networks for jet identification in particle physics,

Z. Que, M. Loo, and W. Luk, “Reconfigurable acceleration of graph neural networks for jet identification in particle physics,” in 2022 IEEE 4th International Conference on Artificial Intelligence Circuits and Systems (AICAS). IEEE, 2022, pp. 202–205

work page 2022

[14] [14]

Optimizing graph neural networks for jet tagging in particle physics on FPGAs,

Z. Que, M. Loo, H. Fan, M. Pierini, A. Tapper, and W. Luk, “Optimizing graph neural networks for jet tagging in particle physics on FPGAs,” in 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2022, pp. 327–333

work page 2022

[15] [15]

ParticleNet for Jet Tagging in Particle Physics on FPGA,

Y . Zhang, Y . Cheng, and Y . Gao, “ParticleNet for Jet Tagging in Particle Physics on FPGA,” in BenchCouncil International Symposium on Intelligent Computers, Algorithms, and Applications. Springer, 2023, pp. 244–253

work page 2023

[16] [16]

Gradient-based automatic mixed precision quantization for neural net- works on-chip,

C. Sun, T. K. ˚Arrestad, V . Loncar, J. Ngadiuba, and M. Spiropulu, “Gradient-based automatic mixed precision quantization for neural net- works on-chip,” arXiv preprint arXiv:2405.00645 , 2024

work page arXiv 2024

[17] [17]

da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs

C. Sun, Z. Que, V . Loncar, W. Luk, and M. Spiropulu, “da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs,”arXiv preprint arXiv:2507.04535, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[18] [18]

The Phase-2 Upgrade of the CMS Level-1 Trigger,

The CMS Collaboration, “The Phase-2 Upgrade of the CMS Level-1 Trigger,” CERN, Geneva, Tech. Rep., 2020, final version. [Online]. Available: https://cds.cern.ch/record/2714892

work page arXiv 2020

[19] [19]

Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,

J. Ngadiuba, V . Loncar, M. Pierini, S. Summers, G. Di Guglielmo, J. Duarte, P. Harris, D. Rankin, S. Jindariani, M. Liuet al., “Compressing deep neural networks on FPGAs to binary and ternary precision with hls4ml,” Machine Learning: Science and Technology , vol. 2, no. 1, p. 015001, 2020

work page 2020

[20] [20]

FINN: A framework for fast, scalable binarized neural network inference,

Y . Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers, “FINN: A framework for fast, scalable binarized neural network inference,” in Proceedings of the 2017 ACM/SIGDA international symposium on field-programmable gate arrays , 2017, pp. 65–74

work page 2017

[21] [21]

A real-time object detection accelerator with com- pressed SSDLite on FPGA,

H. Fan, S. Liu, M. Ferianc, H.-C. Ng, Z. Que, S. Liu, X. Niu, and W. Luk, “A real-time object detection accelerator with com- pressed SSDLite on FPGA,” in 2018 International conference on field- programmable technology (FPT). IEEE, 2018, pp. 14–21

work page 2018

[22] [22]

Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,

Z. Que, H. Nakahara, E. Nurvitadhi, A. Boutros, H. Fan, C. Zeng, J. Meng, K. H. Tsoi, X. Niu, and W. Luk, “Recurrent neural networks with column-wise matrix–vector multiplication on FPGAs,”IEEE Trans- actions on Very Large Scale Integration (VLSI) Systems , vol. 30, no. 2, pp. 227–237, 2021

work page 2021

[23] [23]

Verilator,

W. Snyder, P. Wasson, D. Galbi, and et al, “Verilator,” Veripool, repository: https://github.com/verilator/verilator. [Online]. Available: https://verilator.org

work page

[24] [24]

HLS4ML LHC jet dataset (150 particles)

M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, “Hls4ml lhc jet dataset (150 particles),” Jan. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3602260

work page doi:10.5281/zenodo.3602260 2020

[25] [25]

The importance of calorimetry for highly-boosted jet substructure,

E. Coleman, M. Freytsis, A. Hinzmann, M. Narain, J. Thaler, N. Tran, and C. Vernieri, “The importance of calorimetry for highly-boosted jet substructure,” Journal of Instrumentation , vol. 13, no. 01, p. T01003, jan 2018. [Online]. Available: https://dx.doi.org/10.1088/1748-0221/13/ 01/T01003

work page doi:10.1088/1748-0221/13/ 2018

[26] [26]

Fast jet tagging with mlp-mixers on fpgas,

C. Sun, J. Ngadiuba, M. Pierini, and M. Spiropulu, “Fast jet tagging with mlp-mixers on fpgas,” Machine Learning: Science and 9 Technology, 2025. [Online]. Available: http://iopscience.iop.org/article/ 10.1088/2632-2153/adf596

work page doi:10.1088/2632-2153/adf596 2025

[27] [27]

Machine Learning: Science and Technology , abstract =

J. Weitz, D. Demler, L. McDermott, N. Tran, and J. Duarte, “Neural architecture codesign for fast physics applications,” Machine Learning: Science and Technology , vol. 6, no. 3, p. 035009, jul 2025. [Online]. Available: https://dx.doi.org/10.1088/2632-2153/adede1

work page doi:10.1088/2632-2153/adede1 2025

[28] [28]

Ultrafast jet classification on FPGAs for HL-LHC

P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini, P. Rincke, A. Seksaria, S. Summers, A. Sznajder, A. Tapper, and T. K. ˚Arrestad, “Ultrafast jet classification at the hl-lhc,” Machine Learning: Science and Technology, vol. 5, no. 3, p. 035017, Jul. 2024. [Online]. Available: http://dx.doi.org/...

work page doi:10.1088/2632-2153/ad5f10 2024