Recognition: unknown
HGQ-LUT: Fast LUT-Aware Training and Efficient Architectures for DNN Inference
Pith reviewed 2026-05-08 09:30 UTC · model grok-4.3
The pith
HGQ-LUT trains lookup-table neural networks over 100 times faster on GPUs while delivering state-of-the-art FPGA hardware efficiency.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
HGQ-LUT introduces LUT-Dense and LUT-Conv layers that execute as standard tensor operations during training yet compile to logic LUTs for hardware, paired with element-wise heterogeneous quantization that includes zero-bit pruning and a LUT-aware resource surrogate; together these enable more than 100 times faster training on modern GPUs, automatic accuracy-resource trade-off exploration, and unified design of hybrid LUT-plus-arithmetic architectures with bit-exact verification.
What carries the argument
LUT-Dense and LUT-Conv layers implemented via regular tensor operations that later compile to FPGA logic LUTs, combined with fine-grained heterogeneous quantization and a LUT-aware resource surrogate that guides automatic design-space exploration.
If this is right
- Training time for LUT-based networks drops from hours or days to minutes, making repeated hardware-aware design iterations feasible on standard GPU hardware.
- Designers no longer need to manually select bit widths; the surrogate and quantization jointly search the accuracy-resource space automatically.
- Hybrid networks that combine LUT blocks with conventional multiply-add blocks can be designed, compiled, and verified in a single open-source flow.
- The same model can be deployed at the edge with ultra-low latency while retaining the accuracy achieved during fast GPU training.
Where Pith is reading between the lines
- The approach could be adapted to other reconfigurable fabrics or even ASICs if a corresponding resource surrogate is built for those targets.
- The tensor-based training trick may reduce the cost of other hardware-aware training methods that currently rely on slow simulation loops.
- Integration with existing neural-architecture-search tools could turn the resource surrogate into an automatic hardware-aware NAS objective for FPGAs.
Load-bearing premise
The tensor-operation versions of the LUT layers used in training produce exactly the same numerical results and resource counts as the final compiled LUT hardware on the FPGA.
What would settle it
Measure actual FPGA latency, power, resource utilization, and inference accuracy for a trained HGQ-LUT model and compare them directly against the values predicted by the resource surrogate and the bit-exact simulation from the training phase.
Figures
read the original abstract
Lookup-table (LUT) based neural networks can deliver ultra-low latency and excellent hardware efficiency on FPGAs by mapping arithmetic operations directly onto the logic primitives. However, state-of-the-art LUT-aware training (LAT) approaches remain difficult to use in practice: they are often orders of magnitude slower to train than conventional networks, require non-trivial manual tuning for hardware efficiency, and lack an end-to-end workflow. This work presents HGQ-LUT, integrated in https://github.com/calad0i/HGQ2, a new LAT approach that achieves state-of-the-art hardware efficiency while accelerating training by over 100 times on modern GPUs. HGQ-LUT introduces LUT-Dense and LUT-Conv layers that are implemented with regular, accelerator-efficient tensor operations during training, which are then compiled into logic LUTs for hardware. By combining these layers with fine-grained, element-wise heterogeneous quantization (including zero-bit pruning) and a LUT-aware resource surrogate, HGQ-LUT enables the automatic exploration of accuracy-resource trade-offs without manual bit-width tuning. We further integrate HGQ-LUT into open-source toolchains, enabling unified design, compilation, and bit-exact verification of hybrid architectures that mix LUT-based with conventional arithmetic blocks. These features make LAT-based DNNs practical for real-world deployment, such as at the CERN Large Hadron Collider's experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces HGQ-LUT, a LUT-aware training (LAT) framework for DNN inference on FPGAs. It defines LUT-Dense and LUT-Conv layers that are realized via standard tensor operations (matrix multiplies and convolutions) during GPU training to achieve >100x speedup over prior LAT methods, then compiled to logic LUTs. These layers are paired with element-wise heterogeneous quantization (including zero-bit pruning) and a LUT-aware resource surrogate that enables automatic accuracy-resource trade-off search without manual bit-width tuning. The work integrates the approach into open-source toolchains for unified design, compilation, and bit-exact verification of hybrid LUT-plus-conventional arithmetic architectures, with example use at CERN LHC experiments.
Significance. If the central claims hold, HGQ-LUT would make LUT-based DNNs substantially more practical for ultra-low-latency FPGA deployment by removing the dominant training-time and tuning barriers that have limited prior LAT work. The explicit provision of the GitHub repository (https://github.com/calad0i/HGQ2) for reproducible code, the emphasis on bit-exact verification, and the hybrid-architecture support are concrete strengths that increase the result's immediate utility for high-energy-physics and other real-time inference settings.
major comments (2)
- [§3.1–3.2] §3.1–3.2 (LUT-Dense/LUT-Conv definitions): The central claim that training-time tensor implementations produce models whose accuracy and functionality are preserved on FPGA LUT hardware under heterogeneous quantization and zero-bit pruning is load-bearing. No explicit equivalence proof, exhaustive edge-case enumeration (e.g., pruning semantics when a weight is quantized to zero bits, or input-encoding differences for multi-input LUTs), or side-by-side numerical comparison of tensor vs. post-synthesis behavior is supplied for the heterogeneous case; only homogeneous examples appear to be validated.
- [§4.3, Table 3] §4.3 and Table 3 (resource surrogate validation): The automatic trade-off exploration and reported SOTA efficiency numbers rest on the LUT-aware surrogate accurately predicting post-synthesis utilization. Direct quantitative comparison (e.g., surrogate-predicted vs. actual LUT/FF/BRAM counts after Vivado synthesis) is shown only for a subset of homogeneous designs; extension to the fine-grained heterogeneous configurations that drive the claimed gains is required to confirm the surrogate does not systematically under- or over-estimate resources.
minor comments (2)
- [Figure 4] Figure 4: Axis labels and legend entries for the heterogeneous vs. homogeneous curves are difficult to distinguish at the printed resolution; adding explicit bit-width annotations on the data points would improve readability.
- [§5.1] §5.1: The statement that the approach is “parameter-free” should be qualified, as the surrogate still contains a small number of tunable coefficients whose sensitivity is not reported.
Simulated Author's Rebuttal
We thank the referee for the positive assessment of HGQ-LUT's practical contributions and for the constructive major comments. We address each point below and will revise the manuscript accordingly to strengthen the validation of the core claims.
read point-by-point responses
-
Referee: [§3.1–3.2] §3.1–3.2 (LUT-Dense/LUT-Conv definitions): The central claim that training-time tensor implementations produce models whose accuracy and functionality are preserved on FPGA LUT hardware under heterogeneous quantization and zero-bit pruning is load-bearing. No explicit equivalence proof, exhaustive edge-case enumeration (e.g., pruning semantics when a weight is quantized to zero bits, or input-encoding differences for multi-input LUTs), or side-by-side numerical comparison of tensor vs. post-synthesis behavior is supplied for the heterogeneous case; only homogeneous examples appear to be validated.
Authors: The LUT-Dense and LUT-Conv layers are defined so that the tensor operations (matrix multiplies and convolutions) during training perform exactly the same arithmetic as the compiled LUT logic on hardware, with element-wise heterogeneous quantization applied identically in both domains. For zero-bit pruning, a weight assigned zero bits is removed from the computation graph in the tensor implementation (by masking or zeroing the corresponding slice), which matches the hardware behavior of omitting that LUT input. We acknowledge that the manuscript currently provides only homogeneous validation examples. In the revision we will add an appendix containing (i) a concise equivalence argument derived directly from the layer definitions in §3.1–3.2, (ii) side-by-side numerical comparisons of tensor versus post-synthesis outputs for representative heterogeneous quantization masks (including zero-bit cases), and (iii) explicit clarification of the input-encoding convention used for multi-input LUTs. These additions will directly address the referee's concern without altering the reported results. revision: yes
-
Referee: [§4.3, Table 3] §4.3 and Table 3 (resource surrogate validation): The automatic trade-off exploration and reported SOTA efficiency numbers rest on the LUT-aware surrogate accurately predicting post-synthesis utilization. Direct quantitative comparison (e.g., surrogate-predicted vs. actual LUT/FF/BRAM counts after Vivado synthesis) is shown only for a subset of homogeneous designs; extension to the fine-grained heterogeneous configurations that drive the claimed gains is required to confirm the surrogate does not systematically under- or over-estimate resources.
Authors: We agree that the surrogate's accuracy must be demonstrated for the heterogeneous configurations that underpin the automatic trade-off search. The current Table 3 and §4.3 focus on homogeneous designs for brevity. In the revised manuscript we will extend the validation by adding a new table (or expanded subsection) that reports surrogate-predicted versus post-Vivado-synthesis LUT/FF/BRAM counts for multiple heterogeneous bit-width assignments drawn from the Pareto-front experiments. This will confirm that the surrogate remains reliable in the fine-grained regime and will support the SOTA efficiency claims. revision: yes
Circularity Check
No circularity in derivation chain
full rationale
The paper defines LUT-Dense and LUT-Conv layers via independent tensor-operation implementations for training, then separately compiles them to hardware LUTs. The LUT-aware resource surrogate is introduced as an additional modeling component for trade-off search. No equations or claims reduce by construction to fitted inputs, self-definitions, or load-bearing self-citations; the workflow is presented as an end-to-end empirical pipeline with external verification steps. This matches the default expectation of a non-circular paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Tensor operations used in training accurately simulate the behavior of compiled LUT logic on FPGA
Reference graph
Works this paper leans on
-
[1]
LUTNet: Learning FPGA configurations for highly efficient neural network infer- ence,
E. Wang, J. J. Davis, P. Y . Cheung, and G. A. Constantinides, “LUTNet: Learning FPGA configurations for highly efficient neural network infer- ence,”IEEE Transactions on Computers, vol. 69, no. 12, pp. 1795–1808, 2020
2020
-
[2]
Neuralut-assemble: Hardware- aware assembling of sub-neural networks for efficient lut infer- ence,
M. Andronic and G. A. Constantinides, “Neuralut-assemble: Hardware- aware assembling of sub-neural networks for efficient lut infer- ence,” in2025 IEEE 33rd Annual International Symposium on Field- Programmable Custom Computing Machines (FCCM), 2025, pp. 208– 216
2025
-
[3]
Logicnets: Co-designed neural networks and circuits for extreme-throughput applications,
Y . Umuroglu, Y . Akhauri, N. J. Fraser, and M. Blott, “Logicnets: Co-designed neural networks and circuits for extreme-throughput applications,”2020 30th International Conference on Field- Programmable Logic and Applications (FPL), pp. 291–297, 2020. [Online]. Available: https://doi.org/10.1109/FPL50879.2020.00055
-
[4]
NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions,
M. Andronic and G. A. Constantinides, “NeuraLUT: Hiding Neural Network Density in Boolean Synthesizable Functions,” in2024 34th In- ternational Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2024, pp. 140–148, doi: 10.1109/FPL64840.2024.00028
-
[5]
Reducedlut: Table decomposition with “don’t care
O. Cassidy, M. Andronic, S. Coward, and G. A. Constantinides, “Reducedlut: Table decomposition with “don’t care” conditions,” in Proceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, ser. FPGA ’25. ACM, Feb. 2025, p. 36–42. [Online]. Available: http://dx.doi.org/10.1145/3706628.3708823
-
[6]
PolyLUT: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning,
M. Andronic and G. A. Constantinides, “PolyLUT: Ultra-Low Latency Polynomial Inference With Hardware-Aware Structured Pruning,” in IEEE Transactions on Computers. IEEE, 2025, pp. 3181–3194, doi: 10.1109/TC.2025.3586311
-
[7]
Polylut-add: Fpga-based lut inference with wide inputs,
B. Lou, R. Rademacher, D. Boland, and P. H. Leong, “Polylut-add: Fpga-based lut inference with wide inputs,” in2024 34th International Conference on Field-Programmable Logic and Applications (FPL), 2024, pp. 149–155
2024
-
[8]
O. Weng, M. Andronic, D. Zuberi, J. Chen, C. Geniesse, G. A. Constantinides, N. Tran, N. J. Fraser, J. M. Duarte, and R. Kastner, “Greater than the sum of its luts: Scaling up lut-based neural networks with amigolut,” inProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, ser. FPGA ’25. New York, NY , USA: Associati...
-
[9]
Hmt: Hierarchical memory transformer for efficient long context language processing
Z. He, S. Ye, R. Ma, Y . Wang, and J. Cong, “LUT-LLM: Efficient Large Language Model Inference with Memory-based Computations on FPGAs,”arXiv preprint arXiv:2511.06174, 2025
-
[10]
CD-LLM: A Heterogeneous Multi-FPGA Sys- tem for Batched Decoding of 70B+ LLMs using a Compute-Dedicated Architecture,
W. Ma, X. Yang, S. Zeng, T. Liu, L. Shen, H. Wang, S. Li, K. Hong, Z. Zhu, X. Ninget al., “CD-LLM: A Heterogeneous Multi-FPGA Sys- tem for Batched Decoding of 70B+ LLMs using a Compute-Dedicated Architecture,”ACM Transactions on Reconfigurable Technology and Systems, 2025
2025
-
[11]
Differentiable weightless neural networks,
A. T. L. Bacellar, Z. Susskind, M. Breternitz Jr, E. John, L. K. John, P. M. V . Lima, and F. M. Franc ¸a, “Differentiable weightless neural networks,” inProceedings of the 41st International Conference on Machine Learning, ser. Proceedings of Machine Learning Research, R. Salakhutdinov, Z. Kolter, K. Heller, A. Weller, N. Oliver, J. Scarlett, and F. Berk...
2024
-
[12]
Multilayer feedforward networks are universal approximators,
K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators,”Neural Netw., vol. 2, no. 5, pp. 359–366, Jul. 1989
1989
-
[13]
Hgq: High granularity quantization for real-time neural networks on fpgas,
C. Sun, Z. Que, T. K. ˚Arrestad, V . Loncar, J. Ngadiuba, W. Luk, and M. Spiropulu, “Hgq: High granularity quantization for real-time neural networks on fpgas,” inProceedings of the 2026 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA ’26. New York, NY , USA: Association for Computing Machinery, 2026. [Online]. Available: ht...
-
[14]
hls4ml: An open-source codesign workflow to empower scientific low-power machine learning devices,
F. Fahim, B. Hawks, C. Herwig, J. Hirschauer, S. Jindariani, N. Tran, L. P. Carloni, G. D. Guglielmo, P. C. Harris, J. D. Krupa, D. Rankin, M. B. Valentin, J. Hester, Y . Luo, J. Mamish, S. Orgrenci-Memik, T. Aarrestad, H. Javed, V . Loncar, M. Pierini, A. A. Pol, S. Summers, J. M. Duarte, S. Hauck, S. Hsu, J. Ngadiuba, M. Liu, D. Hoang, E. Kreinar, and Z...
-
[15]
da4ml: Distributed Arithmetic for Real-time Neural Networks on FPGAs
C. Sun, Z. Que, V . Loncar, W. Luk, and M. Spiropulu, “da4ml: Distributed arithmetic for real-time neural networks on fpgas,” 2025. [Online]. Available: https://arxiv.org/abs/2507.04535
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[16]
Xla : Compiling machine learning for peak performance,
A. Sabne, “Xla : Compiling machine learning for peak performance,” 2020
2020
-
[17]
High Performance Convolutional Neural Networks for Document Processing,
K. Chellapilla, S. Puri, and P. Simard, “High Performance Convolutional Neural Networks for Document Processing,” inTenth International Workshop on Frontiers in Handwriting Recognition, G. Lorette, Ed., Universit´e de Rennes 1. La Baule (France): Suvisoft, Oct. 2006, http://www.suvisoft.com. [Online]. Available: https://inria.hal.science/ inria-00112631
2006
-
[18]
GHDL: Open-source analyzer, compiler and simulator for VHDL
T. Gingold and et al, “GHDL: Open-source analyzer, compiler and simulator for VHDL.” [Online]. Available: https://ghdl.github.io/ghdl/
-
[19]
Verilator,
W. Snyder, P. Wasson, D. Galbi, and et al, “Verilator,” 2025, if you use this software, please cite it using the metadata from this file. [Online]. Available: https://verilator.org
2025
-
[20]
Fpga resource- aware structured pruning for real-time neural networks,
B. Ramhorst, G. A. Constantinides, and V . Loncar, “Fpga resource- aware structured pruning for real-time neural networks,” 2023. [Online]. Available: https://arxiv.org/abs/2308.05170v1
-
[21]
MetaML- Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration,
Z. Que, J. G. F. Coutinho, C. Guo, H. Fan, and W. Luk, “MetaML- Pro: Cross-Stage Design Flow Automation for Efficient Deep Learning Acceleration,”ACM Transactions on Reconfigurable Technology and Systems, 2026 (Accepted)
2026
-
[22]
KANEL ´E: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation,
D. Hoang, A. Gupta, and P. Harris, “KANEL ´E: Kolmogorov-Arnold Networks for Efficient LUT-based Evaluation,” inProceedings of the 2026 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, ser. FPGA ’26. New York, NY , USA: Association for Computing Machinery, 2026. [Online]. Available: https://doi.org/10. 1145/3748173.3779202
-
[23]
A. Khataei and K. Bazargan, “Treelut: An efficient alternative to deep neural networks for inference acceleration using gradient boosted decision trees,” inProceedings of the 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays, ser. FPGA ’25. ACM, Feb. 2025, pp. 14–24. [Online]. Available: http://dx.doi.org/10.1145/3706628.3708877
-
[24]
Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors,
C. N. Coelho, A. Kuusela, S. Li, H. Zhuang, J. Ngadiuba, T. K. Aarrestad, V . Loncar, M. Pierini, A. A. Pol, and S. Summers, “Automatic heterogeneous quantization of deep neural networks for low-latency inference on the edge for particle detectors,”Nature Machine Intelligence, vol. 3, no. 8, pp. 675–686, jun 2021. [Online]. Available: https://doi.org/10.1...
2021
-
[25]
Ultrafast jet classification at the hl-lhc,
P. Odagiu, Z. Que, J. Duarte, J. Haller, G. Kasieczka, A. Lobanov, V . Loncar, W. Luk, J. Ngadiuba, M. Pierini, P. Rincke, A. Seksaria, S. Summers, A. Sznajder, A. Tapper, and T. K. ˚Arrestad, “Ultrafast jet classification at the hl-lhc,”Machine Learning: Science and Technology, vol. 5, no. 3, p. 035017, Jul. 2024. [Online]. Available: http://dx.doi.org/1...
-
[26]
LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,
Z. Que, H. Fan, M. Loo, H. Li, M. Blott, M. Pierini, A. Tapper, and W. Luk, “LL-GNN: Low Latency Graph Neural Networks on FPGAs for High Energy Physics,”ACM Transactions on Embedded Computing Systems, vol. 23, no. 2, p. 1–28, Mar. 2024. [Online]. Available: http://dx.doi.org/10.1145/3640464
-
[27]
Optimizing graph neural networks for jet tagging in particle physics on FPGAs,
Z. Que, M. Loo, H. Fan, M. Pierini, A. Tapper, and W. Luk, “Optimizing graph neural networks for jet tagging in particle physics on FPGAs,” in 2022 32nd International Conference on Field-Programmable Logic and Applications (FPL). IEEE, 2022, pp. 327–333
2022
-
[28]
C. Sun, T. Nakajima, Y . Mitsumori, Y . Horii, and M. Tomoto, “Fast muon tracking with machine learning implemented in fpga,”Nuclear Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers, Detectors and Associated Equipment, vol. 1045, p. 167546, Jan. 2023. [Online]. Available: http://dx.doi.org/10.1016/j. nima.2022.167546
work page doi:10.1016/j 2023
-
[29]
Hls4ml lhc jet dataset (150 particles),
M. Pierini, J. M. Duarte, N. Tran, and M. Freytsis, “Hls4ml lhc jet dataset (150 particles),” Jan. 2020. [Online]. Available: https://doi.org/10.5281/zenodo.3602260
-
[30]
Cluster counting algorithm for the cepc drift chamber using lstm and dgcnn,
Z.-F. Tian, G. Zhao, L.-H. Wu, Z.-Y . Zhang, X. Zhou, S.-T. Xin, S.-Y . Liu, G. Li, M.-Y . Dong, and S.-S. Sun, “Cluster counting algorithm for the cepc drift chamber using lstm and dgcnn,”Nuclear Science and Techniques, vol. 36, no. 7, May 2025. [Online]. Available: http://dx.doi.org/10.1007/s41365-025-01670-y
-
[31]
Adam: A Method for Stochastic Optimization
D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” CoRR, vol. abs/1412.6980, 2015. 9
work page internal anchor Pith review arXiv 2015
-
[32]
Compressing deep neural networks on fpgas to binary and ternary precision with hls4ml,
J. Ngadiuba, V . Loncar, M. Pierini, S. Summers, G. D. Guglielmo, J. Duarte, P. Harris, D. Rankin, S. Jindariani, M. Liu, K. Pedro, N. Tran, E. Kreinar, S. Sagear, Z. Wu, and D. Hoang, “Compressing deep neural networks on fpgas to binary and ternary precision with hls4ml,”Machine Learning: Science and Technology, vol. 2, no. 1, p. 015001, dec 2020. [Onlin...
-
[33]
Neuralut,
M. Andronic and O. Cassidy, “Neuralut,” 2025. [On- line]. Available: https://github.com/MartaAndronic/NeuraLUT/tree/ 650c4f4ebcd9c6e47c7229c6a0786a5b4d8696c7
2025
-
[34]
Duchstf/kanele: Fpga’ 26 artifact evaluation,
D. Hoang and A. Gupta, “Duchstf/kanele: Fpga’ 26 artifact evaluation,” Jan. 2026. [Online]. Available: https://doi.org/10.5281/zenodo.18165682
-
[35]
JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs
Z. Que, C. Sun, S. Paramesvaran, E. Clement, K. Karakoulaki, C. Brown, L. Laatu, A. Cox, A. Tapper, W. Luk, and M. Spiropulu, “JEDI-linear: Fast and Efficient Graph Neural Networks for Jet Tagging on FPGAs,” 2025. [Online]. Available: https://arxiv.org/abs/2508.15468
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[36]
Cluster Counting Algorithm for Drift Chamber using LSTM and DGCNN,
Z. Tian, G. Zhao, L. Wu, Z. Zhang, X. Zhou, S. Xin, S. Liu, G. Li, M. Dong, and S. Sun, “Cluster Counting Algorithm for Drift Chamber using LSTM and DGCNN,” Sep. 2024. [Online]. Available: https://doi.org/10.57760/sciencedb.16322
-
[37]
CEPC Conceptual Design Report: Volume 2 - Physics & Detector
T. C. S. Group, “Cepc conceptual design report: V olume 2 - physics & detector,” 2018. [Online]. Available: https://arxiv.org/abs/1811.10545
work page Pith review arXiv 2018
-
[38]
——, “Cepc technical design report – accelerator (v2),” 2024. [Online]. Available: https://arxiv.org/abs/2312.14363
-
[39]
Sub-microsecond Transformers for Jet Tagging on FPGAs,
L. Laatu, C. Sun, A. Cox, A. Gandrakota, B. Maier, J. Ngadiuba, Z. Que, W. Luk, M. Spiropulu, and A. Tapper, “Sub-microsecond Transformers for Jet Tagging on FPGAs,”arXiv preprint arXiv:2510.24784, 2025
-
[40]
R. Zheng, C. Sun, Q. Liu, L. Laatu, A. Cox, B. Maier, A. Tapper, J. G. Coutinho, W. Luk, and Z. Que, “JetFormer: A Scalable and Efficient Transformer for Jet Tagging from Offline Analysis to FPGA Triggers,” arXiv preprint arXiv:2601.17215, 2026. 10
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.