pith. sign in

arxiv: 2605.23785 · v1 · pith:2YEBIIHFnew · submitted 2026-05-22 · ✦ hep-ex

FPGA Acceleration of Matrix-Element Calculations for Monte Carlo Event Generation

Pith reviewed 2026-05-25 02:37 UTC · model grok-4.3

classification ✦ hep-ex
keywords FPGA accelerationmatrix-element calculationsMonte Carlo event generationcolor algebraMadGraph5_aMC@NLOenergy efficiencyHigh-Level Synthesis
0
0 comments X

The pith

FPGA implementations of matrix-element calculations deliver substantial speedups and better energy efficiency than CPU or GPU versions in the MG5aMC framework.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests FPGA acceleration of matrix-element calculations inside the MadGraph5_aMC@NLO Monte Carlo framework. For the simple process e+e- to mu+mu- it maps the entire workflow onto an AMD Alveo U250 device. For the more complex gg to ttbar plus jets processes it accelerates only the color-algebra reduction step that operates on already-computed amplitudes. In both cases the FPGA versions produce results numerically close to the CPU reference while reporting large gains in throughput and energy per operation relative to the framework's existing CPU and GPU code. The work therefore positions FPGAs as a viable architecture for selected pieces of high-energy physics event generation.

Core claim

FPGA-based accelerators implemented via High-Level Synthesis can perform matrix-element calculations for Monte Carlo event generation with substantial speedups and significantly improved energy efficiency compared with CPU and GPU implementations available within the MG5aMC framework, while keeping numerical results in close agreement with the CPU reference.

What carries the argument

High-Level Synthesis implementations of the full matrix-element workflow on an AMD Alveo U250 for simple processes and of the isolated color-reduction kernel for complex processes.

If this is right

  • Numerical output from the FPGA kernels remains in close agreement with the corresponding CPU reference calculations.
  • Resource utilization and scalability on the FPGA depend strongly on the choice of numerical representation.
  • FPGAs constitute a competitive architecture for selected Monte Carlo event-generation workloads in high-energy physics.
  • The color-algebra kernel approach provides a structured and scalable entry point for selective acceleration of more complex processes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Extending the acceleration from the isolated color kernel to the surrounding amplitude computation and event-generation overhead could alter the net performance gain.
  • The same High-Level Synthesis methodology could be applied to other matrix-element generators or to different physics processes not tested here.
  • Energy-efficiency improvements could translate into lower power draw for large-scale computing farms used by collider experiments.

Load-bearing premise

The reported speedups for complex processes apply only to the isolated color-reduction kernel running on precomputed amplitudes, not to the complete matrix-element evaluation or the full event-generation workflow.

What would settle it

A timing and energy measurement of the complete event-generation pipeline, including all non-accelerated steps, executed end-to-end on the same FPGA hardware versus the CPU and GPU baselines would show whether the claimed advantages survive outside the isolated kernel.

Figures

Figures reproduced from arXiv: 2605.23785 by A. Oyanguren, A. Valero, C. Vico Villalba, F. Carri\'o, F. Herv\`as \'Alvarez, H. Guti\'errez Arance, L. Fiorini, P. Leguina L\'opez, S. Folgueras.

Figure 1
Figure 1. Figure 1: General MG5aMC workflow for Monte Carlo event generation, from process definition [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: shows the execution-time breakdown for the representative gg → tt¯+ 2 jets process. As expected, matrix-element evaluation dominates the total runtime, while the internal break￾down of the matrix-element kernel shows that the color-related stages constitute only a relatively small fraction of the full computation for this benchmark. Nevertheless, unlike wavefunction and amplitude evaluation, the color-alge… view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the heterogeneous host–FPGA architecture adopted in this work. The [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the selective acceleration strategy used for the color-algebra kernels. The [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Execution time as a function of the number of processed events for CPU, GPU, and [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Measured FPGA power trace for the full matrix-element implementation during the [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
read the original abstract

We present an FPGA-based study of matrix-element acceleration for Monte Carlo event generation, using MadGraph5_aMC@NLO as a benchmark framework. Two complementary scenarios are considered. First, we implement the full matrix-element workflow on an AMD Alveo U250 accelerator for the benchmark process $e^+e^- \to \mu^+\mu^-$, enabling an end-to-end evaluation of FPGA acceleration for a simple process. Second, for the more complex $gg \to t\bar{t}+X$ processes with increasing jet multiplicity, we investigate FPGA acceleration of the color-algebra kernels as a structured and scalable entry point for selective acceleration. In this second case, the reported speedups correspond to the isolated color-reduction kernel operating on precomputed amplitudes, rather than to the full matrix-element evaluation or the complete event-generation workflow. The proposed implementations are developed using High-Level Synthesis and are evaluated in terms of numerical accuracy, performance, energy efficiency, resource utilization, and scalability. Compared with CPU and GPU implementations available within the MG5aMC framework, the FPGA solutions achieve substantial speedups and significantly improved energy efficiency. For the considered benchmarks, the numerical results remain in close agreement with the corresponding CPU reference calculations, while the resource analysis highlights the importance of numerical representation in determining scalability on FPGA devices. These results support the use of FPGAs as a competitive architecture for selected Monte Carlo event-generation workloads in high-energy physics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript reports on the use of FPGAs to accelerate matrix-element calculations for Monte Carlo event generation in the context of MadGraph5_aMC@NLO. It describes an end-to-end implementation for the process e+e− → μ+μ− on an AMD Alveo U250 device and selective acceleration of the color-algebra (color-reduction) kernel for gg → t t-bar + X processes with varying jet multiplicities, where the latter operates on precomputed amplitudes. The study evaluates numerical accuracy, performance, energy efficiency, resource utilization, and scalability using High-Level Synthesis, claiming close agreement with CPU results and substantial speedups with improved energy efficiency compared to CPU and GPU baselines in the MG5aMC framework.

Significance. Should the reported benchmarks prove robust, this study contributes to the exploration of heterogeneous computing architectures for high-energy physics simulations. The focus on energy efficiency is particularly relevant for scaling to future collider data volumes. The explicit qualification of results for the complex processes as applying only to isolated kernels strengthens the credibility of the claims by avoiding overstatement. The analysis of numerical representation effects on FPGA scalability offers practical guidance for similar efforts.

minor comments (2)
  1. [Abstract] Abstract: the scope limitation for the gg → ttbar+X results is already stated clearly; ensure the results and discussion sections repeat this qualification verbatim when presenting speedup numbers to prevent reader misinterpretation.
  2. [Resource analysis] The resource-utilization discussion (likely §4 or §5) should tabulate the exact bit-width choices (fixed-point vs. floating-point) alongside the achieved speedups and energy figures for each benchmark so that the scalability claim can be assessed quantitatively.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the constructive review and the recommendation of minor revision. The positive assessment of the work's significance, particularly regarding energy efficiency and the careful qualification of results for isolated kernels, is appreciated. No specific major comments were enumerated in the report, so we provide no point-by-point rebuttals below. The manuscript already incorporates the qualifications noted in the referee summary.

Circularity Check

0 steps flagged

Empirical implementation study with no derivation chain or self-referential steps

full rationale

This is a hardware-acceleration and benchmarking paper. It reports direct performance measurements (speedup, energy efficiency, resource use) on FPGA implementations of selected kernels, with explicit scope limitations stated for the gg→ttbar+X case. No equations, fitted parameters, uniqueness theorems, or ansatzes are invoked; the central claims rest on measured wall-clock times and power draw against CPU/GPU baselines inside MG5aMC. No load-bearing step reduces to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities; this is an engineering performance study of hardware implementations rather than a theoretical derivation.

pith-pipeline@v0.9.0 · 5839 in / 1065 out tokens · 26697 ms · 2026-05-25T02:37:57.102965+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Revision 1.1, accessed 2026-04-10

    Advanced Micro Devices.Alveo U200 and U250 Accelerator Cards User Guide (UG1289). Revision 1.1, accessed 2026-04-10. 2023.url:https://docs.amd.com/r/en-US/ug1289- u200-u250-reconfig-accel

  2. [2]

    Revision 1.7, accessed 2026-04-10

    Advanced Micro Devices.Alveo U200 and U250 Data Center Accelerator Cards Data Sheet (DS962). Revision 1.7, accessed 2026-04-10. 2023.url:https://docs.amd.com/r/en- US/ds962-u200-u250

  3. [3]

    Accessed 2026- 04-10

    Advanced Micro Devices.Vitis High-Level Synthesis User Guide (UG1399). Accessed 2026- 04-10. 2025.url:https://docs.amd.com/r/en-US/ug1399-vitis-hls

  4. [4]

    Accessed 2026-04-10

    Advanced Micro Devices.Xilinx Runtime (XRT) Documentation. Accessed 2026-04-10. 2025.url:https://xilinx.github.io/XRT/2023.2/html/index.html. 1https://gitlab.cern.ch/hgutierr/epem_mupmum_fixed_hls/-/releases/paper-v1.0 2https://gitlab.cern.ch/hgutierr/color_gg_ttg_hls/-/releases/paper-v1.0 3https://gitlab.cern.ch/hgutierr/color_gg_ttgg_hls/-/releases/pape...

  5. [5]

    The automated computation of tree-level and next-to-leading order dif- ferential cross sections, and their matching to parton shower simulations

    J. Alwall et al. “The automated computation of tree-level and next-to-leading order dif- ferential cross sections, and their matching to parton shower simulations”. In:JHEP07 (2014), p. 079.doi:10 . 1007 / JHEP07(2014 ) 079. arXiv:1405 . 0301 [hep-ph].url: https://doi.org/10.1007/JHEP07(2014)079

  6. [6]

    Challenges in Monte Carlo Event Generator Software for High- Luminosity LHC

    Simone Amoroso et al. “Challenges in Monte Carlo Event Generator Software for High- Luminosity LHC”. In:Comput. Softw. Big Sci.5.1 (2021). Ed. by Andrea Valassi, Efe Yazgan, and Josh McFayden, p. 12.doi:10.1007/s41781-021-00055-1. arXiv:2004. 13687 [hep-ph].url:https://doi.org/10.1007/s41781-021-00055-1

  7. [7]

    Demonstration of FPGA Acceleration of Monte Carlo Simulation

    M. Barbone et al. “Demonstration of FPGA Acceleration of Monte Carlo Simulation”. In: J. Phys. Conf. Ser.2438.1 (2023), p. 012023.doi:10.1088/1742-6596/2438/1/012023. url:https://doi.org/10.1088/1742-6596/2438/1/012023

  8. [8]

    Albouy, et al., Eur

    Stefano Carrazza et al. “MadFlow: automating Monte Carlo simulation on GPU for par- ticle physics processes”. In:Eur. Phys. J. C81.7 (2021), p. 656.doi:10.1140/epjc/ s10052-021-09443-8. arXiv:2106.10279 [physics.comp-ph].url:https://doi.org/ 10.1140/epjc/s10052-021-09443-8

  9. [9]

    Fast inference of deep neural networks in FPGAs for particle physics

    Javier Duarte et al. “Fast inference of deep neural networks in FPGAs for particle physics”. In:JINST13.07 (2018), P07027.doi:10.1088/1748-0221/13/07/P07027. arXiv:1804. 06913 [physics.ins-det].url:https://doi.org/10.1088/1748-0221/13/07/P07027

  10. [10]

    The automation of next-to-leading order electroweak calculations

    R. Frederix et al. “The automation of next-to-leading order electroweak calculations”. In: JHEP07 (2018). [Erratum: JHEP 11, 085 (2021)], p. 185.doi:10.1007/JHEP11(2021)

  11. [11]

    arXiv:1804.10017 [hep-ph].url:https://doi.org/10.1007/JHEP11(2021)085

  12. [12]

    Porting MADGRAPH to FPGA Using High-Level Syn- thesis (HLS)

    H´ ector Guti´ errez Arance et al. “Porting MADGRAPH to FPGA Using High-Level Syn- thesis (HLS)”. In:Particles8.3 (2025), p. 63.doi:10 . 3390 / particles8030063.url: https://doi.org/10.3390/particles8030063

  13. [13]

    Data-parallel leading-order event generation in MadGraph5aMC@NLO

    Stephan Hageb¨ ock et al. “Data-parallel leading-order event generation in MadGraph5aMC@NLO”. In: (2025). arXiv:2507.21039 [hep-ph].url:https://arxiv.org/abs/2507.21039

  14. [14]

    Madgraph5 aMC@NLO on GPUs and vector CPUs Experience with the first alpha release

    Stephan Hageboeck et al. “Madgraph5 aMC@NLO on GPUs and vector CPUs Experience with the first alpha release”. In:EPJ Web Conf.295 (2024), p. 11013.doi:10.1051/ epjconf/202429511013. arXiv:2312.02898 [physics.comp-ph].url:https://doi. org/10.1051/epjconf/202429511013

  15. [15]

    Accessed 2026-04-10

    Intel Corporation.Running Average Power Limit Energy Reporting. Accessed 2026-04-10. 2022.url:https : / / www . intel . com / content / www / us / en / developer / articles / technical / software - security - guidance / advisory - guidance / running - average - power-limit-energy-reporting.html

  16. [16]

    A New Monte Carlo Treatment of Multiparticle Phase Space at High Energies

    R. Kleiss, W. J. Stirling, and S. D. Ellis. “A New Monte Carlo Treatment of Multiparticle Phase Space at High Energies”. In:Comput. Phys. Commun.40 (1986), pp. 359–373. doi:10 . 1016 / 0010 - 4655(86 ) 90119 - 0.url:https : / / doi . org / 10 . 1016 / 0010 - 4655(86)90119-0

  17. [17]

    Harnessing hardware acceleration in high- energy physics through high-level synthesis techniques

    Pelayo Leguina L´ opez and Santiago Folgueras. “Harnessing hardware acceleration in high- energy physics through high-level synthesis techniques”. In:Frontiers in Detector Science and Technology2 (2025), p. 1502834.doi:10.3389/fdest.2024.1502834.url:https: //doi.org/10.3389/fdest.2024.1502834

  18. [18]

    Improving colour computations in MadGraph5 aMC@NLO and exploring a 1/N c expansion

    Andrew Lifson and Olivier Mattelaer. “Improving colour computations in MadGraph5 aMC@NLO and exploring a 1/N c expansion”. In:Eur. Phys. J. C82 (2022), p. 1144.doi:10.1140/ epjc / s10052 - 022 - 11078 - 2.url:https : / / doi . org / 10 . 1140 / epjc / s10052 - 022 - 11078-2. 24

  19. [19]

    Murayama, I

    H. Murayama, I. Watanabe, and K. Hagiwara.HELAS: HELicity amplitude subroutines for Feynman diagram evaluations. 1992.url:https://cp3.irmp.ucl.ac.be/projects/ madgraph/raw-attachment/wiki/ManualAndHelp/HELAS_reference.pdf

  20. [20]

    Accessed 2026-04-10

    NVIDIA Corporation.NVIDIA System Management Interface. Accessed 2026-04-10. 2025. url:https://docs.nvidia.com/deploy/nvidia-smi/

  21. [21]

    Speeding up Madgraph5 aMC@NLO through CPU vectorization and GPU offloading: towards a first alpha release

    A. Valassi et al. “Speeding up Madgraph5 aMC@NLO through CPU vectorization and GPU offloading: towards a first alpha release”. In:21th International Workshop on Ad- vanced Computing and Analysis Techniques in Physics Research: AI meets Reality. Mar

  22. [22]

    18244 [physics.comp-ph].url:https : / / arxiv

    arXiv:2303 . 18244 [physics.comp-ph].url:https : / / arxiv . org / abs / 2303 . 18244

  23. [23]

    Design and engineering of a simplified workflow execution for the MG5aMC event generator on GPUs and vector CPUs

    Andrea Valassi et al. “Design and engineering of a simplified workflow execution for the MG5aMC event generator on GPUs and vector CPUs”. In:EPJ Web Conf.251 (2021), p. 03045.doi:10.1051/epjconf/202125103045. arXiv:2106.12631 [physics.comp-ph]. url:https://doi.org/10.1051/epjconf/202125103045

  24. [24]

    Madgraph on GPUs and vector CPUs: Towards production. The 5- year journey to the first LO release CUDACPP v1.00.00

    Andrea Valassi et al. “Madgraph on GPUs and vector CPUs: Towards production. The 5- year journey to the first LO release CUDACPP v1.00.00”. In:EPJ Web Conf.337 (2025), p. 01021.doi:10.1051/epjconf/202533701021. arXiv:2503.21935 [physics.comp-ph]. url:https://doi.org/10.1051/epjconf/202533701021

  25. [25]

    Madgraph5 aMC@NLO on GPUs and vector CPUs Experience with the first alpha release

    Zenny Wettersten et al. “Acceleration beyond lowest order event generation. An outlook on further parallelism within MadGraph5 aMC@NLO”. In:EPJ Web Conf.295 (2024), p. 10001.doi:10.1051/epjconf/202429510001. arXiv:2312.07440 [physics.comp-ph]. url:https://doi.org/10.1051/epjconf/202429510001. 25