Learning to Trigger: Reinforcement Learning at the Large Hadron Collider

Abhijith Gandrakota; Cecilia Tosciri; Christian Herwig; David W. Miller; Giovanna Salvi; Jennifer Ngadiuba; Nhan Tran; Shaghayegh Emami; Yuxin Chen; Zixin Ding

arxiv: 2606.23993 · v3 · pith:S3ELCLAInew · submitted 2026-06-22 · 💻 cs.LG · cs.AI· hep-ex

Learning to Trigger: Reinforcement Learning at the Large Hadron Collider

Zixin Ding , Shaghayegh Emami , Giovanna Salvi , Cecilia Tosciri , Abhijith Gandrakota , Jennifer Ngadiuba , Nhan Tran , Christian Herwig

show 2 more authors

David W. Miller Yuxin Chen

This is my paper

Pith reviewed 2026-06-30 10:18 UTC · model grok-4.3

classification 💻 cs.LG cs.AIhep-ex

keywords reinforcement learningtrigger systemsLarge Hadron Collideronline controlanomaly detectionthreshold tuningCMS experimentMonte Carlo simulation

0 comments

The pith

A reinforcement learning agent adjusts LHC trigger thresholds in real time and transfers from simulation to real collision data without retraining.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper frames online tuning of trigger thresholds as a sequential decision problem in which an agent reads streaming summaries of background rates and signal features and adjusts thresholds to maximize signal efficiency while keeping the background rate inside a target tolerance band. It adapts Group-Filtered Policy Optimization with feasibility constraints and tests the resulting agents on two triggers: a total transverse energy trigger sensitive to pileup and an anomaly-detection trigger based on reconstruction loss. On Monte Carlo streams the agent raises the fraction of in-tolerance intervals by 48 percent for the energy trigger and 28 percent for the anomaly trigger, with up to 2 percent additional signal efficiency on those intervals. The identical agent, applied unchanged to real CMS Run 283408 data, produces 56 percent and 28 percent gains in in-tolerance time together with further efficiency improvements. A reader would care because current triggers are static and hand-tuned, so any automatic method that maintains bandwidth limits while capturing more signal events directly increases the physics output of the collider.

Core claim

By casting threshold tuning as a streaming control task and adapting Group-Filtered Policy Optimization with two feasibility-enforcing variants, the authors show that the learned agent increases the fraction of time background rates remain inside tolerance by 48 percent (HT) and 28 percent (AD) on Monte Carlo streams and by 56 percent (HT) and 28 percent (AD) on real CMS Run 283408 data, while also raising signal efficiency on the in-tolerance intervals, constituting the first reported demonstration of reinforcement-learning trigger control on actual Large Hadron Collider collision data.

What carries the argument

Group-Filtered Policy Optimization (GFPO) agent adapted for streaming control that ingests recent rate and feature summaries to update trigger thresholds while enforcing background-rate feasibility.

If this is right

Signal efficiency rises by up to 2 percent on the intervals where background rate stays inside tolerance.
The same policy transfers directly to real data without any fine-tuning step.
The approach works for both a pileup-sensitive total-energy trigger and an anomaly-detection trigger.
Feasibility constraints introduced in the GFPO-F and GFPO-FR variants keep the background rate inside the allowed band during training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continuous online adaptation could reduce the frequency of manual retuning campaigns during long LHC running periods.
Extending the same streaming formulation to additional trigger types could produce coordinated optimization across an entire trigger menu.
If the observed sim-to-real transfer holds across multiple runs and years, the method may tolerate the gradual changes in detector response and beam conditions that occur in practice.
Analogous streaming reinforcement-learning controllers could be examined in other high-rate scientific instruments that must filter data under strict bandwidth and latency limits.

Load-bearing premise

Monte Carlo streams used for training accurately capture the statistical properties and drift patterns of the real collision data the agent will meet at runtime.

What would settle it

Applying the trained agent to a new real collision run and measuring that the in-tolerance time fraction falls to or below the level achieved by the static baselines.

Figures

Figures reproduced from arXiv: 2606.23993 by Abhijith Gandrakota, Cecilia Tosciri, Christian Herwig, David W. Miller, Giovanna Salvi, Jennifer Ngadiuba, Nhan Tran, Shaghayegh Emami, Yuxin Chen, Zixin Ding.

**Figure 1.** Figure 1: Sensitivity Analysis of Reward Components for HT trigger (20% MC). Each point represents a (λ1, λ2) configuration from Equation 2, with concave hulls connecting the upper envelope per method. The x-axis measures the fraction of chunks whose background rate falls within the tolerance band, and the y-axis measures overall signal efficiency. Our methods (GFPO-F and GFPOFR) collapse to a tighter cluster in th… view at source ↗

**Figure 2.** Figure 2: GRPO’s group-feasibility failure on HT (latter 20% of MC, G = 16) averaged across 3 seeds. (a) Candidate background rates; green/red denote feasible/infeasible w.r.t. the tolerance band. (b) Per-step feasible fraction ft = nfeas/G (run mean ⟨ft⟩ = 0.58). ft=0 on 30.8% of steps, so the group has no in-band sample and GRPO reinforces the least-infeasible (still out-of-band) action. AD trigger is in [PITH_FU… view at source ↗

**Figure 3.** Figure 3: Background rates for HT (evaluated on 20% MC). Our methods (GFPO-F and GFPOFR) concentrate the background rate inside the [90, 110] kHz tolerance band around the 100 kHz target, while the baselines spread well outside it. GFPO-F sits on target (µ = 100.0 kHz) with the tighter distribution (σ = 2.0 kHz). GFPO-FR runs at a slightly higher mean (µ = 106.1 kHz) with a marginally wider spread (σ = 3.4 kHz). Di… view at source ↗

**Figure 4.** Figure 4: Average step composition for HT (K=16, G=64 for GFPO, G=K=16 for GRPO). Each step is classified by |Ft| relative to K: Pure feasible (|Ft| ≥ K, kept set Kt ⊆ Ft), Padded (0 < |Ft| < K, kept set mixes feasible and out-of-band candidates), Zero feasibility (Ft = ∅). AD trigger is in [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Anomaly detection results on UNSW-NB15 (left) and NAB (right). [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: Background scores drift over time for both triggers for MC. Running mean (solid), median [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

**Figure 7.** Figure 7: Background scores drift over time for both triggers for CMS Run 283408. Running mean [PITH_FULL_IMAGE:figures/full_fig_p028_7.png] view at source ↗

**Figure 8.** Figure 8: Signal-background discrimination for the [PITH_FULL_IMAGE:figures/full_fig_p030_8.png] view at source ↗

**Figure 9.** Figure 9: Signal-background discrimination for the [PITH_FULL_IMAGE:figures/full_fig_p031_9.png] view at source ↗

**Figure 10.** Figure 10: (a) GFPO-F for AD trigger (CMS Run 283408) (b) GFPO-FR for AD trigger (CMS Run [PITH_FULL_IMAGE:figures/full_fig_p032_10.png] view at source ↗

**Figure 11.** Figure 11: Sensitivity Analysis of Reward Components (MC). Each point represents a (λ1, λ2) configuration from Equation 2, with concave hulls connecting the upper envelope per method. The x-axis measures the fraction of chunks whose background rate falls within the tolerance band, and the y-axis measures overall signal efficiency. Across all four trigger, signal combinations (HT /AD × tt¯/h → 4b), baseline methods (… view at source ↗

**Figure 12.** Figure 12: DSPOT calibration-deployment mismatch: the GPD is fitted on 50 calibration chunks [PITH_FULL_IMAGE:figures/full_fig_p039_12.png] view at source ↗

**Figure 13.** Figure 13: Zero-feasible step fraction for GRPO and L-GRPO (AD trigger). (a) Mean zerofeasible fraction across all chunks. On MC data, approximately 20% of micro-steps yield no rate-feasible candidate from the sampled group of G actions for both GRPO and L-GRPO. On CMS real data this rises to ∼61% for both methods. (b) Per-chunk trajectory over time. The two curves are nearly indistinguishable in both datasets, con… view at source ↗

**Figure 14.** Figure 14: Dual variable λt and in-band rate over chunks for L-GRPO (AD trigger). (a) On MC data, λt decreases monotonically from its initial value of 0.25 as the policy stays in-band (∼96% of chunks). (b) On CMS real collision data, λt fluctuates but remains roughly constant (≈ 0.25) while the in-band rate oscillates around 54%. Shaded regions mark chunks where the in-band rate is below 50%. Despite λt responding t… view at source ↗

**Figure 15.** Figure 15: Background rate trajectory on CMS real data (AD trigger, MC-trained frozen policy). Grey band: ±τ tolerance around the target r ∗ B. GFPO-F and GFPO-FR track within the band for most of the 73-chunk deployment window. GRPO and L-GRPO sit near or below the lower band edge throughout, despite L-GRPO’s dual variable adapting concurrently ( [PITH_FULL_IMAGE:figures/full_fig_p041_15.png] view at source ↗

**Figure 16.** Figure 16: Zero-feasibility fraction of GRPO vs. group size G. Fraction of training steps on which every rollout violates the rate budget, for AD and HT triggers. Even at G = 256, 24–34% of steps yield no feasible sample, so the group-relative baseline collapses to an uninformative signal on a constant fraction of updates. Scaling G alone cannot recover constraint satisfaction, motivating the constraint-aware varian… view at source ↗

**Figure 17.** Figure 17: GRPO group-feasibility failure (AD trigger, MC, G=16); appendix complement to [PITH_FULL_IMAGE:figures/full_fig_p043_17.png] view at source ↗

**Figure 18.** Figure 18: Step composition for the AD trigger; Complement to [PITH_FULL_IMAGE:figures/full_fig_p044_18.png] view at source ↗

**Figure 19.** Figure 19: Look-ahead rates (p1, p2) and survival ratios (r1, r2) over time, evaluated at each window’s natural operating point θ = P99.75 so that p0 ≈ r ∗ B by construction. The four scalars give the agent a discrete forward-rate map of the tail at θ + ∆ and θ + 2∆ (with ∆ = 1 GeV on HT and ∆ = 0.5 on AD), complementing the local sensitivity probe ∂r/∂θ. Their drift over the run motivates including (p1, p2, r1, r2)… view at source ↗

**Figure 20.** Figure 20: Non-stationarity of trigger thresholds under pileup drift (MC). [PITH_FULL_IMAGE:figures/full_fig_p047_20.png] view at source ↗

**Figure 21.** Figure 21: Npv (Number of Primary Vertices) distribution over time (MC). 47 [PITH_FULL_IMAGE:figures/full_fig_p047_21.png] view at source ↗

**Figure 22.** Figure 22: Exponential Moving Average (EMA) of the fractional rate error at different smoothing [PITH_FULL_IMAGE:figures/full_fig_p048_22.png] view at source ↗

**Figure 23.** Figure 23: Sensitivity probe over time (MC) [PITH_FULL_IMAGE:figures/full_fig_p048_23.png] view at source ↗

**Figure 24.** Figure 24: Near-threshold occupancy over time (MC). [PITH_FULL_IMAGE:figures/full_fig_p048_24.png] view at source ↗

**Figure 25.** Figure 25: Background trigger rates under DQN for (a) [PITH_FULL_IMAGE:figures/full_fig_p049_25.png] view at source ↗

**Figure 26.** Figure 26: Background rates distribution for (a) HT trigger, (b) AD trigger. Our methods (GFPO-F and GFPO-FR) have the highest in-band rate of all established baselines. GFPO-FR (106.1 kHz) operates nearer the band edge for signal compared to GFPO (100.0 kHz). Signal efficiency over time on MC. Emami et al. [5] showcases that adaptive thresholding recovers signal efficiency as a run progresses, while a static menu s… view at source ↗

**Figure 27.** Figure 27: Background rates under Constant Menu, PID loop, GRPO, GFPO-F and GFPO-FR (MC) [PITH_FULL_IMAGE:figures/full_fig_p050_27.png] view at source ↗

**Figure 28.** Figure 28: Signal efficiency over time under Constant Menu, PID loop, GRPO, GFPO-F and GFPO [PITH_FULL_IMAGE:figures/full_fig_p051_28.png] view at source ↗

**Figure 29.** Figure 29: Background trigger rates under MC→CMS transfer ( [PITH_FULL_IMAGE:figures/full_fig_p052_29.png] view at source ↗

**Figure 30.** Figure 30: Background trigger rates under MC→CMS transfer with test-time training (Table 12). Per-step rates on CMS Run 283408 with policies trained on MC and updated online during the run on real collision data (one gradient step per chunk), for the (a) HT trigger and (b) AD trigger. G Ablation study on noisy anomaly scores In HEP anomaly detection, a long line of work has explicitly highlighted the sensitivity of… view at source ↗

**Figure 31.** Figure 31: UNSW-NB15 TPR by attack difficulty and adaptation regime. For each controller we report TPR on two attack classes: Exploits (frequent, high-signal; easy) and Backdoors (rare, lowsignal; hard). Bars span {Deployment, Test-time training} × {Exploits, Backdoors}: solid = frozen policy, hatched = online adaptation; high opacity = Exploits, low opacity = Backdoors. GFPO-F and GFPO-FR (ours) maintain the highe… view at source ↗

**Figure 32.** Figure 32: NAB precision and recall per method (mean over 24 test files). Solid bars: precision; hatched bars: recall. Methods sorted left-to-right by ascending F1. All controllers operate on the same per-chunk anomaly scores under the same FAR budget. GFPO-F and GFPO-FR (ours) dominate the Pareto frontier: precision ≈ 28% and recall ≈ 42%. Classical controllers (Constant, PID, DSPOT) reach 11–20% precision but reca… view at source ↗

read the original abstract

High-throughput scientific facilities such as the Large Hadron Collider depend on real-time event filtering (\textit{triggering}) under tight constraints on bandwidth, latency, and storage. In practice, trigger menus are largely static and hand-tuned and can become suboptimal as detector conditions, pileup, and background composition drift over time. We cast online threshold tuning as a sequential decision-making problem: a reinforcement learning agent ingests streaming summaries of recent rates and signal-sensitive features and updates trigger thresholds to maximize signal efficiency while tracking a target background rate within a tolerance band. We adapt Group-Filtered Policy Optimization (GFPO) to streaming control and introduce two variants (GFPO-F, GFPO-FR) that enforce background rate feasibility during training. On a benchmark that emulates realistic collider operation, we study two representative triggers: a total transverse energy ($H_{T}$) trigger sensitive to pileup variation, and an anomaly-detection (AD) trigger based on reconstruction loss for rare or non-standard signatures. On Monte Carlo streams, our agent increases the fraction of in-tolerance time intervals by 48\% ($H_T$) and 28\% (AD), with a cumulative gain of up to 2\% in signal efficiency on those in-tolerance intervals. Transferring from simulation to \emph{real} collision data (CMS Run 283408), the same agent, without fine-tuning, achieves a 56\% ($H_T$) and 28\% (AD) in-tolerance improvement over baselines, with further signal-efficiency gain on both triggers. To our knowledge, this is the \emph{first} demonstration of RL-based trigger control on real Large Hadron Collider collision data. Code is available at https://github.com/Zixind/GFPO_LHC (see repo for details).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives the first reported RL trigger tuning on real CMS collision data with zero-shot transfer gains, but the sim-to-real result rests on an unverified match between MC and the test run.

read the letter

The two things to know are that this is the first claimed demonstration of RL-based online threshold tuning on actual LHC data, and that the same agent trained only on Monte Carlo transfers to CMS Run 283408 without fine-tuning, raising the fraction of in-tolerance intervals by 56% on the HT trigger and 28% on the anomaly-detection trigger.

The setup is sensible. Framing threshold adjustment as a streaming control task with rate summaries and signal features as state makes sense for drifting collider conditions. Adapting GFPO with the two feasibility variants to keep background rates inside the band during training is a direct way to handle the hard constraint. Running the same agent on both emulated streams and real data, plus the code release, gives readers something concrete to examine.

The main weakness is the transfer claim. The abstract supplies no alignment numbers between the Monte Carlo used for training and the specific real run—no rate histograms, pileup spectra, or feature-distribution distances. Without those, the observed gains could come from partial overlap rather than policy robustness, exactly as the stress-test note flags. The performance figures also lack error bars, baseline definitions, or any statistical comparison, so the size of the improvement is hard to judge from the given text.

This is for the HEP trigger community and for people working on constrained online RL in high-throughput settings. A reader who needs practical methods for maintaining efficiency under drift will find the benchmark and the feasibility handling worth looking at.

Send it to peer review. The real-data result is new enough to merit referee time even though the transfer evidence needs more support.

Referee Report

2 major / 1 minor

Summary. The paper casts LHC trigger threshold tuning as a streaming RL control problem and adapts Group-Filtered Policy Optimization (GFPO) with feasibility constraints (GFPO-F, GFPO-FR). On Monte Carlo streams the agent raises the fraction of in-tolerance intervals by 48% (HT) and 28% (AD) while adding up to 2% signal efficiency. The same policy, without fine-tuning, is reported to deliver 56% (HT) and 28% (AD) in-tolerance gains on real CMS Run 283408 collision data, presented as the first RL-based trigger demonstration on actual LHC data. Code is released.

Significance. If the sim-to-real transfer result is robust, the work would show that RL can maintain trigger performance under realistic drift without manual retuning, a practical advance for high-throughput scientific facilities. The open-source release and the explicit zero-shot transfer experiment are concrete strengths that aid reproducibility and allow independent verification.

major comments (2)

[Abstract] Abstract (transfer paragraph): the central claim of 56% (HT) and 28% (AD) in-tolerance improvement on CMS Run 283408 without fine-tuning rests on the unverified assumption that Monte Carlo training streams match the statistical properties of the real data. No distributional alignment metrics (rate histograms, pileup spectra, or feature-space distances) are supplied to support this match.
[Abstract] Abstract: the reported performance numbers are given without error bars, confidence intervals, or any statistical test against the baselines. This omission directly affects the ability to judge whether the claimed gains are distinguishable from run-to-run variability.

minor comments (1)

[Abstract] The abstract introduces GFPO-F and GFPO-FR but does not indicate where in the manuscript the precise feasibility constraints or the group-filtering mechanism are defined.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed review and constructive comments. We address each major point below and commit to revisions that strengthen the manuscript without altering its core claims.

read point-by-point responses

Referee: [Abstract] Abstract (transfer paragraph): the central claim of 56% (HT) and 28% (AD) in-tolerance improvement on CMS Run 283408 without fine-tuning rests on the unverified assumption that Monte Carlo training streams match the statistical properties of the real data. No distributional alignment metrics (rate histograms, pileup spectra, or feature-space distances) are supplied to support this match.

Authors: We agree that explicit distributional alignment metrics would strengthen the sim-to-real transfer claim. Although the Monte Carlo samples were generated with standard CMS tuning procedures to reproduce observed pileup and rate distributions, the current manuscript does not include side-by-side histograms or distance measures. In the revision we will add rate histograms, pileup spectra, and feature-space distances (e.g., Wasserstein or MMD) between the training streams and Run 283408, together with a short discussion of residual mismatches and their expected impact on policy transfer. revision: yes
Referee: [Abstract] Abstract: the reported performance numbers are given without error bars, confidence intervals, or any statistical test against the baselines. This omission directly affects the ability to judge whether the claimed gains are distinguishable from run-to-run variability.

Authors: The referee correctly notes the absence of uncertainty quantification. The reported percentages are point estimates obtained from single long streams. In the revised manuscript we will recompute all in-tolerance and efficiency figures with bootstrap confidence intervals derived from multiple independent rollouts (both in simulation and on the real run) and will include paired statistical tests against the baseline controllers to assess whether the observed gains exceed run-to-run variability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RL results on held-out MC and real data streams

full rationale

The paper reports an RL agent (GFPO variants) trained on Monte Carlo streams and evaluated zero-shot on held-out MC and real CMS Run 283408 data. Performance metrics (in-tolerance fraction, signal efficiency) are direct empirical comparisons against baselines. No equations, fitted parameters, or self-citation chains are shown that reduce the reported gains to inputs defined inside the paper. The derivation chain consists of standard RL adaptation plus experimental benchmarks; results remain falsifiable against external data streams.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that streaming rate summaries are informative enough for policy learning and that the GFPO variants enforce feasibility without introducing new fitted constants beyond standard RL hyperparameters.

free parameters (1)

GFPO learning-rate and feasibility parameters
Standard RL hyperparameters and the two feasibility variants (GFPO-F, GFPO-FR) are introduced but their specific values are not enumerated in the abstract.

axioms (1)

domain assumption Streaming summaries of recent rates and signal-sensitive features contain sufficient information to decide threshold updates that track a target background rate.
The agent architecture is defined to ingest exactly these summaries.

pith-pipeline@v0.9.1-grok · 5901 in / 1212 out tokens · 32607 ms · 2026-06-30T10:18:13.039251+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

83 extracted references · 14 canonical work pages · 5 internal anchors

[1]

Summary of the trigger systems of the large hadron collider experiments alice, atlas, cms and lhcb.Journal of Physics G: Nuclear and Particle Physics, 52(3):030501, 2025

Johannes Albrecht, Leon Bozianu, Lukas Calefice, Sofia Cella, CE Cocha Toapaxi, Caterina Doglioni, VV Gligorov, James Andrew Gooding, Kaare Endrup Iversen, Patin Inkaew, et al. Summary of the trigger systems of the large hadron collider experiments alice, atlas, cms and lhcb.Journal of Physics G: Nuclear and Particle Physics, 52(3):030501, 2025

2025
[2]

Towards an inter- pretable data-driven trigger system for high-throughput physics facilities.arXiv preprint arXiv:2104.06622, 2021

Chinmaya Mahesh, Kristin Dona, David W Miller, and Yuxin Chen. Towards an inter- pretable data-driven trigger system for high-throughput physics facilities.arXiv preprint arXiv:2104.06622, 2021

work page arXiv 2021
[3]

The cms high level trigger.The European Physical Journal C-Particles and Fields, 46(3):605–667, 2006

CMS collaboration. The cms high level trigger.The European Physical Journal C-Particles and Fields, 46(3):605–667, 2006

2006
[4]

Performance of the atlas level-1 topological trigger in run 2.The European Physical Journal C, 82(1):7, 2022

Georges Aad, Brad Abbott, Dale Charles Abbott, A Abed Abud, Kira Abeling, Deshan Kavishka Abhayasinghe, Syed Haider Abidi, OS AbouZeid, NL Abraham, Halina Abramowicz, et al. Performance of the atlas level-1 topological trigger in run 2.The European Physical Journal C, 82(1):7, 2022

2022
[5]

Miller, Jennifer Ngadiuba, and Nhan Tran

Shaghayegh Emami, Cecilia Tosciri, Giovanna Salvi, Zixin Ding, Yuxin Chen, Abhijith Gan- drakota, Christian Herwig, David W. Miller, Jennifer Ngadiuba, and Nhan Tran. Towards a self-driving trigger at the LHC: Adaptive response in real time.Machine Learning: Science and Technology, 2026. doi: 10.1088/2632-2153/ae631f. URL https://iopscience.iop.org/ artic...

work page doi:10.1088/2632-2153/ae631f 2026
[6]

An automated bandwidth division for the lhcb upgrade trigger.Computing and Software for Big Science, 9(1):7, 2025

Timothy Evans, Conor Fitzpatrick, and Joshua Horswill. An automated bandwidth division for the lhcb upgrade trigger.Computing and Software for Big Science, 9(1):7, 2025

2025
[7]

Description and performance of track and primary-vertex reconstruc- tion with the cms tracker.Journal of Instrumentation, 9(10):P10009–P10009, 2014

CMS collaboration et al. Description and performance of track and primary-vertex reconstruc- tion with the cms tracker.Journal of Instrumentation, 9(10):P10009–P10009, 2014

2014
[8]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998

1998
[9]

Deep recurrent q-learning for partially observable mdps

Matthew J Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. InAAAI fall symposia, volume 45, page 141, 2015

2015
[10]

Dream to control: Learning behaviors by latent imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representa- tions, 2020. URLhttps://openreview.net/forum?id=S1lOTC4tDS

2020
[11]

Thickbrick: optimal event selection and categorization in high energy physics

Konstantin T Matchev and Prasanth Shyamsundar. Thickbrick: optimal event selection and categorization in high energy physics. part i. signal discovery.Journal of High Energy Physics, 2021(3):291, 2021

2021
[12]

Sequential anomaly detection using inverse reinforcement learning

Min-hwan Oh and Garud Iyengar. Sequential anomaly detection using inverse reinforcement learning. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & data mining, pages 1480–1490, 2019

2019
[13]

Adt: Time series anomaly detection for cyber-physical systems via deep reinforcement learning.Computers & Security, 141:103825, 2024

Xue Yang, Enda Howley, and Michael Schukat. Adt: Time series anomaly detection for cyber-physical systems via deep reinforcement learning.Computers & Security, 141:103825, 2024

2024
[14]

Variational autoencoders for new physics mining at the large hadron collider.Journal of High Energy Physics, 2019(5):1–29, 2019

Olmo Cerri, Thong Q Nguyen, Maurizio Pierini, Maria Spiropulu, and Jean-Roch Vlimant. Variational autoencoders for new physics mining at the large hadron collider.Journal of High Energy Physics, 2019(5):1–29, 2019

2019
[15]

The atlas run-3 trigger menu

Sofia Cella. The atlas run-3 trigger menu. Technical report, ATL-COM-DAQ-2024-077, 2024

2024
[16]

About CMS Open Data, 2024

CMS Collaboration. About CMS Open Data, 2024. URL https://opendata.cern.ch/ docs/about-cms. Accessed: March 9, 2025. 13

2024
[17]

Sample more to think less: Group filtered policy optimization for concise reasoning

Vaishnavi Shrivastava, Ahmed Hassan Awadallah, Vidhisha Balachandran, Shivam Garg, Harki- rat Behl, and Dimitris Papailiopoulos. Sample more to think less: Group filtered policy optimization for concise reasoning. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=UKOqoULbZS

2026
[18]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Alphaflow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning.Nature Communications, 14(1):1403, 2023

Amanda A V olk, Robert W Epps, Daniel T Yonemoto, Benjamin S Masters, Felix N Castellano, Kristofer G Reyes, and Milad Abolhasani. Alphaflow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning.Nature Communications, 14(1):1403, 2023

2023
[20]

Reinforcement learning-trained optimisers and bayesian optimisation for online particle accelerator tuning

Jan Kaiser, Chenran Xu, Annika Eichler, Andrea Santamaria Garcia, Oliver Stein, Erik Brün- dermann, Willi Kuropka, Hannes Dinter, Frank Mayet, Thomas Vinatier, et al. Reinforcement learning-trained optimisers and bayesian optimisation for online particle accelerator tuning. Scientific reports, 14(1):15733, 2024

2024
[21]

Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602(7897): 414–419, 2022

Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al. Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602(7897): 414–419, 2022

2022
[22]

Meta-aad: Active anomaly detection with deep reinforcement learning

Daochen Zha, Kwei-Herng Lai, Mingyang Wan, and Xia Hu. Meta-aad: Active anomaly detection with deep reinforcement learning. In2020 IEEE International Conference on Data Mining (ICDM), pages 771–780. IEEE, 2020

2020
[23]

Anomaly detection in streams with extreme value theory

Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouet. Anomaly detection in streams with extreme value theory. InProceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1067–1075, 2017

2017
[24]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[25]

Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model? InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang. Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model? InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=4OsgYD7em5

2026
[26]

Ekaterina Govorkova, Ema Puljak, Thea Aarrestad, Thomas James, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Nicolo Ghielmetti, Maksymilian Graczyk, Sioni Summers, et al. Autoencoders on field-programmable gate arrays for real-time, unsupervised new physics detection at 40 mhz at the large hadron collider.Nature Machine Intelligence, 4(2):154–161, 2022

2022
[27]

A primer on reinforcement learning in medicine for clinicians.NPJ digital medicine, 7(1):337, 2024

Pushkala Jayaraman, Jacob Desman, Moein Sabounchi, Girish N Nadkarni, and Ankit Sakhuja. A primer on reinforcement learning in medicine for clinicians.NPJ digital medicine, 7(1):337, 2024

2024
[28]

Smooth imitation learning for online sequence prediction

Hoang Le, Andrew Kang, Yisong Yue, and Peter Carr. Smooth imitation learning for online sequence prediction. InInternational Conference on Machine Learning, pages 680–688. PMLR, 2016

2016
[29]

Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

2015
[30]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[31]

Constrained policy optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained policy optimization. InInternational conference on machine learning, pages 22–31. Pmlr, 2017. 14

2017
[32]

Performance of the atlas trigger system in 2015.The European Physical Journal C, 77(5):1–53, 2017

Morad Aaboud, Georges Aad, Brad Abbott, Jalal Abdallah, Baptiste Abeloos, Rosemarie Aben, OS AbouZeid, NL Abraham, Halina Abramowicz, Henso Abreu, et al. Performance of the atlas trigger system in 2015.The European Physical Journal C, 77(5):1–53, 2017

2015
[33]

Operation of the atlas trigger system in run 2.Journal of Instrumenta- tion, 15(10):P10004–P10004, 2020

Atlas Collaboration et al. Operation of the atlas trigger system in run 2.Journal of Instrumenta- tion, 15(10):P10004–P10004, 2020

2020
[34]

Technical Design Report for the Phase-II Upgrade of the ATLAS Tile Calorimeter

ATLAS Collaboration. Technical Design Report for the Phase-II Upgrade of the ATLAS Tile Calorimeter. Technical report, CERN, Geneva, 2017. URL https://cds.cern.ch/record/ 2285583

2017
[35]

Safe reinforcement learning via shielding

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. InProceedings of the AAAI conference on artificial intelligence, 2018

2018
[36]

A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks

Murad Dawood, Ahmed Shokry, and Maren Bennewitz. A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks. In7th Annual Learning for Dynamics & Control Conference, pages 686–697. PMLR, 2025

2025
[37]

What matters in on-policy reinforcement learning? a large-scale empirical study.arXiv preprint arXiv:2006.05990, 2020

Marcin Andrychowicz, Anton Raichuk, Piotr Sta´nczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, et al. What matters in on-policy reinforcement learning? a large-scale empirical study.arXiv preprint arXiv:2006.05990, 2020

work page arXiv 2006
[38]

Turn-ppo: Turn-level advantage estimation with ppo for improved multi-turn rl in agentic llms

Junbo Li, Peng Zhou, Rui Meng, Meet P Vadera, Lihong Li, and Yang Li. Turn-ppo: Turn-level advantage estimation with ppo for improved multi-turn rl in agentic llms. InFindings of the Association for Computational Linguistics: EACL 2026, pages 6227–6243, 2026

2026
[39]

Actor-critic algorithms.Advances in neural information processing systems, 12, 1999

Vijay Konda and John Tsitsiklis. Actor-critic algorithms.Advances in neural information processing systems, 12, 1999

1999
[40]

High- dimensional continuous control using generalized advantage estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High- dimensional continuous control using generalized advantage estimation. InProceedings of the International Conference on Learning Representations (ICLR), 2016

2016
[41]

Time limits in reinforcement learning

Fabio Pardo, Arash Tavakoli, Vitaly Levdik, and Petar Kormushev. Time limits in reinforcement learning. InInternational Conference on Machine Learning, pages 4045–4054. PMLR, 2018

2018
[42]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[43]

The dependence of effective planning horizon on model accuracy

Nan Jiang, Alex Kulesza, Satinder Singh, and Richard Lewis. The dependence of effective planning horizon on model accuracy. InProceedings of the 2015 international conference on autonomous agents and multiagent systems, pages 1181–1189, 2015

2015
[44]

Routledge, 2021

Eitan Altman.Constrained Markov decision processes. Routledge, 2021

2021
[45]

Mankowitz, and Shie Mannor

Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. Reward constrained policy optimization. InInternational Conference on Learning Representations, 2019. URL https://openreview. net/forum?id=SkfrvsA9FX

2019
[46]

Responsive safety in reinforcement learning by pid lagrangian methods

Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational conference on machine learning, pages 9133–9143. PMLR, 2020

2020
[47]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. InInternational conference on machine learning, pages 1889–1897. PMLR, 2015

2015
[48]

Lhc data storage: Preparing for the challenges of run-3

Maria Arsuaga-Rios, Vladimír Bahyl, Manuel Batalha, Cédric Caffy, Eric Cano, Niccolo Capitoni, Cristian Contescu, Michael Davis, David Fernandez Alvarez, Jaroslav Guenther, et al. Lhc data storage: Preparing for the challenges of run-3. InEPJ Web of Conferences, volume 251, page 02023. EDP Sciences, 2021. 15

2021
[49]

Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set)

Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In2015 Military Communications and Information Systems Conference (MilCIS), pages 1–6. IEEE, 2015

2015
[50]

Evaluating real-time anomaly detection algorithms–the numenta anomaly benchmark

Alexander Lavin and Subutai Ahmad. Evaluating real-time anomaly detection algorithms–the numenta anomaly benchmark. In2015 IEEE 14th international conference on machine learning and applications (ICMLA), pages 38–44. IEEE, 2015

2015
[51]

The atlas trigger system for lhc run 3 and trigger performance in 2022.Journal of Instrumentation, 19(06): P06029, 2024

Georges Aad, Erlend Aakvaag, B Abbott, Kira Abeling, Nils Julius Abicht, SH Abidi, Asmaa Aboulhorma, Halina Abramowicz, Henso Abreu, Yiming Abulaiti, et al. The atlas trigger system for lhc run 3 and trigger performance in 2022.Journal of Instrumentation, 19(06): P06029, 2024

2022
[52]

A comparison of cpu and gpu implementations for the lhcb experiment run 3 trigger.Computing and Software for Big Science, 6(1):1, 2022

Roel Aaij, Marta Adinolfi, Salvatore Aiola, S Akar, Johannes Albrecht, M Alexander, S Amato, Yasmine Amhis, F Archilli, M Bala, et al. A comparison of cpu and gpu implementations for the lhcb experiment run 3 trigger.Computing and Software for Big Science, 6(1):1, 2022

2022
[53]

Review of machine learning for real-time analysis at the large hadron collider experiments alice, atlas, cms and lhcb.arXiv preprint arXiv:2506.14578, 2025

Laura Boggia, Carlos Cocha, Fotis Giasemis, Joachim Hansen, Patin Inkaew, Kaare Endrup Iversen, Pratik Jawahar, Henrique Pineiro Monteagudo, Micol Olocco, Sten Astrand, et al. Review of machine learning for real-time analysis at the large hadron collider experiments alice, atlas, cms and lhcb.arXiv preprint arXiv:2506.14578, 2025

work page arXiv 2025
[54]

Top-down design of protein architectures with reinforcement learning.Science, 380(6642):266–273, 2023

Isaac D Lutz, Shunzhi Wang, Christoffer Norn, Alexis Courbet, Andrew J Borst, Yan Ting Zhao, Annie Dosey, Longxing Cao, Jinwei Xu, Elizabeth M Leaf, et al. Top-down design of protein architectures with reinforcement learning.Science, 380(6642):266–273, 2023

2023
[55]

Ex- tending group relative policy optimization to continuous control: A theoretical framework for robotic reinforcement learning.arXiv preprint arXiv:2507.19555, 2025

Rajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti, and Satyasaran Changdar. Ex- tending group relative policy optimization to continuous control: A theoretical framework for robotic reinforcement learning.arXiv preprint arXiv:2507.19555, 2025

work page arXiv 2025
[56]

Anomaly transformer: Time series anomaly detection with association discrepancy

Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Anomaly transformer: Time series anomaly detection with association discrepancy. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=LzQQ89U1qm_

2022
[57]

Lhc machine.Journal of instrumentation, 3(08):S08001– S08001, 2008

Lyndon Evans and Philip Bryant. Lhc machine.Journal of instrumentation, 3(08):S08001– S08001, 2008

2008
[58]

Physics guided rnns for modeling dynamical systems: A case study in simulating lake temperature profiles

Xiaowei Jia, Jared Willard, Anuj Karpatne, Jordan Read, Jacob Zwart, Michael Steinbach, and Vipin Kumar. Physics guided rnns for modeling dynamical systems: A case study in simulating lake temperature profiles. InProceedings of the 2019 SIAM international conference on data mining, pages 558–566. SIAM, 2019

2019
[59]

Physics- guided neural networks (pgnn): An application in lake temperature modeling

Arka Daw, Anuj Karpatne, William D Watkins, Jordan S Read, and Vipin Kumar. Physics- guided neural networks (pgnn): An application in lake temperature modeling. InKnowledge guided machine learning, pages 353–372. Chapman and Hall/CRC, 2022

2022
[60]

Physics-informed recurrent neural network for time dynamics in optical resonances.Nature computational science, 2(3):169–178, 2022

Yingheng Tang, Jichao Fan, Xinwei Li, Jianzhu Ma, Minghao Qi, Cunxi Yu, and Weilu Gao. Physics-informed recurrent neural network for time dynamics in optical resonances.Nature computational science, 2(3):169–178, 2022

2022
[61]

Near-optimal reinforcement learning in polynomial time

Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2):209–232, 2002

2002
[62]

An analysis of model-based interval estimation for markov decision processes.Journal of Computer and System Sciences, 74(8):1309–1331, 2008

Alexander L Strehl and Michael L Littman. An analysis of model-based interval estimation for markov decision processes.Journal of Computer and System Sciences, 74(8):1309–1331, 2008

2008
[63]

The role of baselines in policy gradient optimization.Advances in Neural Information Processing Systems, 35:17818–17830, 2022

Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, and Dale Schuur- mans. The role of baselines in policy gradient optimization.Advances in Neural Information Processing Systems, 35:17818–17830, 2022

2022
[64]

Demystifying group relative policy optimization: Its policy gradient is a u-statistic.arXiv preprint arXiv:2603.01162,

Hongyi Zhou, Kai Ye, Erhan Xu, Jin Zhu, Ying Yang, Shijin Gong, and Chengchun Shi. Demystifying group relative policy optimization: Its policy gradient is a u-statistic.arXiv preprint arXiv:2603.01162, 2026. 16

work page arXiv 2026
[65]

Simulated dataset tttohadronic_tunecp5_13tev-powheg-pythia8 in miniaod- sim format for 2016 collision data

CMS Collaboration. Simulated dataset tttohadronic_tunecp5_13tev-powheg-pythia8 in miniaod- sim format for 2016 collision data. CERN Open Data Portal, 2024. URL https://opendata. cern.ch/record/67840. Data recorded in 2016 and published in 2024

2016
[66]

About cms

CERN Open Data Portal. About cms. https://opendata.cern.ch/docs/about-cms,
[67]

Accessed: 2026-01-03

2026
[68]

Deciphering the nature of the Higgs sector, volume 2

Daniel de Florian, Christophe Grojean, Fabio Maltoni, C Mariotti, A Nikitenko, M Pieri, P Savard, M Schumacher, R Aggleton, M Ahmad, et al.CERN Yellow Reports: Monographs, Vol 2 (2017): Handbook of LHC Higgs cross sections: 4. Deciphering the nature of the Higgs sector, volume 2. Cern, 2017

2017
[69]

Morad Aaboud, G Aad, B Abbott, B Abeloos, DK Abhayasinghe, SH Abidi, OS AbouZeid, NL Abraham, H Abramowicz, H Abreu, et al. Search for the higgs boson produced in association with a vector boson and decaying into two spin-zero particles in the h→aa→4b channel in pp collisions at √s= 13 tev with the atlas detector.Journal of High Energy Physics, 2018(10): ...

2018
[70]

Trigger throttling system for cms daq

A Racz. Trigger throttling system for cms daq. Technical report, CERN, 2000. URL https: //cds.cern.ch/record/479701

2000
[71]

Operation of the upgraded atlas central trigger processor during the lhc run 2.Journal of Instrumentation, 11(02):C02020–C02020, 2016

Henrik Bertelsen, G Carrillo Montoya, P-O Deviveiros, T Eifert, G Galster, J Glatzer, S Haas, A Marzin, MV Silva Oliveira, T Pauly, et al. Operation of the upgraded atlas central trigger processor during the lhc run 2.Journal of Instrumentation, 11(02):C02020–C02020, 2016

2016
[72]

Long short-term memory.Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012

Alex Graves. Long short-term memory.Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012

2012
[73]

Gate-variants of gated recurrent unit (gru) neural networks

Rahul Dey and Fathi M Salem. Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pages 1597–1600. IEEE, 2017

2017
[74]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017
[75]

Time2Vec: Learning a Vector Representation of Time

Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907
[76]

Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting, 37(4): 1748–1764, 2021

Bryan Lim, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting, 37(4): 1748–1764, 2021

2021
[77]

Dinamo: Dynamic and inter- pretable anomaly monitoring for large-scale particle physics experiments.Machine Learning: Science and Technology, 6(3):035050, 2025

Arsenii Gavrikov, Julián García Pardiñas, and Alberto Garfagnini. Dinamo: Dynamic and inter- pretable anomaly monitoring for large-scale particle physics experiments.Machine Learning: Science and Technology, 6(3):035050, 2025

2025
[78]

Real-time anomaly detection at the l1 trigger of cms experiment.arXiv preprint arXiv:2411.19506, 2024

Abhijith Gandrakota. Real-time anomaly detection at the l1 trigger of cms experiment.arXiv preprint arXiv:2411.19506, 2024

work page arXiv 2024
[79]

2024 Data Collected with AXOL1TL Anomaly Detection at the CMS Level- 1 Trigger

CMS Collaboration. 2024 Data Collected with AXOL1TL Anomaly Detection at the CMS Level- 1 Trigger. Technical report, CERN, 2024. URLhttps://cds.cern.ch/record/2904695

work page arXiv 2024
[80]

Testing a neural network for anomaly detection in the cms global trigger test crate during run 3.Journal of Instrumentation, 19(03):C03029, 2024

Noah Zipper and CMS collaboration. Testing a neural network for anomaly detection in the cms global trigger test crate during run 3.Journal of Instrumentation, 19(03):C03029, 2024

2024

Showing first 80 references.

[1] [1]

Summary of the trigger systems of the large hadron collider experiments alice, atlas, cms and lhcb.Journal of Physics G: Nuclear and Particle Physics, 52(3):030501, 2025

Johannes Albrecht, Leon Bozianu, Lukas Calefice, Sofia Cella, CE Cocha Toapaxi, Caterina Doglioni, VV Gligorov, James Andrew Gooding, Kaare Endrup Iversen, Patin Inkaew, et al. Summary of the trigger systems of the large hadron collider experiments alice, atlas, cms and lhcb.Journal of Physics G: Nuclear and Particle Physics, 52(3):030501, 2025

2025

[2] [2]

Towards an inter- pretable data-driven trigger system for high-throughput physics facilities.arXiv preprint arXiv:2104.06622, 2021

Chinmaya Mahesh, Kristin Dona, David W Miller, and Yuxin Chen. Towards an inter- pretable data-driven trigger system for high-throughput physics facilities.arXiv preprint arXiv:2104.06622, 2021

work page arXiv 2021

[3] [3]

The cms high level trigger.The European Physical Journal C-Particles and Fields, 46(3):605–667, 2006

CMS collaboration. The cms high level trigger.The European Physical Journal C-Particles and Fields, 46(3):605–667, 2006

2006

[4] [4]

Performance of the atlas level-1 topological trigger in run 2.The European Physical Journal C, 82(1):7, 2022

Georges Aad, Brad Abbott, Dale Charles Abbott, A Abed Abud, Kira Abeling, Deshan Kavishka Abhayasinghe, Syed Haider Abidi, OS AbouZeid, NL Abraham, Halina Abramowicz, et al. Performance of the atlas level-1 topological trigger in run 2.The European Physical Journal C, 82(1):7, 2022

2022

[5] [5]

Miller, Jennifer Ngadiuba, and Nhan Tran

Shaghayegh Emami, Cecilia Tosciri, Giovanna Salvi, Zixin Ding, Yuxin Chen, Abhijith Gan- drakota, Christian Herwig, David W. Miller, Jennifer Ngadiuba, and Nhan Tran. Towards a self-driving trigger at the LHC: Adaptive response in real time.Machine Learning: Science and Technology, 2026. doi: 10.1088/2632-2153/ae631f. URL https://iopscience.iop.org/ artic...

work page doi:10.1088/2632-2153/ae631f 2026

[6] [6]

An automated bandwidth division for the lhcb upgrade trigger.Computing and Software for Big Science, 9(1):7, 2025

Timothy Evans, Conor Fitzpatrick, and Joshua Horswill. An automated bandwidth division for the lhcb upgrade trigger.Computing and Software for Big Science, 9(1):7, 2025

2025

[7] [7]

Description and performance of track and primary-vertex reconstruc- tion with the cms tracker.Journal of Instrumentation, 9(10):P10009–P10009, 2014

CMS collaboration et al. Description and performance of track and primary-vertex reconstruc- tion with the cms tracker.Journal of Instrumentation, 9(10):P10009–P10009, 2014

2014

[8] [8]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, Cambridge, MA, 1998

1998

[9] [9]

Deep recurrent q-learning for partially observable mdps

Matthew J Hausknecht and Peter Stone. Deep recurrent q-learning for partially observable mdps. InAAAI fall symposia, volume 45, page 141, 2015

2015

[10] [10]

Dream to control: Learning behaviors by latent imagination

Danijar Hafner, Timothy Lillicrap, Jimmy Ba, and Mohammad Norouzi. Dream to control: Learning behaviors by latent imagination. InInternational Conference on Learning Representa- tions, 2020. URLhttps://openreview.net/forum?id=S1lOTC4tDS

2020

[11] [11]

Thickbrick: optimal event selection and categorization in high energy physics

Konstantin T Matchev and Prasanth Shyamsundar. Thickbrick: optimal event selection and categorization in high energy physics. part i. signal discovery.Journal of High Energy Physics, 2021(3):291, 2021

2021

[12] [12]

Sequential anomaly detection using inverse reinforcement learning

Min-hwan Oh and Garud Iyengar. Sequential anomaly detection using inverse reinforcement learning. InProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & data mining, pages 1480–1490, 2019

2019

[13] [13]

Adt: Time series anomaly detection for cyber-physical systems via deep reinforcement learning.Computers & Security, 141:103825, 2024

Xue Yang, Enda Howley, and Michael Schukat. Adt: Time series anomaly detection for cyber-physical systems via deep reinforcement learning.Computers & Security, 141:103825, 2024

2024

[14] [14]

Variational autoencoders for new physics mining at the large hadron collider.Journal of High Energy Physics, 2019(5):1–29, 2019

Olmo Cerri, Thong Q Nguyen, Maurizio Pierini, Maria Spiropulu, and Jean-Roch Vlimant. Variational autoencoders for new physics mining at the large hadron collider.Journal of High Energy Physics, 2019(5):1–29, 2019

2019

[15] [15]

The atlas run-3 trigger menu

Sofia Cella. The atlas run-3 trigger menu. Technical report, ATL-COM-DAQ-2024-077, 2024

2024

[16] [16]

About CMS Open Data, 2024

CMS Collaboration. About CMS Open Data, 2024. URL https://opendata.cern.ch/ docs/about-cms. Accessed: March 9, 2025. 13

2024

[17] [17]

Sample more to think less: Group filtered policy optimization for concise reasoning

Vaishnavi Shrivastava, Ahmed Hassan Awadallah, Vidhisha Balachandran, Shivam Garg, Harki- rat Behl, and Dimitris Papailiopoulos. Sample more to think less: Group filtered policy optimization for concise reasoning. InThe Fourteenth International Conference on Learning Representations, 2026. URLhttps://openreview.net/forum?id=UKOqoULbZS

2026

[18] [18]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Alphaflow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning.Nature Communications, 14(1):1403, 2023

Amanda A V olk, Robert W Epps, Daniel T Yonemoto, Benjamin S Masters, Felix N Castellano, Kristofer G Reyes, and Milad Abolhasani. Alphaflow: autonomous discovery and optimization of multi-step chemistry using a self-driven fluidic lab guided by reinforcement learning.Nature Communications, 14(1):1403, 2023

2023

[20] [20]

Reinforcement learning-trained optimisers and bayesian optimisation for online particle accelerator tuning

Jan Kaiser, Chenran Xu, Annika Eichler, Andrea Santamaria Garcia, Oliver Stein, Erik Brün- dermann, Willi Kuropka, Hannes Dinter, Frank Mayet, Thomas Vinatier, et al. Reinforcement learning-trained optimisers and bayesian optimisation for online particle accelerator tuning. Scientific reports, 14(1):15733, 2024

2024

[21] [21]

Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602(7897): 414–419, 2022

Jonas Degrave, Federico Felici, Jonas Buchli, Michael Neunert, Brendan Tracey, Francesco Carpanese, Timo Ewalds, Roland Hafner, Abbas Abdolmaleki, Diego de Las Casas, et al. Magnetic control of tokamak plasmas through deep reinforcement learning.Nature, 602(7897): 414–419, 2022

2022

[22] [22]

Meta-aad: Active anomaly detection with deep reinforcement learning

Daochen Zha, Kwei-Herng Lai, Mingyang Wan, and Xia Hu. Meta-aad: Active anomaly detection with deep reinforcement learning. In2020 IEEE International Conference on Data Mining (ICDM), pages 771–780. IEEE, 2020

2020

[23] [23]

Anomaly detection in streams with extreme value theory

Alban Siffer, Pierre-Alain Fouque, Alexandre Termier, and Christine Largouet. Anomaly detection in streams with extreme value theory. InProceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining, pages 1067–1075, 2017

2017

[24] [24]

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

Qiying Yu, Zheng Zhang, Ruofei Zhu, Yufeng Yuan, Xiaochen Zuo, Yu Yue, Weinan Dai, Tiantian Fan, Gaohong Liu, Lingjun Liu, et al. Dapo: An open-source llm reinforcement learning system at scale.arXiv preprint arXiv:2503.14476, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[25] [25]

Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model? InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

Yang Yue, Zhiqi Chen, Rui Lu, Andrew Zhao, Zhaokai Wang, Yang Yue, Shiji Song, and Gao Huang. Does reinforcement learning really incentivize reasoning capacity in LLMs beyond the base model? InThe Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026. URLhttps://openreview.net/forum?id=4OsgYD7em5

2026

[26] [26]

Ekaterina Govorkova, Ema Puljak, Thea Aarrestad, Thomas James, Vladimir Loncar, Maurizio Pierini, Adrian Alan Pol, Nicolo Ghielmetti, Maksymilian Graczyk, Sioni Summers, et al. Autoencoders on field-programmable gate arrays for real-time, unsupervised new physics detection at 40 mhz at the large hadron collider.Nature Machine Intelligence, 4(2):154–161, 2022

2022

[27] [27]

A primer on reinforcement learning in medicine for clinicians.NPJ digital medicine, 7(1):337, 2024

Pushkala Jayaraman, Jacob Desman, Moein Sabounchi, Girish N Nadkarni, and Ankit Sakhuja. A primer on reinforcement learning in medicine for clinicians.NPJ digital medicine, 7(1):337, 2024

2024

[28] [28]

Smooth imitation learning for online sequence prediction

Hoang Le, Andrew Kang, Yisong Yue, and Peter Carr. Smooth imitation learning for online sequence prediction. InInternational Conference on Machine Learning, pages 680–688. PMLR, 2016

2016

[29] [29]

Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. Human-level control through deep reinforcement learning.nature, 518(7540):529–533, 2015

2015

[30] [30]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[31] [31]

Constrained policy optimization

Joshua Achiam, David Held, Aviv Tamar, and Pieter Abbeel. Constrained policy optimization. InInternational conference on machine learning, pages 22–31. Pmlr, 2017. 14

2017

[32] [32]

Performance of the atlas trigger system in 2015.The European Physical Journal C, 77(5):1–53, 2017

Morad Aaboud, Georges Aad, Brad Abbott, Jalal Abdallah, Baptiste Abeloos, Rosemarie Aben, OS AbouZeid, NL Abraham, Halina Abramowicz, Henso Abreu, et al. Performance of the atlas trigger system in 2015.The European Physical Journal C, 77(5):1–53, 2017

2015

[33] [33]

Operation of the atlas trigger system in run 2.Journal of Instrumenta- tion, 15(10):P10004–P10004, 2020

Atlas Collaboration et al. Operation of the atlas trigger system in run 2.Journal of Instrumenta- tion, 15(10):P10004–P10004, 2020

2020

[34] [34]

Technical Design Report for the Phase-II Upgrade of the ATLAS Tile Calorimeter

ATLAS Collaboration. Technical Design Report for the Phase-II Upgrade of the ATLAS Tile Calorimeter. Technical report, CERN, Geneva, 2017. URL https://cds.cern.ch/record/ 2285583

2017

[35] [35]

Safe reinforcement learning via shielding

Mohammed Alshiekh, Roderick Bloem, Rüdiger Ehlers, Bettina Könighofer, Scott Niekum, and Ufuk Topcu. Safe reinforcement learning via shielding. InProceedings of the AAAI conference on artificial intelligence, 2018

2018

[36] [36]

A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks

Murad Dawood, Ahmed Shokry, and Maren Bennewitz. A dynamic safety shield for safe and efficient reinforcement learning of navigation tasks. In7th Annual Learning for Dynamics & Control Conference, pages 686–697. PMLR, 2025

2025

[37] [37]

What matters in on-policy reinforcement learning? a large-scale empirical study.arXiv preprint arXiv:2006.05990, 2020

Marcin Andrychowicz, Anton Raichuk, Piotr Sta´nczyk, Manu Orsini, Sertan Girgin, Raphael Marinier, Léonard Hussenot, Matthieu Geist, Olivier Pietquin, Marcin Michalski, et al. What matters in on-policy reinforcement learning? a large-scale empirical study.arXiv preprint arXiv:2006.05990, 2020

work page arXiv 2006

[38] [38]

Turn-ppo: Turn-level advantage estimation with ppo for improved multi-turn rl in agentic llms

Junbo Li, Peng Zhou, Rui Meng, Meet P Vadera, Lihong Li, and Yang Li. Turn-ppo: Turn-level advantage estimation with ppo for improved multi-turn rl in agentic llms. InFindings of the Association for Computational Linguistics: EACL 2026, pages 6227–6243, 2026

2026

[39] [39]

Actor-critic algorithms.Advances in neural information processing systems, 12, 1999

Vijay Konda and John Tsitsiklis. Actor-critic algorithms.Advances in neural information processing systems, 12, 1999

1999

[40] [40]

High- dimensional continuous control using generalized advantage estimation

John Schulman, Philipp Moritz, Sergey Levine, Michael Jordan, and Pieter Abbeel. High- dimensional continuous control using generalized advantage estimation. InProceedings of the International Conference on Learning Representations (ICLR), 2016

2016

[41] [41]

Time limits in reinforcement learning

Fabio Pardo, Arash Tavakoli, Vitaly Levdik, and Petar Kormushev. Time limits in reinforcement learning. InInternational Conference on Machine Learning, pages 4045–4054. PMLR, 2018

2018

[42] [42]

DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[43] [43]

The dependence of effective planning horizon on model accuracy

Nan Jiang, Alex Kulesza, Satinder Singh, and Richard Lewis. The dependence of effective planning horizon on model accuracy. InProceedings of the 2015 international conference on autonomous agents and multiagent systems, pages 1181–1189, 2015

2015

[44] [44]

Routledge, 2021

Eitan Altman.Constrained Markov decision processes. Routledge, 2021

2021

[45] [45]

Mankowitz, and Shie Mannor

Chen Tessler, Daniel J. Mankowitz, and Shie Mannor. Reward constrained policy optimization. InInternational Conference on Learning Representations, 2019. URL https://openreview. net/forum?id=SkfrvsA9FX

2019

[46] [46]

Responsive safety in reinforcement learning by pid lagrangian methods

Adam Stooke, Joshua Achiam, and Pieter Abbeel. Responsive safety in reinforcement learning by pid lagrangian methods. InInternational conference on machine learning, pages 9133–9143. PMLR, 2020

2020

[47] [47]

Trust region policy optimization

John Schulman, Sergey Levine, Pieter Abbeel, Michael Jordan, and Philipp Moritz. Trust region policy optimization. InInternational conference on machine learning, pages 1889–1897. PMLR, 2015

2015

[48] [48]

Lhc data storage: Preparing for the challenges of run-3

Maria Arsuaga-Rios, Vladimír Bahyl, Manuel Batalha, Cédric Caffy, Eric Cano, Niccolo Capitoni, Cristian Contescu, Michael Davis, David Fernandez Alvarez, Jaroslav Guenther, et al. Lhc data storage: Preparing for the challenges of run-3. InEPJ Web of Conferences, volume 251, page 02023. EDP Sciences, 2021. 15

2021

[49] [49]

Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set)

Nour Moustafa and Jill Slay. Unsw-nb15: a comprehensive data set for network intrusion detection systems (unsw-nb15 network data set). In2015 Military Communications and Information Systems Conference (MilCIS), pages 1–6. IEEE, 2015

2015

[50] [50]

Evaluating real-time anomaly detection algorithms–the numenta anomaly benchmark

Alexander Lavin and Subutai Ahmad. Evaluating real-time anomaly detection algorithms–the numenta anomaly benchmark. In2015 IEEE 14th international conference on machine learning and applications (ICMLA), pages 38–44. IEEE, 2015

2015

[51] [51]

The atlas trigger system for lhc run 3 and trigger performance in 2022.Journal of Instrumentation, 19(06): P06029, 2024

Georges Aad, Erlend Aakvaag, B Abbott, Kira Abeling, Nils Julius Abicht, SH Abidi, Asmaa Aboulhorma, Halina Abramowicz, Henso Abreu, Yiming Abulaiti, et al. The atlas trigger system for lhc run 3 and trigger performance in 2022.Journal of Instrumentation, 19(06): P06029, 2024

2022

[52] [52]

A comparison of cpu and gpu implementations for the lhcb experiment run 3 trigger.Computing and Software for Big Science, 6(1):1, 2022

Roel Aaij, Marta Adinolfi, Salvatore Aiola, S Akar, Johannes Albrecht, M Alexander, S Amato, Yasmine Amhis, F Archilli, M Bala, et al. A comparison of cpu and gpu implementations for the lhcb experiment run 3 trigger.Computing and Software for Big Science, 6(1):1, 2022

2022

[53] [53]

Review of machine learning for real-time analysis at the large hadron collider experiments alice, atlas, cms and lhcb.arXiv preprint arXiv:2506.14578, 2025

Laura Boggia, Carlos Cocha, Fotis Giasemis, Joachim Hansen, Patin Inkaew, Kaare Endrup Iversen, Pratik Jawahar, Henrique Pineiro Monteagudo, Micol Olocco, Sten Astrand, et al. Review of machine learning for real-time analysis at the large hadron collider experiments alice, atlas, cms and lhcb.arXiv preprint arXiv:2506.14578, 2025

work page arXiv 2025

[54] [54]

Top-down design of protein architectures with reinforcement learning.Science, 380(6642):266–273, 2023

Isaac D Lutz, Shunzhi Wang, Christoffer Norn, Alexis Courbet, Andrew J Borst, Yan Ting Zhao, Annie Dosey, Longxing Cao, Jinwei Xu, Elizabeth M Leaf, et al. Top-down design of protein architectures with reinforcement learning.Science, 380(6642):266–273, 2023

2023

[55] [55]

Ex- tending group relative policy optimization to continuous control: A theoretical framework for robotic reinforcement learning.arXiv preprint arXiv:2507.19555, 2025

Rajat Khanda, Mohammad Baqar, Sambuddha Chakrabarti, and Satyasaran Changdar. Ex- tending group relative policy optimization to continuous control: A theoretical framework for robotic reinforcement learning.arXiv preprint arXiv:2507.19555, 2025

work page arXiv 2025

[56] [56]

Anomaly transformer: Time series anomaly detection with association discrepancy

Jiehui Xu, Haixu Wu, Jianmin Wang, and Mingsheng Long. Anomaly transformer: Time series anomaly detection with association discrepancy. InInternational Conference on Learning Representations, 2022. URLhttps://openreview.net/forum?id=LzQQ89U1qm_

2022

[57] [57]

Lhc machine.Journal of instrumentation, 3(08):S08001– S08001, 2008

Lyndon Evans and Philip Bryant. Lhc machine.Journal of instrumentation, 3(08):S08001– S08001, 2008

2008

[58] [58]

Physics guided rnns for modeling dynamical systems: A case study in simulating lake temperature profiles

Xiaowei Jia, Jared Willard, Anuj Karpatne, Jordan Read, Jacob Zwart, Michael Steinbach, and Vipin Kumar. Physics guided rnns for modeling dynamical systems: A case study in simulating lake temperature profiles. InProceedings of the 2019 SIAM international conference on data mining, pages 558–566. SIAM, 2019

2019

[59] [59]

Physics- guided neural networks (pgnn): An application in lake temperature modeling

Arka Daw, Anuj Karpatne, William D Watkins, Jordan S Read, and Vipin Kumar. Physics- guided neural networks (pgnn): An application in lake temperature modeling. InKnowledge guided machine learning, pages 353–372. Chapman and Hall/CRC, 2022

2022

[60] [60]

Physics-informed recurrent neural network for time dynamics in optical resonances.Nature computational science, 2(3):169–178, 2022

Yingheng Tang, Jichao Fan, Xinwei Li, Jianzhu Ma, Minghao Qi, Cunxi Yu, and Weilu Gao. Physics-informed recurrent neural network for time dynamics in optical resonances.Nature computational science, 2(3):169–178, 2022

2022

[61] [61]

Near-optimal reinforcement learning in polynomial time

Michael Kearns and Satinder Singh. Near-optimal reinforcement learning in polynomial time. Machine learning, 49(2):209–232, 2002

2002

[62] [62]

An analysis of model-based interval estimation for markov decision processes.Journal of Computer and System Sciences, 74(8):1309–1331, 2008

Alexander L Strehl and Michael L Littman. An analysis of model-based interval estimation for markov decision processes.Journal of Computer and System Sciences, 74(8):1309–1331, 2008

2008

[63] [63]

The role of baselines in policy gradient optimization.Advances in Neural Information Processing Systems, 35:17818–17830, 2022

Jincheng Mei, Wesley Chung, Valentin Thomas, Bo Dai, Csaba Szepesvari, and Dale Schuur- mans. The role of baselines in policy gradient optimization.Advances in Neural Information Processing Systems, 35:17818–17830, 2022

2022

[64] [64]

Demystifying group relative policy optimization: Its policy gradient is a u-statistic.arXiv preprint arXiv:2603.01162,

Hongyi Zhou, Kai Ye, Erhan Xu, Jin Zhu, Ying Yang, Shijin Gong, and Chengchun Shi. Demystifying group relative policy optimization: Its policy gradient is a u-statistic.arXiv preprint arXiv:2603.01162, 2026. 16

work page arXiv 2026

[65] [65]

Simulated dataset tttohadronic_tunecp5_13tev-powheg-pythia8 in miniaod- sim format for 2016 collision data

CMS Collaboration. Simulated dataset tttohadronic_tunecp5_13tev-powheg-pythia8 in miniaod- sim format for 2016 collision data. CERN Open Data Portal, 2024. URL https://opendata. cern.ch/record/67840. Data recorded in 2016 and published in 2024

2016

[66] [66]

About cms

CERN Open Data Portal. About cms. https://opendata.cern.ch/docs/about-cms,

[67] [67]

Accessed: 2026-01-03

2026

[68] [68]

Deciphering the nature of the Higgs sector, volume 2

Daniel de Florian, Christophe Grojean, Fabio Maltoni, C Mariotti, A Nikitenko, M Pieri, P Savard, M Schumacher, R Aggleton, M Ahmad, et al.CERN Yellow Reports: Monographs, Vol 2 (2017): Handbook of LHC Higgs cross sections: 4. Deciphering the nature of the Higgs sector, volume 2. Cern, 2017

2017

[69] [69]

Morad Aaboud, G Aad, B Abbott, B Abeloos, DK Abhayasinghe, SH Abidi, OS AbouZeid, NL Abraham, H Abramowicz, H Abreu, et al. Search for the higgs boson produced in association with a vector boson and decaying into two spin-zero particles in the h→aa→4b channel in pp collisions at √s= 13 tev with the atlas detector.Journal of High Energy Physics, 2018(10): ...

2018

[70] [70]

Trigger throttling system for cms daq

A Racz. Trigger throttling system for cms daq. Technical report, CERN, 2000. URL https: //cds.cern.ch/record/479701

2000

[71] [71]

Operation of the upgraded atlas central trigger processor during the lhc run 2.Journal of Instrumentation, 11(02):C02020–C02020, 2016

Henrik Bertelsen, G Carrillo Montoya, P-O Deviveiros, T Eifert, G Galster, J Glatzer, S Haas, A Marzin, MV Silva Oliveira, T Pauly, et al. Operation of the upgraded atlas central trigger processor during the lhc run 2.Journal of Instrumentation, 11(02):C02020–C02020, 2016

2016

[72] [72]

Long short-term memory.Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012

Alex Graves. Long short-term memory.Supervised sequence labelling with recurrent neural networks, pages 37–45, 2012

2012

[73] [73]

Gate-variants of gated recurrent unit (gru) neural networks

Rahul Dey and Fathi M Salem. Gate-variants of gated recurrent unit (gru) neural networks. In 2017 IEEE 60th international midwest symposium on circuits and systems (MWSCAS), pages 1597–1600. IEEE, 2017

2017

[74] [74]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

2017

[75] [75]

Time2Vec: Learning a Vector Representation of Time

Seyed Mehran Kazemi, Rishab Goel, Sepehr Eghbali, Janahan Ramanan, Jaspreet Sahota, Sanjay Thakur, Stella Wu, Cathal Smyth, Pascal Poupart, and Marcus Brubaker. Time2vec: Learning a vector representation of time.arXiv preprint arXiv:1907.05321, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1907

[76] [76]

Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting, 37(4): 1748–1764, 2021

Bryan Lim, Sercan Ö Arık, Nicolas Loeff, and Tomas Pfister. Temporal fusion transformers for interpretable multi-horizon time series forecasting.International journal of forecasting, 37(4): 1748–1764, 2021

2021

[77] [77]

Dinamo: Dynamic and inter- pretable anomaly monitoring for large-scale particle physics experiments.Machine Learning: Science and Technology, 6(3):035050, 2025

Arsenii Gavrikov, Julián García Pardiñas, and Alberto Garfagnini. Dinamo: Dynamic and inter- pretable anomaly monitoring for large-scale particle physics experiments.Machine Learning: Science and Technology, 6(3):035050, 2025

2025

[78] [78]

Real-time anomaly detection at the l1 trigger of cms experiment.arXiv preprint arXiv:2411.19506, 2024

Abhijith Gandrakota. Real-time anomaly detection at the l1 trigger of cms experiment.arXiv preprint arXiv:2411.19506, 2024

work page arXiv 2024

[79] [79]

2024 Data Collected with AXOL1TL Anomaly Detection at the CMS Level- 1 Trigger

CMS Collaboration. 2024 Data Collected with AXOL1TL Anomaly Detection at the CMS Level- 1 Trigger. Technical report, CERN, 2024. URLhttps://cds.cern.ch/record/2904695

work page arXiv 2024

[80] [80]

Testing a neural network for anomaly detection in the cms global trigger test crate during run 3.Journal of Instrumentation, 19(03):C03029, 2024

Noah Zipper and CMS collaboration. Testing a neural network for anomaly detection in the cms global trigger test crate during run 3.Journal of Instrumentation, 19(03):C03029, 2024

2024