AgentRivet: an automated system for producing Rivet routines from journal publications

Andrew D. Pilkington; Antonio J. Costa; Caterina Doglioni; Christian G\"utschow; Sukanya Sinha

arxiv: 2606.13535 · v3 · pith:HEFH5CPCnew · submitted 2026-06-11 · ✦ hep-ex · cs.AI· hep-ph

AgentRivet: an automated system for producing Rivet routines from journal publications

Antonio J. Costa , Caterina Doglioni , Christian G\"utschow , Andrew D. Pilkington , Sukanya Sinha This is my paper

Pith reviewed 2026-06-27 04:48 UTC · model grok-4.3

classification ✦ hep-ex cs.AIhep-ph

keywords Rivet routinesanalysis preservationLarge Language Modelscollider measurementsautomated code generationfiducial observablesMonte Carlo comparison

0 comments

The pith

Large language models can extract analysis details from papers and generate working Rivet routines with reasonable fidelity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an automated workflow that reads published collider measurements and produces the code libraries needed to compare new theoretical predictions against those measurements. Only 39 percent of such measurements currently have public Rivet routines, leaving many results inaccessible for model testing. The workflow proceeds through extraction of observables and cuts, code writing, and separate reviews of syntax and physics content. When applied to two recent measurements, the generated routines showed few syntax errors and generally reproduced the physics described in the source papers, although implementation problems still occurred in some cases.

Core claim

AgentRivet is a multi-step workflow based on Large Language Models that extracts the physics analysis information from published papers and writes the missing Rivet routines, with intermediate code- and physics-reviews as part of an autonomous quality control. Tests on two measurements show that the system produces competent Rivet routines with few syntax errors. The physics fidelity of the routines is reasonable and follows the explanations given in the relevant publications, although physics-implementation issues arise mainly from subtle ambiguities in the source papers.

What carries the argument

The multi-step LLM workflow that extracts observables, cuts and fiducial regions from papers, writes Rivet code, and performs autonomous code and physics reviews before output.

If this is right

Rivet coverage can rise above the current 39 percent of measurements.
New Monte Carlo models can be tested against a larger set of preserved analyses without manual coding.
Searches for physics beyond the Standard Model gain access to additional fiducial measurements.
Analysis preservation becomes less dependent on individual authors writing routines after publication.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may allow routine generation to keep pace with the growing volume of published measurements.
Ambiguous wording in papers will continue to require targeted human checks even after automation.
The workflow could be extended to flag papers whose definitions are too vague for reliable implementation.
Integration into journal submission processes might encourage clearer reporting of analysis details.

Load-bearing premise

Published papers contain sufficiently unambiguous definitions of observables, cuts and fiducial regions that LLMs can extract and implement them correctly without human clarification.

What would settle it

Running the generated Rivet routines on the original datasets and comparing the resulting distributions directly to the published measurement results would show whether the physics implementation is correct.

Figures

Figures reproduced from arXiv: 2606.13535 by Andrew D. Pilkington, Antonio J. Costa, Caterina Doglioni, Christian G\"utschow, Sukanya Sinha.

**Figure 1.** Figure 1: Summary of the AGENTRIVET workflow. The black arrows represent the transfer of information between different steps in the pipeline. The dotted lines represent the storage to Memory. complementary perspectives and produce structured review objects describing issues and suggested corrections. The resulting review feedback is subsequently incorporated into the next iteration of the CODER prompt, allowing th… view at source ↗

**Figure 2.** Figure 2: Normalised differential cross-sections as a function of thrust (a,c,e) and [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Differential cross sections as a function of [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

read the original abstract

Particle physics collider experiments provide Rivet routines as part of the analysis preservation strategy for model-independent measurements. Rivet is a C++ toolkit that allow new theoretical models to be compared to the measurements, thus aiding the development and tuning of Monte Carlo event generators as well as searches for physics beyond the Standard Model. However, analysis coverage is known to be incomplete, with only 39% of measurements having documented and publicly available Rivet routines. In this article, we design and implement an automated workflow based on Large Language Models with the goal of providing the missing routines. This multi-step workflow, referred to as AgentRivet, extracts the physics analysis information from published papers and writes the missing Rivet routines, with intermediate code- and physics- reviews as part of an autonomous quality control. We report the results obtained using commercial Large Language Models, provided by OpenAI, Anthropic, and Google, for two recent measurements from the ATLAS and CMS experiments. We find that AgentRivet produces competent Rivet routines with few syntax errors. The physics fidelity of the routines is reasonable and follows the explanations given in the relevant publications. Nevertheless, physics-implementation issues do arise and are investigated using the artefacts produced by AgentRivet. The majority of physics implementation issues arise from subtle-but-ambiguous definitions in the given publication, although some models struggle to implement complex observables even when clear definitions are given.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AgentRivet shows an LLM workflow can produce Rivet code from two papers with few syntax issues, but the test is too narrow and qualitative to support broader claims.

read the letter

AgentRivet is a multi-step LLM system that pulls analysis details from papers and writes Rivet routines, with built-in code and physics review steps. On two recent ATLAS and CMS measurements it produced output that mostly matched the published descriptions and had only minor syntax problems.

The practical gap is real. Only 39 percent of measurements have public Rivet routines, so anything that lowers the cost of adding more could matter for Monte Carlo tuning and BSM work. The workflow itself is new in its combination of extraction, generation, and autonomous review loops using models from OpenAI, Anthropic, and Google. That part is clearly described and directly addresses the coverage problem.

The evidence stays thin. Everything rests on exactly two cases, judged by eye as “competent” and “reasonable” without error counts, success rates, or side-by-side comparison to human-written routines. The physics review step uses LLMs as well, so the same class of model both creates and checks the code. The paper notes that ambiguous definitions and complex observables cause problems, but supplies no frequency data even for the small sample. Those limits are stated plainly, yet they still leave the central claim dependent on an unquantified demonstration.

This is a tool paper for the hep-ex subgroup that already uses Rivet for preservation. People who maintain analysis repositories or tune generators would get a concrete starting point from the workflow description. The thinking is straightforward and the citations to the existing Rivet framework are appropriate.

I would send it to peer review. The idea targets a documented bottleneck, and referees can ask for larger tests and quantitative metrics without needing to reject the premise outright.

Referee Report

2 major / 0 minor

Summary. The paper introduces AgentRivet, an automated multi-step workflow using commercial LLMs (from OpenAI, Anthropic, and Google) to extract physics analysis details from journal publications and generate Rivet routines, incorporating intermediate code- and physics-review steps for quality control. It reports results from applying the system to two recent ATLAS and CMS measurements, claiming that the output routines are competent with few syntax errors and exhibit reasonable physics fidelity that follows the source publications, while noting that some implementation issues arise primarily from ambiguous definitions in the papers.

Significance. If the approach can be shown to scale reliably, it would address the known gap in Rivet coverage (currently only 39% of measurements) and thereby improve model-independent comparisons between theory and experiment, Monte Carlo tuning, and BSM searches. The multi-provider LLM strategy and autonomous review steps represent a constructive engineering effort toward automation; however, the present demonstration supplies no quantitative performance metrics and rests on a sample of two analyses, so the immediate significance remains limited.

major comments (2)

[Abstract] Abstract and reported results: the central claim that AgentRivet 'produces competent Rivet routines' rests on qualitative judgments ('few syntax errors', 'reasonable' fidelity) for exactly two test cases, with no tabulated error counts, success rates, inter-rater agreement, or side-by-side comparison against human-written Rivet code. This small-N, non-metric evaluation is load-bearing for the claim and leaves its generality unestablished.
[Workflow and results] Workflow description and results: the physics-review stage is performed by an LLM of the same class as the generation stage, yet the manuscript supplies neither frequency/severity statistics for the observed physics issues nor an external validation protocol. Without these, the assertion that issues 'arise from subtle-but-ambiguous definitions' cannot be quantified even within the two reported cases.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. Our work presents AgentRivet as a proof-of-concept demonstration on two analyses, and we agree the evaluation can be strengthened with additional quantitative details from the existing cases. We respond point-by-point below, indicating planned revisions.

read point-by-point responses

Referee: [Abstract] Abstract and reported results: the central claim that AgentRivet 'produces competent Rivet routines' rests on qualitative judgments ('few syntax errors', 'reasonable' fidelity) for exactly two test cases, with no tabulated error counts, success rates, inter-rater agreement, or side-by-side comparison against human-written Rivet code. This small-N, non-metric evaluation is load-bearing for the claim and leaves its generality unestablished.

Authors: We agree the evaluation is qualitative and limited to two cases, consistent with the manuscript's framing as an initial demonstration of workflow feasibility rather than a broad benchmark. We will revise the abstract and results to include tabulated counts of syntax errors (e.g., compilation issues) and physics discrepancies per case, with explicit examples. Inter-rater agreement metrics are not applicable to this author-reviewed assessment. Side-by-side human comparisons are not included, as they require separate expert effort, but all generated routines and artefacts will be released to support such evaluations. The revisions will emphasize the proof-of-concept scope while making the reported observations more precise. revision: yes
Referee: [Workflow and results] Workflow description and results: the physics-review stage is performed by an LLM of the same class as the generation stage, yet the manuscript supplies neither frequency/severity statistics for the observed physics issues nor an external validation protocol. Without these, the assertion that issues 'arise from subtle-but-ambiguous definitions' cannot be quantified even within the two reported cases.

Authors: The physics-review LLM provides an initial automated check, but issue classification (ambiguous definitions versus model limitations) derives from the authors' subsequent manual inspection of outputs against the source papers. We will revise the workflow and results sections to add frequency and severity statistics for physics issues in both test cases, with categorization and examples. This will quantify the assertion within the reported scope. A formal external validation protocol is not implemented here and lies beyond the current demonstration; we will explicitly note this as a limitation and potential future extension. revision: yes

Circularity Check

0 steps flagged

No circularity: workflow and evaluation rest on external publications and standard Rivet framework

full rationale

The paper presents an LLM-driven workflow (AgentRivet) that extracts analysis details from published papers and generates Rivet code, with intermediate reviews. Its central claim is an empirical demonstration on two independent ATLAS/CMS measurements, asserting competent output and reasonable fidelity by direct comparison to the source publications' explanations. No equations, fitted parameters, or derivations are involved. No self-citations are invoked as load-bearing premises, and the evaluation does not reduce to any input by construction. The process is self-contained against external benchmarks (the original journal papers and the Rivet toolkit), satisfying the default expectation of no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The system depends on existing commercial LLM capabilities and the established Rivet framework; no new free parameters, axioms or invented entities are introduced.

pith-pipeline@v0.9.1-grok · 6086 in / 952 out tokens · 60338 ms · 2026-06-27T04:48:17.668994+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 9 canonical work pages · 3 internal anchors

[1]

HEPData: a repository for high energy physics data

Maguire, Eamonn and Heinrich, Lukas and Watt, Graeme. HEPData: a repository for high energy physics data. J. Phys. Conf. Ser. 2017. doi:10.1088/1742-6596/898/10/102006. arXiv:1704.05473

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1742-6596/898/10/102006 2017
[2]

2024 , publisher=

Christian Bierlich and others , journal=. 2024 , publisher=. doi:10.21468/SciPostPhysCodeb.36 , url=

work page doi:10.21468/scipostphyscodeb.36 2024
[3]

Buckley and others , journal=

A. Buckley and others , journal=. 2021 , publisher=. doi:10.21468/SciPostPhysCore.4.2.013 , url=

work page doi:10.21468/scipostphyscore.4.2.013 2021
[4]

Measurement and interpretation of inclusive W production in proton-proton collisions at s =13 TeV using the ATLAS detector

ATLAS Collaboration. Measurement and interpretation of inclusive W production in proton-proton collisions at s =13 TeV using the ATLAS detector. 2026. arXiv:2603.22478

arXiv 2026
[5]

Precise measurement of the t t production cross-section and lepton differential distributions in e dilepton events from s =13 TeV pp collisions with the ATLAS detector

ATLAS Collaboration. Precise measurement of the t t production cross-section and lepton differential distributions in e dilepton events from s =13 TeV pp collisions with the ATLAS detector. Eur. Phys. J. C. 2026. doi:10.1140/epjc/s10052-026-15311-0. arXiv:2509.15066

work page doi:10.1140/epjc/s10052-026-15311-0 2026
[6]

Measurement of event shape variables using charged particles inside jets in proton-proton collisions at s = 13 TeV

CMS Collaboration. Measurement of event shape variables using charged particles inside jets in proton-proton collisions at s = 13 TeV. 2026. arXiv:2602.17509

arXiv 2026
[7]

The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations

Alwall, J. and others. The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations. JHEP. 2014. doi:10.1007/JHEP07(2014)079. arXiv:1405.0301

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/jhep07(2014)079 2014
[8]

Recommendations for Best Practices for Data Preservation and Open Science in HEP

Campana, Simone and others. Recommendations for Best Practices for Data Preservation and Open Science in HEP. 2025. arXiv:2508.18892

arXiv 2025
[9]

DPHEP-2025-01 , title = "

Arbey, Alexandre and others. DPHEP-2025-01 , title = ". 2025. arXiv:2503.23619

arXiv 2025
[10]

2020 , publisher=

Waleed Abdallah and others , journal=. 2020 , publisher=. doi:10.21468/SciPostPhys.9.2.022 , url=

work page doi:10.21468/scipostphys.9.2.022 2020
[11]

A comprehensive guide to the physics and usage of PYTHIA 8.3

Bierlich, Christian and others. A comprehensive guide to the physics and usage of PYTHIA 8.3. SciPost Phys. Codeb. 2022. doi:10.21468/SciPostPhysCodeb.8. arXiv:2203.11601

work page internal anchor Pith review Pith/arXiv arXiv doi:10.21468/scipostphyscodeb.8 2022
[12]

Jason Wei and others , title =. Trans. Mach. Learn. Res. , volume =. 2022 , eprint =

2022
[13]

NeurIPS 2023

Are Emergent Abilities of Large Language Models a Mirage? , author=. NeurIPS 2023. 2023 , eprint=

2023
[14]

2021 , url =

Mark Chen and others , title =. 2021 , url =. 2107.03374 , timestamp =

Pith/arXiv arXiv 2021
[15]

2023 , eprint=

News Summarization and Evaluation in the Era of GPT-3 , author=. 2023 , eprint=

2023
[16]

Frontiers Comput

Wang, Lei and others , year=. A survey on large language model based autonomous agents , volume=. Frontiers of Computer Science , publisher=. doi:10.1007/s11704-024-40231-1 , number=

work page doi:10.1007/s11704-024-40231-1
[17]

2026 , month =

Hightower, Kelsey , title =. 2026 , month =

2026
[18]

Diefenbacher, Sascha and others , title = ". 2025. arXiv:2509.08535

arXiv 2025
[19]

and others

Moreno, Eric A. and others. AI Agents Can Already Autonomously Perform Experimental High Energy Physics. 2026. arXiv:2603.20179

Pith/arXiv arXiv 2026
[20]

Agentic AI -- Physicist Collaboration in Experimental Particle Physics: A Proof-of-Concept Measurement with LEP Open Data

Badea, Anthony and others. Agentic AI -- Physicist Collaboration in Experimental Particle Physics: A Proof-of-Concept Measurement with LEP Open Data. 2026. arXiv:2603.05735

arXiv 2026
[21]

Dr.Sai: An agentic AI for real-world physics analysis at BESIII

He, Mingfeng and others. Dr.Sai: An agentic AI for real-world physics analysis at BESIII. 2026. arXiv:2604.22541

Pith/arXiv arXiv 2026
[22]

HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency

Menzo, Tony and others. HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency. FERMILAB-PUB-25-0923-CSAID-ETD-T. 2025. arXiv:2512.15867

arXiv 2025
[23]

GRACE: an Agentic AI for Particle Physics Experiment Design and Simulation

Hill, Justin and Ryoo, Hong Joo. GRACE: an Agentic AI for Particle Physics Experiment Design and Simulation. 2026. arXiv:2602.15039

arXiv 2026
[24]

Automating High Energy Physics Data Analysis with LLM-Powered Agents

Gendreau-Distler, Eli and others. Automating High Energy Physics Data Analysis with LLM-Powered Agents. NeurIPS 2025. 2025. arXiv:2512.07785

arXiv 2025
[25]

MadAgents

Plehn, Tilman and Schiller, Daniel and Schmal, Nikita. MadAgents. 2026. arXiv:2601.21015

Pith/arXiv arXiv 2026
[26]

Qwen2.5-Coder Technical Report

Hui, Binyuan and others. Qwen2.5-Coder Technical Report. 2024. arXiv:2409.12186

Pith/arXiv arXiv 2024
[27]

Measurements of ZZ and ZZjj jj productions in pp collisions at s =13 TeV with the ATLAS detector

ATLAS Collaboration. Measurements of ZZ and ZZjj jj productions in pp collisions at s =13 TeV with the ATLAS detector. 2025. arXiv:2511.15569

arXiv 2025
[28]

2017 , url =

Ashish Vaswani and others , title =. 2017 , url =

2017
[29]

arXiv:2504.00256

2025. arXiv:2504.00256

arXiv 2025
[30]

doi:10.5281/zenodo.20646340 , url =

Doglioni, Caterina and Gutschow, Christian and Jacques Costa, António and Pilkington, Andrew and Sinha, Sukanya , title = ". doi:10.5281/zenodo.20646340 , url =

work page doi:10.5281/zenodo.20646340
[31]

and Palacios Schweitzer, Sofia and Pang, Ian and Mishra-Sharma, Siddharth and Shih, David

Faroughy, Darius A. and Palacios Schweitzer, Sofia and Pang, Ian and Mishra-Sharma, Siddharth and Shih, David. Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction. 2026. arXiv:2605.13950

Pith/arXiv arXiv 2026

[1] [1]

HEPData: a repository for high energy physics data

Maguire, Eamonn and Heinrich, Lukas and Watt, Graeme. HEPData: a repository for high energy physics data. J. Phys. Conf. Ser. 2017. doi:10.1088/1742-6596/898/10/102006. arXiv:1704.05473

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1088/1742-6596/898/10/102006 2017

[2] [2]

2024 , publisher=

Christian Bierlich and others , journal=. 2024 , publisher=. doi:10.21468/SciPostPhysCodeb.36 , url=

work page doi:10.21468/scipostphyscodeb.36 2024

[3] [3]

Buckley and others , journal=

A. Buckley and others , journal=. 2021 , publisher=. doi:10.21468/SciPostPhysCore.4.2.013 , url=

work page doi:10.21468/scipostphyscore.4.2.013 2021

[4] [4]

Measurement and interpretation of inclusive W production in proton-proton collisions at s =13 TeV using the ATLAS detector

ATLAS Collaboration. Measurement and interpretation of inclusive W production in proton-proton collisions at s =13 TeV using the ATLAS detector. 2026. arXiv:2603.22478

arXiv 2026

[5] [5]

Precise measurement of the t t production cross-section and lepton differential distributions in e dilepton events from s =13 TeV pp collisions with the ATLAS detector

ATLAS Collaboration. Precise measurement of the t t production cross-section and lepton differential distributions in e dilepton events from s =13 TeV pp collisions with the ATLAS detector. Eur. Phys. J. C. 2026. doi:10.1140/epjc/s10052-026-15311-0. arXiv:2509.15066

work page doi:10.1140/epjc/s10052-026-15311-0 2026

[6] [6]

Measurement of event shape variables using charged particles inside jets in proton-proton collisions at s = 13 TeV

CMS Collaboration. Measurement of event shape variables using charged particles inside jets in proton-proton collisions at s = 13 TeV. 2026. arXiv:2602.17509

arXiv 2026

[7] [7]

The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations

Alwall, J. and others. The automated computation of tree-level and next-to-leading order differential cross sections, and their matching to parton shower simulations. JHEP. 2014. doi:10.1007/JHEP07(2014)079. arXiv:1405.0301

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/jhep07(2014)079 2014

[8] [8]

Recommendations for Best Practices for Data Preservation and Open Science in HEP

Campana, Simone and others. Recommendations for Best Practices for Data Preservation and Open Science in HEP. 2025. arXiv:2508.18892

arXiv 2025

[9] [9]

DPHEP-2025-01 , title = "

Arbey, Alexandre and others. DPHEP-2025-01 , title = ". 2025. arXiv:2503.23619

arXiv 2025

[10] [10]

2020 , publisher=

Waleed Abdallah and others , journal=. 2020 , publisher=. doi:10.21468/SciPostPhys.9.2.022 , url=

work page doi:10.21468/scipostphys.9.2.022 2020

[11] [11]

A comprehensive guide to the physics and usage of PYTHIA 8.3

Bierlich, Christian and others. A comprehensive guide to the physics and usage of PYTHIA 8.3. SciPost Phys. Codeb. 2022. doi:10.21468/SciPostPhysCodeb.8. arXiv:2203.11601

work page internal anchor Pith review Pith/arXiv arXiv doi:10.21468/scipostphyscodeb.8 2022

[12] [12]

Jason Wei and others , title =. Trans. Mach. Learn. Res. , volume =. 2022 , eprint =

2022

[13] [13]

NeurIPS 2023

Are Emergent Abilities of Large Language Models a Mirage? , author=. NeurIPS 2023. 2023 , eprint=

2023

[14] [14]

2021 , url =

Mark Chen and others , title =. 2021 , url =. 2107.03374 , timestamp =

Pith/arXiv arXiv 2021

[15] [15]

2023 , eprint=

News Summarization and Evaluation in the Era of GPT-3 , author=. 2023 , eprint=

2023

[16] [16]

Frontiers Comput

Wang, Lei and others , year=. A survey on large language model based autonomous agents , volume=. Frontiers of Computer Science , publisher=. doi:10.1007/s11704-024-40231-1 , number=

work page doi:10.1007/s11704-024-40231-1

[17] [17]

2026 , month =

Hightower, Kelsey , title =. 2026 , month =

2026

[18] [18]

Diefenbacher, Sascha and others , title = ". 2025. arXiv:2509.08535

arXiv 2025

[19] [19]

and others

Moreno, Eric A. and others. AI Agents Can Already Autonomously Perform Experimental High Energy Physics. 2026. arXiv:2603.20179

Pith/arXiv arXiv 2026

[20] [20]

Agentic AI -- Physicist Collaboration in Experimental Particle Physics: A Proof-of-Concept Measurement with LEP Open Data

Badea, Anthony and others. Agentic AI -- Physicist Collaboration in Experimental Particle Physics: A Proof-of-Concept Measurement with LEP Open Data. 2026. arXiv:2603.05735

arXiv 2026

[21] [21]

Dr.Sai: An agentic AI for real-world physics analysis at BESIII

He, Mingfeng and others. Dr.Sai: An agentic AI for real-world physics analysis at BESIII. 2026. arXiv:2604.22541

Pith/arXiv arXiv 2026

[22] [22]

HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency

Menzo, Tony and others. HEPTAPOD: Orchestrating High Energy Physics Workflows Towards Autonomous Agency. FERMILAB-PUB-25-0923-CSAID-ETD-T. 2025. arXiv:2512.15867

arXiv 2025

[23] [23]

GRACE: an Agentic AI for Particle Physics Experiment Design and Simulation

Hill, Justin and Ryoo, Hong Joo. GRACE: an Agentic AI for Particle Physics Experiment Design and Simulation. 2026. arXiv:2602.15039

arXiv 2026

[24] [24]

Automating High Energy Physics Data Analysis with LLM-Powered Agents

Gendreau-Distler, Eli and others. Automating High Energy Physics Data Analysis with LLM-Powered Agents. NeurIPS 2025. 2025. arXiv:2512.07785

arXiv 2025

[25] [25]

MadAgents

Plehn, Tilman and Schiller, Daniel and Schmal, Nikita. MadAgents. 2026. arXiv:2601.21015

Pith/arXiv arXiv 2026

[26] [26]

Qwen2.5-Coder Technical Report

Hui, Binyuan and others. Qwen2.5-Coder Technical Report. 2024. arXiv:2409.12186

Pith/arXiv arXiv 2024

[27] [27]

Measurements of ZZ and ZZjj jj productions in pp collisions at s =13 TeV with the ATLAS detector

ATLAS Collaboration. Measurements of ZZ and ZZjj jj productions in pp collisions at s =13 TeV with the ATLAS detector. 2025. arXiv:2511.15569

arXiv 2025

[28] [28]

2017 , url =

Ashish Vaswani and others , title =. 2017 , url =

2017

[29] [29]

arXiv:2504.00256

2025. arXiv:2504.00256

arXiv 2025

[30] [30]

doi:10.5281/zenodo.20646340 , url =

Doglioni, Caterina and Gutschow, Christian and Jacques Costa, António and Pilkington, Andrew and Sinha, Sukanya , title = ". doi:10.5281/zenodo.20646340 , url =

work page doi:10.5281/zenodo.20646340

[31] [31]

and Palacios Schweitzer, Sofia and Pang, Ian and Mishra-Sharma, Siddharth and Shih, David

Faroughy, Darius A. and Palacios Schweitzer, Sofia and Pang, Ian and Mishra-Sharma, Siddharth and Shih, David. Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction. 2026. arXiv:2605.13950

Pith/arXiv arXiv 2026