MadAgents
Pith reviewed 2026-05-16 10:16 UTC · model grok-4.3
The pith
AI agents install, support and run full MadGraph simulation campaigns starting from a research paper PDF.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MadAgents deliver agentic installation, learning-by-doing training, user support, and autonomous simulation campaigns for MadGraph, beginning directly from a PDF file of a paper, using an updated Claude Code implementation that includes a self-improvement loop.
What carries the argument
The set of communicative AI agents that interact with MadGraph's command-line interface to interpret papers, generate simulation setups, run event generation, and analyze outputs.
If this is right
- Inexperienced users gain direct access to state-of-the-art MadGraph simulations through guided installation and training.
- Simulation campaigns can be launched and completed autonomously once a paper PDF is provided.
- Agents support a range of tasks including result analysis for both basic and advanced users.
- The self-improvement loop enables the system to refine its performance on repeated simulation requests.
Where Pith is reading between the lines
- The same agent pattern could be tested on other command-line physics packages to check transferability.
- Integration with version control or public repositories might allow agents to pull papers and reproduce results at scale.
- Error rates in autonomous runs could be measured across a set of recent LHC papers to quantify reliability gains over time.
Load-bearing premise
The underlying AI model can reliably interpret physics papers and interact with MadGraph's command-line interface to produce correct simulation setups without frequent errors or human fixes.
What would settle it
A controlled test in which the agents receive a specific paper PDF, generate and execute the full simulation campaign end-to-end, and produce output files whose key distributions match the paper's published results without manual intervention.
read the original abstract
We uncover an effective and communicative set of agents working with MadGraph. Agentic installation, learning-by-doing training, and user support provide easy access to state-of-the-art simulations and accelerate LHC research. We show in detail how MadAgents interact with inexperienced and advanced users, support a range of simulation tasks, and analyze results. In a second step, we illustrate how MadAgents automatize event generation and run an autonomous simulation campaign, starting from a pdf file of a paper. The updated Claude Code implementation includes a self-improvement loop.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents MadAgents, a system of AI agents integrated with MadGraph that enable agentic installation, learning-by-doing training, user support for both novice and advanced users, and fully autonomous simulation campaigns that start from a PDF of a physics paper, with an added self-improvement loop in the updated Claude Code implementation.
Significance. If validated, the approach could meaningfully lower barriers to LHC phenomenology by automating routine MadGraph workflows and allowing non-experts to launch campaigns directly from the literature. The combination of interactive support and autonomous PDF-to-event-generation pipelines represents a novel application of agentic AI in high-energy physics tooling.
major comments (3)
- [Autonomous simulation campaign and self-improvement loop sections] The central claim of reliable autonomous simulation campaigns (starting from an arbitrary PDF) is unsupported by any quantitative metrics: no success rates, error frequencies, failure-mode analysis, or side-by-side comparison against expert human setups are provided anywhere in the manuscript.
- [Agentic installation, learning-by-doing training, and user support sections] The description of agent interactions with the MadGraph CLI (installation, process definition, parameter-card generation, and result analysis) remains purely qualitative; no test cases, edge-case handling, or validation against known MadGraph outputs are reported, leaving the reliability of the core workflow unverified.
- [Updated Claude Code implementation and self-improvement loop] The self-improvement loop is asserted to enhance performance, yet the manuscript supplies neither concrete examples of corrections made by the loop nor any before/after performance indicators, making it impossible to assess whether the loop actually reduces errors.
minor comments (2)
- Figure captions and workflow diagrams would benefit from explicit labels indicating which agent performs each step and which MadGraph commands are invoked.
- The manuscript should include a short table summarizing the distinct agent roles and their primary responsibilities for clarity.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on the MadAgents manuscript. We agree that quantitative validation is necessary to support the claims of reliability and will revise the manuscript to incorporate the requested metrics, test cases, and examples.
read point-by-point responses
-
Referee: [Autonomous simulation campaign and self-improvement loop sections] The central claim of reliable autonomous simulation campaigns (starting from an arbitrary PDF) is unsupported by any quantitative metrics: no success rates, error frequencies, failure-mode analysis, or side-by-side comparison against expert human setups are provided anywhere in the manuscript.
Authors: We acknowledge that the current manuscript presents the autonomous PDF-to-campaign workflow primarily through illustrative examples rather than statistical validation. In the revised version we will add a dedicated evaluation section reporting success rates across 20 sample PDFs drawn from recent LHC phenomenology papers, a breakdown of observed failure modes (PDF parsing ambiguities, command syntax errors, and parameter inconsistencies), and a side-by-side comparison of agent-generated setups versus expert manual configurations for a subset of processes, including wall-clock time and final cross-section agreement. revision: yes
-
Referee: [Agentic installation, learning-by-doing training, and user support sections] The description of agent interactions with the MadGraph CLI (installation, process definition, parameter-card generation, and result analysis) remains purely qualitative; no test cases, edge-case handling, or validation against known MadGraph outputs are reported, leaving the reliability of the core workflow unverified.
Authors: We agree that the core agent–MadGraph interactions require explicit validation. The revised manuscript will include a new appendix with concrete test cases: a complete installation trace, an example process-definition dialogue with the agent’s reasoning steps, generated parameter cards compared line-by-line to manually produced cards, and result-analysis outputs cross-checked against standard MadGraph reference outputs for benchmark processes such as pp → tt̄ and pp → WZ. revision: yes
-
Referee: [Updated Claude Code implementation and self-improvement loop] The self-improvement loop is asserted to enhance performance, yet the manuscript supplies neither concrete examples of corrections made by the loop nor any before/after performance indicators, making it impossible to assess whether the loop actually reduces errors.
Authors: The self-improvement loop is a recent addition to the Claude Code implementation. The revised text will supply two concrete examples of corrections performed by the loop (one involving recovery from an incorrect PDF-extracted parameter value and one fixing a mis-specified decay chain) together with before-and-after error counts and success-rate improvements measured over a fixed set of ten autonomous campaigns. revision: yes
Circularity Check
No circularity: descriptive software account with no derivations or fits
full rationale
The paper is a purely descriptive account of an AI agent system for MadGraph, covering installation, training, user support, autonomous campaigns from PDF inputs, and a self-improvement loop. It contains no mathematical derivations, equations, fitted parameters, predictions, or uniqueness theorems. All content consists of workflow descriptions and qualitative examples with no load-bearing steps that could reduce outputs to inputs by construction or via self-citation chains. The central claims rest on demonstrated functionality rather than any self-referential logic.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 2 Pith papers
-
Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction
Collider-Bench is a new benchmark showing that current LLM agents cannot reliably reproduce LHC analyses at the level of a physicist-in-the-loop.
-
RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis
RooAgent provides an LLM agent interface that translates natural-language prompts into calls to PyROOT analysis functions for high energy physics tasks, with support for multiple AI backends and tested on ZH simulatio...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.