pith. sign in

arxiv: 2606.11157 · v1 · pith:EEZVEVVHnew · submitted 2026-06-09 · ✦ hep-ph · astro-ph.CO· physics.comp-ph

DarkAgents

Pith reviewed 2026-06-27 12:32 UTC · model grok-4.3

classification ✦ hep-ph astro-ph.COphysics.comp-ph
keywords multi-agent systemslarge language modelsastroparticle physicsfirst-order phase transitionsgravitational wavesNANOGravcosmological modelsparameter fitting
0
0 comments X

The pith

DarkAgents deploys multi-agent LLMs with deterministic code to build audited pipelines from scale-invariant models to NANOGrav gravitational-wave fits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a multi-agent framework that uses large language models together with verified human-written code to construct complete research pipelines in astroparticle physics. Starting from a classically scale-invariant particle model, the system computes first-order cosmological phase transitions, generates the associated gravitational-wave spectrum, performs a fit to the NANOGrav nanohertz data, and returns the best-fit parameters along with existing constraints. It additionally produces a report that audits every assumption and prior used in both the fit and the constraint collection. A sympathetic reader would care because the domain routinely involves layered theoretical calculations, multiple observational bounds, and hidden modeling choices that are difficult to track manually.

Core claim

DarkAgent-PT applies the multi-agent system to a classically scale-invariant particle-physics model, derives best-fit values for the parameters that reproduce the NANOGrav spectrum via a first-order phase transition, compiles the relevant experimental and observational bounds on those parameters, and supplies an explicit audit of all assumptions and priors that entered the calculation and the constraint list. The same runs expose inconsistencies in some previously published fits and generate new ones that employ the dissipative bulk-flow gravitational-wave template.

What carries the argument

The multi-agent orchestration layer that combines LLM reasoning and code generation with deterministic, tested human-written code to assemble and execute the full pipeline from model definition through phase-transition dynamics, gravitational-wave prediction, parameter fitting, constraint lookup, and assumption auditing.

If this is right

  • The framework can flag inconsistencies between new and existing literature fits for the same class of models.
  • It can generate new fits that incorporate alternative gravitational-wave templates such as the dissipative bulk-flow spectrum.
  • Every run produces a machine-readable audit report that makes the choice of priors and approximations explicit for later scrutiny.
  • The same architecture supports different underlying language models, including local deployments, without changing the pipeline logic.
  • Public release of the code allows direct reuse and extension to other astroparticle calculations that require similar multi-step modeling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The audit-report feature could be adopted as a minimal reproducibility standard for any future parameter fit in the field.
  • Once validated on additional benchmarks, the approach might shorten the time between proposing a new particle-physics model and obtaining its observational constraints.
  • The same orchestration pattern could be tested on collider-phenomenology pipelines that likewise combine model building, event generation, and experimental-limit comparisons.
  • Local-LLM variants would allow the entire workflow to run without external API calls, which may matter for computations involving proprietary or sensitive model details.

Load-bearing premise

The reasoning and code-generation steps performed by the language models remain reliable enough, when paired with deterministic human code, that they introduce neither undetected calculation errors nor invalid modeling assumptions into the final pipelines and fits.

What would settle it

Application of the system to a benchmark first-order transition model whose correct best-fit values and assumption list are already established in the literature; mismatch between the system output and the known correct values, or failure to flag a known flawed assumption, would falsify the reliability claim.

Figures

Figures reproduced from arXiv: 2606.11157 by Filippo Sala, Matteo Zandi, Michele Lucente, Silvia Pascoli.

Figure 1
Figure 1. Figure 1: Architecture of DarkAgents. An orchestrator organises and coordi￾nates the workflow of the sub-agents (robot icon in the small rectangular boxes), based on the user’s prompts. Humans can audit the progress after each task. The displayed implementation is for DarkAgent-PT, aimed at studying cosmological FOPT: between the proposal and astroparticle stages, it employs the dedicated FOPT-PTA pipeline. Addition… view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of the Bayesian posterior distributions against the [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

We present DarkAgents: a multi-agent system that leverages the reasoning and code-generation capabilities of large language models (LLMs), together with deterministic tested human-written code, to build orchestrated pipelines for theoretical astroparticle physics research. While related approaches have been proposed in collider physics and cosmology, DarkAgents targets the specific challenges of this domain, such as model building, complex pipeline computations, multiple constraints and assumption auditing. The framework can be powered by different agentic command-line tools, including Mistral's, Anthropic's, OpenAI's and local LLMs via Ollama. As first implementation, we apply DarkAgents to the study of cosmological first order transitions, starting from a classically scale-invariant particle-physics model and ending with the fit to the NANOGrav nanohertz gravitational-waves spectrum. DarkAgent-PT provides as output i) the best-fit values of model parameters, ii) their existing experimental and observational constraints, iii) an audit report of the assumptions and priors entering both i) and ii), of particular relevance for astroparticle physics. Our test runs identify inconsistencies in some fits in the literature and produce novel ones based on the dissipative bulk-flow GW template. The code is publicly available at https://github.com/PhysicsZandi/DarkAgents.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces DarkAgents, a multi-agent system combining LLM reasoning and code-generation capabilities with deterministic human-written code to construct pipelines for theoretical astroparticle physics. As a first application, DarkAgent-PT is used to study a classically scale-invariant particle-physics model, fit it to the NANOGrav nanohertz gravitational-wave spectrum, and output best-fit parameter values, existing experimental/observational constraints, and an audit report of assumptions and priors. The work claims that test runs identify inconsistencies in some literature fits and produce novel fits based on the dissipative bulk-flow GW template. The code is publicly available.

Significance. If the outputs can be independently verified as correct, the framework could help manage the complexity of model building, multi-constraint analyses, and assumption auditing in astroparticle physics. The public code release is a clear strength that enables reproducibility checks. At present, however, the significance is limited because the central claims rest on unverified LLM-generated pipelines.

major comments (3)
  1. [Abstract] Abstract: the claim that the system 'identifies inconsistencies in some fits in the literature and produce novel ones' is presented without any specific examples, comparison tables, or quantitative differences from prior results, which is load-bearing for the asserted utility of the framework.
  2. [Abstract] Abstract and results description: no equations for the scale-invariant model, the GW template (including the dissipative bulk-flow case), the likelihood, or the fitting procedure are provided, nor is any validation against known analytic limits or manual calculations, leaving the correctness of the reported best-fit values and audits unassessable.
  3. [Abstract] Abstract: the manuscript does not describe any cross-checks, error propagation, or sensitivity tests confirming that LLM-orchestrated steps (model construction, prior specification, constraint application) introduce no undetected errors, which directly affects the reliability of the claimed outputs.
minor comments (2)
  1. The abstract lists support for multiple LLM back-ends (Mistral, Anthropic, OpenAI, Ollama) but provides no usage examples or performance notes for the NANOGrav application.
  2. The workflow diagram or pseudocode for agent orchestration is not described, which would aid clarity even if the technical details are expanded elsewhere.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their careful review and constructive feedback on the manuscript. We address each major comment below and indicate the revisions planned for the next version.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the system 'identifies inconsistencies in some fits in the literature and produce novel ones' is presented without any specific examples, comparison tables, or quantitative differences from prior results, which is load-bearing for the asserted utility of the framework.

    Authors: We agree that the abstract would benefit from concrete examples to support the claim. In the revised manuscript we will add a brief description of one specific literature inconsistency (a cited fit whose priors were flagged as internally inconsistent by the audit module) together with the quantitative shifts in best-fit parameters obtained with the dissipative bulk-flow template. Full comparison tables already appear in the results section and will be referenced from the abstract. revision: yes

  2. Referee: [Abstract] Abstract and results description: no equations for the scale-invariant model, the GW template (including the dissipative bulk-flow case), the likelihood, or the fitting procedure are provided, nor is any validation against known analytic limits or manual calculations, leaving the correctness of the reported best-fit values and audits unassessable.

    Authors: The referee is correct that the abstract and the summarized results description omit the explicit equations and validation steps. We will insert the key equations (scale-invariant potential, GW spectrum expressions for both templates, likelihood, and fitting procedure) into a new concise subsection of the results. We will also add explicit statements of the analytic-limit checks and manual cross-verifications that were performed on the pipeline outputs. revision: yes

  3. Referee: [Abstract] Abstract: the manuscript does not describe any cross-checks, error propagation, or sensitivity tests confirming that LLM-orchestrated steps (model construction, prior specification, constraint application) introduce no undetected errors, which directly affects the reliability of the claimed outputs.

    Authors: We acknowledge that the current text does not detail cross-checks or sensitivity tests for the LLM-driven components. In the revision we will add a dedicated validation subsection describing (i) manual inspection of a random sample of LLM-generated code segments, (ii) sensitivity tests on prior choices, and (iii) comparison of LLM-orchestrated versus fully manual runs for a subset of the NANOGrav fits. These additions will directly address the reliability concern. revision: yes

Circularity Check

0 steps flagged

Framework introduction exhibits no load-bearing circularity

full rationale

The paper presents DarkAgents as a new multi-agent LLM-plus-human-code framework for building astroparticle physics pipelines, with its primary output being best-fit parameters, constraints, and assumption audits for a scale-invariant model fitted to NANOGrav data. No derivation chain reduces a claimed prediction or first-principles result to its own inputs by construction, self-definition, or self-citation. The central value resides in the orchestration tool itself rather than in any fitted quantity or uniqueness theorem that would require external verification. Minor self-citation risk is present but not load-bearing for the framework claim.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper presents a software framework and its application rather than a theoretical derivation, so the central claim introduces no free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5754 in / 1085 out tokens · 21890 ms · 2026-06-27T12:32:01.268239+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. LeWRON: Agentic Analysis of Electroweak Phase Transitions

    hep-ph 2026-06 unverdicted novelty 7.0

    LeWRON is a new agentic framework that automates construction, auditing, and exploration of finite-temperature effective potentials and gravitational-wave predictions for electroweak phase transitions starting from an...

Reference graph

Works this paper leans on

24 extracted references · 14 canonical work pages · cited by 1 Pith paper · 4 internal anchors

  1. [1]

    Alves Batistaet al.,EuCAPT White Paper: Opportunities and Challenges for Theoretical Astroparticle Physics in the Next Decade(2021),2110.10074

    R. Alves Batistaet al.,EuCAPT White Paper: Opportunities and Challenges for Theoretical Astroparticle Physics in the Next Decade(2021),2110.10074

  2. [2]

    Agrawal, N

    P. Agrawal, N. Craig, A. Madden and I. V. Lombera,The FERMIACC: Agents for Particle Theory(2026),2603.22538

  3. [3]

    Peng, H.-S

    Z.-Y. Peng, H.-S. Yuan, Q. Lai, J.-Q. Jiang, G. Ye, J. Zhang and Y.-S. Piao,DeepIn- flation: an AI agent for research and model discovery of inflation(2026),2601.14288

  4. [4]

    S. Qiu, Z. Cai, J. Wei, Z. Li, Y. Yin, Q.-H. Cao, C. Liu, M.-x. Luo, X.-B. Yuan and H. X. Zhu,An End-to-end Architecture for Collider Physics and Beyond(2026), 2603.14553

  5. [5]

    Mudur, C

    N. Mudur, C. Cuesta-Lazaro, M. W. Toomey and D. Finkbeiner,An llm-driven framework for cosmological model-building and exploration, InLLM for Scientific Discovery: Reasoning, Assistance, and Collaboration. 10 SciPost PhysicsReferences

  6. [6]

    The NANOGrav 15-year Data Set: Evidence for a Gravitational-Wave Background

    G. Agazieet al.,The NANOGrav 15 yr Data Set: Evidence for a Gravitational-wave Background, Astrophys. J. Lett.951(1), L8 (2023), doi:10.3847/2041-8213/acdac6, 2306.16213. [7]Mistral,https://mistral.ai/, Accessed: 2026-06-09. [8]Claude,https://claude.com/, Accessed: 2026-06-09. [9]OpenAI,https://openai.com/, Accessed: 2026-06-09. [10]Ollama,https://ollama....

  7. [7]

    The second data release from the European Pulsar Timing Array III. Search for gravitational wave signals

    J. Antoniadiset al.,The second data release from the European Pulsar Timing Array - III. Search for gravitational wave signals, Astron. Astrophys.678, A50 (2023), doi:10.1051/0004-6361/202346844,2306.16214

  8. [8]

    Searching for the nano-Hertz stochastic gravitational wave background with the Chinese Pulsar Timing Array Data Release I

    H. Xuet al.,Searching for the Nano-Hertz Stochastic Gravitational Wave Background with the Chinese Pulsar Timing Array Data Release I, Res. Astron. Astrophys.23(7), 075024 (2023), doi:10.1088/1674-4527/acdfa5,2306.16216

  9. [9]

    D. J. Reardonet al.,Search for an Isotropic Gravitational-wave Background with the Parkes Pulsar Timing Array, Astrophys. J. Lett.951(1), L6 (2023), doi:10.3847/2041-8213/acdd02,2306.16215

  10. [10]

    Pascoli, S

    S. Pascoli, S. Rosauro-Alcaraz and M. Zandi,Cosmological phase transitions: from particle physics to gravitational waves, semi-analytically(2026),2602.02829

  11. [11]

    Costa, J

    F. Costa, J. Hoefken Zink, M. Lucente, S. Pascoli and S. Rosauro-Alcaraz,Supercooled dark scalar phase transitions explanation of NANOGrav data, Phys. Lett. B868, 139634 (2025), doi:10.1016/j.physletb.2025.139634,2501.15649

  12. [12]

    Balan, T

    S. Balan, T. Bringmann, F. Kahlhoefer, J. Matuszak and C. Tasillo,Sub-GeV dark matter and nano-Hertz gravitational waves from a classically conformal dark sector, JCAP08, 062 (2025), doi:10.1088/1475-7516/2025/08/062,2502.19478

  13. [13]

    Gon¸ calves, D

    J. Gon¸ calves, D. Marfatia, A. P. Morais and R. Pasechnik,Supercooled phase transi- tions in conformal dark sectors explain NANOGrav data, Phys. Lett. B869, 139829 (2025), doi:10.1016/j.physletb.2025.139829,2501.11619

  14. [14]

    Athron, C

    P. Athron, C. Bal´ azs, A. Fowlie, L. Morris and L. Wu,Cosmological phase transitions: From perturbative particle physics to gravitational waves, Prog. Part. Nucl. Phys.135, 104094 (2024), doi:10.1016/j.ppnp.2023.104094,2305.02357

  15. [15]

    Mitridate, D

    A. Mitridate, D. Wright, R. von Eckardstein, T. Schr¨ oder, J. Nay, K. Olum, K. Schmitz and T. Trickle,PTArcade(2023),2306.16377

  16. [16]

    W. G. Lamb, S. R. Taylor and R. van Haasteren,Rapid refitting tech- niques for Bayesian spectral characterization of the gravitational wave back- ground using pulsar timing arrays, Phys. Rev. D108(10), 103019 (2023), doi:10.1103/PhysRevD.108.103019,2303.15442

  17. [17]

    Lewicki and V

    M. Lewicki and V. Vaskonen,Impact of cosmic expansion on gravitational wave spectra from strongly supercooled first-order phase transitions(2025),2511.15687

  18. [18]

    Jinno, T

    R. Jinno, T. Konstandin, H. Rubira and I. Stomberg,Higgsless simulations of cosmological phase transitions and gravitational waves, JCAP02, 011 (2023), doi:10.1088/1475-7516/2023/02/011,2209.04369. 11 SciPost PhysicsReferences

  19. [19]

    Caprini, R

    C. Caprini, R. Jinno, M. Lewicki, E. Madge, M. Merchand, G. Nardini, M. Pieroni, A. Roper Pol and V. Vaskonen,Gravitational waves from first-order phase transitions in LISA: reconstruction pipeline and physics interpretation, JCAP10, 020 (2024), doi:10.1088/1475-7516/2024/10/020,2403.03723

  20. [20]

    Musumeci, J

    A. Musumeci, J. Nava, S. Pascoli and F. Sala,Nanohertz gravitational waves from the baryon-dark matter coincidence(2026),2604.26860

  21. [21]

    Agrawalet al.,Feebly-interacting particles: FIPs 2020 workshop report, Eur

    P. Agrawalet al.,Feebly-interacting particles: FIPs 2020 workshop report, Eur. Phys. J. C81(11), 1015 (2021), doi:10.1140/epjc/s10052-021-09703-7,2102.12143

  22. [22]

    Antelet al.,Feebly-interacting particles: FIPs 2022 Workshop Report, Eur

    C. Antelet al.,Feebly-interacting particles: FIPs 2022 Workshop Report, Eur. Phys. J. C83(12), 1122 (2023), doi:10.1140/epjc/s10052-023-12168-5,2305.01715

  23. [23]

    Abdullahiet al.,From oversimplified to overlooked: The case for exploring rich dark sectors, Nucl

    A. Abdullahiet al.,From oversimplified to overlooked: The case for exploring rich dark sectors, Nucl. Phys. B1020, 117148 (2025), doi:10.1016/j.nuclphysb.2025.117148,2505.05663

  24. [24]

    R. S. Sutton,The Bitter Lesson, Accessed: 2026-06-04 (2019). 12