pith. sign in

arxiv: 2606.21316 · v1 · pith:F3MRW5TBnew · submitted 2026-06-19 · ✦ hep-ph

Large Language Model-Assisted Framework for BSM Model Building

Pith reviewed 2026-06-26 13:58 UTC · model grok-4.3

classification ✦ hep-ph
keywords beyond the Standard Modellarge language modelsmodel buildingLagrangian constructiongauge anomalieselectroweak symmetry breakingsymbolic computationPython framework
0
0 comments X

The pith

The bsm_agent framework builds BSM models automatically from natural-language descriptions of new fields, with an LLM handling only the interface and a Python backend executing all symbolic calculations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper describes a software package that starts with the Standard Model and adds user-specified scalars or fermions whose quantum numbers are given through conversation. The system then generates the full renormalizable Lagrangian, verifies gauge anomalies, expands operators to components, and computes electroweak symmetry breaking conditions plus tree-level mass matrices. The LLM interprets requests and calls tools but never performs the physics arithmetic itself. This separation keeps every derived quantity deterministic and reproducible while removing the need for users to write or debug model code by hand.

Core claim

Starting from the SM field content and a user-specified set of additional scalars and/or fermions, the package constructs renormalizable Lagrangian, performs gauge-anomaly checks, expands operators into component fields, and derives electroweak symmetry breaking stationary conditions and tree-level mass matrices. All of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface, eliminating the need for manual model construction. The symbolic calculations are performed entirely by the Python backend to ensure the correctness and reproducibility of the physics results; the LLM is used only as an orchestration laye

What carries the argument

The LLM orchestration layer that interprets natural-language requests, manages confirmation steps for ambiguous inputs, triggers backend tools, and formats report-ready summaries, separated from the deterministic Python backend that executes all symbolic calculations.

If this is right

  • Quantum numbers of new scalars and fermions are supplied conversationally rather than through manual code entry.
  • Renormalizable Lagrangian construction, gauge-anomaly checks, operator expansion, and mass-matrix derivation all occur without further user coding after the initial description.
  • The same backend can be driven by local Ollama models, self-hosted servers, or commercial APIs while producing identical physics output.
  • Stationary conditions and tree-level mass matrices are generated in report-ready form for immediate use in further calculations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same split between conversational control and deterministic computation could be reused in other symbolic physics packages to reduce manual setup time.
  • Confirmation steps already built into the LLM layer provide a practical way to catch interpretation mistakes before the backend runs.
  • Extending the backend with additional modules such as one-loop corrections would immediately make those capabilities available through the same natural-language interface.

Load-bearing premise

The LLM correctly and unambiguously translates the user's natural-language description of quantum numbers into the precise inputs required by the backend without introducing parsing or interpretation errors.

What would settle it

An input phrase whose quantum numbers are misread by the LLM, producing a Lagrangian missing an interaction term or failing an anomaly check that the backend would otherwise catch.

Figures

Figures reproduced from arXiv: 2606.21316 by Shaikh Saad.

Figure 1
Figure 1. Figure 1: Architecture and workflow of bsm_agent. Natural-language user requests are handled by the agent layer, which performs intent matching, tool routing, and LLM-assisted interpreta￾tion only when needed. All physics outputs are produced by the deterministic symbolic backend: field and representation handling, renormalizable operator generation, component-field expansion, electroweak-symmetry-breaking analysis,… view at source ↗
read the original abstract

Recent advances in artificial intelligence (AI), particularly large language models (LLMs), have created new opportunities for natural-language interaction with scientific software, but reliable theoretical model building still requires deterministic symbolic calculations. We present \texttt{bsm_agent}, an open-source symbolic framework for beyond the Standard Model (BSM) model building that combines a deterministic physics backend with an LLM chat interface. Starting from the SM field content and a user-specified set of additional scalars and/or fermions, the package constructs renormalizable Lagrangian, performs gauge-anomaly checks, expands operators into component fields, and derives electroweak symmetry breaking stationary conditions and tree-level mass matrices. The key novelty of the framework is that all of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface, eliminating the need for manual model construction. The symbolic calculations are performed entirely by the Python backend to ensure the correctness and reproducibility of the physics results; the LLM is used only as an orchestration layer that interprets natural-language requests, manages confirmation steps for ambiguous inputs, triggers backend tools, and formats report-ready summaries. The package supports three provider classes: local Ollama inference, remote self-hosted model servers accessed through the implemented remote provider interface, and commercial hosted APIs via OpenAI and Anthropic. This separation between conversational control and deterministic computation preserves reproducibility while making interactive BSM model construction substantially more convenient.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript describes bsm_agent, an open-source framework that pairs an LLM chat interface for natural-language specification of BSM field quantum numbers with a deterministic Python backend. The backend automatically constructs renormalizable Lagrangians, performs gauge-anomaly checks, expands operators to component fields, and derives EWSB conditions and tree-level mass matrices starting from the SM plus user-specified scalars/fermions. The LLM acts solely as an orchestration layer for input interpretation, ambiguity confirmation, tool triggering, and report formatting, while all symbolic physics is handled in Python for reproducibility; multiple LLM providers (local, self-hosted, commercial) are supported.

Significance. If the LLM-to-backend translation proves reliable, the framework would lower the barrier to exploratory BSM model construction while preserving reproducibility. The explicit separation of conversational control from deterministic computation is a clear design strength. However, the absence of any concrete examples, parsing-accuracy metrics, or validation against known models means the practical significance remains unestablished.

major comments (2)
  1. [Abstract] Abstract: the central claim that 'all of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface' and that 'the LLM is used only as an orchestration layer' is unsupported by any validation data, test cases, or error metrics. Without quantitative assessment of parsing accuracy for quantum numbers (representations, hypercharges, etc.), the reproducibility guarantee cannot be evaluated.
  2. [Abstract] Abstract (and implied implementation description): the architecture assumes the LLM will not produce confident but incorrect mappings (e.g., doublet vs. singlet or fractional hypercharge errors) that the backend then executes deterministically. No mechanism beyond 'confirmation steps for ambiguous inputs' is described to catch such errors, and no accuracy benchmarks are referenced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their detailed review and for highlighting the importance of empirical validation for the LLM orchestration layer. The comments correctly identify that the submitted manuscript provides limited quantitative support for the reliability claims. We address each point below and will revise the manuscript to incorporate additional test cases, accuracy metrics, and workflow clarifications as described.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that 'all of these tasks are performed automatically once the user specifies the quantum numbers of the new fields through a natural-language interface' and that 'the LLM is used only as an orchestration layer' is unsupported by any validation data, test cases, or error metrics. Without quantitative assessment of parsing accuracy for quantum numbers (representations, hypercharges, etc.), the reproducibility guarantee cannot be evaluated.

    Authors: We agree that the abstract and current manuscript lack explicit quantitative validation of the LLM parsing step. The full text describes the separation of concerns and provides illustrative usage, but does not report systematic accuracy metrics or error rates. In the revision we will add a new section presenting a benchmark suite of quantum-number specifications (including representations, hypercharges, and multiplicities), measured parsing success rates across the supported LLM providers, and direct comparisons of the resulting Lagrangians against manually constructed reference models. This will allow readers to evaluate the reproducibility claim quantitatively. revision: yes

  2. Referee: [Abstract] Abstract (and implied implementation description): the architecture assumes the LLM will not produce confident but incorrect mappings (e.g., doublet vs. singlet or fractional hypercharge errors) that the backend then executes deterministically. No mechanism beyond 'confirmation steps for ambiguous inputs' is described to catch such errors, and no accuracy benchmarks are referenced.

    Authors: The current description emphasizes user confirmation of the parsed quantum numbers prior to backend execution, which is intended to intercept mis-mappings before any symbolic computation occurs. However, we acknowledge that the manuscript does not detail the exact confirmation interface, does not quantify how often such errors arise, and provides no benchmark data on their detection rate. In the revision we will expand the implementation section with a step-by-step description of the confirmation workflow, include the benchmark results mentioned above (which will report both raw parsing error rates and post-confirmation residual error rates), and clarify that the deterministic backend operates exclusively on the user-approved field content. revision: yes

Circularity Check

0 steps flagged

No circularity: framework description with deterministic backend

full rationale

The paper describes a software package (bsm_agent) that uses an LLM solely as an orchestration layer for natural-language input while delegating all symbolic calculations (Lagrangian construction, anomaly checks, mass matrices) to a deterministic Python backend. No derivations, equations, fitted parameters, predictions, or self-citations appear in the text. The central claim reduces to a description of implemented functionality rather than any mathematical reduction to its own inputs. This is a standard non-finding for tool/framework papers.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the assumption that the backend correctly encodes Standard Model gauge groups and field representations; no new physical parameters or entities are introduced.

axioms (1)
  • domain assumption The Python backend correctly implements the Standard Model gauge group, field content, and rules for renormalizable operators and anomaly cancellation.
    The package starts from SM and adds user-specified fields; correctness of the SM baseline is presupposed.

pith-pipeline@v0.9.1-grok · 5771 in / 1313 out tokens · 31870 ms · 2026-06-26T13:58:00.313466+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

39 extracted references · 15 linked inside Pith

  1. [1]

    S. L. Glashow,Partial Symmetries of Weak Interactions,Nucl. Phys.22(1961) 579

  2. [2]

    Weinberg,A Model of Leptons,Phys

    S. Weinberg,A Model of Leptons,Phys. Rev. Lett.19(1967) 1264

  3. [3]

    Salam,Weak and Electromagnetic Interactions,Conf

    A. Salam,Weak and Electromagnetic Interactions,Conf. Proc. C680519(1968) 367

  4. [4]

    W. X. Zhao, K. Zhou, J. Li, T. Tang, Z. Dong, Y. Hou et al.,A survey of large language models,Frontiers of Computer Science20(2026) 2012627

  5. [5]

    Hajkowicz, C

    S. Hajkowicz, C. Sanderson, S. Karimi, A. Bratanova and C. Naughtin,Artificial intelligence adoption in the physical sciences, natural sciences, life sciences, social sciences and the arts and humanities: A bibliometric analysis of research publications from 1960-2021,Technology in Society74(2023) 102260

  6. [6]

    Zhang, L

    X. Zhang, L. Wang, J. Helwig, Y. Luo, C. Fu, Y. Xie et al.,Artificial intelligence for science in quantum, atomistic, and continuum systems,Foundations and Trends®in Machine Learning18(2025) 385. 47

  7. [7]

    Boyko, J

    J. Boyko, J. Cohen, N. Fox, M. H. Veiga, J. I. Li, J. Liu et al.,An interdisciplinary outlook on large language models for scientific research,arXiv preprint arXiv:2311.04929(2023)

  8. [8]

    Zheng, H

    Y. Zheng, H. Y. Koh, J. Ju, A. T. Nguyen, L. T. May, G. I. Webb et al.,Large language models for scientific synthesis, inference and explanation,arXiv preprint arXiv:2310.07984 (2023)

  9. [9]

    L. Wang, C. Ma, X. Feng, Z. Zhang, H. Yang, J. Zhang et al.,A survey on large language model based autonomous agents,Frontiers of Computer Science18(2024) 186345

  10. [10]

    K. Li, B. Liu, B. Mellado, C.-Z. Yuan and Z. Zhang,AI agents, language, deep learning, and the next revolution in science,Front. Phys. (Beijing)21(2026) 096401 [arXiv:2603.07940]

  11. [11]

    Millison et al.,State machine structured agents for physical science reasoning, inAAAI Spring Symposium Series, vol

    J. Millison et al.,State machine structured agents for physical science reasoning, inAAAI Spring Symposium Series, vol. 20, pp. 310–317, 2026

  12. [12]

    Plehn, D

    T. Plehn, D. Schiller and N. Schmal,Madagents,arXiv preprint arXiv:2601.21015(2026) [arXiv:2601.21015]

  13. [13]

    E. A. Moreno, S. Bright-Thonney, A. Novak, D. Garcia and P. Harris,Ai agents can already autonomously perform experimental high energy physics,arXiv preprint arXiv:2603.20179 (2026) [arXiv:2603.20179]

  14. [15]

    Esmail, A

    W. Esmail, A. Hammad and M. Nojiri,CoLLM: AI engineering toolbox for end-to-end deep learning in collider analyses,arXiv:2602.06496

  15. [16]

    S. Qiu, Z. Cai, J. Wei, Z. Li, Y. Yin, Q.-H. Cao et al.,An End-to-end Architecture for Collider Physics and Beyond,arXiv:2603.14553

  16. [17]

    Agrawal, N

    P. Agrawal, N. Craig, A. Madden and I. V. Lombera,The FERMIACC: Agents for Particle Theory,arXiv:2603.22538

  17. [18]

    Menzo, A

    T. Menzo, A. Roman, G. T. Fleming, S. Gleyzer, K. T. Matchev and S. Mrenna,Agentic Diagrammatica: Towards Autonomous Symbolic Computation in High Energy Physics, arXiv:2603.26990

  18. [19]

    D. A. Faroughy, S. Palacios Schweitzer, I. Pang, S. Mishra-Sharma and D. Shih, Collider-Bench: Benchmarking AI Agents with Particle Physics Analysis Reproduction, arXiv:2605.13950

  19. [20]

    Desai,RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis, arXiv:2605.17318

    A. Desai,RooAgent: An LLM Agent for Root-Based High Energy Physics Analysis, arXiv:2605.17318

  20. [21]

    I. R. Wang,LeWRON: Agentic Analysis of Electroweak Phase Transitions, arXiv:2606.19425

  21. [22]

    Cirelli, N

    M. Cirelli, N. Fornengo and A. Strumia,Minimal dark matter,Nucl. Phys. B753(2006) 178 [arXiv:hep-ph/0512090]

  22. [23]

    Fritzsch and P

    H. Fritzsch and P. Minkowski,Unified Interactions of Leptons and Hadrons,Annals Phys.93 (1975) 193

  23. [24]

    Staub,SARAH,arXiv:0806.0538

    F. Staub,SARAH,arXiv:0806.0538. 48

  24. [25]

    N. D. Christensen and C. Duhr,FeynRules - Feynman rules made easy,Comput. Phys. Commun.180(2009) 1614 [arXiv:0806.4194]

  25. [26]

    R. M. Fonseca,Calculating the renormalisation group equations of a SUSY model with Susyno,Comput. Phys. Commun.183(2012) 2298 [arXiv:1106.5016]

  26. [27]

    Minkowski,µ→eγat a Rate of One Out of109 Muon Decays?,Phys

    P. Minkowski,µ→eγat a Rate of One Out of109 Muon Decays?,Phys. Lett. B67(1977) 421

  27. [28]

    Georgi and M

    H. Georgi and M. Machacek,DOUBLY CHARGED HIGGS BOSONS,Nucl. Phys. B262 (1985) 463

  28. [29]

    Kundu, P

    A. Kundu, P. Mondal and P. B. Pal,Custodial symmetry, the Georgi-Machacek model, and other scalar extensions,Phys. Rev. D105(2022) 115026 [arXiv:2111.14195]

  29. [30]

    K. S. Babu, S. Nandi and Z. Tavartkiladze,New Mechanism for Neutrino Mass Generation and Triply Charged Higgs Bosons at the LHC,Phys. Rev.D80(2009) 071702 [arXiv:0905.2710]

  30. [31]

    Buchmuller, R

    W. Buchmuller, R. Ruckl and D. Wyler,Leptoquarks in Lepton - Quark Collisions,Phys. Lett.B191(1987) 442

  31. [32]

    Doršner, S

    I. Doršner, S. Fajfer, A. Greljo, J. F. Kamenik and N. Košnik,Physics of leptoquarks in precision experiments and at particle colliders,Phys. Rept.641(2016) 1 [arXiv:1603.04993]

  32. [33]

    Crivellin and L

    A. Crivellin and L. Schnell,Complete Lagrangian and set of Feynman rules for scalar leptoquarks,Comput. Phys. Commun.271(2022) 108188 [arXiv:2105.04844]

  33. [34]

    Hisano and K

    J. Hisano and K. Tsumura,Higgs boson mixes with an SU(2) septet representation,Phys. Rev. D87(2013) 053004 [arXiv:1301.6455]

  34. [35]

    Alvarado, L

    C. Alvarado, L. Lehman and B. Ostdiek,Surveying the Scope of theSU(2)L Scalar Septet Sector,JHEP05(2014) 150 [arXiv:1404.3208]

  35. [36]

    E. Ma, M. Raidal and U. Sarkar,Probing the exotic particle content beyond the standard model,Eur. Phys. J. C8(1999) 301 [arXiv:hep-ph/9808484]

  36. [37]

    A. V. Manohar and M. B. Wise,Flavor changing neutral currents, an extended scalar sector, and the Higgs production rate at the CERN LHC,Phys. Rev. D74(2006) 035009 [arXiv:hep-ph/0606172]

  37. [38]

    Meurer, C

    A. Meurer, C. P. Smith, M. Paprocki, O. Čertík, S. B. Kirpichev, M. Rocklin et al.,Sympy: symbolic computing in python,PeerJ Computer Science3(2017) e103

  38. [39]

    LangChain

    H. Chase, “LangChain.”https://github.com/langchain-ai/langchain, Oct., 2022

  39. [40]

    Ollama Contributors, “Ollama.”https://github.com/ollama/ollama, 2023. 49