A Constrained Natural-Language Interface for Variational Multi-Physics Finite Element Simulations in FEniCS

Nilay Upadhyay; Wesley F. Reinhart

arxiv: 2606.10928 · v1 · pith:PUH5U57Nnew · submitted 2026-06-09 · 💻 cs.CE · cs.AI· cs.LG· physics.comp-ph

A Constrained Natural-Language Interface for Variational Multi-Physics Finite Element Simulations in FEniCS

Nilay Upadhyay , Wesley F. Reinhart This is my paper

Pith reviewed 2026-06-27 11:15 UTC · model grok-4.3

classification 💻 cs.CE cs.AIcs.LGphysics.comp-ph

keywords natural language interfacefinite element methodFEniCSlarge language modelsconstrained generationmulti-physics simulationvariational methodsGmsh

0 comments

The pith

Constraining LLMs to prompt parsing and geometry tasks while routing to human-written FEniCS templates yields 100 percent valid parses and benchmark agreement.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a system in which large language models are restricted to front-end tasks: converting natural-language prompts into structured JSON and generating Gmsh code only for custom geometries, with retry feedback loops at those stages. A deterministic dispatcher then maps the validated JSON to one of five fixed, human-written FEniCS/UFL templates covering linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, and phase-field fracture. The LLM never produces solver code, derives weak forms, or touches the numerical core. Benchmarks show a final 100 percent valid parse rate, 100 percent problem-class accuracy, 97.1 percent field-extraction accuracy, and 90 percent success on custom-geometry cases, with solution accuracy ranging from sub-percent on smooth problems to 2-5 percent on harder nonlinear ones. A sympathetic reader would care because the design keeps generated code off the critical path while still allowing natural-language access to variational multi-physics simulations.

Core claim

By limiting the LLM to parsing prompts into validated JSON and optional Gmsh generation, then dispatching via a deterministic layer to five human-written templates, the interface reaches 100.0 percent final valid parse rate, 100.0 percent problem-class accuracy, 97.1 percent field-extraction accuracy, and 90.0 percent final success on custom-geometry cases while producing solutions that agree with analytical and published benchmarks to within a few percent.

What carries the argument

The deterministic dispatcher that maps validated JSON specifications to one of five human-written FEniCS/UFL templates for linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, and phase-field fracture.

Load-bearing premise

The five human-written FEniCS/UFL templates cover the physics cases users will request and contain no implementation errors that affect the reported benchmark agreement.

What would settle it

A prompt requesting physics outside the five templates, or a standard benchmark where the dispatched template produces results that deviate beyond the stated 2-5 percent range from the analytical or published solution.

Figures

Figures reproduced from arXiv: 2606.10928 by Nilay Upadhyay, Wesley F. Reinhart.

**Figure 1.** Figure 1: Architecture of the FEniCS agent. A natural-language input is parsed into a structured specification, validated for physical consistency, [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Variational form generation for elastoplasticity. The structured specification is routed through a deterministic template that selects [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Mesh convergence for two-dimensional linear elasticity. Left: cantilever tip deflection rises from 13.3 mm at 86 elements to 19.21 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Three-dimensional cantilever beam with quadratic tetrahedral elements. Left: tip deflection rises from 0.3803 mm to 0.3833 mm [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Lamé cylinder verification. Left: error in maximum von Mises stress against the analytical Lamé value, dropping from 16% at 106 [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Cook’s membrane convergence with Neo-Hookean material and Poisson ratio [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Three-dimensional rubber block compression with frictionless boundary conditions. Left: FEM reaction force overlaid on the [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Three-dimensional bar in uniaxial tension with linear isotropic hardening ( [PITH_FULL_IMAGE:figures/full_fig_p010_8.png] view at source ↗

**Figure 9.** Figure 9: Three-dimensional thermo-mechanical beam with both [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Two-dimensional SENT phase-field fracture. Left: reaction force per unit thickness against applied displacement, showing linear [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 11.** Figure 11: Three-dimensional SENT phase-field fracture. A thin slab with plane-strain boundary conditions gives a raw peak force near 60.8 N, [PITH_FULL_IMAGE:figures/full_fig_p011_11.png] view at source ↗

**Figure 12.** Figure 12: Damage field in the two-dimensional SENT specimen [PITH_FULL_IMAGE:figures/full_fig_p012_12.png] view at source ↗

**Figure 13.** Figure 13: Damage field on the three-dimensional SENT mesh, shown in perspective on the left and as a top-down projection on the right. The [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗

**Figure 14.** Figure 14: Heuristic local-refinement study across three deterministic [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗

**Figure 15.** Figure 15: Pareto front for the plate-with-hole parametric study. Five [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

**Figure 16.** Figure 16: Maximum von Mises stress contour over the thickness– [PITH_FULL_IMAGE:figures/full_fig_p014_16.png] view at source ↗

**Figure 17.** Figure 17: End-to-end demonstration on a 3D L-bracket with elastoplasticity, generated from one natural-language prompt. Left: applied [PITH_FULL_IMAGE:figures/full_fig_p016_17.png] view at source ↗

**Figure 18.** Figure 18: Wall-clock time breakdown for one representative case [PITH_FULL_IMAGE:figures/full_fig_p016_18.png] view at source ↗

read the original abstract

Large language models can reduce the manual effort required to set up finite element simulations, but they introduce reliability risks when generated solver code lies on the critical path. We present a constrained natural-language interface for multi-physics finite element analysis in which the LLM is limited to front-end tasks: parsing prompts into structured JSON, generating Gmsh code only for non-catalog geometries, and using retry feedback for those stages. It never writes FEniCS solver templates, derives weak forms, or writes the numerical solver core. A deterministic dispatcher maps the validated specification to five human-written FEniCS/UFL templates: linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, and phase-field fracture. We validate this deterministic template layer against analytical solutions and published 2D/3D benchmarks. Smooth cases reach sub-percent agreement on adequate meshes, while harder nonlinear cases reach the 2-5 percent range. We also evaluate the LLM-facing front end directly. In a 15-prompt parser benchmark, first-pass valid parses were obtained for 9 cases, and all remaining cases were repaired after retry, giving a final valid parse rate of 100.0 percent, 100.0 percent problem-class accuracy, and 97.1 percent field-extraction accuracy. In a 10-case custom-geometry benchmark routed through the real LLM-to-Gmsh path, first-pass and final success were both 90.0 percent, with one unrecovered invalid-geometry failure. These results show that the parser and constrained prompt/validation design are effective on these benchmarks. As an end-to-end demonstration, the system generates and analyzes a 3D elastoplastic L-bracket with a fillet and bolt hole from one natural-language prompt. The contribution is a measured architecture for natural-language-driven variational simulation, not open-ended autonomous code generation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's real contribution is a measured split where the LLM only parses and makes meshes while fixed templates handle the physics, backed by 100% final parse success and 90% geometry success on their benchmarks.

read the letter

The main point is that this keeps the LLM out of the solver entirely. It parses natural language into JSON, generates Gmsh only for custom shapes, then hands off to one of five pre-written FEniCS templates for linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, or phase-field fracture. That architecture is the new piece.

The numbers on the front end are the strongest part. The 15-prompt parser test reached 100% valid parses after retry, 100% problem-class accuracy, and 97.1% field extraction. The 10-geometry test hit 90% final success. Template outputs match analytical solutions to sub-percent on smooth cases and 2-5% on the nonlinear benchmarks. Those results are concrete and rest on external comparisons rather than self-referential fits.

The load-bearing assumption is that the five templates will cover what users actually ask for and that they are implemented correctly. The paper does not test how often requests fall outside those five, so generalization beyond the reported benchmarks stays unproven. Minor details on exact benchmark selection and mesh convergence are also thin in the write-up.

This is for people building or using FEniCS who want a safer natural-language front end. It will not reorganize AI-for-science, but the constrained design and the reported parser/geometry metrics are worth a referee's time. I would send it to peer review.

Referee Report

0 major / 2 minor

Summary. The manuscript describes a constrained natural-language interface for multi-physics finite element simulations using FEniCS. The LLM is restricted to parsing natural language prompts into structured JSON and generating Gmsh code for non-catalog geometries, with retry mechanisms for validation. A deterministic dispatcher then maps the validated JSON to one of five human-written FEniCS/UFL templates covering linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, and phase-field fracture. The paper reports validation results against analytical solutions and published benchmarks, achieving sub-percent agreement on smooth cases and 2-5% on nonlinear ones, as well as 100% final valid parse rate, 100% problem-class accuracy, 97.1% field-extraction accuracy in a 15-prompt parser benchmark, and 90% success in a 10-case custom-geometry benchmark.

Significance. If the reported results hold, the work demonstrates a reliable architecture for integrating LLMs with variational FEM by limiting the LLM to non-critical front-end tasks and relying on verified templates for the solver core. This approach mitigates risks associated with LLM-generated code while achieving high success rates on the specified benchmarks. The contribution lies in the measured performance of the constrained parser and validation design rather than in advancing open-ended code generation.

minor comments (2)

[Abstract and §4] The exact criteria for selecting the 15 prompts in the parser benchmark and the 10 custom-geometry cases are not specified; providing the prompt list or selection methodology would improve reproducibility.
[§5] Details on mesh convergence studies for the benchmark validations are limited; clarifying how mesh adequacy was determined for the reported agreement levels would strengthen the validation claims.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper's central claims concern measured effectiveness of a constrained LLM-to-JSON parser and deterministic template dispatcher on explicit benchmarks (100% valid parse rate, 100% problem-class accuracy, 97.1% field-extraction accuracy, 90% custom-geometry success, sub-5% agreement on validations). These rest on direct comparison of outputs to analytical solutions and published 2D/3D benchmarks, not on any equations, fitted parameters, or self-citations that reduce the reported quantities to the inputs by construction. The five human-written FEniCS/UFL templates are treated as fixed external artifacts whose coverage is scoped to the tested cases; no uniqueness theorem, ansatz smuggling, or renaming of known results is invoked as load-bearing for the benchmark numbers themselves.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the correctness of the five human-written FEniCS templates and the representativeness of the chosen analytical and published benchmarks; no free parameters, ad-hoc axioms, or invented entities are introduced.

axioms (2)

standard math Standard finite-element weak-form derivations and numerical solvers in FEniCS/UFL are assumed correct when applied to the five physics classes.
Invoked when the deterministic dispatcher selects and runs one of the five templates; the abstract states validation against analytical solutions.
domain assumption The selected 2D/3D benchmarks and analytical solutions are representative of the target use cases.
Used to interpret the reported sub-percent and 2-5 percent agreement ranges as evidence of template correctness.

pith-pipeline@v0.9.1-grok · 5885 in / 1618 out tokens · 24476 ms · 2026-06-27T11:15:48.878657+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

43 extracted references · 10 canonical work pages

[1]

Thomas J. R. Hughes.The Finite Element Method: Lin- ear Static and Dynamic Finite Element Analysis. Dover Publications, Mineola, NY , 2012

2012
[2]

O. C. Zienkiewicz, R. L. Taylor, and J. Z. Zhu.The Finite Element Method: Its Basis and Fundamentals. Butterworth-Heinemann, 7 edition, 2013

2013
[3]

Pren- tice Hall, 1996

Klaus-J"urgen Bathe.Finite Element Procedures. Pren- tice Hall, 1996

1996
[4]

John Wiley & Sons, 2014

Ted Belytschko, Wing Kam Liu, Brian Moran, and Khalil Elkhodary.Nonlinear Finite Elements for Continua and Structures. John Wiley & Sons, 2014

2014
[5]

Springer, 2008

Peter Wriggers.Nonlinear Finite Element Methods. Springer, 2008

2008
[6]

Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities.International Journal for Numerical Methods in Engineering, 79(11):1309–1331,

Christophe Geuzaine and Jean-François Remacle. Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities.International Journal for Numerical Methods in Engineering, 79(11):1309–1331,
[7]

doi: 10.1002/nme.2579

work page doi:10.1002/nme.2579
[8]

Adams, et al

Satish Balay, Shrirang Abhyankar, Mark F. Adams, et al. PETSc/TAO users manual. Technical Report ANL-21/39 - Revision 3.20, Argonne National Laboratory, 2023

2023
[9]

Wells, editors.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, vol- ume 84 ofLecture Notes in Computational Science and Engineering

Anders Logg, Kent-Andre Mardal, and Garth N. Wells, editors.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, vol- ume 84 ofLecture Notes in Computational Science and Engineering. Springer, 2012. doi: 10.1007/978-3-642-2 3099-8

work page doi:10.1007/978-3-642-2 2012
[10]

Alnæs, Anders Logg, Kristian B

Martin S. Alnæs, Anders Logg, Kristian B. Ølgaard, Marie E. Rognes, and Garth N. Wells. Unified form language: A domain-specific language for weak formula- tions of partial differential equations.ACM Transactions on Mathematical Software, 40(2):9:1–9:37, 2014. doi: 10.1145/2566630

work page doi:10.1145/2566630 2014
[11]

Karni- adakis

Maziar Raissi, Paris Perdikaris, and George E. Karni- adakis. Physics-informed neural networks: A deep learn- ing framework for solving forward and inverse problems involving nonlinear partial differential equations.Jour- nal of Computational Physics, 378:686–707, 2019

2019
[12]

Karniadakis

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George E. Karniadakis. Learning nonlinear opera- tors via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3): 218–229, 2021

2021
[13]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

2021
[14]

Battaglia

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh-based simulation with graph networks. InInternational Conference on Learning Representations, 2021

2021
[15]

Learning to simulate complex physics with graph net- works

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter Battaglia. Learning to simulate complex physics with graph net- works. InInternational Conference on Machine Learn- ing, 2020

2020
[16]

A judge agent closes the reliability gap in AI-generated scientific simulation

Chengshuai Yang. A judge agent closes the reliability gap in AI-generated scientific simulation. arXiv preprint arXiv:2603.25780, 2026

arXiv 2026
[17]

AutoNumerics: An au- tonomous, PDE-agnostic multi-agent pipeline for scien- tific computing

Jianda Du, Youran Sun, et al. AutoNumerics: An au- tonomous, PDE-agnostic multi-agent pipeline for scien- tific computing. arXiv preprint arXiv:2602.17607, 2026

arXiv 2026
[18]

Daniel N. Wilke. From perception to autonomous com- putational modeling: A multi-agent approach. arXiv preprint arXiv:2604.06788, 2026

Pith/arXiv arXiv 2026
[19]

Bran, Sam Cox, Oliver Schilter, Carlo Baldas- sari, Andrew D

Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldas- sari, Andrew D. White, and Philippe Schwaller. Chem- Crow: Augmenting large language models with chem- istry tools.Nature Machine Intelligence, 6:525–535,
[20]

doi: 10.1038/s42256-024-00832-8

work page doi:10.1038/s42256-024-00832-8
[21]

OpenFOAMGPT: A retrieval-augmented large language model (LLM) agent for OpenFOAM-based computa- tional fluid dynamics.Physics of Fluids, 37(3):037121,

Sandeep Pandey, Ran Xu, Wenkang Wang, and Xu Chu. OpenFOAMGPT: A retrieval-augmented large language model (LLM) agent for OpenFOAM-based computa- tional fluid dynamics.Physics of Fluids, 37(3):037121,
[22]

doi: 10.1063/5.0257274

work page doi:10.1063/5.0257274
[23]

MetaOpenFOAM: an LLM-based multi-agent frame- work for CFD

Yuxuan Chen, Xu Zhu, Hua Zhou, and Zhuyin Ren. MetaOpenFOAM: an LLM-based multi-agent frame- work for CFD. arXiv preprint arXiv:2407.21320, 2024. 19

arXiv 2024
[24]

Foam- Agent: Towards automated intelligent CFD workflows

Ling Yue, Nithin Somasekharan, Tingwen Zhang, Yadi Cao, Zhangze Chen, Shimin Di, and Shaowu Pan. Foam- Agent: Towards automated intelligent CFD workflows. arXiv preprint arXiv:2505.04997, 2025

arXiv 2025
[25]

Christian Miehe, Fabian Welschinger, and Martina Ho- facker. Thermodynamically consistent phase-field mod- els of fracture: Variational principles and multi-field FE implementations.International Journal for Numerical Methods in Engineering, 83(10):1273–1311, 2010. doi: 10.1002/nme.2861

work page doi:10.1002/nme.2861 2010
[26]

Francfort, and Jean-Jacques Marigo

Blaise Bourdin, Gilles A. Francfort, and Jean-Jacques Marigo. The variational approach to fracture.Journal of Elasticity, 91(1–3):5–148, 2008. doi: 10.1007/s10659-0 07-9107-3

work page doi:10.1007/s10659-0 2008
[27]

Tortorelli

Luigi Ambrosio and Vincenzo M. Tortorelli. Approx- imation of functionals depending on jumps by elliptic functionals via Γ-convergence.Communications on Pure and Applied Mathematics, 43(8):999–1036, 1990. doi: 10.1002/cpa.3160430805

work page doi:10.1002/cpa.3160430805 1990
[28]

Borden, Clemens V

Michael J. Borden, Clemens V . Verhoosel, Michael A. Scott, Thomas J. R. Hughes, and Chad M. Landis. A phase-field description of dynamic brittle fracture.Com- puter Methods in Applied Mechanics and Engineering, 217–220:77–95, 2012

2012
[29]

A continuum phase field model for fracture.Engineering Fracture Mechan- ics, 77(18):3625–3634, 2010

Charlotte Kuhn and Ralf M"uller. A continuum phase field model for fracture.Engineering Fracture Mechan- ics, 77(18):3625–3634, 2010

2010
[30]

Simo and Thomas J

Juan C. Simo and Thomas J. R. Hughes.Computational Inelasticity, volume 7 ofInterdisciplinary Applied Math- ematics. Springer, 1998

1998
[31]

de Souza Neto, Djordje Peri’c, and David R

Eduardo A. de Souza Neto, Djordje Peri’c, and David R. J. Owen.Computational Methods for Plasticity: The- ory and Applications. John Wiley & Sons, 2008

2008
[32]

FeaGPT: an end-to- end agentic-AI for finite element analysis

Yupeng Qi, Ran Xu, and Xu Chu. FeaGPT: an end-to- end agentic-AI for finite element analysis. arXiv preprint arXiv:2510.21993, 2025

arXiv 2025
[33]

Johnson, R

Shaochen Hou, R. Johnson, R. Makhija, L. Chen, and Y . Ye. AutoFEA: Enhancing AI copilot by integrating finite element analysis using large language models with graph neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 24078–24085, 2025. doi: 10.1609/aaai.v39i22.34582

work page doi:10.1609/aaai.v39i22.34582 2025
[34]

A lightweight large language model-based multi-agent system for 2D frame structural analysis

Ziheng Geng, Jiachen Liu, Ran Cao, Lu Cheng, Haifeng Wang, and Minghui Cheng. A lightweight large language model-based multi-agent system for 2D frame structural analysis. arXiv preprint arXiv:2510.05414, 2025

arXiv 2025
[35]

Integrating large language models for automated structural analysis

Haoran Liang, Mohammad Talebi Kalaleh, and Qipei Mei. Integrating large language models for automated structural analysis. arXiv preprint arXiv:2504.09754, 2025

arXiv 2025
[36]

Brenner, and Peter Nor- gaard

Nayantara Mudur, Hao Cui, Subhashini Venugopalan, Paul Raccuglia, Michael P. Brenner, and Peter Nor- gaard. FEABench: Evaluating language models on multiphysics reasoning ability. arXiv preprint arXiv:2504.06260, 2025

arXiv 2025
[37]

A self-correcting multi-agent LLM framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

Donggeun Park, Hyeonbin Moon, and Seunghwa Ryu. A self-correcting multi-agent LLM framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

2026
[38]

ALL-FEM: Agentic large language models fine-tuned for finite element methods.Computer Methods in Applied Mechanics and Engineering, 457: 118985, 2026

Rushikesh Deotale, Adithya Srinivasan, Mahmoud Golestanian, Yuan Tian, Tianyi Zhang, Pavlos Vlachos, and Hector Gomez. ALL-FEM: Agentic large language models fine-tuned for finite element methods.Computer Methods in Applied Mechanics and Engineering, 457: 118985, 2026

2026
[39]

Optimizing collaboration of LLM-based agents for finite element analysis

Chuan Tian and Yilei Zhang. Optimizing collaboration of LLM-based agents for finite element analysis. arXiv preprint arXiv:2408.13406, 2024

arXiv 2024
[40]

Ogden.Non-Linear Elastic Deformations

Raymond W. Ogden.Non-Linear Elastic Deformations. Dover Publications, 1997

1997
[41]

Holzapfel.Nonlinear Solid Mechanics: A Continuum Approach for Engineering

Gerhard A. Holzapfel.Nonlinear Solid Mechanics: A Continuum Approach for Engineering. John Wiley & Sons, 2000

2000
[42]

Wood.Nonlinear Contin- uum Mechanics for Finite Element Analysis

Javier Bonet and Richard D. Wood.Nonlinear Contin- uum Mechanics for Finite Element Analysis. Cambridge University Press, 2008

2008
[43]

E., et al

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, et al. SciPy 1.0: Fundamental algorithms for scientific com- puting in Python.Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2. 20 Supplementary Information Parser benchmark prompt list The supplementary parser prompt list for the paper-facing set contains the 15 parser benchmark prompts on...

work page doi:10.1038/s41592-019-0686-2 2020

[1] [1]

Thomas J. R. Hughes.The Finite Element Method: Lin- ear Static and Dynamic Finite Element Analysis. Dover Publications, Mineola, NY , 2012

2012

[2] [2]

O. C. Zienkiewicz, R. L. Taylor, and J. Z. Zhu.The Finite Element Method: Its Basis and Fundamentals. Butterworth-Heinemann, 7 edition, 2013

2013

[3] [3]

Pren- tice Hall, 1996

Klaus-J"urgen Bathe.Finite Element Procedures. Pren- tice Hall, 1996

1996

[4] [4]

John Wiley & Sons, 2014

Ted Belytschko, Wing Kam Liu, Brian Moran, and Khalil Elkhodary.Nonlinear Finite Elements for Continua and Structures. John Wiley & Sons, 2014

2014

[5] [5]

Springer, 2008

Peter Wriggers.Nonlinear Finite Element Methods. Springer, 2008

2008

[6] [6]

Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities.International Journal for Numerical Methods in Engineering, 79(11):1309–1331,

Christophe Geuzaine and Jean-François Remacle. Gmsh: A 3-D finite element mesh generator with built-in pre- and post-processing facilities.International Journal for Numerical Methods in Engineering, 79(11):1309–1331,

[7] [7]

doi: 10.1002/nme.2579

work page doi:10.1002/nme.2579

[8] [8]

Adams, et al

Satish Balay, Shrirang Abhyankar, Mark F. Adams, et al. PETSc/TAO users manual. Technical Report ANL-21/39 - Revision 3.20, Argonne National Laboratory, 2023

2023

[9] [9]

Wells, editors.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, vol- ume 84 ofLecture Notes in Computational Science and Engineering

Anders Logg, Kent-Andre Mardal, and Garth N. Wells, editors.Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book, vol- ume 84 ofLecture Notes in Computational Science and Engineering. Springer, 2012. doi: 10.1007/978-3-642-2 3099-8

work page doi:10.1007/978-3-642-2 2012

[10] [10]

Alnæs, Anders Logg, Kristian B

Martin S. Alnæs, Anders Logg, Kristian B. Ølgaard, Marie E. Rognes, and Garth N. Wells. Unified form language: A domain-specific language for weak formula- tions of partial differential equations.ACM Transactions on Mathematical Software, 40(2):9:1–9:37, 2014. doi: 10.1145/2566630

work page doi:10.1145/2566630 2014

[11] [11]

Karni- adakis

Maziar Raissi, Paris Perdikaris, and George E. Karni- adakis. Physics-informed neural networks: A deep learn- ing framework for solving forward and inverse problems involving nonlinear partial differential equations.Jour- nal of Computational Physics, 378:686–707, 2019

2019

[12] [12]

Karniadakis

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George E. Karniadakis. Learning nonlinear opera- tors via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3(3): 218–229, 2021

2021

[13] [13]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

2021

[14] [14]

Battaglia

Tobias Pfaff, Meire Fortunato, Alvaro Sanchez-Gonzalez, and Peter W. Battaglia. Learning mesh-based simulation with graph networks. InInternational Conference on Learning Representations, 2021

2021

[15] [15]

Learning to simulate complex physics with graph net- works

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter Battaglia. Learning to simulate complex physics with graph net- works. InInternational Conference on Machine Learn- ing, 2020

2020

[16] [16]

A judge agent closes the reliability gap in AI-generated scientific simulation

Chengshuai Yang. A judge agent closes the reliability gap in AI-generated scientific simulation. arXiv preprint arXiv:2603.25780, 2026

arXiv 2026

[17] [17]

AutoNumerics: An au- tonomous, PDE-agnostic multi-agent pipeline for scien- tific computing

Jianda Du, Youran Sun, et al. AutoNumerics: An au- tonomous, PDE-agnostic multi-agent pipeline for scien- tific computing. arXiv preprint arXiv:2602.17607, 2026

arXiv 2026

[18] [18]

Daniel N. Wilke. From perception to autonomous com- putational modeling: A multi-agent approach. arXiv preprint arXiv:2604.06788, 2026

Pith/arXiv arXiv 2026

[19] [19]

Bran, Sam Cox, Oliver Schilter, Carlo Baldas- sari, Andrew D

Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldas- sari, Andrew D. White, and Philippe Schwaller. Chem- Crow: Augmenting large language models with chem- istry tools.Nature Machine Intelligence, 6:525–535,

[20] [20]

doi: 10.1038/s42256-024-00832-8

work page doi:10.1038/s42256-024-00832-8

[21] [21]

OpenFOAMGPT: A retrieval-augmented large language model (LLM) agent for OpenFOAM-based computa- tional fluid dynamics.Physics of Fluids, 37(3):037121,

Sandeep Pandey, Ran Xu, Wenkang Wang, and Xu Chu. OpenFOAMGPT: A retrieval-augmented large language model (LLM) agent for OpenFOAM-based computa- tional fluid dynamics.Physics of Fluids, 37(3):037121,

[22] [22]

doi: 10.1063/5.0257274

work page doi:10.1063/5.0257274

[23] [23]

MetaOpenFOAM: an LLM-based multi-agent frame- work for CFD

Yuxuan Chen, Xu Zhu, Hua Zhou, and Zhuyin Ren. MetaOpenFOAM: an LLM-based multi-agent frame- work for CFD. arXiv preprint arXiv:2407.21320, 2024. 19

arXiv 2024

[24] [24]

Foam- Agent: Towards automated intelligent CFD workflows

Ling Yue, Nithin Somasekharan, Tingwen Zhang, Yadi Cao, Zhangze Chen, Shimin Di, and Shaowu Pan. Foam- Agent: Towards automated intelligent CFD workflows. arXiv preprint arXiv:2505.04997, 2025

arXiv 2025

[25] [25]

Christian Miehe, Fabian Welschinger, and Martina Ho- facker. Thermodynamically consistent phase-field mod- els of fracture: Variational principles and multi-field FE implementations.International Journal for Numerical Methods in Engineering, 83(10):1273–1311, 2010. doi: 10.1002/nme.2861

work page doi:10.1002/nme.2861 2010

[26] [26]

Francfort, and Jean-Jacques Marigo

Blaise Bourdin, Gilles A. Francfort, and Jean-Jacques Marigo. The variational approach to fracture.Journal of Elasticity, 91(1–3):5–148, 2008. doi: 10.1007/s10659-0 07-9107-3

work page doi:10.1007/s10659-0 2008

[27] [27]

Tortorelli

Luigi Ambrosio and Vincenzo M. Tortorelli. Approx- imation of functionals depending on jumps by elliptic functionals via Γ-convergence.Communications on Pure and Applied Mathematics, 43(8):999–1036, 1990. doi: 10.1002/cpa.3160430805

work page doi:10.1002/cpa.3160430805 1990

[28] [28]

Borden, Clemens V

Michael J. Borden, Clemens V . Verhoosel, Michael A. Scott, Thomas J. R. Hughes, and Chad M. Landis. A phase-field description of dynamic brittle fracture.Com- puter Methods in Applied Mechanics and Engineering, 217–220:77–95, 2012

2012

[29] [29]

A continuum phase field model for fracture.Engineering Fracture Mechan- ics, 77(18):3625–3634, 2010

Charlotte Kuhn and Ralf M"uller. A continuum phase field model for fracture.Engineering Fracture Mechan- ics, 77(18):3625–3634, 2010

2010

[30] [30]

Simo and Thomas J

Juan C. Simo and Thomas J. R. Hughes.Computational Inelasticity, volume 7 ofInterdisciplinary Applied Math- ematics. Springer, 1998

1998

[31] [31]

de Souza Neto, Djordje Peri’c, and David R

Eduardo A. de Souza Neto, Djordje Peri’c, and David R. J. Owen.Computational Methods for Plasticity: The- ory and Applications. John Wiley & Sons, 2008

2008

[32] [32]

FeaGPT: an end-to- end agentic-AI for finite element analysis

Yupeng Qi, Ran Xu, and Xu Chu. FeaGPT: an end-to- end agentic-AI for finite element analysis. arXiv preprint arXiv:2510.21993, 2025

arXiv 2025

[33] [33]

Johnson, R

Shaochen Hou, R. Johnson, R. Makhija, L. Chen, and Y . Ye. AutoFEA: Enhancing AI copilot by integrating finite element analysis using large language models with graph neural networks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 24078–24085, 2025. doi: 10.1609/aaai.v39i22.34582

work page doi:10.1609/aaai.v39i22.34582 2025

[34] [34]

A lightweight large language model-based multi-agent system for 2D frame structural analysis

Ziheng Geng, Jiachen Liu, Ran Cao, Lu Cheng, Haifeng Wang, and Minghui Cheng. A lightweight large language model-based multi-agent system for 2D frame structural analysis. arXiv preprint arXiv:2510.05414, 2025

arXiv 2025

[35] [35]

Integrating large language models for automated structural analysis

Haoran Liang, Mohammad Talebi Kalaleh, and Qipei Mei. Integrating large language models for automated structural analysis. arXiv preprint arXiv:2504.09754, 2025

arXiv 2025

[36] [36]

Brenner, and Peter Nor- gaard

Nayantara Mudur, Hao Cui, Subhashini Venugopalan, Paul Raccuglia, Michael P. Brenner, and Peter Nor- gaard. FEABench: Evaluating language models on multiphysics reasoning ability. arXiv preprint arXiv:2504.06260, 2025

arXiv 2025

[37] [37]

A self-correcting multi-agent LLM framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

Donggeun Park, Hyeonbin Moon, and Seunghwa Ryu. A self-correcting multi-agent LLM framework for language-based physics simulation and explanation.npj Artificial Intelligence, 2(1):10, 2026

2026

[38] [38]

ALL-FEM: Agentic large language models fine-tuned for finite element methods.Computer Methods in Applied Mechanics and Engineering, 457: 118985, 2026

Rushikesh Deotale, Adithya Srinivasan, Mahmoud Golestanian, Yuan Tian, Tianyi Zhang, Pavlos Vlachos, and Hector Gomez. ALL-FEM: Agentic large language models fine-tuned for finite element methods.Computer Methods in Applied Mechanics and Engineering, 457: 118985, 2026

2026

[39] [39]

Optimizing collaboration of LLM-based agents for finite element analysis

Chuan Tian and Yilei Zhang. Optimizing collaboration of LLM-based agents for finite element analysis. arXiv preprint arXiv:2408.13406, 2024

arXiv 2024

[40] [40]

Ogden.Non-Linear Elastic Deformations

Raymond W. Ogden.Non-Linear Elastic Deformations. Dover Publications, 1997

1997

[41] [41]

Holzapfel.Nonlinear Solid Mechanics: A Continuum Approach for Engineering

Gerhard A. Holzapfel.Nonlinear Solid Mechanics: A Continuum Approach for Engineering. John Wiley & Sons, 2000

2000

[42] [42]

Wood.Nonlinear Contin- uum Mechanics for Finite Element Analysis

Javier Bonet and Richard D. Wood.Nonlinear Contin- uum Mechanics for Finite Element Analysis. Cambridge University Press, 2008

2008

[43] [43]

E., et al

Pauli Virtanen, Ralf Gommers, Travis E. Oliphant, et al. SciPy 1.0: Fundamental algorithms for scientific com- puting in Python.Nature Methods, 17:261–272, 2020. doi: 10.1038/s41592-019-0686-2. 20 Supplementary Information Parser benchmark prompt list The supplementary parser prompt list for the paper-facing set contains the 15 parser benchmark prompts on...

work page doi:10.1038/s41592-019-0686-2 2020