ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods

Adithya Srinivasan; Hector Gomez; Pavlos Vlachos; Rushikesh Deotale; Tianyi Zhang; Yuan Tian

arxiv: 2603.21011 · v2 · submitted 2026-01-08 · 💻 cs.CE · cs.AI· cs.LG· cs.MS· cs.NA· math.NA

ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods

Rushikesh Deotale , Adithya Srinivasan , Yuan Tian , Tianyi Zhang , Pavlos Vlachos , Hector Gomez This is my paper

Pith reviewed 2026-05-16 15:45 UTC · model grok-4.3

classification 💻 cs.CE cs.AIcs.LGcs.MScs.NAmath.NA

keywords finite element methodsFEniCSlarge language modelsagentic AIcode generationcomputational engineeringmulti-agent systemsPDE simulation

0 comments

The pith

Fine-tuned LLMs inside a multi-agent loop with runtime feedback generate correct FEniCS code for 71.79 percent of tested finite-element problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that domain-specific fine-tuning of LLMs on verified FEniCS scripts, when placed inside an agentic workflow that includes problem formulation, code generation, debugging, and result visualization, produces working simulation code for solid mechanics, fluid flow, and multiphysics problems. Conventional LLMs often hallucinate or ignore variational structure, but the added agents and feedback loop close that gap enough for the 120B-parameter model to exceed the success rate of a larger non-agentic model. A training corpus of more than one thousand expert and generated scripts supplies the necessary examples across linear and nonlinear elasticity, Newtonian and non-Newtonian fluids, fluid-structure interaction, and moving domains.

Core claim

An autonomous system called ALL-FEM orchestrates specialized agents powered by LLMs fine-tuned on a corpus of 1000+ verified FEniCS scripts; when the best such model (GPT OSS 120B) operates inside the multi-agent workflow with runtime feedback, it reaches 71.79 percent code-level success on 39 benchmarks that cover linear/nonlinear elasticity, plasticity, Newtonian/non-Newtonian flow, thermofluids, fluid-structure interaction, phase separation, and transport on moving domains, surpassing a non-agentic deployment of GPT 5 Thinking.

What carries the argument

The multi-agent workflow with runtime feedback that uses fine-tuned LLMs to translate problem statements into PDEs, generate and debug FEniCS code, and visualize results.

If this is right

Engineers can obtain working simulation code from natural-language problem statements without writing or debugging the code themselves.
The same agentic pattern supplies a template for automating other computational-science workflows that require both code generation and runtime verification.
Smaller, fine-tuned models become competitive with much larger general models once they are embedded in domain-specific agent loops.
Rapid iteration over geometries, material laws, and boundary conditions becomes feasible for design exploration in manufacturing and research.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach could be tested on industrial-scale meshes and nonlinear solvers to measure how far the success rate drops when problem size increases.
Integration with geometry kernels or CAD files would let the system accept drawings rather than text descriptions as input.
Similar fine-tuning plus agent orchestration might apply to other open-source simulation libraries beyond FEniCS.

Load-bearing premise

The 39 benchmarks and the automated verification process capture the full range of real-world finite-element problems without missing subtle numerical or physical errors that would appear only in production use.

What would settle it

Running the generated codes on a new set of problems outside the 39 benchmarks and checking whether their numerical solutions match independent reference results or experimental data to within engineering tolerances.

read the original abstract

Finite element (FE) analysis guides the design and verification of nearly all manufactured objects. It is at the core of computational engineering, enabling simulation of complex physical systems, from fluids and solids to multiphysics systems. However, implementing FE codes and analyzing simulation results demands expertise across numerical analysis, continuum mechanics, and programming. Conventional Large Language Models (LLMs) can generate FE code, but they hallucinate, lack awareness of variational structures, and cannot close the loop from problem statement to a verified solution. Here, we propose ALL-FEM, an autonomous simulation system that integrates agentic AI with domain-specific, fine-tuned LLMs for FEniCS code generation across solid, fluid, and multiphysics applications. We construct a corpus of 1000+ verified FEniCS scripts by combining 500+ curated expert codes with a retrieval-augmented, multi-LLM pipeline that generates and filters codes for diverse PDEs, geometries, and boundary conditions. We used the corpus to fine-tune LLMs with 3B to 120B parameters. Our agentic framework orchestrates specialized agents, powered by fine-tuned LLMs, to formulate problems as PDEs, generate and debug code and visualize the results. We evaluated the system on 39 benchmarks that include problems of linear/nonlinear elasticity, plasticity, Newtonian/non-Newtonian flow, thermofluids, fluid-structure interaction, phase separation, and transport on moving domains. Embedded in a multi-agent workflow with runtime feedback, the best fine-tuned model (GPT OSS 120B) achieves code-level success of 71.79%, outperforming a non-agentic deployment of GPT 5 Thinking. By showing that relatively small, fine-tuned LLMs, orchestrated through agentic frameworks, can automate FE workflows, ALL-FEM offers a blueprint for autonomous simulation systems in computational science and engineering.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ALL-FEM shows fine-tuned LLMs in an agentic loop can produce executable FEniCS code at 72% on 39 benchmarks, but the metric may miss numerical or physical errors.

read the letter

The main point is that this system fine-tunes LLMs on a 1000+ script FEniCS corpus and wraps them in agents that handle problem setup, code generation, debugging via runtime feedback, and result visualization. The 120B model reaches 71.79% code success on the 39 benchmarks covering elasticity, Newtonian and non-Newtonian flow, fluid-structure interaction, and phase separation, beating a plain GPT 5 Thinking run without the agent structure or fine-tuning.

Referee Report

3 major / 2 minor

Summary. The paper introduces ALL-FEM, an autonomous agentic system that integrates fine-tuned LLMs (3B–120B parameters) with specialized agents for FEniCS code generation in finite element analysis. A corpus of 1000+ verified scripts is built from 500+ expert codes plus a retrieval-augmented multi-LLM pipeline. The system is evaluated on 39 benchmarks spanning linear/nonlinear elasticity, plasticity, Newtonian/non-Newtonian flows, thermofluids, fluid-structure interaction, phase separation, and moving-domain transport. The best model (GPT OSS 120B) embedded in a multi-agent workflow with runtime feedback achieves 71.79% code-level success, outperforming a non-agentic GPT 5 Thinking baseline.

Significance. If the verification process ensures that generated codes produce numerically accurate and physically correct solutions (rather than merely executing without runtime errors), the work would offer a practical blueprint for automating FE workflows in computational engineering. The combination of domain-specific fine-tuning and agentic orchestration with feedback demonstrates measurable gains over direct LLM prompting on variational-form and solver tasks. The scale of the corpus and the breadth of benchmark categories (solids, fluids, multiphysics) would make the result relevant to researchers seeking to reduce manual coding effort in simulation-driven design.

major comments (3)

[Abstract / Evaluation] Abstract and evaluation description: the headline 71.79% code-level success rate for the 120B model is reported without any quantitative specification of the verification criteria. It is unclear whether success requires only runtime execution without errors or includes checks such as L2-norm agreement with analytical solutions, residual norms on the weak form, or observed mesh-convergence rates. This distinction is load-bearing because codes can execute while containing incorrect variational formulations, improper quadrature, or boundary-condition errors that only manifest under refinement or in coupled problems.
[Corpus construction] Corpus construction: the manuscript states that 1000+ scripts were obtained by combining expert codes with a multi-LLM retrieval-augmented pipeline, yet supplies no numbers on how many candidates were generated, how many were discarded by the filtering step, per-PDE-category error rates, or the exact verification procedure used to label a script “verified.” Without these statistics the quality and diversity of the fine-tuning data cannot be assessed, undermining claims about the robustness of the resulting models.
[Benchmarks] Benchmarks: the 39 problems are listed by category (elasticity, plasticity, flows, FSI, etc.) but no table or appendix enumerates the individual problems, their governing equations, mesh types, or whether analytical solutions exist for quantitative validation. Per-benchmark success rates are also absent, making it impossible to determine whether the aggregate 71.79% figure is driven by a few easy cases or holds across the full range of nonlinear and multiphysics problems.

minor comments (2)

Clarify the exact model names (GPT OSS 120B, GPT 5 Thinking) and provide references or parameter counts for the baseline models in the main text.
Consider adding a summary table of the 39 benchmarks that includes problem type, domain, boundary conditions, and whether an analytical solution is available for verification.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have highlighted important areas for clarification and expansion. We address each major comment point by point below, indicating the revisions that will be incorporated in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and evaluation description: the headline 71.79% code-level success rate for the 120B model is reported without any quantitative specification of the verification criteria. It is unclear whether success requires only runtime execution without errors or includes checks such as L2-norm agreement with analytical solutions, residual norms on the weak form, or observed mesh-convergence rates. This distinction is load-bearing because codes can execute while containing incorrect variational formulations, improper quadrature, or boundary-condition errors that only manifest under refinement or in coupled problems.

Authors: We agree that the verification criteria require explicit definition. In the current manuscript, code-level success is defined as the generated FEniCS script executing without runtime errors while producing results consistent with expected physical behavior; for the subset of benchmarks possessing analytical or high-fidelity reference solutions, this includes quantitative checks such as L2-norm agreement and residual norms on the weak form. To remove any ambiguity, we will revise the abstract and add a dedicated paragraph in the Evaluation section that enumerates the precise success criteria, including runtime execution, residual and convergence checks where applicable, and the distinction between problems with and without analytical solutions. revision: yes
Referee: [Corpus construction] Corpus construction: the manuscript states that 1000+ scripts were obtained by combining expert codes with a multi-LLM retrieval-augmented pipeline, yet supplies no numbers on how many candidates were generated, how many were discarded by the filtering step, per-PDE-category error rates, or the exact verification procedure used to label a script “verified.” Without these statistics the quality and diversity of the fine-tuning data cannot be assessed, undermining claims about the robustness of the resulting models.

Authors: The referee is correct that these quantitative details are missing. The manuscript currently gives only aggregate figures. In the revision we will expand the Corpus Construction section with: (i) the total number of candidate scripts generated by the retrieval-augmented multi-LLM pipeline, (ii) the number and percentage discarded at each filtering stage together with the primary rejection reasons, (iii) per-PDE-category verification success rates, and (iv) a step-by-step description of the verification procedure, which combines automated syntax/execution tests with expert manual review for variational correctness and physical fidelity. revision: yes
Referee: [Benchmarks] Benchmarks: the 39 problems are listed by category (elasticity, plasticity, flows, FSI, etc.) but no table or appendix enumerates the individual problems, their governing equations, mesh types, or whether analytical solutions exist for quantitative validation. Per-benchmark success rates are also absent, making it impossible to determine whether the aggregate 71.79% figure is driven by a few easy cases or holds across the full range of nonlinear and multiphysics problems.

Authors: We accept that the benchmark documentation is insufficiently granular. We will add a new appendix (Appendix B) that provides a table enumerating all 39 benchmarks, including the governing PDEs, mesh types, boundary conditions, and the availability of analytical or reference solutions. In addition, the Evaluation section will be augmented with a table of per-benchmark success rates for the GPT OSS 120B model (and the non-agentic baseline) so that readers can assess performance variation across linear/nonlinear, single-physics, and multiphysics cases. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the ALL-FEM evaluation pipeline

full rationale

The paper constructs a training corpus via a multi-LLM retrieval-augmented pipeline and fine-tunes models on it, then reports empirical success rates on a separate set of 39 held-out benchmarks using runtime verification. This performance metric is measured directly against external benchmark problems rather than being derived by construction from the corpus-generation or verification loop. No self-definitional reductions, fitted inputs renamed as predictions, load-bearing self-citations, or ansatzes appear in the described chain; the central claim retains independent empirical content.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the empirical effectiveness of fine-tuning and agentic orchestration rather than new physical axioms or derived equations; the main unstated premises are that LLMs can internalize variational structures from code examples and that runtime feedback reliably catches numerical errors.

axioms (2)

domain assumption Fine-tuned LLMs can generate syntactically and semantically correct FEniCS code for a wide range of PDEs when trained on verified examples
Invoked in the construction of the training corpus and the reported success rates.
domain assumption Multi-agent workflows with runtime execution feedback can close the loop from problem statement to verified solution without external human intervention
Central to the agentic framework description.

pith-pipeline@v0.9.0 · 5678 in / 1619 out tokens · 87737 ms · 2026-05-16T15:45:08.504337+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We construct a corpus of 1000+ verified FEniCS scripts... fine-tune LLMs with 3B to 120B parameters... multi-agent workflow... 71.79% code-level success
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

agentic AI with domain-specific, fine-tuned LLMs for FEniCS code generation across solid, fluid, and multiphysics applications

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Perception to Autonomous Computational Modeling: A Multi-Agent Approach
cs.CE 2026-04 unverdicted novelty 5.0

A multi-agent LLM framework autonomously completes the full computational mechanics pipeline from a photograph to a code-compliant engineering report on a steel L-bracket example.

Reference graph

Works this paper leans on

106 extracted references · 106 canonical work pages · cited by 1 Pith paper · 17 internal anchors

[1]

Dover Publications, Mineola, NY (2012)

Hughes, T.J.R.: The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Dover Publications, Mineola, NY (2012)

work page 2012
[2]

Logg, K.-A

Logg, A., Mardal, K., Wells, G. (eds.): Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Lecture Notes in Computational Science and Engineering, vol. 84, p. 731. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23099-8

work page doi:10.1007/978-3-642-23099-8 2012
[3]

Butterworth-Heinemann, Oxford (2005)

Zienkiewicz, O.C., Taylor, R.L.: The Finite Element Method for Solid and Structural Mechanics. Butterworth-Heinemann, Oxford (2005). https://books.google.com/books?id=VvpU3zssDOwC

work page 2005
[4]

Hughes, J

Hughes, T.J.R., Cottrell, J.A., Bazilevs, Y.: Isogeometric analysis: Cad, finite elements, nurbs, exact geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering 194(39), 4135–4195 (2005) https://doi.org/10.1016/j.cma.2004.10.008

work page doi:10.1016/j.cma.2004.10.008 2005
[5]

The FEniCS Project Version 1.5

Alnæs, M., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M., Wells, G.: The FEniCS project version 1.5. Archive of Numerical Software3(100), 9–23 (2015) https://doi.org/10.11588/ans.2015.100.20553

work page doi:10.11588/ans.2015.100.20553 2015
[6]

Imperial College London and University of Oxford and Baylor University and University of Washington, (2023)

Ham, D.A., Kelly, P.H.J., Mitchell, L., Cotter, C.J., Kirby, R.C., Sagiyama, K., Bouziani, N., Vorderwuelbecke, S., Gregory, T.J., Betteridge, J., Shapero, D.R., Nixon-Hill, R.W., Ward, C.J., Farrell, P.E., Brubeck, P.D., Marsden, I., Gibson, T.H., Homolya, M., Sun, T., McRae, A.T.T., Luporini, F., Gregory, A., Lange, M., Funke, S.W., Rathgeber, F., Berce...

work page doi:10.25561/104839 2023
[7]

Computer Methods in Applied Mechanics and Engineering 450, 118591 (2026)

Guo, J., Park, C., Qian, D., Hughes, T.J., Liu, W.K.: Large language model-empowered next- generation computer-aided engineering. Computer Methods in Applied Mechanics and Engineering 450, 118591 (2026)

work page 2026
[8]

MechAgents: Large lan- guage model multi-agent collaborations can solve mechan- ics problems, generate new data, and integrate knowledge

Ni, B., Buehler, M.J.: MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge (2023). https://arxiv.org/abs/ 2311.08166

work page arXiv 2023
[9]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Xu, C., Chen, Y., Wang, L., Luu, A.T., Bi, W., Shi, F., Shi, S.: Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models (2025). https://arxiv.org/abs/2309.01219

work page internal anchor Pith review Pith/arXiv arXiv 2025
[10]

A survey on evaluation of large language models

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P.S., Yang, Q., Xie, X.: A Survey on Evaluation of Large Language Models (2023). https://arxiv.org/abs/2307.03109

work page arXiv 2023
[11]

CodeMirage : Hallucinations in Code Generated by Large Language Models , 2024

Agarwal, V., Pei, Y., Alamir, S., Liu, X.: CodeMirage: Hallucinations in Code Generated by Large Language Models (2025). https://arxiv.org/abs/2408.08333

work page arXiv 2025
[12]

More agents is all you need

Li, J., Zhang, Q., Yu, Y., Fu, Q., Ye, D.: More Agents Is All You Need (2024). https://arxiv.org/ abs/2402.05120

work page arXiv 2024
[13]

https://arxiv.org/abs/2408.13406

Tian, C., Zhang, Y.: Optimizing Collaboration of LLM based Agents for Finite Element Analysis (2024). https://arxiv.org/abs/2408.13406

work page arXiv 2024
[14]

Digital Discovery3(7), 1389–1409 (2024) https://doi.org/10.1039/d4dd00013g

Ghafarollahi, A., Buehler, M.J.: Protagents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. Digital Discovery3(7), 1389–1409 (2024) https://doi.org/10.1039/d4dd00013g

work page doi:10.1039/d4dd00013g 2024
[15]

Honeycomb: A flexible llm-based agent system for materials science

Zhang, H., Song, Y., Hou, Z., Miret, S., Liu, B.: HoneyComb: A Flexible LLM-Based Agent System for Materials Science (2024). https://arxiv.org/abs/2409.00135 44

work page arXiv 2024
[16]

In: Advances in Neural Information Processing Systems (NeurIPS 2020) (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.-t., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge- intensive nlp tasks. In: Advances in Neural Information Processing Systems (NeurIPS 2020) (2020)

work page 2020
[17]

Physics of Fluids37(3), 035120 (2025) https://doi.org/10.1063/5.0257555

Pandey, S., Xu, R., Wang, W., Chu, X.: Openfoamgpt: A retrieval-augmented large language model (llm) agent for openfoam-based computational fluid dynamics. Physics of Fluids37(3), 035120 (2025) https://doi.org/10.1063/5.0257555

work page doi:10.1063/5.0257555 2025
[18]

Theoretical and Applied Mechanics Letters, 100623 (2025) https://doi.org/10.1016/ j.taml.2025.100623

Wang, W., Xu, R., Feng, J., Zhang, Q., Pandey, S., Chu, X.: A status quo investigation of large-language models for cost-effective computational fluid dynamics automation with Open- FOAMGPT. Theoretical and Applied Mechanics Letters, 100623 (2025) https://doi.org/10.1016/ j.taml.2025.100623

work page arXiv 2025
[19]

URL https://arxiv.org/abs/2504.19338

Feng, J., Xu, R., Chu, X.: Openfoamgpt 2.0: end-to-end, trustworthy automation for computational fluid dynamics. arXiv preprint arXiv:2504.19338 (2025)

work page arXiv 2025
[20]

arXiv preprint arXiv:2509.18178 (2025)

Yue, L., Somasekharan, N., Zhang, T., Cao, Y., Pan, S.: Foam-agent: An end-to-end composable multi-agent framework for automating cfd simulation in openfoam. arXiv preprint arXiv:2509.18178 (2025)

work page arXiv 2025
[21]

Theoretical and Applied Mechanics Letters15, 100594 (2025) https: //doi.org/10.1016/j.taml.2025.100594

Dong, Z., Lu, Z., Yang, Y.: Fine-tuning a large language model for automating computational fluid dynamics simulations. Theoretical and Applied Mechanics Letters15, 100594 (2025) https: //doi.org/10.1016/j.taml.2025.100594

work page doi:10.1016/j.taml.2025.100594 2025
[22]

Metaopenfoam: an llm-based multi-agent framework for cfd

Chen, Y., Zhu, X., Zhou, H., Ren, Z.: Metaopenfoam: an llm-based multi-agent framework for cfd. arXiv preprint arXiv:2407.21320 (2024)

work page arXiv 2024
[23]

arXiv preprint arXiv:2506.02019 (2025)

Fan, E., Hu, K., Wu, Z., Ge, J., Miao, J., Zhang, Y., Sun, H., Wang, W., Zhang, T.: Chatcfd: An llm-driven agent for end-to-end cfd automation with domain-specific structured reasoning. arXiv preprint arXiv:2506.02019 (2025)

work page arXiv 2025
[24]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Hou, S., Johnson, R., Makhija, R., Chen, L., Ye, Y.: Autofea: Enhancing ai copilot by integrating finite element analysis using large language models with graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 24078–24087 (2025)

work page 2025
[25]

FeaGPT: an End-to-End agentic-AI for Finite Element Analysis

Qi, Y., Xu, R., Chu, X.: Feagpt: an end-to-end agentic-ai for finite element analysis. arXiv preprint arXiv:2510.21993 (2025)

work page arXiv 2025
[26]

Masset, R

Feng, J., Qi, Y., Xu, R., Pandey, S., Chu, X.: turbulence.ai: an end-to-end ai scientist for fluid mechanics. Theoretical and Applied Mechanics Letters, 100620 (2025) https://doi.org/10.1016/j. taml.2025.100620

work page doi:10.1016/j 2025
[27]

Jiang, G

Jiang, Q., Karniadakis, G.: Agenticsciml: Collaborative multi-agent systems for emergent discovery in scientific machine learning. arXiv preprint arXiv:2511.07262 (2025)

work page arXiv 2025
[28]

ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms

Toscano, J.D., Chen, D.T., Karniadakis, G.E.: Athena: Agentic team for hierarchical evolutionary numerical algorithms. arXiv preprint arXiv:2512.03476 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[29]

SIAM Review57(4), 483–531 (2015) https://doi.org/10.1137/ 130932715

Benner, P., Gugercin, S., Willcox, K.: A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Review57(4), 483–531 (2015) https://doi.org/10.1137/ 130932715

work page 2015
[30]

Accessed: 2026-02-22 (2025)

Guo, J., Domel, G., Park, C., Zhang, H., Gumus, O.C., Lu, Y., Wagner, G.J., Qian, D., Cao, J., Hughes, T.J.R., Liu, W.K.: Tensor-decomposition-based A Priori Surrogate (TAPS) modeling for ultra large-scale simulations. Accessed: 2026-02-22 (2025). https://arxiv.org/abs/2503.13933

work page arXiv 2026
[31]

PaLM 2 Technical Report

Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., Chu, E., Clark, J.H., Shafey, L.E., Huang, Y., Meier-Hellstern, K., Mishra, G., Moreira, E., Omernick, M., Robinson, K., Ruder, S., Tay, Y., Xiao, K., Xu, Y., Zhang, Y., Abrego, G.H., Ahn, J., Austin, J., Barham, P., Botha, J., Bradbury, J...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[32]

Attention Is All You Need

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need. arXiv (2017). https://doi.org/10.48550/ARXIV.1706.03762 . https: //arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2017
[33]

https://www.anthropic.com/news/ 100k-context-windows (2023)

Anthropic: Introducing 100K Context Windows. https://www.anthropic.com/news/ 100k-context-windows (2023)

work page 2023
[34]

Evaluating Large Language Models Trained on Code

Chen, M., Tworek, J., Jun, H., Yuan, Q., Oliveira Pinto, H.P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F.P., Cummings, D., Plappert, M., Chantzi...

work page internal anchor Pith review Pith/arXiv arXiv 2021
[35]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2023). https://arxiv. org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

In: The Eleventh International Conference on Learning Representations (2023)

Wang, X., Wei, J., Schuurmans, D., Le, Q.V., Chi, E.H., Narang, S., Chowdhery, A., Zhou, D.: Self- consistency improves chain of thought reasoning in language models. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=1PL1NIMMrw

work page 2023
[37]

Meta (2024)

AI, M.: Llama 3.2: Revolutionizing Edge AI and Vision with Open, Customizable Models. Meta (2024). https://huggingface.co/meta-llama/Llama-3.2-3B

work page 2024
[38]

https://huggingface.co/Qwen/Qwen3-32B

Qwen Team: Qwen3-32B. https://huggingface.co/Qwen/Qwen3-32B. Model card. Accessed: 18 November 2025 (2025)

work page 2025
[39]

https://huggingface.co/meta-llama/Llama-3

Meta (via Hugging Face): Llama-3.3-70B-Instruct. https://huggingface.co/meta-llama/Llama-3. 3-70B-Instruct (2024)

work page 2024
[41]

Technical report, OpenAI (aug 2025)

OpenAI: Gpt-5 system card. Technical report, OpenAI (aug 2025). Accessed: 2025-11-20. https: //cdn.openai.com/gpt-5-system-card.pdf

work page 2025
[42]

Accessed: 2025-11-20 (2024)

AI at Meta: Llama 3.2: Revolutionizing edge AI and vision with open, customizable models. Accessed: 2025-11-20 (2024). https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/

work page 2025
[43]

https://aider.chat/docs/leaderboards/

Aider: Aider LLM Leaderboards. https://aider.chat/docs/leaderboards/. Accessed: 18 November 2025 (2025)

work page 2025
[44]

The Llama 3 Herd of Models

Llama Team, AI@Meta: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024) 46

work page internal anchor Pith review Pith/arXiv arXiv 2024
[45]

https://www.vellum.ai/blog/llama-3-3-70b-vs-gpt-4o

Vellum AI: Llama 3.3 70B vs GPT-4o. https://www.vellum.ai/blog/llama-3-3-70b-vs-gpt-4o. Eval- uation shows GPT-4o leads on math (55% vs lower), reasoning (69% vs 44%), while Llama 3.3 70B is competitive on classification and strengths in coding, tool use and multilingual tasks; cost/latency analysis included. (2024)

work page 2024
[46]

https://openai.com/index/introducing-gpt-oss/

OpenAI: Introducing gpt-oss. https://openai.com/index/introducing-gpt-oss/. Accessed: 2025-11- 19 (2025)

work page 2025
[47]

https://github.com/ openai/gpt-oss

OpenAI: gpt-oss: gpt-oss-120b and gpt-oss-20b open-weight language models. https://github.com/ openai/gpt-oss. GitHub repository; accessed: 2025-11-19 (2025)

work page 2025
[48]

Measuring Massive Multitask Language Understanding

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., Steinhardt, J.: Measuring Massive Multitask Language Understanding (2021). https://arxiv.org/abs/2009.03300

work page internal anchor Pith review Pith/arXiv arXiv 2021
[49]

https://arxiv.org/abs/2311

Rein, D., Hou, B.L., Stickland, A.C., Petty, J., Pang, R.Y., Dirani, J., Michael, J., Bowman, S.R.: GPQA: A Graduate-Level Google-Proof Q and A Benchmark (2023). https://arxiv.org/abs/2311. 12022

work page 2023
[50]

https://arxiv.org/abs/2410.03131

Patel, B., Chakraborty, S., Suttle, W.A., Wang, M., Bedi, A.S., Manocha, D.: AIME: AI System Optimization via Multiple LLM Evaluators (2024). https://arxiv.org/abs/2410.03131

work page arXiv 2024
[51]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI: gpt-oss-120b & gpt-oss-20b Model Card. Original PDF available from OpenAI at https:// cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai gpt-oss model card.pdf (2025). https://arxiv.org/abs/2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025
[52]

https://huggingface.co/docs/transformers/ en/quantization/mxfp4

Hugging Face: MXFP4 Quantization in Transformers. https://huggingface.co/docs/transformers/ en/quantization/mxfp4. Accessed: 2025-11-24 (2025)

work page 2025
[53]

OpenAI: Introducing GPT-5. OpenAI. https://openai.com/index/introducing-gpt-5/ Accessed 2025-11-26

work page 2025
[54]

Journal of Chemical Information and Modeling62(24), 6365–6377 (2022) https://doi.org/ 10.1021/acs.jcim.2c00035

Huang, S., Cole, J.M.: Batterybert: A pretrained language model for battery database enhance- ment. Journal of Chemical Information and Modeling62(24), 6365–6377 (2022) https://doi.org/ 10.1021/acs.jcim.2c00035

work page doi:10.1021/acs.jcim.2c00035 2022
[55]

Bioinformatics , volume =

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics36(4), 1234–1240 (2019) https://doi.org/10.1093/bioinformatics/btz682

work page doi:10.1093/bioinformatics/btz682 2019
[56]

doi: 10.1038/s41587-022-01618-2

Madani, A., Krause, B., Greene, E.R., Subramanian, S., Mohr, B.P., Holton, J.M., Olmos, J.L., Xiong, C., Sun, Z.Z., Socher, R., Fraser, J.S., Naik, N.: Large language models generate functional protein sequences across diverse families. Nature Biotechnology41(8), 1099–1106 (2023) https: //doi.org/10.1038/s41587-022-01618-2

work page doi:10.1038/s41587-022-01618-2 2023
[57]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Han, Z., Gao, C., Liu, J., Zhang, J., Zhang, S.Q.: Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey (2024). https://arxiv.org/abs/2403.14608

work page internal anchor Pith review Pith/arXiv arXiv 2024
[58]

Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., Hashimoto, T.B.: Stanford Alpaca: An Instruction-following LLaMA model. GitHub. Retrieved from https://github.com/tatsu-lab/stanford alpaca (2023)

work page 2023
[59]

Langtangen, H.P., Logg, A.: Solving PDEs in Python: The FEniCS Tutorial I (Volume 3, Simula SpringerBriefs on Computing), p. 146. Springer, Cham, Switzerland (2017). https://doi.org/10. 1007/978-3-319-52462-7 . https://fenicsproject.org/tutorial/

work page 2017
[60]

https://github.com/ david-kamensky/mae-207-fea-for-coupled-problems

Kamensky, D.: MAE 207 – FEA for coupled problems: Code examples. https://github.com/ david-kamensky/mae-207-fea-for-coupled-problems. Code examples for the class ”MAE 207: FEA for coupled problems” at UC San Diego (Legacy FEniCS) (2022)

work page 2022
[61]

https://olddocs

The FEniCS Project: DOLFIN Python demos (legacy FEniCS documentation). https://olddocs. 47 fenicsproject.org/dolfin/latest/python/demos.html. Accessed 2025-11-19 (2017)

work page 2025
[62]

https://people.sc.fsu.edu/ ∼jburkardt/fenics src/fenics src.html

Burkardt, J.: FEniCS Examples. https://people.sc.fsu.edu/ ∼jburkardt/fenics src/fenics src.html. Mirror at https://people.math.sc.edu/Burkardt/fenics src/fenics src.html. Accessed 2025-11-19 (2020)

work page 2025
[63]

Accessed: 2 December 2025

OpenAI: Introducing OpenAI O3 and O4-mini. Accessed: 2 December 2025. https://openai.com/ index/introducing-o3-and-o4-mini/

work page 2025
[64]

https://research.google.com/colaboratory/faq.html

Google: Colaboratory (Google Colab). https://research.google.com/colaboratory/faq.html. Ac- cessed 2025-11-19

work page 2025
[65]

https://blog.google/technology/ google-deepmind/gemini-model-thinking-updates-march-2025/

Google DeepMind: Gemini 2.5: Our Most Intelligent AI Model. https://blog.google/technology/ google-deepmind/gemini-model-thinking-updates-march-2025/

work page 2025
[67]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations (ICLR 2022) (2022). Paper and code available at https://arxiv.org/abs/ 2106.09685. https://openreview.net/forum?id=nZeVKeeFYf9

work page internal anchor Pith review Pith/arXiv arXiv 2022
[68]

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Finetuning of Quantized LLMs (2023). https://arxiv.org/abs/2305.14314

work page internal anchor Pith review Pith/arXiv arXiv 2023
[69]

Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024

Shuttleworth, R., Andreas, J., Torralba, A., Sharma, P.: LoRA vs Full Fine-tuning: An Illusion of Equivalence (2024). https://arxiv.org/abs/2410.21228

work page arXiv 2024
[70]

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Kalajdzievski, D.: A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA (2023). https: //arxiv.org/abs/2312.03732

work page internal anchor Pith review Pith/arXiv arXiv 2023
[71]

Transactions on Machine Learning Research (2024)

Biderman, D., Portes, J., Ortiz, J.J.G., Paul, M., Greengard, P., Jennings, C., King, D., Havens, S., Chiley, V., Frankle, J., Blakeney, C., Cunningham, J.P.: LoRA learns less and forgets less. Transactions on Machine Learning Research (2024). Featured Certification

work page 2024
[72]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization (2019). https://arxiv.org/abs/ 1711.05101

work page internal anchor Pith review Pith/arXiv arXiv 2019
[73]

https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt

Rosen Center for Advanced Computing: AnvilGPT User Guide. https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt. Purdue University. Accessed: 2025-11-22 (2025)

work page 2025
[74]

https://ollama.com/

Ollama: Ollama. https://ollama.com/. Accessed: 2025-11-22 (2025)

work page 2025
[75]

https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt/api

Rosen Center for Advanced Computing: AnvilGPT API. https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt/api. Purdue University. Accessed: 2025-11-22 (2025)

work page 2025
[76]

https: //huggingface.co/docs/peft/developer guides/lora

Face, H.: PEFT LoRA Developer Guide: Merge LoRA weights into the base model. https: //huggingface.co/docs/peft/developer guides/lora. Accessed 2025-11-22 (2024)

work page 2025
[77]

https://github.com/ggml-org/ llama.cpp

Gerganov, G., contributors: llama.cpp: LLM inference in C/C++. https://github.com/ggml-org/ llama.cpp. Accessed 2025-11-22 (2023)

work page 2025
[78]

https://github.com/ggml-org/llama

contributors: How to convert Hugging Face models to GGUF. https://github.com/ggml-org/llama. cpp/discussions/2948. Accessed 2025-11-22 (2024)

work page 2025
[79]

https://huggingface.co/docs/hub/gguf

Face, H.: GGUF on the Hugging Face Hub. https://huggingface.co/docs/hub/gguf. Accessed 2025- 11-22 (2024)

work page 2025
[80]

https://docs.ollama.com/import

Ollama: Importing a GGUF Model or Adapter. https://docs.ollama.com/import. Accessed 2025- 11-22 (2025) 48

work page 2025
[81]

https://gist.github.com/ Artefact2/b5f810600771265fc1e39442288e8ec9

Artefact2: GGUF quantizations overview and recommendations. https://gist.github.com/ Artefact2/b5f810600771265fc1e39442288e8ec9. Accessed 2025-11-22 (2024)

work page 2025
[82]

https://docs.ollama.com/modelfile

Ollama: Modelfile Reference. https://docs.ollama.com/modelfile. Accessed 2025-11-22 (2025)

work page 2025

Showing first 80 references.

[1] [1]

Dover Publications, Mineola, NY (2012)

Hughes, T.J.R.: The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Dover Publications, Mineola, NY (2012)

work page 2012

[2] [2]

Logg, K.-A

Logg, A., Mardal, K., Wells, G. (eds.): Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Lecture Notes in Computational Science and Engineering, vol. 84, p. 731. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23099-8

work page doi:10.1007/978-3-642-23099-8 2012

[3] [3]

Butterworth-Heinemann, Oxford (2005)

Zienkiewicz, O.C., Taylor, R.L.: The Finite Element Method for Solid and Structural Mechanics. Butterworth-Heinemann, Oxford (2005). https://books.google.com/books?id=VvpU3zssDOwC

work page 2005

[4] [4]

Hughes, J

Hughes, T.J.R., Cottrell, J.A., Bazilevs, Y.: Isogeometric analysis: Cad, finite elements, nurbs, exact geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering 194(39), 4135–4195 (2005) https://doi.org/10.1016/j.cma.2004.10.008

work page doi:10.1016/j.cma.2004.10.008 2005

[5] [5]

The FEniCS Project Version 1.5

Alnæs, M., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M., Wells, G.: The FEniCS project version 1.5. Archive of Numerical Software3(100), 9–23 (2015) https://doi.org/10.11588/ans.2015.100.20553

work page doi:10.11588/ans.2015.100.20553 2015

[6] [6]

Imperial College London and University of Oxford and Baylor University and University of Washington, (2023)

Ham, D.A., Kelly, P.H.J., Mitchell, L., Cotter, C.J., Kirby, R.C., Sagiyama, K., Bouziani, N., Vorderwuelbecke, S., Gregory, T.J., Betteridge, J., Shapero, D.R., Nixon-Hill, R.W., Ward, C.J., Farrell, P.E., Brubeck, P.D., Marsden, I., Gibson, T.H., Homolya, M., Sun, T., McRae, A.T.T., Luporini, F., Gregory, A., Lange, M., Funke, S.W., Rathgeber, F., Berce...

work page doi:10.25561/104839 2023

[7] [7]

Computer Methods in Applied Mechanics and Engineering 450, 118591 (2026)

Guo, J., Park, C., Qian, D., Hughes, T.J., Liu, W.K.: Large language model-empowered next- generation computer-aided engineering. Computer Methods in Applied Mechanics and Engineering 450, 118591 (2026)

work page 2026

[8] [8]

MechAgents: Large lan- guage model multi-agent collaborations can solve mechan- ics problems, generate new data, and integrate knowledge

Ni, B., Buehler, M.J.: MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge (2023). https://arxiv.org/abs/ 2311.08166

work page arXiv 2023

[9] [9]

Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models

Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Xu, C., Chen, Y., Wang, L., Luu, A.T., Bi, W., Shi, F., Shi, S.: Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models (2025). https://arxiv.org/abs/2309.01219

work page internal anchor Pith review Pith/arXiv arXiv 2025

[10] [10]

A survey on evaluation of large language models

Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P.S., Yang, Q., Xie, X.: A Survey on Evaluation of Large Language Models (2023). https://arxiv.org/abs/2307.03109

work page arXiv 2023

[11] [11]

CodeMirage : Hallucinations in Code Generated by Large Language Models , 2024

Agarwal, V., Pei, Y., Alamir, S., Liu, X.: CodeMirage: Hallucinations in Code Generated by Large Language Models (2025). https://arxiv.org/abs/2408.08333

work page arXiv 2025

[12] [12]

More agents is all you need

Li, J., Zhang, Q., Yu, Y., Fu, Q., Ye, D.: More Agents Is All You Need (2024). https://arxiv.org/ abs/2402.05120

work page arXiv 2024

[13] [13]

https://arxiv.org/abs/2408.13406

Tian, C., Zhang, Y.: Optimizing Collaboration of LLM based Agents for Finite Element Analysis (2024). https://arxiv.org/abs/2408.13406

work page arXiv 2024

[14] [14]

Digital Discovery3(7), 1389–1409 (2024) https://doi.org/10.1039/d4dd00013g

Ghafarollahi, A., Buehler, M.J.: Protagents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. Digital Discovery3(7), 1389–1409 (2024) https://doi.org/10.1039/d4dd00013g

work page doi:10.1039/d4dd00013g 2024

[15] [15]

Honeycomb: A flexible llm-based agent system for materials science

Zhang, H., Song, Y., Hou, Z., Miret, S., Liu, B.: HoneyComb: A Flexible LLM-Based Agent System for Materials Science (2024). https://arxiv.org/abs/2409.00135 44

work page arXiv 2024

[16] [16]

In: Advances in Neural Information Processing Systems (NeurIPS 2020) (2020)

Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.-t., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge- intensive nlp tasks. In: Advances in Neural Information Processing Systems (NeurIPS 2020) (2020)

work page 2020

[17] [17]

Physics of Fluids37(3), 035120 (2025) https://doi.org/10.1063/5.0257555

Pandey, S., Xu, R., Wang, W., Chu, X.: Openfoamgpt: A retrieval-augmented large language model (llm) agent for openfoam-based computational fluid dynamics. Physics of Fluids37(3), 035120 (2025) https://doi.org/10.1063/5.0257555

work page doi:10.1063/5.0257555 2025

[18] [18]

Theoretical and Applied Mechanics Letters, 100623 (2025) https://doi.org/10.1016/ j.taml.2025.100623

Wang, W., Xu, R., Feng, J., Zhang, Q., Pandey, S., Chu, X.: A status quo investigation of large-language models for cost-effective computational fluid dynamics automation with Open- FOAMGPT. Theoretical and Applied Mechanics Letters, 100623 (2025) https://doi.org/10.1016/ j.taml.2025.100623

work page arXiv 2025

[19] [19]

URL https://arxiv.org/abs/2504.19338

Feng, J., Xu, R., Chu, X.: Openfoamgpt 2.0: end-to-end, trustworthy automation for computational fluid dynamics. arXiv preprint arXiv:2504.19338 (2025)

work page arXiv 2025

[20] [20]

arXiv preprint arXiv:2509.18178 (2025)

Yue, L., Somasekharan, N., Zhang, T., Cao, Y., Pan, S.: Foam-agent: An end-to-end composable multi-agent framework for automating cfd simulation in openfoam. arXiv preprint arXiv:2509.18178 (2025)

work page arXiv 2025

[21] [21]

Theoretical and Applied Mechanics Letters15, 100594 (2025) https: //doi.org/10.1016/j.taml.2025.100594

Dong, Z., Lu, Z., Yang, Y.: Fine-tuning a large language model for automating computational fluid dynamics simulations. Theoretical and Applied Mechanics Letters15, 100594 (2025) https: //doi.org/10.1016/j.taml.2025.100594

work page doi:10.1016/j.taml.2025.100594 2025

[22] [22]

Metaopenfoam: an llm-based multi-agent framework for cfd

Chen, Y., Zhu, X., Zhou, H., Ren, Z.: Metaopenfoam: an llm-based multi-agent framework for cfd. arXiv preprint arXiv:2407.21320 (2024)

work page arXiv 2024

[23] [23]

arXiv preprint arXiv:2506.02019 (2025)

Fan, E., Hu, K., Wu, Z., Ge, J., Miao, J., Zhang, Y., Sun, H., Wang, W., Zhang, T.: Chatcfd: An llm-driven agent for end-to-end cfd automation with domain-specific structured reasoning. arXiv preprint arXiv:2506.02019 (2025)

work page arXiv 2025

[24] [24]

In: Proceedings of the AAAI Conference on Artificial Intelligence, vol

Hou, S., Johnson, R., Makhija, R., Chen, L., Ye, Y.: Autofea: Enhancing ai copilot by integrating finite element analysis using large language models with graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 24078–24087 (2025)

work page 2025

[25] [25]

FeaGPT: an End-to-End agentic-AI for Finite Element Analysis

Qi, Y., Xu, R., Chu, X.: Feagpt: an end-to-end agentic-ai for finite element analysis. arXiv preprint arXiv:2510.21993 (2025)

work page arXiv 2025

[26] [26]

Masset, R

Feng, J., Qi, Y., Xu, R., Pandey, S., Chu, X.: turbulence.ai: an end-to-end ai scientist for fluid mechanics. Theoretical and Applied Mechanics Letters, 100620 (2025) https://doi.org/10.1016/j. taml.2025.100620

work page doi:10.1016/j 2025

[27] [27]

Jiang, G

Jiang, Q., Karniadakis, G.: Agenticsciml: Collaborative multi-agent systems for emergent discovery in scientific machine learning. arXiv preprint arXiv:2511.07262 (2025)

work page arXiv 2025

[28] [28]

ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms

Toscano, J.D., Chen, D.T., Karniadakis, G.E.: Athena: Agentic team for hierarchical evolutionary numerical algorithms. arXiv preprint arXiv:2512.03476 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[29] [29]

SIAM Review57(4), 483–531 (2015) https://doi.org/10.1137/ 130932715

Benner, P., Gugercin, S., Willcox, K.: A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Review57(4), 483–531 (2015) https://doi.org/10.1137/ 130932715

work page 2015

[30] [30]

Accessed: 2026-02-22 (2025)

Guo, J., Domel, G., Park, C., Zhang, H., Gumus, O.C., Lu, Y., Wagner, G.J., Qian, D., Cao, J., Hughes, T.J.R., Liu, W.K.: Tensor-decomposition-based A Priori Surrogate (TAPS) modeling for ultra large-scale simulations. Accessed: 2026-02-22 (2025). https://arxiv.org/abs/2503.13933

work page arXiv 2026

[31] [31]

PaLM 2 Technical Report

Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., Chu, E., Clark, J.H., Shafey, L.E., Huang, Y., Meier-Hellstern, K., Mishra, G., Moreira, E., Omernick, M., Robinson, K., Ruder, S., Tay, Y., Xiao, K., Xu, Y., Zhang, Y., Abrego, G.H., Ahn, J., Austin, J., Barham, P., Botha, J., Bradbury, J...

work page internal anchor Pith review Pith/arXiv arXiv 2023

[32] [32]

Attention Is All You Need

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need. arXiv (2017). https://doi.org/10.48550/ARXIV.1706.03762 . https: //arxiv.org/abs/1706.03762

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2017

[33] [33]

https://www.anthropic.com/news/ 100k-context-windows (2023)

Anthropic: Introducing 100K Context Windows. https://www.anthropic.com/news/ 100k-context-windows (2023)

work page 2023

[34] [34]

Evaluating Large Language Models Trained on Code

Chen, M., Tworek, J., Jun, H., Yuan, Q., Oliveira Pinto, H.P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F.P., Cummings, D., Plappert, M., Chantzi...

work page internal anchor Pith review Pith/arXiv arXiv 2021

[35] [35]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models

Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2023). https://arxiv. org/abs/2201.11903

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

In: The Eleventh International Conference on Learning Representations (2023)

Wang, X., Wei, J., Schuurmans, D., Le, Q.V., Chi, E.H., Narang, S., Chowdhery, A., Zhou, D.: Self- consistency improves chain of thought reasoning in language models. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=1PL1NIMMrw

work page 2023

[37] [37]

Meta (2024)

AI, M.: Llama 3.2: Revolutionizing Edge AI and Vision with Open, Customizable Models. Meta (2024). https://huggingface.co/meta-llama/Llama-3.2-3B

work page 2024

[38] [38]

https://huggingface.co/Qwen/Qwen3-32B

Qwen Team: Qwen3-32B. https://huggingface.co/Qwen/Qwen3-32B. Model card. Accessed: 18 November 2025 (2025)

work page 2025

[39] [39]

https://huggingface.co/meta-llama/Llama-3

Meta (via Hugging Face): Llama-3.3-70B-Instruct. https://huggingface.co/meta-llama/Llama-3. 3-70B-Instruct (2024)

work page 2024

[40] [41]

Technical report, OpenAI (aug 2025)

OpenAI: Gpt-5 system card. Technical report, OpenAI (aug 2025). Accessed: 2025-11-20. https: //cdn.openai.com/gpt-5-system-card.pdf

work page 2025

[41] [42]

Accessed: 2025-11-20 (2024)

AI at Meta: Llama 3.2: Revolutionizing edge AI and vision with open, customizable models. Accessed: 2025-11-20 (2024). https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/

work page 2025

[42] [43]

https://aider.chat/docs/leaderboards/

Aider: Aider LLM Leaderboards. https://aider.chat/docs/leaderboards/. Accessed: 18 November 2025 (2025)

work page 2025

[43] [44]

The Llama 3 Herd of Models

Llama Team, AI@Meta: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024) 46

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [45]

https://www.vellum.ai/blog/llama-3-3-70b-vs-gpt-4o

Vellum AI: Llama 3.3 70B vs GPT-4o. https://www.vellum.ai/blog/llama-3-3-70b-vs-gpt-4o. Eval- uation shows GPT-4o leads on math (55% vs lower), reasoning (69% vs 44%), while Llama 3.3 70B is competitive on classification and strengths in coding, tool use and multilingual tasks; cost/latency analysis included. (2024)

work page 2024

[45] [46]

https://openai.com/index/introducing-gpt-oss/

OpenAI: Introducing gpt-oss. https://openai.com/index/introducing-gpt-oss/. Accessed: 2025-11- 19 (2025)

work page 2025

[46] [47]

https://github.com/ openai/gpt-oss

OpenAI: gpt-oss: gpt-oss-120b and gpt-oss-20b open-weight language models. https://github.com/ openai/gpt-oss. GitHub repository; accessed: 2025-11-19 (2025)

work page 2025

[47] [48]

Measuring Massive Multitask Language Understanding

Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., Steinhardt, J.: Measuring Massive Multitask Language Understanding (2021). https://arxiv.org/abs/2009.03300

work page internal anchor Pith review Pith/arXiv arXiv 2021

[48] [49]

https://arxiv.org/abs/2311

Rein, D., Hou, B.L., Stickland, A.C., Petty, J., Pang, R.Y., Dirani, J., Michael, J., Bowman, S.R.: GPQA: A Graduate-Level Google-Proof Q and A Benchmark (2023). https://arxiv.org/abs/2311. 12022

work page 2023

[49] [50]

https://arxiv.org/abs/2410.03131

Patel, B., Chakraborty, S., Suttle, W.A., Wang, M., Bedi, A.S., Manocha, D.: AIME: AI System Optimization via Multiple LLM Evaluators (2024). https://arxiv.org/abs/2410.03131

work page arXiv 2024

[50] [51]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI: gpt-oss-120b & gpt-oss-20b Model Card. Original PDF available from OpenAI at https:// cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai gpt-oss model card.pdf (2025). https://arxiv.org/abs/2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025

[51] [52]

https://huggingface.co/docs/transformers/ en/quantization/mxfp4

Hugging Face: MXFP4 Quantization in Transformers. https://huggingface.co/docs/transformers/ en/quantization/mxfp4. Accessed: 2025-11-24 (2025)

work page 2025

[52] [53]

OpenAI: Introducing GPT-5. OpenAI. https://openai.com/index/introducing-gpt-5/ Accessed 2025-11-26

work page 2025

[53] [54]

Journal of Chemical Information and Modeling62(24), 6365–6377 (2022) https://doi.org/ 10.1021/acs.jcim.2c00035

Huang, S., Cole, J.M.: Batterybert: A pretrained language model for battery database enhance- ment. Journal of Chemical Information and Modeling62(24), 6365–6377 (2022) https://doi.org/ 10.1021/acs.jcim.2c00035

work page doi:10.1021/acs.jcim.2c00035 2022

[54] [55]

Bioinformatics , volume =

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics36(4), 1234–1240 (2019) https://doi.org/10.1093/bioinformatics/btz682

work page doi:10.1093/bioinformatics/btz682 2019

[55] [56]

doi: 10.1038/s41587-022-01618-2

Madani, A., Krause, B., Greene, E.R., Subramanian, S., Mohr, B.P., Holton, J.M., Olmos, J.L., Xiong, C., Sun, Z.Z., Socher, R., Fraser, J.S., Naik, N.: Large language models generate functional protein sequences across diverse families. Nature Biotechnology41(8), 1099–1106 (2023) https: //doi.org/10.1038/s41587-022-01618-2

work page doi:10.1038/s41587-022-01618-2 2023

[56] [57]

Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey

Han, Z., Gao, C., Liu, J., Zhang, J., Zhang, S.Q.: Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey (2024). https://arxiv.org/abs/2403.14608

work page internal anchor Pith review Pith/arXiv arXiv 2024

[57] [58]

Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., Hashimoto, T.B.: Stanford Alpaca: An Instruction-following LLaMA model. GitHub. Retrieved from https://github.com/tatsu-lab/stanford alpaca (2023)

work page 2023

[58] [59]

Langtangen, H.P., Logg, A.: Solving PDEs in Python: The FEniCS Tutorial I (Volume 3, Simula SpringerBriefs on Computing), p. 146. Springer, Cham, Switzerland (2017). https://doi.org/10. 1007/978-3-319-52462-7 . https://fenicsproject.org/tutorial/

work page 2017

[59] [60]

https://github.com/ david-kamensky/mae-207-fea-for-coupled-problems

Kamensky, D.: MAE 207 – FEA for coupled problems: Code examples. https://github.com/ david-kamensky/mae-207-fea-for-coupled-problems. Code examples for the class ”MAE 207: FEA for coupled problems” at UC San Diego (Legacy FEniCS) (2022)

work page 2022

[60] [61]

https://olddocs

The FEniCS Project: DOLFIN Python demos (legacy FEniCS documentation). https://olddocs. 47 fenicsproject.org/dolfin/latest/python/demos.html. Accessed 2025-11-19 (2017)

work page 2025

[61] [62]

https://people.sc.fsu.edu/ ∼jburkardt/fenics src/fenics src.html

Burkardt, J.: FEniCS Examples. https://people.sc.fsu.edu/ ∼jburkardt/fenics src/fenics src.html. Mirror at https://people.math.sc.edu/Burkardt/fenics src/fenics src.html. Accessed 2025-11-19 (2020)

work page 2025

[62] [63]

Accessed: 2 December 2025

OpenAI: Introducing OpenAI O3 and O4-mini. Accessed: 2 December 2025. https://openai.com/ index/introducing-o3-and-o4-mini/

work page 2025

[63] [64]

https://research.google.com/colaboratory/faq.html

Google: Colaboratory (Google Colab). https://research.google.com/colaboratory/faq.html. Ac- cessed 2025-11-19

work page 2025

[64] [65]

https://blog.google/technology/ google-deepmind/gemini-model-thinking-updates-march-2025/

Google DeepMind: Gemini 2.5: Our Most Intelligent AI Model. https://blog.google/technology/ google-deepmind/gemini-model-thinking-updates-march-2025/

work page 2025

[65] [67]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations (ICLR 2022) (2022). Paper and code available at https://arxiv.org/abs/ 2106.09685. https://openreview.net/forum?id=nZeVKeeFYf9

work page internal anchor Pith review Pith/arXiv arXiv 2022

[66] [68]

QLoRA: Efficient Finetuning of Quantized LLMs

Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Finetuning of Quantized LLMs (2023). https://arxiv.org/abs/2305.14314

work page internal anchor Pith review Pith/arXiv arXiv 2023

[67] [69]

Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024

Shuttleworth, R., Andreas, J., Torralba, A., Sharma, P.: LoRA vs Full Fine-tuning: An Illusion of Equivalence (2024). https://arxiv.org/abs/2410.21228

work page arXiv 2024

[68] [70]

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Kalajdzievski, D.: A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA (2023). https: //arxiv.org/abs/2312.03732

work page internal anchor Pith review Pith/arXiv arXiv 2023

[69] [71]

Transactions on Machine Learning Research (2024)

Biderman, D., Portes, J., Ortiz, J.J.G., Paul, M., Greengard, P., Jennings, C., King, D., Havens, S., Chiley, V., Frankle, J., Blakeney, C., Cunningham, J.P.: LoRA learns less and forgets less. Transactions on Machine Learning Research (2024). Featured Certification

work page 2024

[70] [72]

Decoupled Weight Decay Regularization

Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization (2019). https://arxiv.org/abs/ 1711.05101

work page internal anchor Pith review Pith/arXiv arXiv 2019

[71] [73]

https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt

Rosen Center for Advanced Computing: AnvilGPT User Guide. https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt. Purdue University. Accessed: 2025-11-22 (2025)

work page 2025

[72] [74]

https://ollama.com/

Ollama: Ollama. https://ollama.com/. Accessed: 2025-11-22 (2025)

work page 2025

[73] [75]

https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt/api

Rosen Center for Advanced Computing: AnvilGPT API. https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt/api. Purdue University. Accessed: 2025-11-22 (2025)

work page 2025

[74] [76]

https: //huggingface.co/docs/peft/developer guides/lora

Face, H.: PEFT LoRA Developer Guide: Merge LoRA weights into the base model. https: //huggingface.co/docs/peft/developer guides/lora. Accessed 2025-11-22 (2024)

work page 2025

[75] [77]

https://github.com/ggml-org/ llama.cpp

Gerganov, G., contributors: llama.cpp: LLM inference in C/C++. https://github.com/ggml-org/ llama.cpp. Accessed 2025-11-22 (2023)

work page 2025

[76] [78]

https://github.com/ggml-org/llama

contributors: How to convert Hugging Face models to GGUF. https://github.com/ggml-org/llama. cpp/discussions/2948. Accessed 2025-11-22 (2024)

work page 2025

[77] [79]

https://huggingface.co/docs/hub/gguf

Face, H.: GGUF on the Hugging Face Hub. https://huggingface.co/docs/hub/gguf. Accessed 2025- 11-22 (2024)

work page 2025

[78] [80]

https://docs.ollama.com/import

Ollama: Importing a GGUF Model or Adapter. https://docs.ollama.com/import. Accessed 2025- 11-22 (2025) 48

work page 2025

[79] [81]

https://gist.github.com/ Artefact2/b5f810600771265fc1e39442288e8ec9

Artefact2: GGUF quantizations overview and recommendations. https://gist.github.com/ Artefact2/b5f810600771265fc1e39442288e8ec9. Accessed 2025-11-22 (2024)

work page 2025

[80] [82]

https://docs.ollama.com/modelfile

Ollama: Modelfile Reference. https://docs.ollama.com/modelfile. Accessed 2025-11-22 (2025)

work page 2025