ALL-FEM: Agentic Large Language models Fine-tuned for Finite Element Methods
Pith reviewed 2026-05-16 15:45 UTC · model grok-4.3
The pith
Fine-tuned LLMs inside a multi-agent loop with runtime feedback generate correct FEniCS code for 71.79 percent of tested finite-element problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
An autonomous system called ALL-FEM orchestrates specialized agents powered by LLMs fine-tuned on a corpus of 1000+ verified FEniCS scripts; when the best such model (GPT OSS 120B) operates inside the multi-agent workflow with runtime feedback, it reaches 71.79 percent code-level success on 39 benchmarks that cover linear/nonlinear elasticity, plasticity, Newtonian/non-Newtonian flow, thermofluids, fluid-structure interaction, phase separation, and transport on moving domains, surpassing a non-agentic deployment of GPT 5 Thinking.
What carries the argument
The multi-agent workflow with runtime feedback that uses fine-tuned LLMs to translate problem statements into PDEs, generate and debug FEniCS code, and visualize results.
If this is right
- Engineers can obtain working simulation code from natural-language problem statements without writing or debugging the code themselves.
- The same agentic pattern supplies a template for automating other computational-science workflows that require both code generation and runtime verification.
- Smaller, fine-tuned models become competitive with much larger general models once they are embedded in domain-specific agent loops.
- Rapid iteration over geometries, material laws, and boundary conditions becomes feasible for design exploration in manufacturing and research.
Where Pith is reading between the lines
- The approach could be tested on industrial-scale meshes and nonlinear solvers to measure how far the success rate drops when problem size increases.
- Integration with geometry kernels or CAD files would let the system accept drawings rather than text descriptions as input.
- Similar fine-tuning plus agent orchestration might apply to other open-source simulation libraries beyond FEniCS.
Load-bearing premise
The 39 benchmarks and the automated verification process capture the full range of real-world finite-element problems without missing subtle numerical or physical errors that would appear only in production use.
What would settle it
Running the generated codes on a new set of problems outside the 39 benchmarks and checking whether their numerical solutions match independent reference results or experimental data to within engineering tolerances.
read the original abstract
Finite element (FE) analysis guides the design and verification of nearly all manufactured objects. It is at the core of computational engineering, enabling simulation of complex physical systems, from fluids and solids to multiphysics systems. However, implementing FE codes and analyzing simulation results demands expertise across numerical analysis, continuum mechanics, and programming. Conventional Large Language Models (LLMs) can generate FE code, but they hallucinate, lack awareness of variational structures, and cannot close the loop from problem statement to a verified solution. Here, we propose ALL-FEM, an autonomous simulation system that integrates agentic AI with domain-specific, fine-tuned LLMs for FEniCS code generation across solid, fluid, and multiphysics applications. We construct a corpus of 1000+ verified FEniCS scripts by combining 500+ curated expert codes with a retrieval-augmented, multi-LLM pipeline that generates and filters codes for diverse PDEs, geometries, and boundary conditions. We used the corpus to fine-tune LLMs with 3B to 120B parameters. Our agentic framework orchestrates specialized agents, powered by fine-tuned LLMs, to formulate problems as PDEs, generate and debug code and visualize the results. We evaluated the system on 39 benchmarks that include problems of linear/nonlinear elasticity, plasticity, Newtonian/non-Newtonian flow, thermofluids, fluid-structure interaction, phase separation, and transport on moving domains. Embedded in a multi-agent workflow with runtime feedback, the best fine-tuned model (GPT OSS 120B) achieves code-level success of 71.79%, outperforming a non-agentic deployment of GPT 5 Thinking. By showing that relatively small, fine-tuned LLMs, orchestrated through agentic frameworks, can automate FE workflows, ALL-FEM offers a blueprint for autonomous simulation systems in computational science and engineering.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ALL-FEM, an autonomous agentic system that integrates fine-tuned LLMs (3B–120B parameters) with specialized agents for FEniCS code generation in finite element analysis. A corpus of 1000+ verified scripts is built from 500+ expert codes plus a retrieval-augmented multi-LLM pipeline. The system is evaluated on 39 benchmarks spanning linear/nonlinear elasticity, plasticity, Newtonian/non-Newtonian flows, thermofluids, fluid-structure interaction, phase separation, and moving-domain transport. The best model (GPT OSS 120B) embedded in a multi-agent workflow with runtime feedback achieves 71.79% code-level success, outperforming a non-agentic GPT 5 Thinking baseline.
Significance. If the verification process ensures that generated codes produce numerically accurate and physically correct solutions (rather than merely executing without runtime errors), the work would offer a practical blueprint for automating FE workflows in computational engineering. The combination of domain-specific fine-tuning and agentic orchestration with feedback demonstrates measurable gains over direct LLM prompting on variational-form and solver tasks. The scale of the corpus and the breadth of benchmark categories (solids, fluids, multiphysics) would make the result relevant to researchers seeking to reduce manual coding effort in simulation-driven design.
major comments (3)
- [Abstract / Evaluation] Abstract and evaluation description: the headline 71.79% code-level success rate for the 120B model is reported without any quantitative specification of the verification criteria. It is unclear whether success requires only runtime execution without errors or includes checks such as L2-norm agreement with analytical solutions, residual norms on the weak form, or observed mesh-convergence rates. This distinction is load-bearing because codes can execute while containing incorrect variational formulations, improper quadrature, or boundary-condition errors that only manifest under refinement or in coupled problems.
- [Corpus construction] Corpus construction: the manuscript states that 1000+ scripts were obtained by combining expert codes with a multi-LLM retrieval-augmented pipeline, yet supplies no numbers on how many candidates were generated, how many were discarded by the filtering step, per-PDE-category error rates, or the exact verification procedure used to label a script “verified.” Without these statistics the quality and diversity of the fine-tuning data cannot be assessed, undermining claims about the robustness of the resulting models.
- [Benchmarks] Benchmarks: the 39 problems are listed by category (elasticity, plasticity, flows, FSI, etc.) but no table or appendix enumerates the individual problems, their governing equations, mesh types, or whether analytical solutions exist for quantitative validation. Per-benchmark success rates are also absent, making it impossible to determine whether the aggregate 71.79% figure is driven by a few easy cases or holds across the full range of nonlinear and multiphysics problems.
minor comments (2)
- Clarify the exact model names (GPT OSS 120B, GPT 5 Thinking) and provide references or parameter counts for the baseline models in the main text.
- Consider adding a summary table of the 39 benchmarks that includes problem type, domain, boundary conditions, and whether an analytical solution is available for verification.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. These have highlighted important areas for clarification and expansion. We address each major comment point by point below, indicating the revisions that will be incorporated in the next version of the manuscript.
read point-by-point responses
-
Referee: [Abstract / Evaluation] Abstract and evaluation description: the headline 71.79% code-level success rate for the 120B model is reported without any quantitative specification of the verification criteria. It is unclear whether success requires only runtime execution without errors or includes checks such as L2-norm agreement with analytical solutions, residual norms on the weak form, or observed mesh-convergence rates. This distinction is load-bearing because codes can execute while containing incorrect variational formulations, improper quadrature, or boundary-condition errors that only manifest under refinement or in coupled problems.
Authors: We agree that the verification criteria require explicit definition. In the current manuscript, code-level success is defined as the generated FEniCS script executing without runtime errors while producing results consistent with expected physical behavior; for the subset of benchmarks possessing analytical or high-fidelity reference solutions, this includes quantitative checks such as L2-norm agreement and residual norms on the weak form. To remove any ambiguity, we will revise the abstract and add a dedicated paragraph in the Evaluation section that enumerates the precise success criteria, including runtime execution, residual and convergence checks where applicable, and the distinction between problems with and without analytical solutions. revision: yes
-
Referee: [Corpus construction] Corpus construction: the manuscript states that 1000+ scripts were obtained by combining expert codes with a multi-LLM retrieval-augmented pipeline, yet supplies no numbers on how many candidates were generated, how many were discarded by the filtering step, per-PDE-category error rates, or the exact verification procedure used to label a script “verified.” Without these statistics the quality and diversity of the fine-tuning data cannot be assessed, undermining claims about the robustness of the resulting models.
Authors: The referee is correct that these quantitative details are missing. The manuscript currently gives only aggregate figures. In the revision we will expand the Corpus Construction section with: (i) the total number of candidate scripts generated by the retrieval-augmented multi-LLM pipeline, (ii) the number and percentage discarded at each filtering stage together with the primary rejection reasons, (iii) per-PDE-category verification success rates, and (iv) a step-by-step description of the verification procedure, which combines automated syntax/execution tests with expert manual review for variational correctness and physical fidelity. revision: yes
-
Referee: [Benchmarks] Benchmarks: the 39 problems are listed by category (elasticity, plasticity, flows, FSI, etc.) but no table or appendix enumerates the individual problems, their governing equations, mesh types, or whether analytical solutions exist for quantitative validation. Per-benchmark success rates are also absent, making it impossible to determine whether the aggregate 71.79% figure is driven by a few easy cases or holds across the full range of nonlinear and multiphysics problems.
Authors: We accept that the benchmark documentation is insufficiently granular. We will add a new appendix (Appendix B) that provides a table enumerating all 39 benchmarks, including the governing PDEs, mesh types, boundary conditions, and the availability of analytical or reference solutions. In addition, the Evaluation section will be augmented with a table of per-benchmark success rates for the GPT OSS 120B model (and the non-agentic baseline) so that readers can assess performance variation across linear/nonlinear, single-physics, and multiphysics cases. revision: yes
Circularity Check
No significant circularity in the ALL-FEM evaluation pipeline
full rationale
The paper constructs a training corpus via a multi-LLM retrieval-augmented pipeline and fine-tunes models on it, then reports empirical success rates on a separate set of 39 held-out benchmarks using runtime verification. This performance metric is measured directly against external benchmark problems rather than being derived by construction from the corpus-generation or verification loop. No self-definitional reductions, fitted inputs renamed as predictions, load-bearing self-citations, or ansatzes appear in the described chain; the central claim retains independent empirical content.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Fine-tuned LLMs can generate syntactically and semantically correct FEniCS code for a wide range of PDEs when trained on verified examples
- domain assumption Multi-agent workflows with runtime execution feedback can close the loop from problem statement to verified solution without external human intervention
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We construct a corpus of 1000+ verified FEniCS scripts... fine-tune LLMs with 3B to 120B parameters... multi-agent workflow... 71.79% code-level success
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
agentic AI with domain-specific, fine-tuned LLMs for FEniCS code generation across solid, fluid, and multiphysics applications
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 1 Pith paper
-
From Perception to Autonomous Computational Modeling: A Multi-Agent Approach
A multi-agent LLM framework autonomously completes the full computational mechanics pipeline from a photograph to a code-compliant engineering report on a steel L-bracket example.
Reference graph
Works this paper leans on
-
[1]
Dover Publications, Mineola, NY (2012)
Hughes, T.J.R.: The Finite Element Method: Linear Static and Dynamic Finite Element Analysis. Dover Publications, Mineola, NY (2012)
work page 2012
-
[2]
Logg, A., Mardal, K., Wells, G. (eds.): Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book. Lecture Notes in Computational Science and Engineering, vol. 84, p. 731. Springer, Berlin, Heidelberg (2012). https://doi.org/10.1007/978-3-642-23099-8
-
[3]
Butterworth-Heinemann, Oxford (2005)
Zienkiewicz, O.C., Taylor, R.L.: The Finite Element Method for Solid and Structural Mechanics. Butterworth-Heinemann, Oxford (2005). https://books.google.com/books?id=VvpU3zssDOwC
work page 2005
-
[4]
Hughes, T.J.R., Cottrell, J.A., Bazilevs, Y.: Isogeometric analysis: Cad, finite elements, nurbs, exact geometry and mesh refinement. Computer Methods in Applied Mechanics and Engineering 194(39), 4135–4195 (2005) https://doi.org/10.1016/j.cma.2004.10.008
-
[5]
The FEniCS Project Version 1.5
Alnæs, M., Blechta, J., Hake, J., Johansson, A., Kehlet, B., Logg, A., Richardson, C., Ring, J., Rognes, M., Wells, G.: The FEniCS project version 1.5. Archive of Numerical Software3(100), 9–23 (2015) https://doi.org/10.11588/ans.2015.100.20553
-
[6]
Ham, D.A., Kelly, P.H.J., Mitchell, L., Cotter, C.J., Kirby, R.C., Sagiyama, K., Bouziani, N., Vorderwuelbecke, S., Gregory, T.J., Betteridge, J., Shapero, D.R., Nixon-Hill, R.W., Ward, C.J., Farrell, P.E., Brubeck, P.D., Marsden, I., Gibson, T.H., Homolya, M., Sun, T., McRae, A.T.T., Luporini, F., Gregory, A., Lange, M., Funke, S.W., Rathgeber, F., Berce...
-
[7]
Computer Methods in Applied Mechanics and Engineering 450, 118591 (2026)
Guo, J., Park, C., Qian, D., Hughes, T.J., Liu, W.K.: Large language model-empowered next- generation computer-aided engineering. Computer Methods in Applied Mechanics and Engineering 450, 118591 (2026)
work page 2026
-
[8]
Ni, B., Buehler, M.J.: MechAgents: Large language model multi-agent collaborations can solve mechanics problems, generate new data, and integrate knowledge (2023). https://arxiv.org/abs/ 2311.08166
-
[9]
Siren's Song in the AI Ocean: A Survey on Hallucination in Large Language Models
Zhang, Y., Li, Y., Cui, L., Cai, D., Liu, L., Fu, T., Huang, X., Zhao, E., Zhang, Y., Xu, C., Chen, Y., Wang, L., Luu, A.T., Bi, W., Shi, F., Shi, S.: Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models (2025). https://arxiv.org/abs/2309.01219
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
A survey on evaluation of large language models
Chang, Y., Wang, X., Wang, J., Wu, Y., Yang, L., Zhu, K., Chen, H., Yi, X., Wang, C., Wang, Y., Ye, W., Zhang, Y., Chang, Y., Yu, P.S., Yang, Q., Xie, X.: A Survey on Evaluation of Large Language Models (2023). https://arxiv.org/abs/2307.03109
-
[11]
CodeMirage : Hallucinations in Code Generated by Large Language Models , 2024
Agarwal, V., Pei, Y., Alamir, S., Liu, X.: CodeMirage: Hallucinations in Code Generated by Large Language Models (2025). https://arxiv.org/abs/2408.08333
-
[12]
Li, J., Zhang, Q., Yu, Y., Fu, Q., Ye, D.: More Agents Is All You Need (2024). https://arxiv.org/ abs/2402.05120
-
[13]
https://arxiv.org/abs/2408.13406
Tian, C., Zhang, Y.: Optimizing Collaboration of LLM based Agents for Finite Element Analysis (2024). https://arxiv.org/abs/2408.13406
-
[14]
Digital Discovery3(7), 1389–1409 (2024) https://doi.org/10.1039/d4dd00013g
Ghafarollahi, A., Buehler, M.J.: Protagents: protein discovery via large language model multi-agent collaborations combining physics and machine learning. Digital Discovery3(7), 1389–1409 (2024) https://doi.org/10.1039/d4dd00013g
-
[15]
Honeycomb: A flexible llm-based agent system for materials science
Zhang, H., Song, Y., Hou, Z., Miret, S., Liu, B.: HoneyComb: A Flexible LLM-Based Agent System for Materials Science (2024). https://arxiv.org/abs/2409.00135 44
-
[16]
In: Advances in Neural Information Processing Systems (NeurIPS 2020) (2020)
Lewis, P., Perez, E., Piktus, A., Petroni, F., Karpukhin, V., Goyal, N., K¨ uttler, H., Lewis, M., Yih, W.-t., Rockt¨ aschel, T., Riedel, S., Kiela, D.: Retrieval-augmented generation for knowledge- intensive nlp tasks. In: Advances in Neural Information Processing Systems (NeurIPS 2020) (2020)
work page 2020
-
[17]
Physics of Fluids37(3), 035120 (2025) https://doi.org/10.1063/5.0257555
Pandey, S., Xu, R., Wang, W., Chu, X.: Openfoamgpt: A retrieval-augmented large language model (llm) agent for openfoam-based computational fluid dynamics. Physics of Fluids37(3), 035120 (2025) https://doi.org/10.1063/5.0257555
-
[18]
Theoretical and Applied Mechanics Letters, 100623 (2025) https://doi.org/10.1016/ j.taml.2025.100623
Wang, W., Xu, R., Feng, J., Zhang, Q., Pandey, S., Chu, X.: A status quo investigation of large-language models for cost-effective computational fluid dynamics automation with Open- FOAMGPT. Theoretical and Applied Mechanics Letters, 100623 (2025) https://doi.org/10.1016/ j.taml.2025.100623
-
[19]
URL https://arxiv.org/abs/2504.19338
Feng, J., Xu, R., Chu, X.: Openfoamgpt 2.0: end-to-end, trustworthy automation for computational fluid dynamics. arXiv preprint arXiv:2504.19338 (2025)
-
[20]
arXiv preprint arXiv:2509.18178 (2025)
Yue, L., Somasekharan, N., Zhang, T., Cao, Y., Pan, S.: Foam-agent: An end-to-end composable multi-agent framework for automating cfd simulation in openfoam. arXiv preprint arXiv:2509.18178 (2025)
-
[21]
Dong, Z., Lu, Z., Yang, Y.: Fine-tuning a large language model for automating computational fluid dynamics simulations. Theoretical and Applied Mechanics Letters15, 100594 (2025) https: //doi.org/10.1016/j.taml.2025.100594
-
[22]
Metaopenfoam: an llm-based multi-agent framework for cfd
Chen, Y., Zhu, X., Zhou, H., Ren, Z.: Metaopenfoam: an llm-based multi-agent framework for cfd. arXiv preprint arXiv:2407.21320 (2024)
-
[23]
arXiv preprint arXiv:2506.02019 (2025)
Fan, E., Hu, K., Wu, Z., Ge, J., Miao, J., Zhang, Y., Sun, H., Wang, W., Zhang, T.: Chatcfd: An llm-driven agent for end-to-end cfd automation with domain-specific structured reasoning. arXiv preprint arXiv:2506.02019 (2025)
-
[24]
In: Proceedings of the AAAI Conference on Artificial Intelligence, vol
Hou, S., Johnson, R., Makhija, R., Chen, L., Ye, Y.: Autofea: Enhancing ai copilot by integrating finite element analysis using large language models with graph neural networks. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 39, pp. 24078–24087 (2025)
work page 2025
-
[25]
FeaGPT: an End-to-End agentic-AI for Finite Element Analysis
Qi, Y., Xu, R., Chu, X.: Feagpt: an end-to-end agentic-ai for finite element analysis. arXiv preprint arXiv:2510.21993 (2025)
-
[26]
Feng, J., Qi, Y., Xu, R., Pandey, S., Chu, X.: turbulence.ai: an end-to-end ai scientist for fluid mechanics. Theoretical and Applied Mechanics Letters, 100620 (2025) https://doi.org/10.1016/j. taml.2025.100620
work page doi:10.1016/j 2025
- [27]
-
[28]
ATHENA: Agentic Team for Hierarchical Evolutionary Numerical Algorithms
Toscano, J.D., Chen, D.T., Karniadakis, G.E.: Athena: Agentic team for hierarchical evolutionary numerical algorithms. arXiv preprint arXiv:2512.03476 (2025)
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[29]
SIAM Review57(4), 483–531 (2015) https://doi.org/10.1137/ 130932715
Benner, P., Gugercin, S., Willcox, K.: A survey of projection-based model reduction methods for parametric dynamical systems. SIAM Review57(4), 483–531 (2015) https://doi.org/10.1137/ 130932715
work page 2015
-
[30]
Guo, J., Domel, G., Park, C., Zhang, H., Gumus, O.C., Lu, Y., Wagner, G.J., Qian, D., Cao, J., Hughes, T.J.R., Liu, W.K.: Tensor-decomposition-based A Priori Surrogate (TAPS) modeling for ultra large-scale simulations. Accessed: 2026-02-22 (2025). https://arxiv.org/abs/2503.13933
-
[31]
Anil, R., Dai, A.M., Firat, O., Johnson, M., Lepikhin, D., Passos, A., Shakeri, S., Taropa, E., Bailey, P., Chen, Z., Chu, E., Clark, J.H., Shafey, L.E., Huang, Y., Meier-Hellstern, K., Mishra, G., Moreira, E., Omernick, M., Robinson, K., Ruder, S., Tay, Y., Xiao, K., Xu, Y., Zhang, Y., Abrego, G.H., Ahn, J., Austin, J., Barham, P., Botha, J., Bradbury, J...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[32]
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, L., Polosukhin, I.: Attention Is All You Need. arXiv (2017). https://doi.org/10.48550/ARXIV.1706.03762 . https: //arxiv.org/abs/1706.03762
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1706.03762 2017
-
[33]
https://www.anthropic.com/news/ 100k-context-windows (2023)
Anthropic: Introducing 100K Context Windows. https://www.anthropic.com/news/ 100k-context-windows (2023)
work page 2023
-
[34]
Evaluating Large Language Models Trained on Code
Chen, M., Tworek, J., Jun, H., Yuan, Q., Oliveira Pinto, H.P., Kaplan, J., Edwards, H., Burda, Y., Joseph, N., Brockman, G., Ray, A., Puri, R., Krueger, G., Petrov, M., Khlaaf, H., Sastry, G., Mishkin, P., Chan, B., Gray, S., Ryder, N., Pavlov, M., Power, A., Kaiser, L., Bavarian, M., Winter, C., Tillet, P., Such, F.P., Cummings, D., Plappert, M., Chantzi...
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[35]
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Ichter, B., Xia, F., Chi, E., Le, Q., Zhou, D.: Chain-of-Thought Prompting Elicits Reasoning in Large Language Models (2023). https://arxiv. org/abs/2201.11903
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[36]
In: The Eleventh International Conference on Learning Representations (2023)
Wang, X., Wei, J., Schuurmans, D., Le, Q.V., Chi, E.H., Narang, S., Chowdhery, A., Zhou, D.: Self- consistency improves chain of thought reasoning in language models. In: The Eleventh International Conference on Learning Representations (2023). https://openreview.net/forum?id=1PL1NIMMrw
work page 2023
-
[37]
AI, M.: Llama 3.2: Revolutionizing Edge AI and Vision with Open, Customizable Models. Meta (2024). https://huggingface.co/meta-llama/Llama-3.2-3B
work page 2024
-
[38]
https://huggingface.co/Qwen/Qwen3-32B
Qwen Team: Qwen3-32B. https://huggingface.co/Qwen/Qwen3-32B. Model card. Accessed: 18 November 2025 (2025)
work page 2025
-
[39]
https://huggingface.co/meta-llama/Llama-3
Meta (via Hugging Face): Llama-3.3-70B-Instruct. https://huggingface.co/meta-llama/Llama-3. 3-70B-Instruct (2024)
work page 2024
-
[41]
Technical report, OpenAI (aug 2025)
OpenAI: Gpt-5 system card. Technical report, OpenAI (aug 2025). Accessed: 2025-11-20. https: //cdn.openai.com/gpt-5-system-card.pdf
work page 2025
-
[42]
AI at Meta: Llama 3.2: Revolutionizing edge AI and vision with open, customizable models. Accessed: 2025-11-20 (2024). https://ai.meta.com/blog/ llama-3-2-connect-2024-vision-edge-mobile-devices/
work page 2025
-
[43]
https://aider.chat/docs/leaderboards/
Aider: Aider LLM Leaderboards. https://aider.chat/docs/leaderboards/. Accessed: 18 November 2025 (2025)
work page 2025
-
[44]
Llama Team, AI@Meta: The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024) 46
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[45]
https://www.vellum.ai/blog/llama-3-3-70b-vs-gpt-4o
Vellum AI: Llama 3.3 70B vs GPT-4o. https://www.vellum.ai/blog/llama-3-3-70b-vs-gpt-4o. Eval- uation shows GPT-4o leads on math (55% vs lower), reasoning (69% vs 44%), while Llama 3.3 70B is competitive on classification and strengths in coding, tool use and multilingual tasks; cost/latency analysis included. (2024)
work page 2024
-
[46]
https://openai.com/index/introducing-gpt-oss/
OpenAI: Introducing gpt-oss. https://openai.com/index/introducing-gpt-oss/. Accessed: 2025-11- 19 (2025)
work page 2025
-
[47]
https://github.com/ openai/gpt-oss
OpenAI: gpt-oss: gpt-oss-120b and gpt-oss-20b open-weight language models. https://github.com/ openai/gpt-oss. GitHub repository; accessed: 2025-11-19 (2025)
work page 2025
-
[48]
Measuring Massive Multitask Language Understanding
Hendrycks, D., Burns, C., Basart, S., Zou, A., Mazeika, M., Song, D., Steinhardt, J.: Measuring Massive Multitask Language Understanding (2021). https://arxiv.org/abs/2009.03300
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[49]
Rein, D., Hou, B.L., Stickland, A.C., Petty, J., Pang, R.Y., Dirani, J., Michael, J., Bowman, S.R.: GPQA: A Graduate-Level Google-Proof Q and A Benchmark (2023). https://arxiv.org/abs/2311. 12022
work page 2023
-
[50]
https://arxiv.org/abs/2410.03131
Patel, B., Chakraborty, S., Suttle, W.A., Wang, M., Bedi, A.S., Manocha, D.: AIME: AI System Optimization via Multiple LLM Evaluators (2024). https://arxiv.org/abs/2410.03131
-
[51]
gpt-oss-120b & gpt-oss-20b Model Card
OpenAI: gpt-oss-120b & gpt-oss-20b Model Card. Original PDF available from OpenAI at https:// cdn.openai.com/pdf/419b6906-9da6-406c-a19d-1bb078ac7637/oai gpt-oss model card.pdf (2025). https://arxiv.org/abs/2508.10925
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[52]
https://huggingface.co/docs/transformers/ en/quantization/mxfp4
Hugging Face: MXFP4 Quantization in Transformers. https://huggingface.co/docs/transformers/ en/quantization/mxfp4. Accessed: 2025-11-24 (2025)
work page 2025
-
[53]
OpenAI: Introducing GPT-5. OpenAI. https://openai.com/index/introducing-gpt-5/ Accessed 2025-11-26
work page 2025
-
[54]
Huang, S., Cole, J.M.: Batterybert: A pretrained language model for battery database enhance- ment. Journal of Chemical Information and Modeling62(24), 6365–6377 (2022) https://doi.org/ 10.1021/acs.jcim.2c00035
-
[55]
Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C.H., Kang, J.: Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics36(4), 1234–1240 (2019) https://doi.org/10.1093/bioinformatics/btz682
-
[56]
doi: 10.1038/s41587-022-01618-2
Madani, A., Krause, B., Greene, E.R., Subramanian, S., Mohr, B.P., Holton, J.M., Olmos, J.L., Xiong, C., Sun, Z.Z., Socher, R., Fraser, J.S., Naik, N.: Large language models generate functional protein sequences across diverse families. Nature Biotechnology41(8), 1099–1106 (2023) https: //doi.org/10.1038/s41587-022-01618-2
-
[57]
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Han, Z., Gao, C., Liu, J., Zhang, J., Zhang, S.Q.: Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey (2024). https://arxiv.org/abs/2403.14608
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
Taori, R., Gulrajani, I., Zhang, T., Dubois, Y., Li, X., Guestrin, C., Liang, P., Hashimoto, T.B.: Stanford Alpaca: An Instruction-following LLaMA model. GitHub. Retrieved from https://github.com/tatsu-lab/stanford alpaca (2023)
work page 2023
-
[59]
Langtangen, H.P., Logg, A.: Solving PDEs in Python: The FEniCS Tutorial I (Volume 3, Simula SpringerBriefs on Computing), p. 146. Springer, Cham, Switzerland (2017). https://doi.org/10. 1007/978-3-319-52462-7 . https://fenicsproject.org/tutorial/
work page 2017
-
[60]
https://github.com/ david-kamensky/mae-207-fea-for-coupled-problems
Kamensky, D.: MAE 207 – FEA for coupled problems: Code examples. https://github.com/ david-kamensky/mae-207-fea-for-coupled-problems. Code examples for the class ”MAE 207: FEA for coupled problems” at UC San Diego (Legacy FEniCS) (2022)
work page 2022
-
[61]
The FEniCS Project: DOLFIN Python demos (legacy FEniCS documentation). https://olddocs. 47 fenicsproject.org/dolfin/latest/python/demos.html. Accessed 2025-11-19 (2017)
work page 2025
-
[62]
https://people.sc.fsu.edu/ ∼jburkardt/fenics src/fenics src.html
Burkardt, J.: FEniCS Examples. https://people.sc.fsu.edu/ ∼jburkardt/fenics src/fenics src.html. Mirror at https://people.math.sc.edu/Burkardt/fenics src/fenics src.html. Accessed 2025-11-19 (2020)
work page 2025
-
[63]
OpenAI: Introducing OpenAI O3 and O4-mini. Accessed: 2 December 2025. https://openai.com/ index/introducing-o3-and-o4-mini/
work page 2025
-
[64]
https://research.google.com/colaboratory/faq.html
Google: Colaboratory (Google Colab). https://research.google.com/colaboratory/faq.html. Ac- cessed 2025-11-19
work page 2025
-
[65]
https://blog.google/technology/ google-deepmind/gemini-model-thinking-updates-march-2025/
Google DeepMind: Gemini 2.5: Our Most Intelligent AI Model. https://blog.google/technology/ google-deepmind/gemini-model-thinking-updates-march-2025/
work page 2025
-
[67]
LoRA: Low-Rank Adaptation of Large Language Models
Hu, E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Wang, L., Chen, W.: Lora: Low-rank adaptation of large language models. In: Proceedings of the 10th International Conference on Learning Representations (ICLR 2022) (2022). Paper and code available at https://arxiv.org/abs/ 2106.09685. https://openreview.net/forum?id=nZeVKeeFYf9
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[68]
QLoRA: Efficient Finetuning of Quantized LLMs
Dettmers, T., Pagnoni, A., Holtzman, A., Zettlemoyer, L.: QLoRA: Efficient Finetuning of Quantized LLMs (2023). https://arxiv.org/abs/2305.14314
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[69]
Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024
Shuttleworth, R., Andreas, J., Torralba, A., Sharma, P.: LoRA vs Full Fine-tuning: An Illusion of Equivalence (2024). https://arxiv.org/abs/2410.21228
-
[70]
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA
Kalajdzievski, D.: A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA (2023). https: //arxiv.org/abs/2312.03732
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[71]
Transactions on Machine Learning Research (2024)
Biderman, D., Portes, J., Ortiz, J.J.G., Paul, M., Greengard, P., Jennings, C., King, D., Havens, S., Chiley, V., Frankle, J., Blakeney, C., Cunningham, J.P.: LoRA learns less and forgets less. Transactions on Machine Learning Research (2024). Featured Certification
work page 2024
-
[72]
Decoupled Weight Decay Regularization
Loshchilov, I., Hutter, F.: Decoupled Weight Decay Regularization (2019). https://arxiv.org/abs/ 1711.05101
work page internal anchor Pith review Pith/arXiv arXiv 2019
-
[73]
https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt
Rosen Center for Advanced Computing: AnvilGPT User Guide. https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt. Purdue University. Accessed: 2025-11-22 (2025)
work page 2025
- [74]
-
[75]
https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt/api
Rosen Center for Advanced Computing: AnvilGPT API. https://www.rcac.purdue.edu/ knowledge/anvil/anvilgpt/api. Purdue University. Accessed: 2025-11-22 (2025)
work page 2025
-
[76]
https: //huggingface.co/docs/peft/developer guides/lora
Face, H.: PEFT LoRA Developer Guide: Merge LoRA weights into the base model. https: //huggingface.co/docs/peft/developer guides/lora. Accessed 2025-11-22 (2024)
work page 2025
-
[77]
https://github.com/ggml-org/ llama.cpp
Gerganov, G., contributors: llama.cpp: LLM inference in C/C++. https://github.com/ggml-org/ llama.cpp. Accessed 2025-11-22 (2023)
work page 2025
-
[78]
https://github.com/ggml-org/llama
contributors: How to convert Hugging Face models to GGUF. https://github.com/ggml-org/llama. cpp/discussions/2948. Accessed 2025-11-22 (2024)
work page 2025
-
[79]
https://huggingface.co/docs/hub/gguf
Face, H.: GGUF on the Hugging Face Hub. https://huggingface.co/docs/hub/gguf. Accessed 2025- 11-22 (2024)
work page 2025
-
[80]
https://docs.ollama.com/import
Ollama: Importing a GGUF Model or Adapter. https://docs.ollama.com/import. Accessed 2025- 11-22 (2025) 48
work page 2025
-
[81]
https://gist.github.com/ Artefact2/b5f810600771265fc1e39442288e8ec9
Artefact2: GGUF quantizations overview and recommendations. https://gist.github.com/ Artefact2/b5f810600771265fc1e39442288e8ec9. Accessed 2025-11-22 (2024)
work page 2025
-
[82]
https://docs.ollama.com/modelfile
Ollama: Modelfile Reference. https://docs.ollama.com/modelfile. Accessed 2025-11-22 (2025)
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.