FormuLLA: A Large Language Model Approach to Generating Novel 3D Printable Formulations
Pith reviewed 2026-05-21 16:35 UTC · model grok-4.3
The pith
Fine-tuned Llama2 recommends excipients for 3D printable pharmaceutical formulations better than other large language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Large language models fine-tuned on over 1400 FDM pharmaceutical formulations can recommend suitable excipients for given API doses and predict filament properties, with Llama2 emerging as the most suitable architecture among those tested, though performance depends heavily on model choice and parameterization, and smaller models suffer from catastrophic forgetting.
What carries the argument
Fine-tuning and evaluation of multiple LLM architectures including Llama2 on a fused deposition modelling (FDM) formulation dataset for excipient recommendation and property prediction.
If this is right
- Llama2 provides superior recommendations for excipients in FDM formulations compared to other models tested.
- Adjusting fine-tuning and generation parameters can significantly improve or degrade model performance.
- Smaller LLMs are prone to catastrophic forgetting when trained on this relatively small dataset.
- Standard linguistic metrics for LLMs do not assess the actual processability or printability of the generated formulations.
- Models pre-trained on biomedical data do not necessarily yield the best results for this pharmaceutical application.
Where Pith is reading between the lines
- Integrating these LLM recommendations with experimental validation could accelerate development of personalized 3D printed drugs.
- Future systems might optimize directly for formulation success rather than language scores.
- This method could be adapted to other additive manufacturing techniques in pharma with appropriate datasets.
- Techniques to mitigate catastrophic forgetting would allow smaller models to retain broad knowledge while specializing.
Load-bearing premise
The assumption that good performance on linguistic and generative tasks in LLMs will translate to formulations that are actually processable and printable in practice.
What would settle it
Laboratory experiments printing filaments from the model's recommended excipient combinations to verify mechanical properties and printability.
read the original abstract
Pharmaceutical three-dimensional (3D) printing is an advanced fabrication technology with the potential to enable truly personalised dosage forms. Recent studies have integrated artificial intelligence (AI) to accelerate formulation and process development, drastically transforming current approaches to pharmaceutical 3D printing. To date, most AI-driven efforts remain narrowly focused, while failing to account for the broader formulation challenges inherent to the technology. Recent advances in AI have introduced artificial general intelligence concepts, wherein systems extend beyond conventional predictive modelling toward more generalised, human-like reasoning. In this work, we investigate the application of large language models (LLMs), fine-tuned on a fused deposition modelling (FDM) dataset comprising over 1400 formulations, to recommend suitable excipients based on active pharmaceutical ingredient (API) dose, and predict filament mechanical properties. Four LLM architectures were fine-tuned, with systematic evaluation of both fine-tuning and generative parameter configurations. Our results demonstrate that Llama2 was best suited for recommending excipients for FDM formulations. Additionally, model selection and parameterisation significantly influence performance, with smaller LLMs exhibiting instances of catastrophic forgetting. Furthermore, we demonstrate: (i) even with relatively small dataset of over 1400 formulations, it can lead to model catastrophic forgetting; (ii) standard LLM metrics only evaluate linguistic performance but not formulation processability; and (iii) LLMs trained on biomedically-related data do not always produce the best results. Addressing these challenges is essential to advancing LLMs beyond linguistic proficiency and toward reliable systems for pharmaceutical formulation development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FormuLLA, which fine-tunes four LLM architectures on a fused dataset of over 1400 FDM formulations to recommend excipients for given API doses and to predict filament mechanical properties. Systematic sweeps of fine-tuning and generation parameters are performed; the results indicate that Llama2 is best suited for excipient recommendation, that model choice and parameterization materially affect outcomes, and that smaller models can exhibit catastrophic forgetting. The manuscript also notes that standard linguistic metrics do not capture formulation processability and that biomedically pre-trained models are not always optimal.
Significance. If the linguistic outputs could be shown to correspond to physically viable filaments, the work would offer a novel LLM-based route to personalized pharmaceutical 3D printing. The systematic multi-architecture comparison and explicit parameter sweeps constitute a clear methodological strength and provide reproducible evidence on model behavior for this domain-specific task. The significance is currently limited by the absence of any direct experimental link between the reported metrics and actual FDM processability or printability.
major comments (2)
- [Abstract] Abstract: the central claim that the fine-tuned LLMs produce recommendations for 'novel 3D printable formulations' is not supported by the presented evidence. The abstract itself states that 'standard LLM metrics only evaluate linguistic performance but not formulation processability,' yet no additional mechanical, rheological, or printing experiments are reported to close this gap.
- [Results] Results section (comparative evaluation of Llama2 vs. other architectures): the reported superiority of Llama2 for excipient recommendation and the observations of catastrophic forgetting rest entirely on linguistic scores. Because the paper acknowledges that these scores do not establish filament extrudability or mechanical suitability for FDM, the practical-utility conclusions cannot be drawn from the data shown.
minor comments (2)
- [Methods] Methods: provide the precise train/validation/test split ratios and any deduplication steps applied to the >1400-entry dataset so that the fine-tuning experiments can be reproduced.
- [Discussion] Discussion: the statement that 'LLMs trained on biomedically-related data do not always produce the best results' would benefit from a short table listing the biomedical pre-training status of each of the four architectures evaluated.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the scope and limitations of our work. We address each major comment below and indicate where revisions will be made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that the fine-tuned LLMs produce recommendations for 'novel 3D printable formulations' is not supported by the presented evidence. The abstract itself states that 'standard LLM metrics only evaluate linguistic performance but not formulation processability,' yet no additional mechanical, rheological, or printing experiments are reported to close this gap.
Authors: We agree that the abstract phrasing risks overstating the direct applicability to physically printable formulations. The term 'novel' in the manuscript refers to LLM-generated excipient combinations that are not verbatim copies from the training data but are produced through the model's learned patterns. The work is positioned as a computational demonstration of LLM utility for formulation recommendation rather than a complete end-to-end printable solution. We will revise the abstract to more explicitly state that recommendations are evaluated via linguistic metrics and that physical processability validation remains a necessary subsequent step. This revision will better align the claims with the evidence presented. revision: partial
-
Referee: [Results] Results section (comparative evaluation of Llama2 vs. other architectures): the reported superiority of Llama2 for excipient recommendation and the observations of catastrophic forgetting rest entirely on linguistic scores. Because the paper acknowledges that these scores do not establish filament extrudability or mechanical suitability for FDM, the practical-utility conclusions cannot be drawn from the data shown.
Authors: The comparative results are grounded in standard metrics for evaluating text generation quality in LLMs, which is the appropriate evaluation paradigm for this task. We already note in the manuscript that these metrics do not equate to FDM processability. The superiority of Llama2 is demonstrated through better alignment with dataset distributions in terms of coherence and relevance of generated recommendations. We will add explicit language in the results and discussion sections to reinforce that practical utility for filament production would require experimental confirmation and to avoid implying direct processability from linguistic performance alone. revision: partial
- Direct experimental validation of the generated formulations' mechanical properties, rheological behavior, or actual FDM printability is absent from the current study.
Circularity Check
No circularity in empirical LLM fine-tuning for FDM formulations
full rationale
The paper describes an empirical workflow of fine-tuning four LLM architectures on an external dataset of over 1400 FDM formulations, followed by comparative evaluation using standard generative and predictive metrics to identify Llama2 as best suited and to note effects of model size and parameterization. No equations, derivations, or self-definitional steps appear in the provided text; performance claims rest on direct experimental outputs rather than any reduction of a 'prediction' to a fitted input by construction. The paper itself flags that linguistic metrics do not capture processability, but this is an acknowledged limitation on external validity, not evidence of internal circularity. The work is self-contained against the dataset benchmarks and does not rely on load-bearing self-citations or imported uniqueness theorems.
Axiom & Free-Parameter Ledger
free parameters (1)
- fine-tuning and generative parameter configurations
axioms (1)
- domain assumption A dataset of over 1400 FDM formulations is representative enough for LLM fine-tuning to produce useful excipient recommendations.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Four LLM architectures were fine-tuned... Llama2 was best suited for recommending excipients... standard LLM metrics only evaluate linguistic performance but not formulation processability
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
dataset comprising over 1400 formulations... PEFT with LoRA on Q/V (and optionally K/O) projections
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Connected healthcare: Improving patient care using digital health technologies
Awad A, Trenfield SJ, Pollard TD, Ong JJ, Elbadawi M, McCoubrey LE, et al. Connected healthcare: Improving patient care using digital health technologies. Advanced Drug Delivery Reviews. 2021;178:113958
work page 2021
-
[2]
3D printing tablets: Predicting printability and drug dissolution from rheological data
Elbadawi M, Gustaffson T, Gaisford S, Basit AW. 3D printing tablets: Predicting printability and drug dissolution from rheological data. International Journal of Pharmaceutics. 2020;590:119868
work page 2020
-
[3]
4D Printing for Automotive Industry Applications
Raina A, Haq MIU, Javaid M, Rab S, Haleem A. 4D Printing for Automotive Industry Applications. Journal of The Institution of Engineers (India): Series D. 2021;102:521-9
work page 2021
-
[4]
Applications of nanotechnology in 3D printed tissue engineering scaffolds
Laird NZ, Acri TM, Chakka JL, Quarterman JC, Malkawi WI, Elangovan S, et al. Applications of nanotechnology in 3D printed tissue engineering scaffolds. European Journal of Pharmaceutics and Biopharmaceutics. 2021;161:15-28
work page 2021
-
[5]
3D printing of ceramics: Advantages, challenges, applications, and perspectives
Bose S, Akdogan EK, Balla VK, Ciliveri S, Colombo P, Franchin G, et al. 3D printing of ceramics: Advantages, challenges, applications, and perspectives. Journal of the American Ceramic Society. 2024;107:7879-920
work page 2024
-
[6]
Quodbach J, Bogdahn M, Breitkreutz J, Chamberlain R, Eggenreich K, Elia AG, et al. Quality of FDM 3D Printed Medicines for Pediatrics: Considerations for Formulation Development, Filament Extrusion, Printing Process and Printer Design. Therapeutic Innovation & Regulatory Science. 2022;56:910-28
work page 2022
-
[7]
3D printing for personalised medicines: implications for policy and practice
Englezos K, Wang L, Tan ECK, Kang L. 3D printing for personalised medicines: implications for policy and practice. International Journal of Pharmaceutics. 2023;635:122785
work page 2023
-
[8]
Scenarios for 3D printing of personalized medicines - A case study
Beer N, Hegger I, Kaae S, De Bruin ML, Genina N, Alves TL, et al. Scenarios for 3D printing of personalized medicines - A case study. Exploratory Research in Clinical and Social Pharmacy. 2021;4:100073
work page 2021
-
[9]
Disrupting 3D printing of medicines with machine learning
Elbadawi M, McCoubrey LE, Gavins FK, Ong JJ, Goyanes A, Gaisford S, et al. Disrupting 3D printing of medicines with machine learning. Trends in pharmacological sciences. 2021;42:745-57
work page 2021
-
[10]
Harnessing artificial intelligence for the next generation of 3D printed medicines
Elbadawi M, McCoubrey LE, Gavins FK, Ong JJ, Goyanes A, Gaisford S, et al. Harnessing artificial intelligence for the next generation of 3D printed medicines. Advanced Drug Delivery Reviews. 2021;175:113805. 33
work page 2021
-
[11]
Machine learning predicts 3D printing performance of over 900 drug delivery systems
Castro BM, Elbadawi M, Ong JJ, Pollard T, Song Z, Gaisford S, et al. Machine learning predicts 3D printing performance of over 900 drug delivery systems. Journal of Controlled Release. 2021;337:530-45
work page 2021
-
[12]
M3DISEEN: A novel machine learning approach for predicting the 3D printability of medicines
Elbadawi M, Castro BM, Gavins FK, Ong JJ, Gaisford S, Pérez G, et al. M3DISEEN: A novel machine learning approach for predicting the 3D printability of medicines. International Journal of Pharmaceutics. 2020;590:119837
work page 2020
-
[13]
Artificial intelligence generates novel 3D printing formulations
Elbadawi M, Li H, Sun S, Alkahtani ME, Basit AW, Gaisford S. Artificial intelligence generates novel 3D printing formulations. Applied Materials Today. 2024;36:102061
work page 2024
-
[14]
Large language models in medicine
Thirunavukarasu AJ, Ting DSJ, Elangovan K, Gutierrez L, Tan TF, Ting DSW. Large language models in medicine. Nature Medicine. 2023;29:1930-40
work page 2023
-
[15]
Using large language models in psychology
Demszky D, Yang D, Yeager DS, Bryan CJ, Clapper M, Chandhok S, et al. Using large language models in psychology. Nature Reviews Psychology. 2023;2:688-701
work page 2023
-
[16]
Large language models overcome the challenges of unstructured text data in ecology
Castro A, Pinto J, Reino L, Pipek P , Capinha C. Large language models overcome the challenges of unstructured text data in ecology. Ecological Informatics. 2024;82:102742
work page 2024
-
[17]
Plaza -del-Arco FM, Curry AC, Paoli S, Curry AC, Hovy D. Divine LLaMAs: Bias, stereotypes, stigmatization, and emotion representation of religion in large language models. Findings of the Association for Computational Linguistics: EMNLP 20242024. p. 4346-66
-
[18]
A survey of large language models for financial applications: Progress, prospects and challenges
Nie Y, Kong Y , Dong X, Mulvey JM, Poor HV, Wen Q, et al. A survey of large language models for financial applications: Progress, prospects and challenges. arXiv preprint arXiv:240611903. 2024
work page 2024
-
[19]
The role of artificial intelligence in generating original scientific research
Elbadawi M, Li H, Basit AW, Gaisford S. The role of artificial intelligence in generating original scientific research. International Journal of Pharmaceutics. 2024;652:123741
work page 2024
-
[20]
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, et al. Attention is all you need. Advances in neural information processing systems. 2017;30
work page 2017
-
[21]
Bounab Y, Antikainen O, Sivén M, Juppo A. Advancing Direct Tablet Compression with AI: A multi-task framework for quality control, batch acceptance, and causal analysis. European Journal of Pharmaceutical Sciences. 2025;212:107142
work page 2025
-
[22]
DrugPilot: LLM-based parameterized reasoning agent for drug discovery
Li K, Wu Z, Wang S, Wu J, Pan S, Hu W. DrugPilot: LLM-based parameterized reasoning agent for drug discovery. arXiv preprint arXiv:250513940. 2025
work page 2025
-
[23]
Drugagent: Automating ai -aided drug discovery programming through llm multi-agent collaboration
Liu S, Lu Y, Chen S, Hu X, Zhao J, Lu Y, et al. Drugagent: Automating ai -aided drug discovery programming through llm multi-agent collaboration. arXiv preprint arXiv:241115692. 2024
work page 2024
-
[24]
GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text
Liu P, Ren Y , Tao J, Ren Z. GIT-Mol: A multi-modal large language model for molecular science with graph, image, and text. Computers in Biology and Medicine. 2024;171:108073
work page 2024
-
[25]
Liang L, Chen Y , Wang T, Jiang D, Jin J, Pang Y, et al. Genetic Transformer: An Innovative Large Language Model Driven Approach for Rapid and Accurate Identification of Causative Variants in Rare Genetic Diseases. medRxiv. 2024:2024.07.18.24310666
work page 2024
-
[26]
DrugReAlign: a multisource prompt framework for drug repurposing based on large language models
Wei J, Zhuo L, Fu X, Zeng X, Wang L, Zou Q, et al. DrugReAlign: a multisource prompt framework for drug repurposing based on large language models. BMC Biology. 2024;22:226
work page 2024
-
[27]
Materials science in the era of large language models: a perspective
Lei G, Docherty R, Cooper SJ. Materials science in the era of large language models: a perspective. Digital Discovery. 2024;3:1257-72
work page 2024
-
[28]
Parthasarathy VB, Zafar A, Khan A, Shahid A. The ultimate guide to fine-tuning llms from basics to breakthroughs: An exhaustive review of technologies, research, best practices, applied research challenges and opportunities. arXiv preprint arXiv:240813296. 2024
work page 2024
-
[29]
Llama: Open and efficient foundation language models
Touvron H, Lavril T, Izacard G, Martinet X, Lachaux M -A, Lacroix T, et al. Llama: Open and efficient foundation language models. arXiv preprint arXiv:230213971. 2023
work page 2023
-
[30]
Llama 2: Open foundation and fine-tuned chat models
Touvron H, Martin L, Stone K, Albert P, Almahairi A, Babaei Y, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:230709288. 2023
work page 2023
-
[31]
Albert q. jiang as, arthur mensch, chris bamford, devendra singh chaplot, diego de las casas, florian bressand, gianna lengyel, guillaume lample, lucile saulnier, lélio renard lavaud, marie-anne lachaux, pierre stock, teven le scao, thibaut lavril, thomas wang, timothée lacroix, william el sayed. Mistral 7B. arXiv preprint arXiv:231006825. 2023;3
work page 2023
-
[32]
Documenting large webtext corpora: A case study on the colossal clean crawled corpus
Dodge J, Sap M, Marasović A, Agnew W, Ilharco G, Groeneveld D, et al. Documenting large webtext corpora: A case study on the colossal clean crawled corpus. arXiv preprint arXiv:210408758. 2021. 34
work page 2021
-
[33]
Exploring the limits of transfer learning with a unified text-to-text transformer
Raffel C, Shazeer N, Roberts A, Lee K, Narang S, Matena M, et al. Exploring the limits of transfer learning with a unified text-to-text transformer. Journal of machine learning research. 2020;21:1-67
work page 2020
-
[34]
BioGPT: generative pre -trained transformer for biomedical text generation and mining
Luo R, Sun L, Xia Y, Qin T, Zhang S, Poon H, et al. BioGPT: generative pre -trained transformer for biomedical text generation and mining. Briefings in Bioinformatics. 2022;23
work page 2022
-
[35]
Parameter-efficient fine-tuning for large models: A comprehensive survey
Han Z, Gao C, Liu J, Zhang J, Zhang SQ. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv preprint arXiv:240314608. 2024
work page 2024
-
[36]
Alammar J, Grootendorst M. Hands-on large language models: language understanding and generation: " O'Reilly Media, Inc."; 2024
work page 2024
-
[37]
Loss functions in deep learning: a comprehensive review
Elharrouss O, Mahmood Y , Bechqito Y, Adel Serhani M, Badidi E, Riffi J, et al. Loss functions in deep learning: a comprehensive review. arXiv e-prints. 2025:arXiv: 2504.04242
-
[38]
Evtikhiev M, Bogomolov E, Sokolov Y , Bryksin T. Out of the BLEU: How should we assess quality of the Code Generation models? Journal of Systems and Software. 2023;203:111741
work page 2023
-
[39]
Large Language Models are Diverse Role - Players for Summarization Evaluation
Wu N, Gong M, Shou L, Liang S, Jiang D. Large Language Models are Diverse Role - Players for Summarization Evaluation. Cham: Springer Nature Switzerland; 2023. p. 695-707
work page 2023
-
[40]
Machine learning recovers corrupted pharmaceutical 3D printing formulation data
Uddin O, Mohammed YA, Gaisford S, Elbadawi M. Machine learning recovers corrupted pharmaceutical 3D printing formulation data. International Journal of Pharmaceutics. 2026;687:126403
work page 2026
-
[41]
Lora: Low -rank adaptation of large language models
Hu EJ, Shen Y , Wallis P, Allen-Zhu Z, Li Y, Wang S, et al. Lora: Low -rank adaptation of large language models. ICLR. 2022;1:3
work page 2022
-
[42]
Wang F, Zhang Z, Zhang X, Wu Z, Mo T, Lu Q, et al. A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness. ACM Trans Intell Syst Technol . 2025;16:Article 145
work page 2025
-
[43]
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine -Tuning
Luo Y , Yang Z, Meng F, Li Y , Zhou J, Zhang Y . An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine -Tuning. IEEE Transactions on Audio, Speech and Language Processing. 2025;33:3776-86
work page 2025
-
[44]
Performance law of large language models
Wu C, Tang R. Performance law of large language models. arXiv preprint arXiv:240809895. 2024
work page 2024
-
[45]
Song S, Xu H, Ma J, Li S, Peng L, Wan Q, et al. How to Alleviate Catastrophic Forgetting in LLMs Finetuning? Hierarchical Layer-Wise and Element-Wise Regularization. arXiv preprint arXiv:250113669. 2025
work page 2025
-
[46]
Bonfigli A, Bacco L, Merone M, Dell’Orletta F. From pre-training to fine-tuning: An in-depth analysis of Large Language Models in the biomedical domain. Artificial Intelligence in Medicine. 2024;157:103003
work page 2024
-
[47]
Mitigating catastrophic forgetting in large language models with self -synthesized rehearsal
Huang J, Cui L, Wang A, Yang C, Liao X, Song L, et al. Mitigating catastrophic forgetting in large language models with self -synthesized rehearsal. arXiv preprint arXiv:240301244. 2024
work page 2024
-
[48]
Influence of artificial intelligence in modern pharmaceutical formulation and drug development
Ali KA, Mohin SK, Mondal P, Goswami S, Ghosh S, Choudhuri S. Influence of artificial intelligence in modern pharmaceutical formulation and drug development. Future Journal of Pharmaceutical Sciences. 2024;10:53
work page 2024
-
[49]
AI-directed formulation strategy design initiates rational drug development
Wang N, Dong J, Ouyang D. AI-directed formulation strategy design initiates rational drug development. Journal of Controlled Release. 2025;378:619-36
work page 2025
-
[50]
Revolutionizing drug formulation development: The increasing impact of machine learning
Bao Z, Bufton J, Hickman RJ, Aspuru-Guzik A, Bannigan P, Allen C. Revolutionizing drug formulation development: The increasing impact of machine learning. Advanced Drug Delivery Reviews. 2023;202:115108
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.