pith. sign in

arxiv: 2507.22558 · v2 · submitted 2025-07-30 · ❄️ cond-mat.mtrl-sci · cs.AI

aLLoyM: A large language model for alloy phase diagram prediction

Pith reviewed 2026-05-19 03:06 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI
keywords alloy phase diagramslarge language modelfine-tuningmaterials discoveryCALPHADphase predictionbinary alloysternary alloys
0
0 comments X

The pith

A fine-tuned large language model generates phase diagrams for new alloy systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces aLLoyM, a version of the Mistral model fine-tuned on question-and-answer pairs about binary and ternary alloy phase diagrams drawn from databases like CPDDB and CALPHAD assessments. The goal is to show that this training lets the model answer phase-related questions and even create diagrams for alloy combinations it has not seen before. A sympathetic reader would care because traditional methods for mapping phase diagrams are time-consuming, and a working LLM could quickly suggest promising new compositions for materials with desired properties. The short-answer format particularly demonstrates this generative capability.

Core claim

By curating Q&A pairs on alloy compositions, temperatures, and phases from existing sources and fine-tuning Mistral on both multiple-choice and short-answer formats, the resulting aLLoyM model shows improved accuracy on benchmark questions and the short-answer version can produce phase diagrams for novel alloy systems based solely on their components.

What carries the argument

Fine-tuned Mistral LLM using Q&A pairs from phase diagram databases to predict or generate phase information.

If this is right

  • Performance on multiple-choice phase diagram questions improves substantially after fine-tuning.
  • The short-answer model generates phase diagrams for alloy systems not present in the training data.
  • Public release of the model and dataset supports further development in this area.
  • This method offers a way to accelerate exploration of unexplored materials systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such models might eventually handle quaternary or higher-order alloys if more data is incorporated.
  • Integration with experimental validation loops could refine the predictions iteratively.
  • The approach could complement existing computational tools like CALPHAD by providing rapid initial guesses.

Load-bearing premise

The curated set of Q&A pairs from known phase diagrams supplies enough varied examples for the model to learn general rules that apply accurately to entirely new alloy compositions.

What would settle it

Compare the phase diagrams generated by the short-answer model for a selection of unseen ternary alloys against independent experimental measurements or detailed CALPHAD calculations to check for agreement in phase fields and transition temperatures.

Figures

Figures reproduced from arXiv: 2507.22558 by Guillaume Deffrennes, Koji Tsuda, Ryo Tamura, Taichi Abe, Yuna Oikawa.

Figure 1
Figure 1. Figure 1: Schematic of fine-tuned LLM for phase diagram generation: aLLoyM. Q&As were [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Accuracies of the baseline model (Mistral) and the fine-tuned model (aLLoyM) on [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average scores of the fine-tuned models (aLLoyM) for short answer Q&As. Results [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Representative binary phase diagrams exhibiting varying predictive performance, [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representative 800K ternary isothermal sections exhibiting varying predictive [PITH_FULL_IMAGE:figures/full_fig_p013_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Examples of unknown binary phase diagrams and 800K ternary isothermal sections [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗
read the original abstract

Large Language Models (LLMs) are general-purpose tools with wide-ranging applications, including in materials science. In this work, we introduce aLLoyM, a fine-tuned LLM specifically trained on alloy compositions, temperatures, and their corresponding phase information. To develop aLLoyM, we curated question-and-answer (Q&A) pairs for binary and ternary phase diagrams using the open-source Computational Phase Diagram Database (CPDDB) and assessments based on CALPHAD (CALculation of PHAse Diagrams). We fine-tuned Mistral, an open-source pre-trained LLM, for two distinct Q&A formats: multiple-choice and short-answer. Benchmark evaluations demonstrate that fine-tuning substantially enhances performance on multiple-choice phase diagram questions. Moreover, the short-answer model of aLLoyM exhibits the ability to generate novel phase diagrams from its components alone, underscoring its potential to accelerate the discovery of previously unexplored materials systems. To promote further research and adoption, we have publicly released the short-answer fine-tuned version of aLLoyM, along with the complete benchmarking Q&A dataset, on Hugging Face.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces aLLoyM, a fine-tuned Mistral LLM for alloy phase diagram prediction. It curates Q&A pairs from CPDDB and CALPHAD assessments for binary and ternary systems, fine-tunes the model on multiple-choice and short-answer formats, reports improved benchmark performance on multiple-choice questions, and claims that the short-answer version can generate novel phase diagrams from alloy components alone. The short-answer model and Q&A dataset are released on Hugging Face.

Significance. If the short-answer generation capability were quantitatively validated on unseen systems, aLLoyM could offer a new route to rapid phase-diagram exploration that complements traditional CALPHAD workflows. At present the significance is limited by the absence of error metrics or held-out tests for the generative outputs.

major comments (2)
  1. [Abstract] Abstract: the claim that the short-answer model 'exhibits the ability to generate novel phase diagrams from its components alone' is unsupported by any quantitative metrics (e.g., MAE on invariant temperatures or phase-boundary compositions), error analysis, or comparison to independent CALPHAD/experimental references for alloy systems absent from the CPDDB/CALPHAD Q&A corpus.
  2. [Results] Results / evaluation section: no held-out test set of truly unseen binaries or ternaries is described, nor are systematic comparisons provided against established baselines (standard CALPHAD assessments or other ML models) for the short-answer extrapolation task; this leaves open the possibility that outputs are interpolations or hallucinations rather than reliable predictions.
minor comments (2)
  1. [Methods] The manuscript should specify the total number of Q&A pairs, the train/validation/test split ratios, and the exact fine-tuning hyperparameters used for both formats.
  2. [Figures] Figure or table captions for any example generated diagrams should include the source of the reference data against which the output is compared.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments, which correctly highlight the need for stronger quantitative support behind the generative claims for the short-answer model. We have revised the manuscript to temper the language in the abstract, clarify the scope of the evaluation, and add explicit discussion of limitations. These changes preserve the core contribution on fine-tuning and multiple-choice benchmarking while addressing the identified gaps.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the short-answer model 'exhibits the ability to generate novel phase diagrams from its components alone' is unsupported by any quantitative metrics (e.g., MAE on invariant temperatures or phase-boundary compositions), error analysis, or comparison to independent CALPHAD/experimental references for alloy systems absent from the CPDDB/CALPHAD Q&A corpus.

    Authors: We agree that the original abstract phrasing overstated the strength of evidence for the generative capability. The examples in the manuscript are qualitative demonstrations rather than quantitatively validated predictions on held-out systems. In the revised manuscript we have updated the abstract to state that the short-answer model 'can generate phase diagram descriptions for alloy systems from component inputs, as shown in illustrative examples.' We have also inserted a dedicated limitations paragraph that notes the lack of MAE or similar metrics, the absence of comparisons to independent references for systems outside the training corpus, and the exploratory nature of the generation results. These revisions align the claims with the presented evidence. revision: yes

  2. Referee: [Results] Results / evaluation section: no held-out test set of truly unseen binaries or ternaries is described, nor are systematic comparisons provided against established baselines (standard CALPHAD assessments or other ML models) for the short-answer extrapolation task; this leaves open the possibility that outputs are interpolations or hallucinations rather than reliable predictions.

    Authors: The quantitative evaluation reported in the manuscript is confined to the multiple-choice benchmark, which does employ held-out questions. For the short-answer model the outputs are presented as illustrative generations on component combinations not directly copied from training Q&A pairs. We acknowledge that this does not constitute a rigorous held-out extrapolation test and that no baseline comparisons to other ML approaches or full CALPHAD assessments are provided for the generative task. In the revision we have added clarifying text in the results section describing the selection of generation examples, a short discussion of possible hallucinations, and a statement that systematic quantitative validation on truly novel systems is an important avenue for future work. We maintain that the primary contribution remains the fine-tuning and benchmarking results, with generation shown as a promising direction rather than a fully validated capability. revision: partial

Circularity Check

0 steps flagged

No significant circularity in fine-tuning pipeline or claims

full rationale

The paper's core process is curating Q&A pairs from external sources (CPDDB and independent CALPHAD assessments) followed by standard supervised fine-tuning of a pre-trained Mistral model. The claim that the short-answer model can generate novel phase diagrams rests on this conventional ML training against an independent database rather than any self-referential derivation, parameter fitting that is then re-used as a prediction, or load-bearing self-citation chains. No equations, uniqueness theorems, or ansatzes are invoked that reduce to the paper's own inputs by construction. The derivation chain is self-contained against external benchmarks and follows ordinary supervised learning practices.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The work relies on the external CPDDB database and prior CALPHAD assessments as ground truth; no new physical axioms or invented entities are introduced. The only free parameters are those implicit in the fine-tuning process itself.

free parameters (1)
  • fine-tuning hyperparameters
    Learning rate, number of epochs, and prompt formatting choices that control how the base Mistral weights are updated on the Q&A pairs.
axioms (1)
  • domain assumption The curated Q&A pairs derived from CPDDB and CALPHAD assessments accurately represent thermodynamic phase equilibria for the binary and ternary systems considered.
    Invoked when the model is trained and evaluated; the abstract states the data source but does not discuss validation against independent experimental measurements.

pith-pipeline@v0.9.0 · 5740 in / 1247 out tokens · 31533 ms · 2026-05-19T03:06:55.631548+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Schlesinger, M. E. & Mueller, E. M. (eds) ASM handbook, volume 3 ASM Handbook (ASM International, Materials Park, OH, 1983)

  2. [2]

    Massalski, T. B. & Okamoto, H. (eds) Binary alloy phase diagrams (ASM International, 1990)

  3. [3]

    & Okamoto, H

    Villars, P., Prince, A. & Okamoto, H. Handbook of ternary alloy phase diagrams (ASM International, 1995)

  4. [4]

    Desk handbook 2 edn

    Okamoto, H. Desk handbook 2 edn. ASM Handbooks (ASM International, 2010)

  5. [5]

    URL https://cpddb.nims.go

    Computational phase diagram database (CPDDB). URL https://cpddb.nims.go. jp/

  6. [6]

    & Van Ende, M.-A

    Jung, I.-H. & Van Ende, M.-A. Computational thermodynamic calculations: FactSage from CALPHAD thermodynamic database to virtual process simulation. Metallurgical and Materials Transactions B 51, 1851-1874 (2020). URL http://dx.doi.org/10. 1007/s11663-020-01908-7

  7. [7]

    & Haase, C

    Hallstedt, B., Noori, M., Kies, F., Oppermann, F. & Haase, C. Thermodynamic database for multi-principal element alloys within the system Al-Co-Cr-Fe-Mn-Ni-C. 20 Calphad 83, 102644 (2023). URL http://dx.doi.org/10.1016/j.calphad.2023. 102644

  8. [8]

    Terayama, K. et al. Efficient construction method for phase diagrams using uncertainty sampling. Physical Review Materials 3 (2019). URL http://dx.doi.org/10.1103/ PhysRevMaterials.3.033802

  9. [9]

    A., Ahmadi, E

    Aghaaminiha, M., Ghanadian, S. A., Ahmadi, E. & Farnoud, A. M. A machine learn- ing approach to estimation of phase diagrams for three-component lipid mixtures. Biochimica et Biophysica Acta (BBA) - Biomembranes 1862, 183350 (2020). URL http://dx.doi.org/10.1016/j.bbamem.2020.183350

  10. [10]

    & Glotzer, S

    Dai, C. & Glotzer, S. C. Efficient phase diagram sampling by active learning. The Journal of Physical Chemistry B 124, 1275-1284 (2020). URL http://dx.doi.org/ 10.1021/acs.jpcb.9b09202

  11. [11]

    Lund, J., Wang, H., Braatz, R. D. & Garc ˜Aa, R. E. Machine learning of phase dia- grams. Materials Advances 3, 8485-8497 (2022). URL http://dx.doi.org/10.1039/ D2MA00524G

  12. [12]

    & Laino, T

    Zipoli, F., Viterbo, V., Schilter, O., Kahle, L. & Laino, T. Prediction of phase dia- grams and associated phase structural properties. Industrial & Engineering Chemistry Research 61, 8378-389 (2022). URL http://dx.doi.org/10.1021/acs.iecr.2c00355

  13. [13]

    Tamura, R. et al. Machine-learning-based phase diagram construction for high- throughput batch experiments. Science and Technology of Advanced Materials: Methods 2, 153-161 (2022). URL http://dx.doi.org/10.1080/27660400.2022.2076548

  14. [14]

    & Tamura, R

    Deffrennes, G., Terayama, K., Abe, T. & Tamura, R. A machine learning-based classifi- cation approach for phase diagram prediction. Materials & Design 215, 110497 (2022). URL http://dx.doi.org/10.1016/j.matdes.2022.110497. 21

  15. [15]

    Tamura, R. et al. AIPHAD, an active learning web application for visual understanding of phase diagrams. Communications Materials 5, 139 (2024). URL http://dx.doi. org/10.1038/s43246-024-00580-7

  16. [16]

    Jablonka, K. M. et al. 14 examples of how LLMs can transform materials science and chemistry: a reflection on a large language model hackathon. Digital Discovery 2, 1233-1250 (2023). URL http://dx.doi.org/10.1039/D3DD00113J

  17. [17]

    Liu, Y. et al. Generative artificial intelligence and its applications in materials science: Current situation and future perspectives. Journal of Materiomics 9, 798-816 (2023). URL http://dx.doi.org/10.1016/j.jmat.2023.05.001

  18. [18]

    & Cooper, S

    Lei, G., Docherty, R. & Cooper, S. J. Materials science in the era of large language models: a perspective. Digital Discovery 3, 1257-1272 (2024). URL http://dx.doi. org/10.1039/D4DD00074A

  19. [19]

    Deb, J., Saikia, L., Dihingia, K. D. & Sastry, G. N. ChatGPT in the material design: Selected case studies to assess the potential of ChatGPT. Journal of Chemical Informa- tion and Modeling 64, 799-811 (2024). URL http://dx.doi.org/10.1021/acs.jcim. 3c01702

  20. [20]

    Jiang, X. et al. Applications of natural language processing and large language models in materials discovery. npj Computational Materials 11, 79 (2025). URL http://dx. doi.org/10.1038/s41524-025-01554-0

  21. [21]

    Yan, Z. et al. PDGPT: A large language model for acquiring phase diagram information in magnesium alloys. Materials Genome Engineering Advances 2, e77 (2024). URL http://dx.doi.org/10.1002/mgea.77. 22

  22. [22]

    & Lu, X.-G

    Zha, Y., Li, Y. & Lu, X.-G. Enhancing large language model comprehension of material phase diagrams through prompt engineering and benchmark datasets. Mathematics 12, 3141 (2024). URL http://dx.doi.org/10.3390/math12193141

  23. [23]

    URL https://computherm.com/

    Pandat software. URL https://computherm.com/

  24. [24]

    URL https://huggingface.co/ unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit

    Hugging face, mistral-nemo-instruct-2407-bnb-4bit. URL https://huggingface.co/ unsloth/Mistral-Nemo-Instruct-2407-bnb-4bit

  25. [25]

    URL https://periodic-table.rsc.org/element/89/ actinium

    Periodic table, actinium. URL https://periodic-table.rsc.org/element/89/ actinium

  26. [26]

    & Money, R

    Farr, J., Giorgi, A., Bowman, M. & Money, R. The crystal structure of actinium metal and actinium hydride. Journal of Inorganic and Nuclear Chemistry 18, 42-47 (1961). URL http://dx.doi.org/10.1016/0022-1902(61)80369-2

  27. [27]

    URL https://periodic-table.rsc.org/element/92/ uranium

    Periodic table, uranium. URL https://periodic-table.rsc.org/element/92/ uranium

  28. [28]

    Grenthe, I. et al. Uranium , 253-298 (Springer Netherlands). URL http://dx.doi.org/ 10.1007/1-4020-3598-5_5 . Acknowledgements The authors would like to thank Etsuko Ogamino for data collection. This study was sup- ported by a project subsidized by JSPS KAKENHI (25K01492 and 25KJ0870), JST-CREST (JPMJCR21O2), and MEXT Program: Data Creation and Utilizatio...