LLM-guided phase diagram construction through high-throughput experimentation
Pith reviewed 2026-05-10 00:19 UTC · model grok-4.3
The pith
Large language models can guide construction of ternary alloy phase diagrams via iterative high-throughput experiments.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using an LLM as the experimental planner in a closed loop with high-throughput synthesis and XRD phase identification, the ternary phase diagram of the Co-Al-Ge system at 900 °C was constructed. The domain-specific LLM (aLLoyM) directed early measurements toward compositionally complex interior regions, enabling the earliest discovery of all three novel phases that form only in the ternary system. The general-purpose LLM adopted a textbook-like approach that efficiently identified a larger number of phases in fewer cycles. A simulated benchmark confirmed that the LLM approach achieves more efficient exploration than conventional machine learning.
What carries the argument
The closed-loop framework in which an LLM iteratively suggests alloy compositions for high-throughput synthesis followed by X-ray diffraction phase identification.
If this is right
- The domain-specific LLM enables early discovery of all ternary-specific phases by focusing on complex interior compositions.
- The general-purpose LLM identifies a larger total number of phases in fewer experimental cycles by following systematic sampling.
- LLM planners achieve more efficient coverage of composition space than conventional machine-learning methods in simulated benchmarks.
Where Pith is reading between the lines
- Similar LLM loops could shorten the time needed to map phase diagrams in quaternary or higher-component alloys where manual planning becomes impractical.
- Running both general and domain-specific LLMs in parallel might combine early ternary-phase detection with broad overall coverage.
- The same planner-plus-high-throughput loop could be tested on other materials tasks that require iterative composition selection, such as property optimization.
Load-bearing premise
LLM-suggested compositions, together with automated synthesis and phase identification, will reliably produce a complete and accurate phase diagram without missing stable phases.
What would settle it
Observation of any stable phase in the Co-Al-Ge system at 900 °C that was not detected by the LLM-guided experiments.
read the original abstract
Constructing phase diagrams for multicomponent alloys requires extensive experimental measurements and is a time-consuming task. Here we investigate whether large language models (LLMs) can guide experimental planning for phase diagram construction. In our framework, a general-purpose LLM serves as the experimental planner, suggesting compositions for measurement at each cycle in a closed loop with high-throughput synthesis and X-ray diffraction phase identification. Using this framework, we experimentally constructed the ternary phase diagram of the Co-Al-Ge system at 900 degree C through iterative synthesis and characterization. We compared two strategies that differ in how the initial compositions are selected: one uses predictions from a domain-specific LLM trained on phase diagram data (aLLoyM), while the other relies solely on the general-purpose LLM. The two strategies exhibited complementary strengths. aLLoyM directed the initial measurements toward compositionally complex regions in the interior of the ternary diagram, enabling the earliest discovery of all three novel phases that form only in the ternary system. In contrast, the general-purpose LLM adopted a textbook-like approach which efficiently identified a larger number of phases in fewer cycles. In addition, a simulated benchmark comparing the LLM against conventional machine learning confirmed that the LLM achieves more efficient exploration. The results demonstrate that LLMs have high potential as experimental planners for phase diagram construction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a closed-loop framework in which a general-purpose LLM acts as an experimental planner, iteratively suggesting alloy compositions for high-throughput synthesis and XRD-based phase identification to construct ternary phase diagrams. It reports the experimental construction of the Co-Al-Ge phase diagram at 900 °C, compares a domain-specific LLM (aLLoyM) that prioritizes complex interior compositions against a general-purpose LLM that follows a more systematic approach, notes their complementary strengths in discovering three novel ternary phases, and includes a simulated benchmark showing the LLM outperforming conventional machine-learning exploration strategies.
Significance. If the central claims hold, the work demonstrates a practical route to accelerating phase-diagram mapping, a traditionally labor-intensive task in materials science. The real-system experimental demonstration on Co-Al-Ge, the head-to-head comparison of two LLM prompting strategies, and the simulated benchmark against ML baselines constitute concrete, falsifiable evidence that LLMs can serve as effective planners. These elements, together with the integration of automated synthesis and characterization, provide a reproducible template that other groups could adapt.
major comments (3)
- [Results (experimental Co-Al-Ge phase diagram)] Results section on experimental Co-Al-Ge construction: the manuscript asserts that the LLM-guided loop successfully constructed the phase diagram and discovered all three novel ternary phases, yet provides no comparison of the final diagram against independent literature data for the Co-Al-Ge system at 900 °C and no targeted follow-up measurements in unsampled composition regions. Given that XRD is known to be insensitive to low-volume-fraction phases and can yield ambiguous patterns, this omission leaves the completeness claim unsupported and is load-bearing for the central assertion of successful diagram construction.
- [Results (strategy comparison)] Section comparing the two LLM strategies: while qualitative descriptions state that aLLoyM enabled earliest discovery of ternary phases and the general-purpose LLM identified a larger number of phases in fewer cycles, the text supplies no quantitative metrics (cycle counts, phase-identification accuracy, or statistical tests of difference) and no table summarizing these outcomes. Without such data the efficiency claims cannot be rigorously evaluated.
- [Simulated benchmark] Simulated benchmark section: the claim that the LLM achieves more efficient exploration than conventional ML is presented separately from the experimental results; the simulation protocol does not appear to incorporate realistic experimental noise sources such as XRD pattern ambiguity or synthesis yield variability, weakening its ability to corroborate the experimental findings.
minor comments (3)
- [Abstract] The abstract would be strengthened by inclusion of at least one quantitative outcome (e.g., total cycles or number of phases identified) to support the stated success.
- [Introduction / Methods] Notation for the two strategies (aLLoyM versus general-purpose LLM) should be introduced with a clear definition and abbreviation table at first use.
- [Figures] Phase-diagram figures would benefit from explicit labeling of all identified phases, including the three novel ternary compounds, and from a supplementary table listing their approximate compositions and XRD signatures.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which identify key areas where the manuscript can be strengthened. We address each major comment point by point below, indicating the revisions we will make.
read point-by-point responses
-
Referee: Results section on experimental Co-Al-Ge construction: the manuscript asserts that the LLM-guided loop successfully constructed the phase diagram and discovered all three novel ternary phases, yet provides no comparison of the final diagram against independent literature data for the Co-Al-Ge system at 900 °C and no targeted follow-up measurements in unsampled composition regions. Given that XRD is known to be insensitive to low-volume-fraction phases and can yield ambiguous patterns, this omission leaves the completeness claim unsupported and is load-bearing for the central assertion of successful diagram construction.
Authors: We thank the referee for highlighting this important point. Our primary focus was on demonstrating the LLM-guided iterative process and the discovery of the three novel phases. Comprehensive literature data for the full Co-Al-Ge ternary phase diagram at 900°C is not extensively available, particularly for the interior regions. In the revised manuscript, we will include a comparison of our results with the known binary phase diagrams (Co-Al, Co-Ge, Al-Ge) and any reported ternary phases from the literature. We will also add a discussion acknowledging the limitations of XRD in detecting minor phases and will qualify the claim of diagram construction to emphasize the phases identified in the explored composition space. Targeted follow-up experiments in unsampled regions will be suggested as future work. revision: yes
-
Referee: Section comparing the two LLM strategies: while qualitative descriptions state that aLLoyM enabled earliest discovery of ternary phases and the general-purpose LLM identified a larger number of phases in fewer cycles, the text supplies no quantitative metrics (cycle counts, phase-identification accuracy, or statistical tests of difference) and no table summarizing these outcomes. Without such data the efficiency claims cannot be rigorously evaluated.
Authors: We agree that the comparison would benefit from quantitative metrics. The current text describes the outcomes based on the sequence of experiments performed. In the revised manuscript, we will add a table that provides quantitative details, including the cycle number at which each phase was first identified for both strategies, the total number of phases discovered, and the number of cycles needed to identify all three novel ternary phases. This will enable a more rigorous assessment of the complementary strengths of the two approaches. revision: yes
-
Referee: Simulated benchmark section: the claim that the LLM achieves more efficient exploration than conventional ML is presented separately from the experimental results; the simulation protocol does not appear to incorporate realistic experimental noise sources such as XRD pattern ambiguity or synthesis yield variability, weakening its ability to corroborate the experimental findings.
Authors: The simulated benchmark was conducted separately to evaluate the intrinsic efficiency of the LLM-based planner compared to standard ML exploration methods in a controlled environment without experimental uncertainties. This helps to isolate the contribution of the LLM. We will revise the manuscript to better explain the rationale for this separation and to discuss the potential impact of experimental noise on the benchmark results. We will also attempt to incorporate a simplified noise model in the simulation to strengthen the connection to the experimental findings. revision: partial
Circularity Check
No circularity: experimental results rest on physical measurements, not self-referential derivations
full rationale
The paper reports an experimental closed-loop process in which an LLM proposes compositions, followed by actual high-throughput synthesis and XRD characterization to build the Co-Al-Ge phase diagram at 900 °C. The central claim (LLMs as effective planners) is supported by the observed discovery of three novel ternary phases and comparative cycle counts, none of which reduce to fitted parameters or prior outputs by construction. The simulated benchmark is presented separately and does not underpin the experimental conclusions. No load-bearing self-citation chain or ansatz smuggling is required for the reported outcomes; the work is self-contained against external physical benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLM outputs can be reliably translated into valid synthesis compositions that advance phase diagram coverage
- domain assumption X-ray diffraction provides unambiguous phase identification for the synthesized samples
Reference graph
Works this paper leans on
-
[1]
C., Collison, C
Ramos, M. C., Collison, C. J. & White, A. D. A review of large language models and autonomous agents in chemistry. Chem. Sci. 16, 2514–2572 (2025)
2025
-
[2]
Lei, G., Docherty, R. & J. Cooper, S. Materials science in the era of large language models: a perspective. Digit. Discov. 3, 1257–1272 (2024)
2024
-
[3]
Zhang, J., Chen, X., Ye, X., Yang, Y. & Ai, B. Large Language Model in Materials Science: Roles, Challenges, and Strategic Outlook. Advanced Intelligent Discovery n/a, 202500085
-
[4]
& Terayama, K
Tomita, H., Nakamura, N., Ishida, S., Kamiya, T. & Terayama, K. Extracting effective solutions hidden in large language models via generated comprehensive specialists: case studies in developing electronic devices. Commun. Mater. 6, 207 (2025)
2025
-
[5]
A., MacKnight, R., Kline, B
Boiko, D. A., MacKnight, R., Kline, B. & Gomes, G. Autonomous chemical research with large language models. Nature 624, 570–578 (2023). 27
2023
-
[6]
M. Bran, A. et al. Augmenting large language models with chemistry tools. Nat. Mach. Intell. 6, 525–535 (2024)
2024
-
[7]
& Buehler, M
Ghafarollahi, A. & Buehler, M. J. Rapid and automated alloy design with graph neural network-powered large language model -driven multi -agent AI. MRS Bulletin 50, 1309 –1324 (2025)
2025
-
[8]
& Liu, B
Takahara, I., Mizoguchi, T. & Liu, B. Accelerated inorganic materials design with generative AI agents. Cell Rep. Phys. Sci. 6, (2025)
2025
-
[9]
& Terayama, K
Ishida, S., Sato, T., Honma, T. & Terayama, K. Large language models open new way of AI-assisted molecule design for chemists. J. Cheminform. 17, 36 (2025)
2025
-
[10]
& Glotzer, S
Dai, C. & Glotzer, S. C. Efficient Phase Diagram Sampling by Active Learning. J. Phys. Chem. B 124, 1275–1284 (2020)
2020
-
[11]
Ament, S. et al. Autonomous materials synthesis via hierarchical active learning of nonequilibrium phase diagrams. Sci. Adv. 7, eabg4930 (2021)
2021
-
[12]
Tian, Y. et al. Determining Multi -Component Phase Diagrams with Desired Characteristics Using Active Learning. Adv. Sci. 8, 2003165 (2021)
2021
-
[13]
Zhu, M. et al. Active Learning for Discovering Complex Phase Diagrams with Gaussian Processes. Preprint at https://doi.org/10.48550/arXiv.2409.07042 (2024)
-
[14]
Terayama, K. et al. Efficient construction method for phase diagrams using uncertainty sampling. Phys. Rev. Mater. 3, 033802 (2019)
2019
-
[15]
Terayama, K. et al. Acceleration of phase diagram construction by machine learning incorporating Gibbs’ phase rule. Scr. Mater. 208, 114335 (2022)
2022
-
[16]
Tamura, R. et al. Machine-Learning-Based phase diagram construction for high - throughput batch experiments. Sci. Technol. Adv. Mater. Meth. 2, 153–161 (2022). 28
2022
-
[17]
& Tsuda, K
Zou, P., Tamura, R. & Tsuda, K. Bayesian diversity control for batch-based phase diagram determination. Digital Discovery 5, 1252–1256 (2026)
2026
-
[18]
Tamura, R. et al. AIPHAD, an active learning web application for visual understanding of phase diagrams. Commun. Mater. 5, 1–11 (2024)
2024
-
[19]
Yan, Z. et al. PDGPT: A large language model for acquiring phase diagram information in magnesium alloys. Materials Genome Engineering Advances 2, e77 (2024)
2024
-
[20]
Oikawa, Y. et al. aLLoyM: a large language model for alloy phase diagram prediction. npj Comput. Mater. 12, 97 (2026)
2026
-
[21]
& Lu, X.-G
Zha, Y., Li, Y. & Lu, X.-G. Enhancing Large Language Model Comprehension of Material Phase Diagrams through Prompt Engineering and Benchmark Datasets. Mathematics 12, 3141 (2024)
2024
-
[22]
Materials Project https://next-gen.materialsproject.org/ [Accessed: 16- April-2026]
Materials Project. Materials Project https://next-gen.materialsproject.org/ [Accessed: 16- April-2026]
2026
-
[23]
https://cpddb.nims.go.jp/
CPDDB. https://cpddb.nims.go.jp/
-
[24]
Claude https://claude.ai
Claude. Claude https://claude.ai
-
[25]
Inada, Y. et al. Elemental Reactivity Maps for Materials Discovery. Chem. Mater. 37, 2097–2105 (2025)
2097
-
[26]
& Matsuda, S
Tamura, R., Tsuda, K. & Matsuda, S. NIMS -OS: an automation software to implement a closed loop between artificial intelligence and robotic experiments in materials science. Sci. Technol. Adv. Mater. Meth. 3, 2232297 (2023)
2023
-
[27]
https://github.com/nims-da/nimo
NIMS-DA/nimo: nimo. https://github.com/nims-da/nimo
-
[28]
https://platform.claude.com/
Claude Console. https://platform.claude.com/. 29
-
[29]
MacLeod, B. P. et al. Self-driving laboratory for accelerated discovery of thin -film materials. Sci. Adv. https://doi.org/10.1126/sciadv.aaz8867 (2020) doi:10.1126/sciadv.aaz8867
-
[30]
Tom, G. et al. Self-Driving Laboratories for Chemistry and Materials Science. Chem. Rev. 124, 9633–9732 (2024)
2024
-
[31]
Yoshikawa, N. et al. Self-driving laboratories in Japan. Digit. Discov. 4, 1384–1403 (2025)
2025
-
[32]
Imasato, K. et al. Achieving high thermoelectric performance of triple half -Heusler compositions enabled by high-throughput screening. J. Mater. Chem. A 13, 39042–39052 (2025)
2025
-
[33]
& Sugahara, T
Katsura, Y., Akiyama, M., Morito, H., Fujioka, M. & Sugahara, T. Systematic searches for new inorganic materials assisted by materials informatics. Sci. Technol. Adv. Mater. 26, 2428154 (2025)
2025
-
[34]
Acta Crystallographica Section A , volume =
Iwasaki, S. et al. Electric Transport Properties of NaAlB14 with Covalent Frameworks. Inorg. Chem. 61, 4378–4383 (2022). 30 Supplementary Figure Fig. S1 Ternary phase diagram of the Co-Al-Ge system at 900 °C predicted by aLLoyM. The 231 candidate compositions on a 5 at.% grid are colored according to the predicted phases. aLLoyM outputs a standardized nom...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.