Predicting Scale-Up of Metal-Organic Framework Syntheses with Large Language Models
Pith reviewed 2026-05-10 02:09 UTC · model grok-4.3
The pith
Fine-tuned large language models predict scalability of metal-organic framework syntheses at 91.4 percent accuracy using a literature-mined dataset.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce the ESU-MOF dataset extracted from the scientific literature together with a positive-unlabeled learning procedure that fine-tunes large language models to classify synthesis recipes as scalable or non-scalable, achieving 91.4 percent accuracy on held-out test cases and thereby enabling rapid, data-driven triage of candidates for industrial MOF production.
What carries the argument
The ESU-MOF literature-mined dataset combined with positive-unlabeled learning, which supplies training signal for scalability labels even when negative examples are absent or unreliable in published reports.
If this is right
- Thousands of reported MOF recipes can be ranked for scale-up likelihood before any new laboratory work begins.
- Industrial teams gain a quantitative filter that reduces the fraction of candidates advanced to pilot-scale trials.
- Discovery pipelines can integrate the model as an early gate after initial synthesis screening.
- The same literature-mining plus positive-unlabeled approach could be repeated for other classes of porous materials.
Where Pith is reading between the lines
- If literature bias toward successful outcomes is strong, the model may over-estimate scalability for recipes that appear only in small-scale papers.
- Pairing the predictor with automated synthesis robots could create an iterative loop that tests and refines predictions in real time.
- The method might transfer to predicting manufacturability of other framework or nanoparticle materials once comparable datasets are assembled.
Load-bearing premise
Literature reports supply sufficiently reliable and unbiased labels for scalable versus non-scalable syntheses so that positive-unlabeled learning can extract a generalizable signal without direct experimental ground truth for the non-scalable class.
What would settle it
A blind experimental campaign that follows the model's top predictions on previously unreported MOF syntheses, then measures whether the predicted scalable recipes actually produce the target material at kilogram scale without major changes in yield or purity.
Figures
read the original abstract
Scalable synthesis remains the gate between MOF discovery and industrial deployment, as scale-up know-how is fragmented across disparate reports. We introduce ESU-MOF, a literature-mined dataset and a positive-unlabeled learning strategy that fine-tunes large language models to predict scalability potential with 91.4% accuracy, enabling rapid data-driven triage for industrial MOF discovery.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ESU-MOF, a dataset mined from the literature on metal-organic framework (MOF) syntheses, and applies a positive-unlabeled learning strategy to fine-tune large language models for predicting scalability potential, reporting 91.4% accuracy to enable data-driven triage for industrial MOF discovery.
Significance. If the accuracy claim and generalization hold after proper validation, the work could meaningfully accelerate MOF scale-up by prioritizing candidates from literature reports, addressing a key bottleneck in translating discoveries to industrial use. The positive-unlabeled approach on mined data is a pragmatic response to the scarcity of explicit negative examples. However, without external experimental grounding, the practical significance remains uncertain.
major comments (2)
- Abstract: The central claim of 91.4% accuracy is stated without any information on ESU-MOF dataset size, train-test split, definition of positive (scalable) examples, class imbalance handling, or external validation. These details are load-bearing for assessing whether the positive-unlabeled learning strategy produces a reliable classifier rather than capturing reporting biases.
- The positive-unlabeled learning strategy (as described in the abstract and implied methods): No external validation, such as blind experimental scale-up tests on model-predicted negatives, is reported. This leaves open the possibility that the model learns literature reporting patterns rather than true scalability, directly undermining the triage utility claimed.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review of our manuscript. We address each of the major comments below and have revised the manuscript accordingly where appropriate.
read point-by-point responses
-
Referee: Abstract: The central claim of 91.4% accuracy is stated without any information on ESU-MOF dataset size, train-test split, definition of positive (scalable) examples, class imbalance handling, or external validation. These details are load-bearing for assessing whether the positive-unlabeled learning strategy produces a reliable classifier rather than capturing reporting biases.
Authors: We agree with the referee that including these details in the abstract would enhance its informativeness and allow for better assessment of the work. The full details regarding the ESU-MOF dataset size, the train-test split used, the definition of positive examples, and the handling of class imbalance via the positive-unlabeled learning framework are provided in the Methods section. To address this, we will revise the abstract to concisely incorporate summaries of these elements, ensuring the central claim is better contextualized. revision: yes
-
Referee: The positive-unlabeled learning strategy (as described in the abstract and implied methods): No external validation, such as blind experimental scale-up tests on model-predicted negatives, is reported. This leaves open the possibility that the model learns literature reporting patterns rather than true scalability, directly undermining the triage utility claimed.
Authors: We acknowledge this valid concern. It is correct that our study does not include external experimental validation, such as performing blind scale-up experiments on MOFs predicted as non-scalable by the model. Our methodology is based on mining existing literature reports and applying positive-unlabeled learning to account for the absence of explicit negative examples. We have used internal validation metrics, including accuracy on held-out data, to evaluate the model. In the revised manuscript, we will expand the discussion section to explicitly discuss the potential for the model to learn reporting biases and the limitations this imposes on claiming true scalability prediction. We maintain that the approach offers a practical tool for prioritizing candidates, but we agree that experimental grounding would further strengthen the claims. revision: partial
- Provision of external experimental validation through blind scale-up tests, as this would require new laboratory experiments not present in the current study.
Circularity Check
No circularity in the ML training pipeline
full rationale
The paper describes construction of the ESU-MOF literature-mined dataset followed by positive-unlabeled fine-tuning of LLMs to classify scalability potential, with reported accuracy on held-out literature examples. No equations, derivations, or self-referential definitions appear; the output is an empirical classifier rather than a quantity algebraically forced by its own fitted parameters. No load-bearing self-citations or uniqueness theorems are invoked in the provided text, and the central claim does not reduce to renaming or smuggling an ansatz. The pipeline is therefore self-contained as a standard supervised learning workflow against external literature benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Literature reports provide reliable labels for scalable syntheses
- domain assumption Positive-unlabeled learning can extract a generalizable scalability signal from mostly unlabeled MOF literature
Reference graph
Works this paper leans on
-
[1]
O. M. Yaghi, M. O’Keeffe, N. W. Ockwig, H. K. Chae, M. Eddaoudi, and J. Kim.Nature, 423:705–714, 2003
work page 2003
-
[2]
H. Furukawa, K. E. Cordova, M. O’Keeffe, and O. M. Yaghi.Science, 341:1230444, 2013
work page 2013
-
[3]
R. Freund, S. Canossa, S. M. Cohen, W. Yan, H. Deng, V . Guillerm, M. Eddaoudi, D. G. Madden, D. Fairen-Jimenez, H. Lyu, L. K. Macreadie, Z. Ji, Y . Zhang, B. Wang, F. Haase, C. W ¨oll, O. Zaremba, J. Andreo, S. Wuttke, and C. S. Diercks.Angew. Chem. Int. Ed., 60:23946–23974, 2021
work page 2021
-
[4]
S. M. Moosavi, A. Nandy, K. M. Jablonka, D. Ongari, J. P. Janet, P. G. Boyd, Y . Lee, B. Smit, and H. J. Kulik.Nat. Commun., 11:4068, 2020
work page 2020
-
[5]
H. Lyu, Z. Ji, S. Wuttke, and O. M. Yaghi.Chem, 6:2219–2241, 2020
work page 2020
-
[6]
Y . G. Chung, E. Haldoupis, B. J. Bucior, M. Haranczyk, S. Lee, H. Zhang, K. D. V ogiatzis, M. Milisavljevic, S. Ling, J. S. Camp, B. Slater, J. I. Siepmann, D. S. Sholl, and R. Q. Snurr.J. Chem. Eng. Data, 64:5985–5998, 2019
work page 2019
-
[7]
G. Zhao, L. M. Brabson, S. Chheda, J. Huang, H. Kim, K. Liu, K. Mochida, T. D. Pham, Prerna, G. G. Terrones, S. Yoon, L. Zoubritzky, F.-X. Coudert, M. Haranczyk, H. J. Kulik, S. M. Moosavi, D. S. Sholl, J. I. Siepmann, R. Q. Snurr, and Y . G. Chung.Matter, 8:102140, 2025
work page 2025
- [8]
-
[9]
A. M. Wright, M. T. Kapelewski, S. Marx, O. K. Farha, and W. Morris.Nat. Mater., 24:178–187, 2025
work page 2025
-
[10]
D. Chakraborty, A. Yurdusen, G. Mouchaham, F. Nouar, and C. Serre.Adv. Funct. Mater. DOI:10.1002/adfm.202309089
-
[11]
T. Paul, A. Juma, R. Alqerem, G. Karanikolos, H. A. Arafat, and L. F. Dum´ee.J. Environ. Chem. Eng., 11:111112, 2023. 7
work page 2023
-
[12]
C. Duan, A. Nandy, S. C. Pal, X. Yang, W. Gao, Y . Du, H. Kraß, Y . Kang, V . Bernales, Z. Ye, T. Pyle, R. Yang, Z. Gu, P. Schwaller, S. Ma, S. Sun, A. Aspuru-Guzik, S. M. Moosavi, R. Wexler, and Z. Zheng.Matter. DOI:10.1016/j.matt.2026.102748
-
[13]
Y . Khabzina, J. Dhainaut, M. Ahlhelm, H.-J. Richter, H. Reinsch, N. Stock, and D. Far- russeng.Ind. Eng. Chem. Res., 57:8200–8208, 2018
work page 2018
- [14]
-
[15]
J.-B. Lin, T. T. T. Nguyen, R. Vaidhyanathan, J. Burner, J. M. Taylor, H. Durekova, F. Akhtar, R. K. Mah, O. Ghaffari-Nik, S. Marx, N. Fylstra, S. S. Iremonger, K. W. Daw- son, P. Sarkar, P. Hovington, A. Rajendran, T. K. Woo, and G. K. H. Shimizu.Science, 374:1464–1469, 2021
work page 2021
- [16]
- [17]
-
[18]
M. Rubio-Martinez, C. Avci-Camur, A. W. Thornton, I. Imaz, D. Maspoch, and M. R. Hill.Chem. Soc. Rev., 46:3453–3480, 2017
work page 2017
-
[19]
U. Mueller, M. Schubert, F. Teich, H. Puetter, K. Schierle-Arndt, and J. Pastr ´e.J. Mater. Chem., 16:626–636, 2006
work page 2006
-
[20]
A. U. Czaja, N. Trukhan, and U. M ¨uller.Chem. Soc. Rev., 38:1284–1293, 2009
work page 2009
-
[21]
W. L. Teo, W. Zhou, C. Qian, and Y . Zhao.Mater. Today, 47:170–186, 2021
work page 2021
-
[22]
M. Gaab, N. Trukhan, S. Maurer, R. Gummaraju, and U. M ¨uller.Microporous Meso- porous Mater., 157:131–136, 2012
work page 2012
- [23]
-
[24]
C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, New York, NY , USA, 2008. Association for Computing Machinery
work page 2008
- [25]
- [26]
- [27]
-
[28]
Y . Kang, W. Lee, T. Bae, S. Han, H. Jang, and J. Kim.J. Am. Chem. Soc., 147:3943–3958, 2025. 8
work page 2025
-
[29]
T. M. Pruyn, A. Aswad, S. T. Khan, J. Huang, R. Black, and S. M. Moosavi.J. Am. Chem. Soc., 147:43474–43486, 2025
work page 2025
-
[30]
L. T. Glasby, K. Gubsch, R. Bence, R. Oktavian, K. Isoko, S. M. Moosavi, J. L. Cordiner, J. C. Cole, and P. Z. Moghadam.Chem. Mater., 35:4510–4524, 2023
work page 2023
-
[31]
Y . Luo, S. Bag, O. Zaremba, A. Cierpka, J. Andreo, S. Wuttke, P. Friederich, and M. Tsot- salas.Angew. Chem. Int. Ed., 61:e202200242, 2022
work page 2022
- [32]
-
[33]
C. R. Groom, I. J. Bruno, M. P. Lightfoot, and S. C. Ward.Acta Crystallogr. Sect. B Struct. Sci. Cryst. Eng. Mater., 72:171–179, 2016
work page 2016
-
[34]
P. Z. Moghadam, A. Li, S. B. Wiggin, A. Tao, A. G. P. Maloney, P. A. Wood, S. C. Ward, and D. Fairen-Jimenez.Chem. Mater., 29:2618–2625, 2017
work page 2017
-
[35]
S. X. Leong, S. Pablo-Garc ´ıa, Z. Zhang, and A. Aspuru-Guzik.ChemRxiv Prepr. DOI:10.26434/chemrxiv-2024-7fwxv
-
[36]
J. Dagdelen, A. Dunn, S. Lee, N. Walker, A. S. Rosen, G. Ceder, K. A. Persson, and A. Jain.Nat. Commun., 15:1418, 2024
work page 2024
-
[37]
Q. Ai, F. Meng, J. Shi, B. Pelkie, and C. W. Coley.Digit. Discov., 3:1822–1831, 2024
work page 2024
-
[38]
S. Kim, Y . Jung, and J. Schrier.J. Am. Chem. Soc., 146:19654–19659, 2024
work page 2024
-
[39]
J. Choi, S. Kim, and Y . Jung.J. Am. Chem. Soc., 147:39113–39122, 2025
work page 2025
- [40]
-
[41]
B. W. Matthews.Biochim. Biophys. Acta BBA - Protein Struct., 405:442–451, 1975
work page 1975
- [42]
- [43]
- [44]
-
[45]
A. Helal, Z. H. Yamani, K. E. Cordova, and O. M. Yaghi.Natl. Sci. Rev., 4:296–298, 2017. 9 Supplementary Information Predicting Scale-Up of Metal-Organic Framework Syntheses Contents 1 Introduction 1 2 Results and Discussion 2 3 Conclusion 6 S1 ESU-MOF Dataset Construction 12 S1.1 Literature Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . ...
work page 2017
-
[46]
Scan all pages to locate keyword occurrences
-
[47]
Define context windows around each occurrence
-
[48]
Merge overlapping windows
-
[49]
Your output had error:{error}. Fix and retry
Extract the merged text regions, subject to a configurable maximum character limit. This ensures that synthesis procedure sections receive priority over unrelated content (e.g., characterisation data, discussion), while respecting the token budget of the extraction LLM. Main text and supporting information are processed independently and then combined bef...
work page 2025
- [50]
-
[51]
This score is corrected under the PU-learning framework to account for incomplete pos- itive labelling in the literature-derived training set
-
[52]
The corrected score is optionally calibrated by Platt scaling on the validation gold split, and a decision threshold selected on the same split is used to obtain the final binary prediction. In addition to the scalar score, the log-probability extraction module records diagnostic metadata, including the generated token sequence, top-kalternative tokens at...
-
[53]
Experimental section headers
-
[54]
Reactor/vessel types (solvothermal, autoclave ...)
-
[55]
Scale-up & quantity signals
-
[56]
Reagent/stoichiometry terms (mmol, equiv ...) 24
-
[57]
Solvents & reaction media
-
[58]
Reaction conditions (T, time, stirring ...)
-
[59]
Work-up & post-processing
-
[60]
Yield & characterization
-
[61]
Supporting-info markers
-
[62]
MOF-specific names & topologies
-
[63]
Mass/unit evidence (mg, g, kg ...) OUTPUT: a singleJSONarray of unique lowercase strings. No prose, no fences. {examples_block} EXISTING (preserve all, only add new): {existing_keywords_json} Solvent Alias Mapping Planner TheSolventAliasPlannerbuilds the solvent canonicalisation table (Section S2.3). SolventAliasPlanner – system prompt You are an expert i...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.