pith. sign in

arxiv: 2604.20899 · v1 · submitted 2026-04-21 · ❄️ cond-mat.mtrl-sci · cs.AI

Predicting Scale-Up of Metal-Organic Framework Syntheses with Large Language Models

Pith reviewed 2026-05-10 02:09 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.AI
keywords metal-organic frameworksscale-up predictionlarge language modelspositive-unlabeled learningliterature miningmaterials discoverysynthesis scalability
0
0 comments X

The pith

Fine-tuned large language models predict scalability of metal-organic framework syntheses at 91.4 percent accuracy using a literature-mined dataset.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that large language models can be fine-tuned to forecast whether a given metal-organic framework synthesis will succeed at industrial scales. Authors create a dataset called ESU-MOF by mining published reports and use positive-unlabeled learning to handle cases where only successful scale-ups are clearly labeled. This yields 91.4 percent accuracy and provides a computational filter that ranks candidates before costly lab trials. A sympathetic reader cares because most MOF discoveries remain stuck at milligram scales, and reliable early prediction would shorten the path from lab finding to applications in gas storage or catalysis.

Core claim

The authors introduce the ESU-MOF dataset extracted from the scientific literature together with a positive-unlabeled learning procedure that fine-tunes large language models to classify synthesis recipes as scalable or non-scalable, achieving 91.4 percent accuracy on held-out test cases and thereby enabling rapid, data-driven triage of candidates for industrial MOF production.

What carries the argument

The ESU-MOF literature-mined dataset combined with positive-unlabeled learning, which supplies training signal for scalability labels even when negative examples are absent or unreliable in published reports.

If this is right

  • Thousands of reported MOF recipes can be ranked for scale-up likelihood before any new laboratory work begins.
  • Industrial teams gain a quantitative filter that reduces the fraction of candidates advanced to pilot-scale trials.
  • Discovery pipelines can integrate the model as an early gate after initial synthesis screening.
  • The same literature-mining plus positive-unlabeled approach could be repeated for other classes of porous materials.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If literature bias toward successful outcomes is strong, the model may over-estimate scalability for recipes that appear only in small-scale papers.
  • Pairing the predictor with automated synthesis robots could create an iterative loop that tests and refines predictions in real time.
  • The method might transfer to predicting manufacturability of other framework or nanoparticle materials once comparable datasets are assembled.

Load-bearing premise

Literature reports supply sufficiently reliable and unbiased labels for scalable versus non-scalable syntheses so that positive-unlabeled learning can extract a generalizable signal without direct experimental ground truth for the non-scalable class.

What would settle it

A blind experimental campaign that follows the model's top predictions on previously unreported MOF syntheses, then measures whether the predicted scalable recipes actually produce the target material at kilogram scale without major changes in yield or purity.

Figures

Figures reproduced from arXiv: 2604.20899 by Bin Feng, Harrison Kayal, Hongrui Sheng, Kyle Smith, Peter Walther, Reid Coyle, Shyam Chand Pal, Xinhua Yan, Xinxin Liu, Zhiling Zheng.

Figure 1
Figure 1. Figure 1: Literature-to-prediction workflow for scalable MOF synthesis. (a) End-to-end pipeline: literature retrieval, schema-constrained LLM extraction, protocol normalization, la￾bel construction (Ps , Pa, U, N), paper-level splitting, PU fine-tuning, and calibrated scoring. (b) Example of mined synthesis inputs as a structured protocol record emphasizing metal, linker, solvent system (up to three), temperature, t… view at source ↗
Figure 2
Figure 2. Figure 2: Predictive performance evaluation of the fine-tuned LLM ESU-MOF. (a) Re￾ceiver operating characteristic (ROC) curves comparing the ESU-MOF with baseline model. The ESU-MOF achieves the highest area under the curve (AUC = 0.958) and the shaded region represents the confidence interval. (b) Confusion matrix of the ESU-MOF, illustrating the dis￾tribution of true versus predicted labels for the negative (N) an… view at source ↗
Figure 3
Figure 3. Figure 3: Performance comparison of ESU-MOF with baseline models. ESU-MOF consis￾tently outperforms traditional ML and untuned LLM baselines across both metrics (BA, AP), demonstrating superior accuracy. Detailed comparison is given in the Supporting Information. 5 [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

Scalable synthesis remains the gate between MOF discovery and industrial deployment, as scale-up know-how is fragmented across disparate reports. We introduce ESU-MOF, a literature-mined dataset and a positive-unlabeled learning strategy that fine-tunes large language models to predict scalability potential with 91.4% accuracy, enabling rapid data-driven triage for industrial MOF discovery.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces ESU-MOF, a dataset mined from the literature on metal-organic framework (MOF) syntheses, and applies a positive-unlabeled learning strategy to fine-tune large language models for predicting scalability potential, reporting 91.4% accuracy to enable data-driven triage for industrial MOF discovery.

Significance. If the accuracy claim and generalization hold after proper validation, the work could meaningfully accelerate MOF scale-up by prioritizing candidates from literature reports, addressing a key bottleneck in translating discoveries to industrial use. The positive-unlabeled approach on mined data is a pragmatic response to the scarcity of explicit negative examples. However, without external experimental grounding, the practical significance remains uncertain.

major comments (2)
  1. Abstract: The central claim of 91.4% accuracy is stated without any information on ESU-MOF dataset size, train-test split, definition of positive (scalable) examples, class imbalance handling, or external validation. These details are load-bearing for assessing whether the positive-unlabeled learning strategy produces a reliable classifier rather than capturing reporting biases.
  2. The positive-unlabeled learning strategy (as described in the abstract and implied methods): No external validation, such as blind experimental scale-up tests on model-predicted negatives, is reported. This leaves open the possibility that the model learns literature reporting patterns rather than true scalability, directly undermining the triage utility claimed.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the detailed and constructive review of our manuscript. We address each of the major comments below and have revised the manuscript accordingly where appropriate.

read point-by-point responses
  1. Referee: Abstract: The central claim of 91.4% accuracy is stated without any information on ESU-MOF dataset size, train-test split, definition of positive (scalable) examples, class imbalance handling, or external validation. These details are load-bearing for assessing whether the positive-unlabeled learning strategy produces a reliable classifier rather than capturing reporting biases.

    Authors: We agree with the referee that including these details in the abstract would enhance its informativeness and allow for better assessment of the work. The full details regarding the ESU-MOF dataset size, the train-test split used, the definition of positive examples, and the handling of class imbalance via the positive-unlabeled learning framework are provided in the Methods section. To address this, we will revise the abstract to concisely incorporate summaries of these elements, ensuring the central claim is better contextualized. revision: yes

  2. Referee: The positive-unlabeled learning strategy (as described in the abstract and implied methods): No external validation, such as blind experimental scale-up tests on model-predicted negatives, is reported. This leaves open the possibility that the model learns literature reporting patterns rather than true scalability, directly undermining the triage utility claimed.

    Authors: We acknowledge this valid concern. It is correct that our study does not include external experimental validation, such as performing blind scale-up experiments on MOFs predicted as non-scalable by the model. Our methodology is based on mining existing literature reports and applying positive-unlabeled learning to account for the absence of explicit negative examples. We have used internal validation metrics, including accuracy on held-out data, to evaluate the model. In the revised manuscript, we will expand the discussion section to explicitly discuss the potential for the model to learn reporting biases and the limitations this imposes on claiming true scalability prediction. We maintain that the approach offers a practical tool for prioritizing candidates, but we agree that experimental grounding would further strengthen the claims. revision: partial

standing simulated objections not resolved
  • Provision of external experimental validation through blind scale-up tests, as this would require new laboratory experiments not present in the current study.

Circularity Check

0 steps flagged

No circularity in the ML training pipeline

full rationale

The paper describes construction of the ESU-MOF literature-mined dataset followed by positive-unlabeled fine-tuning of LLMs to classify scalability potential, with reported accuracy on held-out literature examples. No equations, derivations, or self-referential definitions appear; the output is an empirical classifier rather than a quantity algebraically forced by its own fitted parameters. No load-bearing self-citations or uniqueness theorems are invoked in the provided text, and the central claim does not reduce to renaming or smuggling an ansatz. The pipeline is therefore self-contained as a standard supervised learning workflow against external literature benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The claim rests on two unverified domain assumptions about label quality in the mined literature and the suitability of positive-unlabeled learning for this task; no free parameters or invented entities are visible in the abstract.

axioms (2)
  • domain assumption Literature reports provide reliable labels for scalable syntheses
    The ESU-MOF dataset is constructed by mining papers, so the training signal depends on the accuracy of those extracted labels.
  • domain assumption Positive-unlabeled learning can extract a generalizable scalability signal from mostly unlabeled MOF literature
    The strategy explicitly uses this semi-supervised technique because explicit negative examples are scarce.

pith-pipeline@v0.9.0 · 5377 in / 1445 out tokens · 43454 ms · 2026-05-10T02:09:53.472767+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages

  1. [1]

    O. M. Yaghi, M. O’Keeffe, N. W. Ockwig, H. K. Chae, M. Eddaoudi, and J. Kim.Nature, 423:705–714, 2003

  2. [2]

    Furukawa, K

    H. Furukawa, K. E. Cordova, M. O’Keeffe, and O. M. Yaghi.Science, 341:1230444, 2013

  3. [3]

    Freund, S

    R. Freund, S. Canossa, S. M. Cohen, W. Yan, H. Deng, V . Guillerm, M. Eddaoudi, D. G. Madden, D. Fairen-Jimenez, H. Lyu, L. K. Macreadie, Z. Ji, Y . Zhang, B. Wang, F. Haase, C. W ¨oll, O. Zaremba, J. Andreo, S. Wuttke, and C. S. Diercks.Angew. Chem. Int. Ed., 60:23946–23974, 2021

  4. [4]

    S. M. Moosavi, A. Nandy, K. M. Jablonka, D. Ongari, J. P. Janet, P. G. Boyd, Y . Lee, B. Smit, and H. J. Kulik.Nat. Commun., 11:4068, 2020

  5. [5]

    H. Lyu, Z. Ji, S. Wuttke, and O. M. Yaghi.Chem, 6:2219–2241, 2020

  6. [6]

    Y . G. Chung, E. Haldoupis, B. J. Bucior, M. Haranczyk, S. Lee, H. Zhang, K. D. V ogiatzis, M. Milisavljevic, S. Ling, J. S. Camp, B. Slater, J. I. Siepmann, D. S. Sholl, and R. Q. Snurr.J. Chem. Eng. Data, 64:5985–5998, 2019

  7. [7]

    G. Zhao, L. M. Brabson, S. Chheda, J. Huang, H. Kim, K. Liu, K. Mochida, T. D. Pham, Prerna, G. G. Terrones, S. Yoon, L. Zoubritzky, F.-X. Coudert, M. Haranczyk, H. J. Kulik, S. M. Moosavi, D. S. Sholl, J. I. Siepmann, R. Q. Snurr, and Y . G. Chung.Matter, 8:102140, 2025

  8. [8]

    Zheng, N

    Z. Zheng, N. Rampal, T. J. Inizan, C. Borgs, J. T. Chayes, and O. M. Yaghi.Nat. Rev. Mater., 10:369–381, 2025

  9. [9]

    A. M. Wright, M. T. Kapelewski, S. Marx, O. K. Farha, and W. Morris.Nat. Mater., 24:178–187, 2025

  10. [10]

    Chakraborty, A

    D. Chakraborty, A. Yurdusen, G. Mouchaham, F. Nouar, and C. Serre.Adv. Funct. Mater. DOI:10.1002/adfm.202309089

  11. [11]

    T. Paul, A. Juma, R. Alqerem, G. Karanikolos, H. A. Arafat, and L. F. Dum´ee.J. Environ. Chem. Eng., 11:111112, 2023. 7

  12. [12]

    C. Duan, A. Nandy, S. C. Pal, X. Yang, W. Gao, Y . Du, H. Kraß, Y . Kang, V . Bernales, Z. Ye, T. Pyle, R. Yang, Z. Gu, P. Schwaller, S. Ma, S. Sun, A. Aspuru-Guzik, S. M. Moosavi, R. Wexler, and Z. Zheng.Matter. DOI:10.1016/j.matt.2026.102748

  13. [13]

    Khabzina, J

    Y . Khabzina, J. Dhainaut, M. Ahlhelm, H.-J. Richter, H. Reinsch, N. Stock, and D. Far- russeng.Ind. Eng. Chem. Res., 57:8200–8208, 2018

  14. [14]

    Zheng, H

    Z. Zheng, H. L. Nguyen, N. Hanikel, K. K.-Y . Li, Z. Zhou, T. Ma, and O. M. Yaghi.Nat. Protoc., 18:136–156, 2023

  15. [15]

    J.-B. Lin, T. T. T. Nguyen, R. Vaidhyanathan, J. Burner, J. M. Taylor, H. Durekova, F. Akhtar, R. K. Mah, O. Ghaffari-Nik, S. Marx, N. Fylstra, S. S. Iremonger, K. W. Daw- son, P. Sarkar, P. Hovington, A. Rajendran, T. K. Woo, and G. K. H. Shimizu.Science, 374:1464–1469, 2021

  16. [16]

    Jiang, Y

    Z.-J. Jiang, Y . Wang, W. Lu, and D. Li.J. Am. Chem. Soc., 147:31102–31110, 2025

  17. [17]

    Deacon, L

    A. Deacon, L. Briquet, M. Malankowska, F. Massingberd-Mundy, S. Rudi ´c, T. l. Hyde, H. Cavaye, J. Coronas, S. Poulston, and T. Johnson.Commun. Chem., 5:18, 2022

  18. [18]

    Rubio-Martinez, C

    M. Rubio-Martinez, C. Avci-Camur, A. W. Thornton, I. Imaz, D. Maspoch, and M. R. Hill.Chem. Soc. Rev., 46:3453–3480, 2017

  19. [19]

    Mueller, M

    U. Mueller, M. Schubert, F. Teich, H. Puetter, K. Schierle-Arndt, and J. Pastr ´e.J. Mater. Chem., 16:626–636, 2006

  20. [20]

    A. U. Czaja, N. Trukhan, and U. M ¨uller.Chem. Soc. Rev., 38:1284–1293, 2009

  21. [21]

    W. L. Teo, W. Zhou, C. Qian, and Y . Zhao.Mater. Today, 47:170–186, 2021

  22. [22]

    M. Gaab, N. Trukhan, S. Maurer, R. Gummaraju, and U. M ¨uller.Microporous Meso- porous Mater., 157:131–136, 2012

  23. [23]

    Bekker and J

    J. Bekker and J. Davis.Mach. Learn., 109:719–760, 2020

  24. [24]

    Elkan and K

    C. Elkan and K. Noto. Learning classifiers from only positive and unlabeled data. In Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining, pages 213–220, New York, NY , USA, 2008. Association for Computing Machinery

  25. [25]

    Zheng, O

    Z. Zheng, O. Zhang, C. Borgs, J. T. Chayes, and O. M. Yaghi.J. Am. Chem. Soc., 145:18048–18062, 2023

  26. [26]

    Zheng, Z

    Z. Zheng, Z. He, O. Khattab, N. Rampal, M. A. Zaharia, C. Borgs, J. T. Chayes, and O. M. Yaghi.Digit. Discov., 3:491–501, 2024

  27. [27]

    Zhang, Q

    W. Zhang, Q. Wang, X. Kong, J. Xiong, S. Ni, D. Cao, B. Niu, M. Chen, Y . Li, R. Zhang, Y . Wang, L. Zhang, X. Li, Z. Xiong, Q. Shi, Z. Huang, Z. Fu, and M. Zheng.Chem. Sci., 15:10600–10611, 2024

  28. [28]

    Y . Kang, W. Lee, T. Bae, S. Han, H. Jang, and J. Kim.J. Am. Chem. Soc., 147:3943–3958, 2025. 8

  29. [29]

    T. M. Pruyn, A. Aswad, S. T. Khan, J. Huang, R. Black, and S. M. Moosavi.J. Am. Chem. Soc., 147:43474–43486, 2025

  30. [30]

    L. T. Glasby, K. Gubsch, R. Bence, R. Oktavian, K. Isoko, S. M. Moosavi, J. L. Cordiner, J. C. Cole, and P. Z. Moghadam.Chem. Mater., 35:4510–4524, 2023

  31. [31]

    Y . Luo, S. Bag, O. Zaremba, A. Cierpka, J. Andreo, S. Wuttke, P. Friederich, and M. Tsot- salas.Angew. Chem. Int. Ed., 61:e202200242, 2022

  32. [32]

    Nandy, C

    A. Nandy, C. Duan, and H. J. Kulik.J. Am. Chem. Soc., 143:17535–17547, 2021

  33. [33]

    C. R. Groom, I. J. Bruno, M. P. Lightfoot, and S. C. Ward.Acta Crystallogr. Sect. B Struct. Sci. Cryst. Eng. Mater., 72:171–179, 2016

  34. [34]

    P. Z. Moghadam, A. Li, S. B. Wiggin, A. Tao, A. G. P. Maloney, P. A. Wood, S. C. Ward, and D. Fairen-Jimenez.Chem. Mater., 29:2618–2625, 2017

  35. [35]

    S. X. Leong, S. Pablo-Garc ´ıa, Z. Zhang, and A. Aspuru-Guzik.ChemRxiv Prepr. DOI:10.26434/chemrxiv-2024-7fwxv

  36. [36]

    Dagdelen, A

    J. Dagdelen, A. Dunn, S. Lee, N. Walker, A. S. Rosen, G. Ceder, K. A. Persson, and A. Jain.Nat. Commun., 15:1418, 2024

  37. [37]

    Q. Ai, F. Meng, J. Shi, B. Pelkie, and C. W. Coley.Digit. Discov., 3:1822–1831, 2024

  38. [38]

    S. Kim, Y . Jung, and J. Schrier.J. Am. Chem. Soc., 146:19654–19659, 2024

  39. [39]

    J. Choi, S. Kim, and Y . Jung.J. Am. Chem. Soc., 147:39113–39122, 2025

  40. [40]

    Platt.Adv

    J. Platt.Adv. Large Margin Classif., 10:61–74, 1999

  41. [41]

    B. W. Matthews.Biochim. Biophys. Acta BBA - Protein Struct., 405:442–451, 1975

  42. [42]

    Chicco and G

    D. Chicco and G. Jurman.BMC Genomics, 21:6, 2020

  43. [43]

    Nazari, F

    M. Nazari, F. Zadehahmadi, M. M. Sadiq, A. L. Sutton, H. Mahdavi, and M. R. Hill. Commun. Mater., 5:170, 2024

  44. [44]

    Xie and J

    F. Xie and J. Li.ACS Mater. Lett., 6:2400–2408, 2024

  45. [45]

    scale-up

    A. Helal, Z. H. Yamani, K. E. Cordova, and O. M. Yaghi.Natl. Sci. Rev., 4:296–298, 2017. 9 Supplementary Information Predicting Scale-Up of Metal-Organic Framework Syntheses Contents 1 Introduction 1 2 Results and Discussion 2 3 Conclusion 6 S1 ESU-MOF Dataset Construction 12 S1.1 Literature Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . ...

  46. [46]

    Scan all pages to locate keyword occurrences

  47. [47]

    Define context windows around each occurrence

  48. [48]

    Merge overlapping windows

  49. [49]

    Your output had error:{error}. Fix and retry

    Extract the merged text regions, subject to a configurable maximum character limit. This ensures that synthesis procedure sections receive priority over unrelated content (e.g., characterisation data, discussion), while respecting the token budget of the extraction LLM. Main text and supporting information are processed independently and then combined bef...

  50. [50]

    P” and “U

    Extracts a raw score from the token-level log-probabilities of the two candidate output tokens (“P” and “U”), yielding a continuous confidence value in[0,1]

  51. [51]

    This score is corrected under the PU-learning framework to account for incomplete pos- itive labelling in the literature-derived training set

  52. [52]

    The corrected score is optionally calibrated by Platt scaling on the validation gold split, and a decision threshold selected on the same split is used to obtain the final binary prediction. In addition to the scalar score, the log-probability extraction module records diagnostic metadata, including the generated token sequence, top-kalternative tokens at...

  53. [53]

    Experimental section headers

  54. [54]

    Reactor/vessel types (solvothermal, autoclave ...)

  55. [55]

    Scale-up & quantity signals

  56. [56]

    Reagent/stoichiometry terms (mmol, equiv ...) 24

  57. [57]

    Solvents & reaction media

  58. [58]

    Reaction conditions (T, time, stirring ...)

  59. [59]

    Work-up & post-processing

  60. [60]

    Yield & characterization

  61. [61]

    Supporting-info markers

  62. [62]

    MOF-specific names & topologies

  63. [63]

    _comment

    Mass/unit evidence (mg, g, kg ...) OUTPUT: a singleJSONarray of unique lowercase strings. No prose, no fences. {examples_block} EXISTING (preserve all, only add new): {existing_keywords_json} Solvent Alias Mapping Planner TheSolventAliasPlannerbuilds the solvent canonicalisation table (Section S2.3). SolventAliasPlanner – system prompt You are an expert i...