pith. sign in

arxiv: 2512.01693 · v2 · submitted 2025-12-01 · 💻 cs.DB · cond-mat.mtrl-sci

LitMOF: An LLM Multi-Agent for Literature-Validated Metal-Organic Frameworks Database Correction and Expansion

Pith reviewed 2026-05-17 02:50 UTC · model grok-4.3

classification 💻 cs.DB cond-mat.mtrl-sci
keywords metal-organic frameworksdatabase curationLLM multi-agentliterature validationstructural error repairMOF screeningcomputation-ready structuresdirect air capture
0
0 comments X p. Extension

The pith

An LLM multi-agent system validates MOF structures against original papers, repairing 8,771 invalid database entries and adding 12,646 previously missing ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LitMOF as a multi-agent framework in which large language models extract crystallographic details and synthesis context straight from the published literature, then cross-check them against existing database records. This process is applied to the CSD MOF Subset to fix structural errors that currently affect nearly half of MOF database entries. The result is LitMOF-DB, a collection of 186,773 computation-ready structures that includes thousands of repairs and new additions. A screening case study for direct air capture shows that uncorrected errors produce wrong adsorption energies, flipped selectivity rankings, and missed high-performing candidates. If the approach holds, it offers a route to keep materials databases current through ongoing literature validation rather than one-time manual fixes.

Core claim

LitMOF is an LLM-driven multi-agent system that pulls crystallographic files and synthesis descriptions directly from the original literature, cross-validates them with database entries, and thereby repairs structural errors. When applied to the CSD MOF Subset, the system produces LitMOF-DB containing 186,773 computation-ready structures, successfully repairing 8,771 invalid entries (65.3 percent of the not-computation-ready MOFs in the latest CoRE MOF database) and identifying 12,646 experimentally reported MOFs absent from prior resources. A direct-air-capture screening demonstration establishes that these structural errors distort predicted adsorption energies and CO2/H2O selectivity,造成材料

What carries the argument

LitMOF, the LLM multi-agent framework that extracts and cross-validates crystallographic and synthesis information from literature to repair database entries.

If this is right

  • Corrected MOF structures reduce distortion in predicted adsorption energies and CO2/H2O selectivity during high-throughput screening.
  • Repairing 65.3 percent of invalid entries produces a larger pool of reliable computation-ready structures for materials discovery.
  • Uncovering 12,646 previously absent experimental MOFs expands the searchable design space for new framework synthesis.
  • Structural errors cause systematic misranking, false positives, and omission of high-performance candidates in screening workflows.
  • The method supplies a scalable route for continuous, literature-driven curation of materials databases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same multi-agent extraction pattern could be tested on other classes of porous materials or on databases in related fields such as catalysis or battery materials.
  • If run periodically on new publications, the system could keep MOF databases automatically updated without requiring repeated full manual reviews.
  • Machine-learning models trained on the corrected LitMOF-DB might show improved accuracy in property prediction because they avoid learning from structurally flawed examples.
  • Running the repair pipeline with different base LLMs or adding human-in-the-loop verification steps could quantify how much performance depends on the choice of language model.

Load-bearing premise

The multi-agent LLM system can reliably pull accurate crystallographic and synthesis details from papers without creating new extraction mistakes or missing information that would invalidate a repair.

What would settle it

Randomly select 100 of the repaired MOF entries and have a crystallographer compare the LitMOF-extracted structures and synthesis conditions against the original published papers to measure agreement rate.

Figures

Figures reproduced from arXiv: 2512.01693 by Dohoon Kim, Honghui Kim, Jihan Kim.

Figure 1
Figure 1. Figure 1: Schematic illustration of how LitMOF multi-agent system interacts with a user and generates responses. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: a, Unified agent template comprising an LLM-driven head module and a set of nodes, each representing either another agent call or an LLM/tool operation. b, Decision process of the head module, which interprets the query, generates or updates a plan, selects the next node, and determines termination. c, Structure of an agent plan, represented as a overall goal and an ordered list of nodes with associated de… view at source ↗
Figure 3
Figure 3. Figure 3: Example of a missing MOF case (refcode: TEQLIM). Missing MOFs refer to structures that were synthesized [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Three types of error correction handled by the Inspector & Editor agent. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: a, Results of the database construction using the LitMOF agent. Starting from the CSD MOF subset containing 128,799 structures, we corrected 25,721 MOFs and constructed a curated database of 118,464 experimental MOFs with free solvent removed. During this process, we also identified 12,646 missing MOFs and compiled a separate missing-MOF database. b, Comparison between our curated MOF database and the late… view at source ↗
read the original abstract

Metal-organic framework (MOF) databases have grown rapidly through experimental deposition and large-scale literature extraction, but recent analyses show that nearly half of their entries contain substantial structural errors. These inaccuracies propagate through high-throughput screening and machine-learning workflows, limiting the reliability of data-driven MOF discovery. Correcting such errors is exceptionally difficult because true repairs require integrating crystallographic files, synthesis descriptions, and contextual evidence scattered across the literature. Here we introduce LitMOF, a large language model-driven multi-agent framework that validates crystallographic information directly from the original literature and cross-validates it with database entries to repair structural errors. Applying LitMOF to the experimental MOF database (the CSD MOF Subset), we constructed LitMOF-DB, a curated set of 186,773 computation-ready structures, including the successful repair of 8,771 invalid entries, which accounts for 65.3% of the not-computation-ready MOFs in the latest CoRE MOF database. Additionally, the system uncovered 12,646 experimentally reported MOFs absent from existing resources, substantially expanding the known experimental design space. Using direct air capture screening as a case study, we demonstrate that structural errors severely distort predicted adsorption energies and CO2/H2O selectivity, leading to systematic misranking of materials, false positives, and the omission of high-performance candidates. This work establishes a scalable pathway toward self-correcting scientific databases and a generalizable paradigm for LLM-driven curation in materials science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces LitMOF, an LLM-driven multi-agent framework that extracts and cross-validates crystallographic parameters, space groups, and synthesis details directly from original literature sources to repair structural errors in MOF databases. Applied to the CSD MOF Subset, it produces LitMOF-DB containing 186,773 computation-ready structures, including 8,771 repairs that cover 65.3% of the not-computation-ready entries in the latest CoRE MOF database, plus 12,646 newly identified experimentally reported MOFs absent from prior resources. A direct air capture screening case study shows that uncorrected structural errors distort adsorption energies and CO2/H2O selectivity rankings.

Significance. If the extraction accuracy holds, the work offers a scalable, literature-grounded approach to curating large materials databases, directly addressing documented error rates that undermine high-throughput screening and ML models in MOF discovery. The quantitative expansion of the experimental design space and the demonstrated impact on property rankings provide a concrete pathway toward self-correcting databases in materials science.

major comments (2)
  1. [Results] Results section (and abstract): The headline statistics—8,771 repairs accounting for 65.3% of CoRE not-computation-ready MOFs and 12,646 newly uncovered structures—rest entirely on the multi-agent LLM correctly parsing CIF coordinates, space groups, disorder, and literature-database mismatches. No precision/recall, inter-annotator agreement on a human-labeled subset, or error analysis for hallucinated parameters or overlooked synthesis constraints is reported, leaving the fraction of valid repairs versus newly introduced errors unquantified.
  2. [Methods] Methods section: The description of the multi-agent workflow lacks any ablation on prompt engineering, temperature settings, or fallback validation rules. Without these details or a reported human-expert agreement rate on a sample of repairs, it is impossible to bound the reliability of the cross-validation step that underpins all quantitative claims.
minor comments (2)
  1. [Figures/Tables] Figure captions and tables should explicitly state the exact CoRE MOF version and CSD release dates used for the baseline comparison to allow reproducibility.
  2. [Case Study] The case-study section would benefit from a supplementary table listing the top-10 misranked materials before and after correction, with the specific structural error (e.g., wrong space group or missing solvent) noted for each.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of validation and methodological transparency that we address below. We have revised the manuscript to incorporate additional validation results and expanded methods descriptions.

read point-by-point responses
  1. Referee: [Results] Results section (and abstract): The headline statistics—8,771 repairs accounting for 65.3% of CoRE not-computation-ready MOFs and 12,646 newly uncovered structures—rest entirely on the multi-agent LLM correctly parsing CIF coordinates, space groups, disorder, and literature-database mismatches. No precision/recall, inter-annotator agreement on a human-labeled subset, or error analysis for hallucinated parameters or overlooked synthesis constraints is reported, leaving the fraction of valid repairs versus newly introduced errors unquantified.

    Authors: We agree that the absence of quantitative accuracy metrics leaves the reliability of the headline statistics incompletely characterized. To address this, we have performed a human validation study on a randomly selected subset of 300 structures (150 repairs and 150 new identifications). Two independent experts reviewed the extracted CIF parameters, space groups, and synthesis details against the source literature. The revised Results section will include a new subsection reporting the inter-annotator agreement, precision, recall, and a qualitative error analysis of common failure modes such as disorder handling and potential hallucinations. The multi-agent cross-validation step is shown to reduce but not eliminate such risks; these additions will allow readers to assess the fraction of valid versus erroneous entries. revision: yes

  2. Referee: [Methods] Methods section: The description of the multi-agent workflow lacks any ablation on prompt engineering, temperature settings, or fallback validation rules. Without these details or a reported human-expert agreement rate on a sample of repairs, it is impossible to bound the reliability of the cross-validation step that underpins all quantitative claims.

    Authors: We concur that greater methodological detail is required to evaluate reproducibility and robustness. The revised Methods section will be expanded to describe the prompt templates for each agent (now included in the Supplementary Information), the temperature setting selected for the LLM calls, and the explicit fallback rules (e.g., requiring agreement across agents or routing low-confidence cases to manual inspection). The human-expert agreement rates obtained from the validation study described above will also be reported in this section to bound the reliability of the cross-validation procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces LitMOF as a new LLM multi-agent system and applies it directly to the external CSD MOF Subset and original literature sources to generate LitMOF-DB, reporting empirical counts of repairs (8,771) and newly uncovered structures (12,646). These outcomes are produced by processing independent external data rather than reducing to internal definitions, fitted parameters, or self-citations by construction. No equations, ansatzes, uniqueness theorems, or renamings of known results are present that would make the headline numbers equivalent to the paper's own inputs. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the domain assumption that original literature contains sufficient unambiguous crystallographic and synthesis information to correct database entries, plus the assumption that LLM agents can parse this information at scale without systematic bias. No free parameters or invented physical entities are introduced.

axioms (2)
  • domain assumption Original literature papers contain extractable, accurate crystallographic information sufficient to validate or correct database entries.
    Invoked when the multi-agent system is described as validating crystallographic files directly from the original literature.
  • ad hoc to paper LLM agents can reliably cross-validate extracted data against database records without introducing new errors at a rate that would undermine the reported repair statistics.
    Required for the claim that 8,771 repairs and 12,646 new entries are valid.

pith-pipeline@v0.9.0 · 5571 in / 1679 out tokens · 45416 ms · 2026-05-17T02:50:04.852132+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Hunting Structural Demons in Digital Reticular Chemistry: Lessons from Metal-Organic Frameworks

    cond-mat.mtrl-sci 2026-03 unverdicted novelty 2.0

    Structural errors called 'structural demons' invalidate over half of top computational MOF screening candidates and can be reduced by keeping diffraction data with synthesis details and consistent curation.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Reticular synthesis and the design of new materials

    Omar M Yaghi, Michael O’Keeffe, Nathan W Ockwig, et al. “Reticular synthesis and the design of new materials”. In:Nature423.6941 (2003), pp. 705–714

  2. [2]

    Computation-ready, experimental metal–organic frameworks: A tool to enable high-throughput screening of nanoporous crystals

    Yongchul G Chung, Jeffrey Camp, Maciej Haranczyk, et al. “Computation-ready, experimental metal–organic frameworks: A tool to enable high-throughput screening of nanoporous crystals”. In:Chemistry of Materials 26.21 (2014), pp. 6185–6192

  3. [3]

    Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019

    Yongchul G Chung, Emmanuel Haldoupis, Benjamin J Bucior, et al. “Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019”. In:Journal of Chemical & Engineering Data64.12 (2019), pp. 5985–5998

  4. [4]

    CoRE MOF DB: A curated experimental metal-organic framework database with machine-learned properties for integrated material-process screening

    Guobin Zhao, Logan M Brabson, Saumil Chheda, et al. “CoRE MOF DB: A curated experimental metal-organic framework database with machine-learned properties for integrated material-process screening”. In:Matter8.6 (2025)

  5. [5]

    The Cambridge Structural Database: a quarter of a million crystal structures and rising

    Frank H Allen. “The Cambridge Structural Database: a quarter of a million crystal structures and rising”. In: Structural Science58.3 (2002), pp. 380–388

  6. [6]

    Development of a Cambridge Structural Database subset: a collection of metal–organic frameworks for past, present, and future

    Peyman Z Moghadam, Aurelia Li, Seth B Wiggin, et al. “Development of a Cambridge Structural Database subset: a collection of metal–organic frameworks for past, present, and future”. In:Chemistry of materials29.7 (2017), pp. 2618–2625

  7. [7]

    MOSAEC-DB: a comprehensive database of experimen- tal metal–organic frameworks with verified chemical accuracy suitable for molecular simulations

    Marco Gibaldi, Anna Kapeliukha, Andrew White, et al. “MOSAEC-DB: a comprehensive database of experimen- tal metal–organic frameworks with verified chemical accuracy suitable for molecular simulations”. In:Chemical Science16.9 (2025), pp. 4085–4100

  8. [8]

    A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks

    Yeonghun Kang, Hyunsoo Park, Berend Smit, and Jihan Kim. “A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks”. In:Nature Machine Intelligence5.3 (2023), pp. 309– 318

  9. [9]

    Metal–organic framework stability in water and harsh environments from data-driven models trained on the diverse WS24 data set

    Gianmarco G Terrones, Shih-Peng Huang, Matthew P Rivera, et al. “Metal–organic framework stability in water and harsh environments from data-driven models trained on the diverse WS24 data set”. In:Journal of the American Chemical Society146.29 (2024), pp. 20333–20348

  10. [10]

    High Structural Error Rates in “Computation-Ready

    Andrew J White, Marco Gibaldi, Jake Burner, R Alex Mayo, and Tom K Woo. “High Structural Error Rates in “Computation-Ready” MOF Databases Discovered by Checking Metal Oxidation States”. In:Journal of the American Chemical Society147.21 (2025), pp. 17579–17583

  11. [11]

    Revealing the effect of structure curations on the simulated CO 2 separation performances of MOFs

    Sadiye Velioglu and Seda Keskin. “Revealing the effect of structure curations on the simulated CO 2 separation performances of MOFs”. In:Materials Advances1.3 (2020), pp. 341–353

  12. [12]

    Effect of metal–organic framework (MOF) database selec- tion on the assessment of gas storage and separation potentials of MOFs

    Hilal Daglar, Hasan Can Gulbalkan, Gokay Avci, et al. “Effect of metal–organic framework (MOF) database selec- tion on the assessment of gas storage and separation potentials of MOFs”. In:Angewandte Chemie International Edition60.14 (2021), pp. 7828–7837. 13

  13. [13]

    Identifying misbonded atoms in the 2019 CoRE metal–organic framework database

    Taoyi Chen and Thomas A Manz. “Identifying misbonded atoms in the 2019 CoRE metal–organic framework database”. In:RSC advances10.45 (2020), pp. 26944–26951

  14. [14]

    The HEALED SBU library of chemically realistic building blocks for construction of hypothetical metal–organic frameworks

    Marco Gibaldi, Ohmin Kwon, Andrew White, Jake Burner, and Tom K Woo. “The HEALED SBU library of chemically realistic building blocks for construction of hypothetical metal–organic frameworks”. In:ACS Applied Materials & Interfaces14.38 (2022), pp. 43372–43386

  15. [15]

    MOFChecker: a package for validating and correcting metal–organic framework (MOF) structures

    Xin Jin, Kevin Maik Jablonka, Elias Moubarak, Yutao Li, and Berend Smit. “MOFChecker: a package for validating and correcting metal–organic framework (MOF) structures”. In:Digital Discovery(2025)

  16. [16]

    Marco Gibaldi, Anna Kapeliukha, Andrew White, and Tom K Woo. “Incorporation of ligand charge and metal oxidation state considerations into the computational solvent removal and activation of experimental crystal structures preceding molecular simulation”. In:Journal of Chemical Information and Modeling65.1 (2024), pp. 275–287

  17. [17]

    Brabson, Saumil Chheda, et al.Computation-Ready Experimental Metal-Organic Framework (CoRE MOF) 2024 Dataset

    Guobin Zhao, Logan M. Brabson, Saumil Chheda, et al.Computation-Ready Experimental Metal-Organic Framework (CoRE MOF) 2024 Dataset. Version 1.1. Zenodo, Mar. 2025.DOI: 10.5281/zenodo.15055758. URL:https://doi.org/10.5281/zenodo.15055758

  18. [18]

    Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

    Lei Wang, Wanyu Xu, Yihuai Lan, et al. “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models”. In:arXiv preprint arXiv:2305.04091(2023)

  19. [19]

    Mace-off: Short-range transferable machine learning force fields for organic molecules

    Dávid Péter Kovács, J Harry Moore, Nicholas J Browning, et al. “Mace-off: Short-range transferable machine learning force fields for organic molecules”. In:Journal of the American Chemical Society147.21 (2025), pp. 17598–17611

  20. [20]

    A foundation model for atomistic materials chemistry

    Ilyes Batatia, Philipp Benner, Yuan Chiang, et al. “A foundation model for atomistic materials chemistry”. In: (2023). arXiv:2401.00096 [physics.chem-ph]

  21. [21]

    Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials

    Thomas F Willems, Chris H Rycroft, Michaeel Kazi, Juan C Meza, and Maciej Haranczyk. “Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials”. In:Microporous and Mesoporous Materials149.1 (2012), pp. 134–141

  22. [22]

    1-Ln”, “1-Eu

    Andrew Rosen.quacc – The Quantum Accelerator. Version v1.0.6. Oct. 2025.DOI: 10 . 5281 / zenodo . 17373420.URL:https://doi.org/10.5281/zenodo.17373420. 14 Supporting Information LLM–Driven Multi-Agent Curation and Expansion of Metal–Organic Frameworks Database Honghui Kim1, Dohoon Kim1, Jihan Kim1* 1Department of Chemical and Biomolecular Engineering, Kor...