LitMOF: An LLM Multi-Agent for Literature-Validated Metal-Organic Frameworks Database Correction and Expansion

arxiv: 2512.01693 · v2 · submitted 2025-12-01 · 💻 cs.DB · cond-mat.mtrl-sci

LitMOF: An LLM Multi-Agent for Literature-Validated Metal-Organic Frameworks Database Correction and Expansion

Honghui Kim , Dohoon Kim , Jihan Kim This is my paper

Pith reviewed 2026-05-17 02:50 UTC · model grok-4.3

classification 💻 cs.DB cond-mat.mtrl-sci

keywords metal-organic frameworksdatabase curationLLM multi-agentliterature validationstructural error repairMOF screeningcomputation-ready structuresdirect air capture

0 comments p. Extension

The pith

An LLM multi-agent system validates MOF structures against original papers, repairing 8,771 invalid database entries and adding 12,646 previously missing ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LitMOF as a multi-agent framework in which large language models extract crystallographic details and synthesis context straight from the published literature, then cross-check them against existing database records. This process is applied to the CSD MOF Subset to fix structural errors that currently affect nearly half of MOF database entries. The result is LitMOF-DB, a collection of 186,773 computation-ready structures that includes thousands of repairs and new additions. A screening case study for direct air capture shows that uncorrected errors produce wrong adsorption energies, flipped selectivity rankings, and missed high-performing candidates. If the approach holds, it offers a route to keep materials databases current through ongoing literature validation rather than one-time manual fixes.

Core claim

LitMOF is an LLM-driven multi-agent system that pulls crystallographic files and synthesis descriptions directly from the original literature, cross-validates them with database entries, and thereby repairs structural errors. When applied to the CSD MOF Subset, the system produces LitMOF-DB containing 186,773 computation-ready structures, successfully repairing 8,771 invalid entries (65.3 percent of the not-computation-ready MOFs in the latest CoRE MOF database) and identifying 12,646 experimentally reported MOFs absent from prior resources. A direct-air-capture screening demonstration establishes that these structural errors distort predicted adsorption energies and CO2/H2O selectivity,造成材料

What carries the argument

LitMOF, the LLM multi-agent framework that extracts and cross-validates crystallographic and synthesis information from literature to repair database entries.

If this is right

Corrected MOF structures reduce distortion in predicted adsorption energies and CO2/H2O selectivity during high-throughput screening.
Repairing 65.3 percent of invalid entries produces a larger pool of reliable computation-ready structures for materials discovery.
Uncovering 12,646 previously absent experimental MOFs expands the searchable design space for new framework synthesis.
Structural errors cause systematic misranking, false positives, and omission of high-performance candidates in screening workflows.
The method supplies a scalable route for continuous, literature-driven curation of materials databases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multi-agent extraction pattern could be tested on other classes of porous materials or on databases in related fields such as catalysis or battery materials.
If run periodically on new publications, the system could keep MOF databases automatically updated without requiring repeated full manual reviews.
Machine-learning models trained on the corrected LitMOF-DB might show improved accuracy in property prediction because they avoid learning from structurally flawed examples.
Running the repair pipeline with different base LLMs or adding human-in-the-loop verification steps could quantify how much performance depends on the choice of language model.

Load-bearing premise

The multi-agent LLM system can reliably pull accurate crystallographic and synthesis details from papers without creating new extraction mistakes or missing information that would invalidate a repair.

What would settle it

Randomly select 100 of the repaired MOF entries and have a crystallographer compare the LitMOF-extracted structures and synthesis conditions against the original published papers to measure agreement rate.

Figures

Figures reproduced from arXiv: 2512.01693 by Dohoon Kim, Honghui Kim, Jihan Kim.

**Figure 1.** Figure 1: Schematic illustration of how LitMOF multi-agent system interacts with a user and generates responses. [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: a, Unified agent template comprising an LLM-driven head module and a set of nodes, each representing either another agent call or an LLM/tool operation. b, Decision process of the head module, which interprets the query, generates or updates a plan, selects the next node, and determines termination. c, Structure of an agent plan, represented as a overall goal and an ordered list of nodes with associated de… view at source ↗

**Figure 3.** Figure 3: Example of a missing MOF case (refcode: TEQLIM). Missing MOFs refer to structures that were synthesized [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Three types of error correction handled by the Inspector & Editor agent. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: a, Results of the database construction using the LitMOF agent. Starting from the CSD MOF subset containing 128,799 structures, we corrected 25,721 MOFs and constructed a curated database of 118,464 experimental MOFs with free solvent removed. During this process, we also identified 12,646 missing MOFs and compiled a separate missing-MOF database. b, Comparison between our curated MOF database and the late… view at source ↗

read the original abstract

Metal-organic framework (MOF) databases have grown rapidly through experimental deposition and large-scale literature extraction, but recent analyses show that nearly half of their entries contain substantial structural errors. These inaccuracies propagate through high-throughput screening and machine-learning workflows, limiting the reliability of data-driven MOF discovery. Correcting such errors is exceptionally difficult because true repairs require integrating crystallographic files, synthesis descriptions, and contextual evidence scattered across the literature. Here we introduce LitMOF, a large language model-driven multi-agent framework that validates crystallographic information directly from the original literature and cross-validates it with database entries to repair structural errors. Applying LitMOF to the experimental MOF database (the CSD MOF Subset), we constructed LitMOF-DB, a curated set of 186,773 computation-ready structures, including the successful repair of 8,771 invalid entries, which accounts for 65.3% of the not-computation-ready MOFs in the latest CoRE MOF database. Additionally, the system uncovered 12,646 experimentally reported MOFs absent from existing resources, substantially expanding the known experimental design space. Using direct air capture screening as a case study, we demonstrate that structural errors severely distort predicted adsorption energies and CO2/H2O selectivity, leading to systematic misranking of materials, false positives, and the omission of high-performance candidates. This work establishes a scalable pathway toward self-correcting scientific databases and a generalizable paradigm for LLM-driven curation in materials science.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LitMOF reports large-scale repairs and additions to the MOF database via multi-agent LLM literature checks, but the absence of accuracy metrics on the extractions leaves the reliability of those numbers open.

read the letter

The main thing to know is that this paper applies a multi-agent LLM setup to the CSD MOF Subset, claims to repair 8771 invalid structures (covering 65% of the not-ready ones in CoRE MOF) and adds 12646 new experimentally reported MOFs, then shows that fixing them shifts rankings in a direct air capture screening case study. The numbers are concrete and the problem they target is real: structural errors in these databases do propagate into screening and ML work on gas separation and carbon capture. What is actually new is the specific multi-agent workflow tuned for pulling crystallographic details and synthesis context straight from the original papers and cross-checking against database entries. They get credit for producing a larger computation-ready set (186k structures) and for demonstrating downstream effects rather than stopping at the curation step. The approach is practical and addresses a documented pain point in the field. The soft spot is the lack of any reported validation on the LLM outputs themselves. There are no precision or recall figures, no human-expert comparison on a held-out set, and no breakdown of error types like hallucinated coordinates or overlooked disorder. Without those, it is difficult to bound how many of the reported repairs are accurate versus newly introduced mistakes. The central assumption that the agents reliably extract the right info from literature is plausible but untested in the write-up. This paper is for computational materials researchers and database curators who work with MOFs or similar experimental datasets. A reader running high-throughput screens would find the case study useful as a warning about data quality. It deserves a serious referee because the scale of the effort and the concrete outputs make it worth checking, even if the methods section needs expansion on validation and reproducibility. I would send it to review with a request for the missing accuracy metrics and any available code or prompts.

Referee Report

2 major / 2 minor

Summary. The paper introduces LitMOF, an LLM-driven multi-agent framework that extracts and cross-validates crystallographic parameters, space groups, and synthesis details directly from original literature sources to repair structural errors in MOF databases. Applied to the CSD MOF Subset, it produces LitMOF-DB containing 186,773 computation-ready structures, including 8,771 repairs that cover 65.3% of the not-computation-ready entries in the latest CoRE MOF database, plus 12,646 newly identified experimentally reported MOFs absent from prior resources. A direct air capture screening case study shows that uncorrected structural errors distort adsorption energies and CO2/H2O selectivity rankings.

Significance. If the extraction accuracy holds, the work offers a scalable, literature-grounded approach to curating large materials databases, directly addressing documented error rates that undermine high-throughput screening and ML models in MOF discovery. The quantitative expansion of the experimental design space and the demonstrated impact on property rankings provide a concrete pathway toward self-correcting databases in materials science.

major comments (2)

[Results] Results section (and abstract): The headline statistics—8,771 repairs accounting for 65.3% of CoRE not-computation-ready MOFs and 12,646 newly uncovered structures—rest entirely on the multi-agent LLM correctly parsing CIF coordinates, space groups, disorder, and literature-database mismatches. No precision/recall, inter-annotator agreement on a human-labeled subset, or error analysis for hallucinated parameters or overlooked synthesis constraints is reported, leaving the fraction of valid repairs versus newly introduced errors unquantified.
[Methods] Methods section: The description of the multi-agent workflow lacks any ablation on prompt engineering, temperature settings, or fallback validation rules. Without these details or a reported human-expert agreement rate on a sample of repairs, it is impossible to bound the reliability of the cross-validation step that underpins all quantitative claims.

minor comments (2)

[Figures/Tables] Figure captions and tables should explicitly state the exact CoRE MOF version and CSD release dates used for the baseline comparison to allow reproducibility.
[Case Study] The case-study section would benefit from a supplementary table listing the top-10 misranked materials before and after correction, with the specific structural error (e.g., wrong space group or missing solvent) noted for each.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. The comments highlight important aspects of validation and methodological transparency that we address below. We have revised the manuscript to incorporate additional validation results and expanded methods descriptions.

read point-by-point responses

Referee: [Results] Results section (and abstract): The headline statistics—8,771 repairs accounting for 65.3% of CoRE not-computation-ready MOFs and 12,646 newly uncovered structures—rest entirely on the multi-agent LLM correctly parsing CIF coordinates, space groups, disorder, and literature-database mismatches. No precision/recall, inter-annotator agreement on a human-labeled subset, or error analysis for hallucinated parameters or overlooked synthesis constraints is reported, leaving the fraction of valid repairs versus newly introduced errors unquantified.

Authors: We agree that the absence of quantitative accuracy metrics leaves the reliability of the headline statistics incompletely characterized. To address this, we have performed a human validation study on a randomly selected subset of 300 structures (150 repairs and 150 new identifications). Two independent experts reviewed the extracted CIF parameters, space groups, and synthesis details against the source literature. The revised Results section will include a new subsection reporting the inter-annotator agreement, precision, recall, and a qualitative error analysis of common failure modes such as disorder handling and potential hallucinations. The multi-agent cross-validation step is shown to reduce but not eliminate such risks; these additions will allow readers to assess the fraction of valid versus erroneous entries. revision: yes
Referee: [Methods] Methods section: The description of the multi-agent workflow lacks any ablation on prompt engineering, temperature settings, or fallback validation rules. Without these details or a reported human-expert agreement rate on a sample of repairs, it is impossible to bound the reliability of the cross-validation step that underpins all quantitative claims.

Authors: We concur that greater methodological detail is required to evaluate reproducibility and robustness. The revised Methods section will be expanded to describe the prompt templates for each agent (now included in the Supplementary Information), the temperature setting selected for the LLM calls, and the explicit fallback rules (e.g., requiring agreement across agents or routing low-confidence cases to manual inspection). The human-expert agreement rates obtained from the validation study described above will also be reported in this section to bound the reliability of the cross-validation procedure. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces LitMOF as a new LLM multi-agent system and applies it directly to the external CSD MOF Subset and original literature sources to generate LitMOF-DB, reporting empirical counts of repairs (8,771) and newly uncovered structures (12,646). These outcomes are produced by processing independent external data rather than reducing to internal definitions, fitted parameters, or self-citations by construction. No equations, ansatzes, uniqueness theorems, or renamings of known results are present that would make the headline numbers equivalent to the paper's own inputs. The derivation chain is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The framework rests on the domain assumption that original literature contains sufficient unambiguous crystallographic and synthesis information to correct database entries, plus the assumption that LLM agents can parse this information at scale without systematic bias. No free parameters or invented physical entities are introduced.

axioms (2)

domain assumption Original literature papers contain extractable, accurate crystallographic information sufficient to validate or correct database entries.
Invoked when the multi-agent system is described as validating crystallographic files directly from the original literature.
ad hoc to paper LLM agents can reliably cross-validate extracted data against database records without introducing new errors at a rate that would undermine the reported repair statistics.
Required for the claim that 8,771 repairs and 12,646 new entries are valid.

pith-pipeline@v0.9.0 · 5571 in / 1679 out tokens · 45416 ms · 2026-05-17T02:50:04.852132+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LitMOF consists of a Supervisor and five specialized agents... Reference Builder constructs the expected structural motif... Inspector & Editor identifies and corrects inconsistencies
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we constructed LitMOF-DB, a curated set of 118,464 computation-ready structures, including corrections of 69% (6,161 MOFs)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Hunting Structural Demons in Digital Reticular Chemistry: Lessons from Metal-Organic Frameworks
cond-mat.mtrl-sci 2026-03 unverdicted novelty 2.0

Structural errors called 'structural demons' invalidate over half of top computational MOF screening candidates and can be reduced by keeping diffraction data with synthesis details and consistent curation.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · cited by 1 Pith paper · 2 internal anchors

[1]

Reticular synthesis and the design of new materials

Omar M Yaghi, Michael O’Keeffe, Nathan W Ockwig, et al. “Reticular synthesis and the design of new materials”. In:Nature423.6941 (2003), pp. 705–714

work page 2003
[2]

Computation-ready, experimental metal–organic frameworks: A tool to enable high-throughput screening of nanoporous crystals

Yongchul G Chung, Jeffrey Camp, Maciej Haranczyk, et al. “Computation-ready, experimental metal–organic frameworks: A tool to enable high-throughput screening of nanoporous crystals”. In:Chemistry of Materials 26.21 (2014), pp. 6185–6192

work page 2014
[3]

Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019

Yongchul G Chung, Emmanuel Haldoupis, Benjamin J Bucior, et al. “Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019”. In:Journal of Chemical & Engineering Data64.12 (2019), pp. 5985–5998

work page 2019
[4]

CoRE MOF DB: A curated experimental metal-organic framework database with machine-learned properties for integrated material-process screening

Guobin Zhao, Logan M Brabson, Saumil Chheda, et al. “CoRE MOF DB: A curated experimental metal-organic framework database with machine-learned properties for integrated material-process screening”. In:Matter8.6 (2025)

work page 2025
[5]

The Cambridge Structural Database: a quarter of a million crystal structures and rising

Frank H Allen. “The Cambridge Structural Database: a quarter of a million crystal structures and rising”. In: Structural Science58.3 (2002), pp. 380–388

work page 2002
[6]

Development of a Cambridge Structural Database subset: a collection of metal–organic frameworks for past, present, and future

Peyman Z Moghadam, Aurelia Li, Seth B Wiggin, et al. “Development of a Cambridge Structural Database subset: a collection of metal–organic frameworks for past, present, and future”. In:Chemistry of materials29.7 (2017), pp. 2618–2625

work page 2017
[7]

MOSAEC-DB: a comprehensive database of experimen- tal metal–organic frameworks with verified chemical accuracy suitable for molecular simulations

Marco Gibaldi, Anna Kapeliukha, Andrew White, et al. “MOSAEC-DB: a comprehensive database of experimen- tal metal–organic frameworks with verified chemical accuracy suitable for molecular simulations”. In:Chemical Science16.9 (2025), pp. 4085–4100

work page 2025
[8]

A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks

Yeonghun Kang, Hyunsoo Park, Berend Smit, and Jihan Kim. “A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks”. In:Nature Machine Intelligence5.3 (2023), pp. 309– 318

work page 2023
[9]

Metal–organic framework stability in water and harsh environments from data-driven models trained on the diverse WS24 data set

Gianmarco G Terrones, Shih-Peng Huang, Matthew P Rivera, et al. “Metal–organic framework stability in water and harsh environments from data-driven models trained on the diverse WS24 data set”. In:Journal of the American Chemical Society146.29 (2024), pp. 20333–20348

work page 2024
[10]

High Structural Error Rates in “Computation-Ready

Andrew J White, Marco Gibaldi, Jake Burner, R Alex Mayo, and Tom K Woo. “High Structural Error Rates in “Computation-Ready” MOF Databases Discovered by Checking Metal Oxidation States”. In:Journal of the American Chemical Society147.21 (2025), pp. 17579–17583

work page 2025
[11]

Revealing the effect of structure curations on the simulated CO 2 separation performances of MOFs

Sadiye Velioglu and Seda Keskin. “Revealing the effect of structure curations on the simulated CO 2 separation performances of MOFs”. In:Materials Advances1.3 (2020), pp. 341–353

work page 2020
[12]

Effect of metal–organic framework (MOF) database selec- tion on the assessment of gas storage and separation potentials of MOFs

Hilal Daglar, Hasan Can Gulbalkan, Gokay Avci, et al. “Effect of metal–organic framework (MOF) database selec- tion on the assessment of gas storage and separation potentials of MOFs”. In:Angewandte Chemie International Edition60.14 (2021), pp. 7828–7837. 13

work page 2021
[13]

Identifying misbonded atoms in the 2019 CoRE metal–organic framework database

Taoyi Chen and Thomas A Manz. “Identifying misbonded atoms in the 2019 CoRE metal–organic framework database”. In:RSC advances10.45 (2020), pp. 26944–26951

work page 2019
[14]

The HEALED SBU library of chemically realistic building blocks for construction of hypothetical metal–organic frameworks

Marco Gibaldi, Ohmin Kwon, Andrew White, Jake Burner, and Tom K Woo. “The HEALED SBU library of chemically realistic building blocks for construction of hypothetical metal–organic frameworks”. In:ACS Applied Materials & Interfaces14.38 (2022), pp. 43372–43386

work page 2022
[15]

MOFChecker: a package for validating and correcting metal–organic framework (MOF) structures

Xin Jin, Kevin Maik Jablonka, Elias Moubarak, Yutao Li, and Berend Smit. “MOFChecker: a package for validating and correcting metal–organic framework (MOF) structures”. In:Digital Discovery(2025)

work page 2025
[16]

Marco Gibaldi, Anna Kapeliukha, Andrew White, and Tom K Woo. “Incorporation of ligand charge and metal oxidation state considerations into the computational solvent removal and activation of experimental crystal structures preceding molecular simulation”. In:Journal of Chemical Information and Modeling65.1 (2024), pp. 275–287

work page 2024
[17]

Brabson, Saumil Chheda, et al.Computation-Ready Experimental Metal-Organic Framework (CoRE MOF) 2024 Dataset

Guobin Zhao, Logan M. Brabson, Saumil Chheda, et al.Computation-Ready Experimental Metal-Organic Framework (CoRE MOF) 2024 Dataset. Version 1.1. Zenodo, Mar. 2025.DOI: 10.5281/zenodo.15055758. URL:https://doi.org/10.5281/zenodo.15055758

work page doi:10.5281/zenodo.15055758 2024
[18]

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Lei Wang, Wanyu Xu, Yihuai Lan, et al. “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models”. In:arXiv preprint arXiv:2305.04091(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[19]

Mace-off: Short-range transferable machine learning force fields for organic molecules

Dávid Péter Kovács, J Harry Moore, Nicholas J Browning, et al. “Mace-off: Short-range transferable machine learning force fields for organic molecules”. In:Journal of the American Chemical Society147.21 (2025), pp. 17598–17611

work page 2025
[20]

A foundation model for atomistic materials chemistry

Ilyes Batatia, Philipp Benner, Yuan Chiang, et al. “A foundation model for atomistic materials chemistry”. In: (2023). arXiv:2401.00096 [physics.chem-ph]

work page internal anchor Pith review arXiv 2023
[21]

Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials

Thomas F Willems, Chris H Rycroft, Michaeel Kazi, Juan C Meza, and Maciej Haranczyk. “Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials”. In:Microporous and Mesoporous Materials149.1 (2012), pp. 134–141

work page 2012
[22]

1-Ln”, “1-Eu

Andrew Rosen.quacc – The Quantum Accelerator. Version v1.0.6. Oct. 2025.DOI: 10 . 5281 / zenodo . 17373420.URL:https://doi.org/10.5281/zenodo.17373420. 14 Supporting Information LLM–Driven Multi-Agent Curation and Expansion of Metal–Organic Frameworks Database Honghui Kim1, Dohoon Kim1, Jihan Kim1* 1Department of Chemical and Biomolecular Engineering, Kor...

work page doi:10.5281/zenodo.17373420 2025

[1] [1]

Reticular synthesis and the design of new materials

Omar M Yaghi, Michael O’Keeffe, Nathan W Ockwig, et al. “Reticular synthesis and the design of new materials”. In:Nature423.6941 (2003), pp. 705–714

work page 2003

[2] [2]

Computation-ready, experimental metal–organic frameworks: A tool to enable high-throughput screening of nanoporous crystals

Yongchul G Chung, Jeffrey Camp, Maciej Haranczyk, et al. “Computation-ready, experimental metal–organic frameworks: A tool to enable high-throughput screening of nanoporous crystals”. In:Chemistry of Materials 26.21 (2014), pp. 6185–6192

work page 2014

[3] [3]

Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019

Yongchul G Chung, Emmanuel Haldoupis, Benjamin J Bucior, et al. “Advances, updates, and analytics for the computation-ready, experimental metal–organic framework database: CoRE MOF 2019”. In:Journal of Chemical & Engineering Data64.12 (2019), pp. 5985–5998

work page 2019

[4] [4]

CoRE MOF DB: A curated experimental metal-organic framework database with machine-learned properties for integrated material-process screening

Guobin Zhao, Logan M Brabson, Saumil Chheda, et al. “CoRE MOF DB: A curated experimental metal-organic framework database with machine-learned properties for integrated material-process screening”. In:Matter8.6 (2025)

work page 2025

[5] [5]

The Cambridge Structural Database: a quarter of a million crystal structures and rising

Frank H Allen. “The Cambridge Structural Database: a quarter of a million crystal structures and rising”. In: Structural Science58.3 (2002), pp. 380–388

work page 2002

[6] [6]

Development of a Cambridge Structural Database subset: a collection of metal–organic frameworks for past, present, and future

Peyman Z Moghadam, Aurelia Li, Seth B Wiggin, et al. “Development of a Cambridge Structural Database subset: a collection of metal–organic frameworks for past, present, and future”. In:Chemistry of materials29.7 (2017), pp. 2618–2625

work page 2017

[7] [7]

MOSAEC-DB: a comprehensive database of experimen- tal metal–organic frameworks with verified chemical accuracy suitable for molecular simulations

Marco Gibaldi, Anna Kapeliukha, Andrew White, et al. “MOSAEC-DB: a comprehensive database of experimen- tal metal–organic frameworks with verified chemical accuracy suitable for molecular simulations”. In:Chemical Science16.9 (2025), pp. 4085–4100

work page 2025

[8] [8]

A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks

Yeonghun Kang, Hyunsoo Park, Berend Smit, and Jihan Kim. “A multi-modal pre-training transformer for universal transfer learning in metal–organic frameworks”. In:Nature Machine Intelligence5.3 (2023), pp. 309– 318

work page 2023

[9] [9]

Metal–organic framework stability in water and harsh environments from data-driven models trained on the diverse WS24 data set

Gianmarco G Terrones, Shih-Peng Huang, Matthew P Rivera, et al. “Metal–organic framework stability in water and harsh environments from data-driven models trained on the diverse WS24 data set”. In:Journal of the American Chemical Society146.29 (2024), pp. 20333–20348

work page 2024

[10] [10]

High Structural Error Rates in “Computation-Ready

Andrew J White, Marco Gibaldi, Jake Burner, R Alex Mayo, and Tom K Woo. “High Structural Error Rates in “Computation-Ready” MOF Databases Discovered by Checking Metal Oxidation States”. In:Journal of the American Chemical Society147.21 (2025), pp. 17579–17583

work page 2025

[11] [11]

Revealing the effect of structure curations on the simulated CO 2 separation performances of MOFs

Sadiye Velioglu and Seda Keskin. “Revealing the effect of structure curations on the simulated CO 2 separation performances of MOFs”. In:Materials Advances1.3 (2020), pp. 341–353

work page 2020

[12] [12]

Effect of metal–organic framework (MOF) database selec- tion on the assessment of gas storage and separation potentials of MOFs

Hilal Daglar, Hasan Can Gulbalkan, Gokay Avci, et al. “Effect of metal–organic framework (MOF) database selec- tion on the assessment of gas storage and separation potentials of MOFs”. In:Angewandte Chemie International Edition60.14 (2021), pp. 7828–7837. 13

work page 2021

[13] [13]

Identifying misbonded atoms in the 2019 CoRE metal–organic framework database

Taoyi Chen and Thomas A Manz. “Identifying misbonded atoms in the 2019 CoRE metal–organic framework database”. In:RSC advances10.45 (2020), pp. 26944–26951

work page 2019

[14] [14]

The HEALED SBU library of chemically realistic building blocks for construction of hypothetical metal–organic frameworks

Marco Gibaldi, Ohmin Kwon, Andrew White, Jake Burner, and Tom K Woo. “The HEALED SBU library of chemically realistic building blocks for construction of hypothetical metal–organic frameworks”. In:ACS Applied Materials & Interfaces14.38 (2022), pp. 43372–43386

work page 2022

[15] [15]

MOFChecker: a package for validating and correcting metal–organic framework (MOF) structures

Xin Jin, Kevin Maik Jablonka, Elias Moubarak, Yutao Li, and Berend Smit. “MOFChecker: a package for validating and correcting metal–organic framework (MOF) structures”. In:Digital Discovery(2025)

work page 2025

[16] [16]

Marco Gibaldi, Anna Kapeliukha, Andrew White, and Tom K Woo. “Incorporation of ligand charge and metal oxidation state considerations into the computational solvent removal and activation of experimental crystal structures preceding molecular simulation”. In:Journal of Chemical Information and Modeling65.1 (2024), pp. 275–287

work page 2024

[17] [17]

Brabson, Saumil Chheda, et al.Computation-Ready Experimental Metal-Organic Framework (CoRE MOF) 2024 Dataset

Guobin Zhao, Logan M. Brabson, Saumil Chheda, et al.Computation-Ready Experimental Metal-Organic Framework (CoRE MOF) 2024 Dataset. Version 1.1. Zenodo, Mar. 2025.DOI: 10.5281/zenodo.15055758. URL:https://doi.org/10.5281/zenodo.15055758

work page doi:10.5281/zenodo.15055758 2024

[18] [18]

Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models

Lei Wang, Wanyu Xu, Yihuai Lan, et al. “Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models”. In:arXiv preprint arXiv:2305.04091(2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[19] [19]

Mace-off: Short-range transferable machine learning force fields for organic molecules

Dávid Péter Kovács, J Harry Moore, Nicholas J Browning, et al. “Mace-off: Short-range transferable machine learning force fields for organic molecules”. In:Journal of the American Chemical Society147.21 (2025), pp. 17598–17611

work page 2025

[20] [20]

A foundation model for atomistic materials chemistry

Ilyes Batatia, Philipp Benner, Yuan Chiang, et al. “A foundation model for atomistic materials chemistry”. In: (2023). arXiv:2401.00096 [physics.chem-ph]

work page internal anchor Pith review arXiv 2023

[21] [21]

Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials

Thomas F Willems, Chris H Rycroft, Michaeel Kazi, Juan C Meza, and Maciej Haranczyk. “Algorithms and tools for high-throughput geometry-based analysis of crystalline porous materials”. In:Microporous and Mesoporous Materials149.1 (2012), pp. 134–141

work page 2012

[22] [22]

1-Ln”, “1-Eu

Andrew Rosen.quacc – The Quantum Accelerator. Version v1.0.6. Oct. 2025.DOI: 10 . 5281 / zenodo . 17373420.URL:https://doi.org/10.5281/zenodo.17373420. 14 Supporting Information LLM–Driven Multi-Agent Curation and Expansion of Metal–Organic Frameworks Database Honghui Kim1, Dohoon Kim1, Jihan Kim1* 1Department of Chemical and Biomolecular Engineering, Kor...

work page doi:10.5281/zenodo.17373420 2025