Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks

Byeong Gwan Lee; Dae Woon Lim; Jihan Kim; Seunghee Han

arxiv: 2407.09514 · v3 · submitted 2024-06-18 · ❄️ cond-mat.mtrl-sci · cs.LG· physics.app-ph

Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks

Seunghee Han , Byeong Gwan Lee , Dae Woon Lim , Jihan Kim This is my paper

Pith reviewed 2026-05-23 23:41 UTC · model grok-4.3

classification ❄️ cond-mat.mtrl-sci cs.LGphysics.app-ph

keywords metal-organic frameworksproton conductivitymachine learningtransformer modeltransfer learningsolid electrolytesfuel cells

0 comments

The pith

A transformer-based transfer learning model estimates proton conductivity in MOFs within one order of magnitude.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors compile a database of known proton-conductive metal-organic frameworks and train both descriptor-based and transformer-based machine learning models to predict conductivity values. The best-performing approach uses a transformer architecture with transfer learning and reaches a mean absolute error of 0.91 on a logarithmic scale. This accuracy would let researchers screen unsynthesized frameworks for use as solid electrolytes in fuel cells. The work also applies feature importance and principal component analysis to identify structural factors tied to conductivity. The overall goal is to reduce reliance on trial-and-error synthesis when designing new proton-conducting materials.

Core claim

By constructing a database of proton-conductive MOFs and training descriptor-based plus transformer-based models, the authors demonstrate that a transformer-based transfer learning model achieves a mean absolute error of 0.91, enabling proton conductivity to be estimated within one order of magnitude for new frameworks. Feature importance and principal component analysis are used to extract the chemical and structural factors that most strongly influence conductivity.

What carries the argument

Transformer-based transfer learning (Freeze) model trained on the compiled proton-conductive MOF database.

If this is right

New MOF structures can be screened for proton conductivity before synthesis.
Targeted design of solid-state electrolytes for fuel cells becomes more efficient.
Feature analysis highlights structural motifs that promote high conductivity.
The same modeling pipeline can be applied to predict other transport properties in MOFs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The trained model could be run over large virtual libraries of hypothetical MOFs to rank candidates for experimental follow-up.
Similar transfer-learning setups may accelerate prediction of ionic conductivity in other classes of porous solids.
Periodic retraining on newly measured MOFs would keep the error bounded as the experimental literature grows.

Load-bearing premise

The collected database of proton-conductive MOFs is large, unbiased, and representative enough of chemical space for the trained models to generalize to new, unsynthesized frameworks.

What would settle it

Synthesize and measure proton conductivity for several MOFs absent from the training database; if measured values deviate systematically by more than one order of magnitude from the model's predictions, the generalization claim fails.

read the original abstract

Recently, metal-organic frameworks (MOFs) have demonstrated their potential as solid-state electrolytes in proton exchange membrane fuel cells. However, the number of MOFs reported to exhibit proton conductivity remains limited, and the mechanisms underlying this phenomenon are not fully elucidated, complicating the design of proton-conductive MOFs. In response, we developed a comprehensive database of proton-conductive MOFs and applied machine learning techniques to predict their proton conductivity. Our approach included the construction of both descriptor-based and transformer-based models. Notably, the transformer-based transfer learning (Freeze) model performed the best with a mean absolute error (MAE) of 0.91, suggesting that the proton conductivity of MOFs can be estimated within one order of magnitude using this model. Additionally, we employed feature importance and principal component analysis to explore the factors influencing proton conductivity. The insights gained from our database and machine learning model are expected to facilitate the targeted design of proton-conductive MOFs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper assembles a proton-conductive MOF database and reports that a frozen transformer model reaches MAE 0.91, but the abstract gives no numbers on database size, measurement consistency, or validation splits.

read the letter

The core contribution is the creation of a dedicated database of MOFs with reported proton conductivities, followed by training both descriptor-based and transformer models on it. The transformer with transfer learning (freeze) comes out on top at MAE 0.91, which they interpret as roughly one order of magnitude accuracy. They also run feature importance and PCA to look at what drives conductivity. That is a straightforward domain application of existing ML pipelines to a materials problem where experimental data are sparse, and the database itself is the part that could be reused by others if it is made public with clear provenance. The work is honest about its goal: a practical screening tool rather than a new theory or algorithm. The soft spot is exactly what the stress-test note flags. The abstract supplies none of the basic statistics needed to judge whether the MAE is meaningful: how many entries, how many distinct metals or topologies, whether measurements were taken at comparable temperature and humidity, or what cross-validation scheme was used. If the set is small or heavily clustered, an internal split MAE does not demonstrate the claimed generalization to unsynthesized frameworks. Without those details the central performance claim stays unsupported. This paper is aimed at materials scientists working on MOF electrolytes for fuel cells who need quick filters before synthesis. A reader who already knows the ML methods will mainly care about the data curation and the actual numbers behind the MAE. It deserves a serious referee once the full methods and data description are checked; the idea is reasonable and the gap it tries to fill is real, even if the current write-up leaves the key evidence out of sight.

Referee Report

2 major / 0 minor

Summary. The paper compiles a database of proton-conductive MOFs and trains both descriptor-based and transformer-based machine learning models (including transfer learning with freezing) to predict proton conductivity values. The best reported performance is an MAE of 0.91 from the transformer-based transfer learning (Freeze) model, with additional analysis via feature importance and PCA to identify factors influencing conductivity.

Significance. If the underlying database is large, chemically diverse, and the reported MAE reflects genuine generalization rather than overfitting to a narrow set of structures or measurement conditions, the work could provide a practical screening tool to guide synthesis of new MOF electrolytes for fuel cells.

major comments (2)

[Abstract] Abstract: the headline claim that the Freeze model estimates proton conductivity 'within one order of magnitude' rests on an MAE of 0.91, yet the abstract supplies no database size, measurement-condition standardization, train-test split, or cross-validation protocol; without these the numerical result cannot be evaluated.
[Database construction / Results] Database and Results sections: the generalization claim to unsynthesized frameworks requires explicit reporting of N, chemical-space coverage (metal/linker/topology diversity), and whether conductivity values were measured under comparable T/RH conditions; if N is small or the data cluster in a few families, the internal MAE does not demonstrate out-of-distribution performance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We have revised the manuscript to address the concerns about missing contextual details in the abstract and database section. Our responses to the major comments are provided below.

read point-by-point responses

Referee: [Abstract] Abstract: the headline claim that the Freeze model estimates proton conductivity 'within one order of magnitude' rests on an MAE of 0.91, yet the abstract supplies no database size, measurement-condition standardization, train-test split, or cross-validation protocol; without these the numerical result cannot be evaluated.

Authors: We agree that the abstract should include these details for proper evaluation of the reported MAE. In the revised manuscript, the abstract has been updated to state the database size, note that values were compiled from literature-reported conditions (with full standardization details and cross-validation protocol provided in the Methods and Results sections), and clarify that the MAE of 0.91 is obtained via 5-fold cross-validation on log10-transformed conductivity values, corresponding to typical errors within one order of magnitude. revision: yes
Referee: [Database construction / Results] Database and Results sections: the generalization claim to unsynthesized frameworks requires explicit reporting of N, chemical-space coverage (metal/linker/topology diversity), and whether conductivity values were measured under comparable T/RH conditions; if N is small or the data cluster in a few families, the internal MAE does not demonstrate out-of-distribution performance.

Authors: We have expanded the Database construction section to explicitly report N along with quantitative metrics of chemical-space coverage, including distributions over metals, organic linkers, and topologies. We also clarify that while T and RH conditions vary across the literature sources, the model treats available condition metadata as input features, and we have added a dedicated limitations paragraph discussing the impact of non-standardized conditions. To support generalization claims, we include additional results from family-stratified cross-validation showing consistent performance across diverse MOF subgroups. We acknowledge that this remains internal validation and does not replace future experimental tests on entirely novel frameworks. revision: yes

Circularity Check

0 steps flagged

No circularity: ML models trained and evaluated on external held-out measurements with no self-referential reductions.

full rationale

The paper compiles an external database of proton-conductive MOFs and trains descriptor-based and transformer models to predict conductivity values. The reported MAE of 0.91 is obtained by standard supervised learning on held-out test entries, not by fitting a parameter to the target quantity and renaming it a prediction. No equations, uniqueness theorems, or ansatzes are invoked that reduce the output to the input by construction. Self-citations, if present, are not load-bearing for any derivation chain. The central claim remains a standard ML generalization result whose validity hinges on data quality rather than definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.0 · 5709 in / 1222 out tokens · 27909 ms · 2026-05-23T23:41:07.993119+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

[1]

B.; Kim, Y

(1) Goodenough, J. B.; Kim, Y . Challenges for rechargeable Li batteries. Chemistry of materials 2010, 22 (3), 587-603. (2) Zhao, R.; Wu, Y .; Liang, Z.; Gao, L.; Xia, W.; Zhao, Y .; Zou, R. Metal –organic frameworks for solid- state electrolytes. Energy & Environmental Science 2020, 13 (8), 2386-2403. (3) Mabrouk, W.; Ogier, L.; Vidal, S.; Sollogoub, C.;...

work page 2010
[2]

A.; Mausam

(24) Gupta, T.; Zaki, M.; Krishnan, N. A.; Mausam. MatSciBERT: A materials domain language model for text mining and information extraction. npj Computational Materials 2022, 8 (1),

work page 2022
[3]

Mining insights on metal–organic framework synthesis from scientific literature texts

(25) Park, H.; Kang, Y .; Choe, W.; Kim, J. Mining insights on metal–organic framework synthesis from scientific literature texts. Journal of Chemical Information and Modeling 2022, 62 (5), 1190-1198. (26) Nandy, A.; Duan, C.; Kulik, H. J. Using machine learning and data mining to leverage community knowledge for the enginee ring of stable metal– organic ...

work page 2022
[4]

ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction

(30) Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction. arXiv preprint arXiv:2010.09885

work page arXiv 2010
[5]

polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics

(31) Kuenneth, C.; Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun. 2023, 14 (1),

work page 2023
[6]

Periodic graph transformers for crystal material property prediction

(33) Yan, K.; Liu, Y .; Lin, Y .; Ji, S. Periodic graph transformers for crystal material property prediction. Adv. Neural Inf. Process. Syst. 2022, 35, 15066-15080. (34) Cao, Z.; Magar, R.; Wang, Y .; Barati Farimani, A. Moformer: self-supervised transformer model for metal – organic framework property prediction. Journal of the American Chemical Society...

work page 2022
[7]

P.; Kiddle, S

(38) Buterez, D.; Janet, J. P.; Kiddle, S. J.; Oglic, D.; Lió, P. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nature Communications 2024, 15 (1),

work page 2024
[8]

-k.; Choudhary, A.; Campbell, C.; Agrawal, A

(39) Jha, D.; Choudhary, K.; Tavazza, F.; Liao, W. -k.; Choudhary, A.; Campbell, C.; Agrawal, A. Enhancing materials property prediction by leveraging computational and experime ntal data using deep transfer learning. Nature communications 2019, 10 (1),

work page 2019
[9]

Enhancing Structure–Property Relationships in Porous Materials through Transfer Learning and Cross -Material Few -Shot Learning

(40) Park, H.; Kang, Y .; Kim, J. Enhancing Structure–Property Relationships in Porous Materials through Transfer Learning and Cross -Material Few -Shot Learning. ACS Applied Materials & Interfaces 2023, 15 (48), 56375- 56385. (41) Groom, C. R.; Bruno, I. J.; Lightfoot, M. P.; Ward, S. C. The Cambridge structural database. Acta Crystallographica Section B...

work page 2023
[10]

(44) mofchecker

Journal of Chemical & Engineering Data 2019 , 64 (12), 5985-5998. (44) mofchecker. https://github.com/kjappelbaum/mofchecker (accessed. (45) Materials Studio

work page 2019
[11]

(46) Janet, J

(accessed. (46) Janet, J. P.; Kulik, H. J. Resolving transition metal chemica l space: Feature selection for machine learning and structure–property relationships. The Journal of Physical Chemistry A 2017, 121 (46), 8939-8954. (47) Willems, T. F.; Rycroft, C. H.; Kazi, M.; Meza, J. C.; Haranczyk, M. Algorithms and tools for high-throughput geometry-based ...

work page 2017
[12]

Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28 (1), 31-36. Supporting Information Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks Seunghee Han1, Byoung Gwan Lee2, Dae Woon Lim2, and Jihan Kim1* 1 Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Scienc...

work page doi:10.1002/smll.202301122 1988

[1] [1]

B.; Kim, Y

(1) Goodenough, J. B.; Kim, Y . Challenges for rechargeable Li batteries. Chemistry of materials 2010, 22 (3), 587-603. (2) Zhao, R.; Wu, Y .; Liang, Z.; Gao, L.; Xia, W.; Zhao, Y .; Zou, R. Metal –organic frameworks for solid- state electrolytes. Energy & Environmental Science 2020, 13 (8), 2386-2403. (3) Mabrouk, W.; Ogier, L.; Vidal, S.; Sollogoub, C.;...

work page 2010

[2] [2]

A.; Mausam

(24) Gupta, T.; Zaki, M.; Krishnan, N. A.; Mausam. MatSciBERT: A materials domain language model for text mining and information extraction. npj Computational Materials 2022, 8 (1),

work page 2022

[3] [3]

Mining insights on metal–organic framework synthesis from scientific literature texts

(25) Park, H.; Kang, Y .; Choe, W.; Kim, J. Mining insights on metal–organic framework synthesis from scientific literature texts. Journal of Chemical Information and Modeling 2022, 62 (5), 1190-1198. (26) Nandy, A.; Duan, C.; Kulik, H. J. Using machine learning and data mining to leverage community knowledge for the enginee ring of stable metal– organic ...

work page 2022

[4] [4]

ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction

(30) Chithrananda, S.; Grand, G.; Ramsundar, B. ChemBERTa: large -scale self -supervised pretraining fo r molecular property prediction. arXiv preprint arXiv:2010.09885

work page arXiv 2010

[5] [5]

polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics

(31) Kuenneth, C.; Ramprasad, R. polyBERT: a chemical language model to enable fully machine-driven ultrafast polymer informatics. Nat. Commun. 2023, 14 (1),

work page 2023

[6] [6]

Periodic graph transformers for crystal material property prediction

(33) Yan, K.; Liu, Y .; Lin, Y .; Ji, S. Periodic graph transformers for crystal material property prediction. Adv. Neural Inf. Process. Syst. 2022, 35, 15066-15080. (34) Cao, Z.; Magar, R.; Wang, Y .; Barati Farimani, A. Moformer: self-supervised transformer model for metal – organic framework property prediction. Journal of the American Chemical Society...

work page 2022

[7] [7]

P.; Kiddle, S

(38) Buterez, D.; Janet, J. P.; Kiddle, S. J.; Oglic, D.; Lió, P. Transfer learning with graph neural networks for improved molecular property prediction in the multi-fidelity setting. Nature Communications 2024, 15 (1),

work page 2024

[8] [8]

-k.; Choudhary, A.; Campbell, C.; Agrawal, A

(39) Jha, D.; Choudhary, K.; Tavazza, F.; Liao, W. -k.; Choudhary, A.; Campbell, C.; Agrawal, A. Enhancing materials property prediction by leveraging computational and experime ntal data using deep transfer learning. Nature communications 2019, 10 (1),

work page 2019

[9] [9]

Enhancing Structure–Property Relationships in Porous Materials through Transfer Learning and Cross -Material Few -Shot Learning

(40) Park, H.; Kang, Y .; Kim, J. Enhancing Structure–Property Relationships in Porous Materials through Transfer Learning and Cross -Material Few -Shot Learning. ACS Applied Materials & Interfaces 2023, 15 (48), 56375- 56385. (41) Groom, C. R.; Bruno, I. J.; Lightfoot, M. P.; Ward, S. C. The Cambridge structural database. Acta Crystallographica Section B...

work page 2023

[10] [10]

(44) mofchecker

Journal of Chemical & Engineering Data 2019 , 64 (12), 5985-5998. (44) mofchecker. https://github.com/kjappelbaum/mofchecker (accessed. (45) Materials Studio

work page 2019

[11] [11]

(46) Janet, J

(accessed. (46) Janet, J. P.; Kulik, H. J. Resolving transition metal chemica l space: Feature selection for machine learning and structure–property relationships. The Journal of Physical Chemistry A 2017, 121 (46), 8939-8954. (47) Willems, T. F.; Rycroft, C. H.; Kazi, M.; Meza, J. C.; Haranczyk, M. Algorithms and tools for high-throughput geometry-based ...

work page 2017

[12] [12]

Introduction to methodology and encoding rules. J. Chem. Inf. Comput. Sci. 1988, 28 (1), 31-36. Supporting Information Machine Learning Based Prediction of Proton Conductivity in Metal-Organic Frameworks Seunghee Han1, Byoung Gwan Lee2, Dae Woon Lim2, and Jihan Kim1* 1 Department of Chemical and Biomolecular Engineering, Korea Advanced Institute of Scienc...

work page doi:10.1002/smll.202301122 1988