pith. sign in

arxiv: 2604.13897 · v1 · submitted 2026-04-15 · 💻 cs.LG · physics.comp-ph

MolCryst-MLIPs: A Machine-Learned Interatomic Potentials Database for Molecular Crystals

Pith reviewed 2026-05-10 13:19 UTC · model grok-4.3

classification 💻 cs.LG physics.comp-ph
keywords molecular crystalsmachine-learned interatomic potentialsMACE modelsmolecular dynamics simulationspolymorphismdatabasefine-tuningenergy accuracy
0
0 comments X

The pith

An open database of fine-tuned machine-learned potentials is released for nine molecular crystal systems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents MolCryst-MLIPs as an open database containing machine-learned interatomic potentials for molecular crystals. It focuses on fine-tuned MACE models for nine specific systems, reporting average errors of 0.141 kJ/mol/atom in energy and 0.648 kJ/mol/Angstrom in forces. The models are tested for stability using molecular dynamics simulations that check energy conservation, orientational order, and atomic distributions. This matters because it supplies validated tools for simulating how these crystals behave under different conditions, particularly for exploring different crystal forms.

Core claim

The authors introduce the MolCryst-MLIPs database with its first release of fine-tuned MACE models for the nine molecular crystal systems of Benzamide, Benzoic acid, Coumarin, Durene, Isonicotinamide, Niacinamide, Nicotinamide, Pyrazinamide, and Resorcinol. These models, created via the Automated Machine Learning Pipeline from the MACE-MH-1 foundation model, achieve a mean energy mean absolute error of 0.141 kJ/mol/atom and a mean force mean absolute error of 0.648 kJ/mol/Angstrom across all systems. Molecular dynamics simulations confirm dynamical stability and structural integrity through assessments of energy conservation, P2 orientational order parameters, and radial distribution functio

What carries the argument

The fine-tuned MACE models developed through the Automated Machine Learning Pipeline that automates data generation, training, and validation for the molecular crystals.

If this is right

  • Researchers can use these models directly for molecular dynamics studies of the listed crystals.
  • The database supports investigations of polymorphism under varying thermodynamic conditions.
  • The automated pipeline allows for easy extension to additional molecular crystal systems.
  • Open release enables community validation and further development of the models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such a database could accelerate the discovery of new crystal forms by making simulations more accessible.
  • Combining these potentials with experimental data might improve predictions for real-world applications.
  • Similar approaches could be applied to other classes of materials beyond molecular crystals.

Load-bearing premise

The reported average errors and simulation stability checks are enough to make the models ready for production use in studying crystal polymorphism without further validation against experiments.

What would settle it

Observing significant deviations in simulated crystal lattice parameters or failure to maintain stability in long molecular dynamics runs at room temperature for any of the nine systems would indicate the models are not sufficiently accurate.

Figures

Figures reproduced from arXiv: 2604.13897 by Adam Lahouari, Amara McCune, Andrea Vergara, Charlotte Infante, Hypatia Newton, Jihye Han, Jillian Hoffstadt, Jonathan Raghoonanan, Jutta Rogal, Mark E. Tuckerman, Maya M. Martirossyan, Oliver Tan, Philipp Hoellmer, Pulkita Jain, Sangram Kadam, Shen Ai, Shlok J. Paul, Sumon Sahu, Willmor Pena.

Figure 1
Figure 1. Figure 1: DFT vs. MACE correlation plots for (a) energies and (b) forces across all nine [PITH_FULL_IMAGE:figures/full_fig_p008_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Relative lattice energies (∆Elatt, kJ mol−1 ) for four molecular crystal systems: Coumarin, Isonicotinamide, Niacinamide, and Pyrazinamide. Polymorphs are ordered by in￾creasing DFT stability. For each polymorph, the DFT reference (colored bar), the MolCryst￾MLIPs fine-tuned MACE model (faded colored bar), and the MACE-MH-1 foundation model (omol head, dashed black line) are shown. The reference zero is se… view at source ↗
Figure 3
Figure 3. Figure 3: Relative lattice energy ∆Elatt as a function of density for optimized crystal struc￾tures across nine polymorphic systems. Each point represents a distinct polymorph, including structures excluded from the training set due to large unit cells. 3.3 Dynamical Stability for Molecular Dynamics To assess whether the trained models support stable MD simulations, a series of dynamical benchmarks were performed. T… view at source ↗
Figure 4
Figure 4. Figure 4: Cumulative energy drift ∆E from NVE simulations following 1 ps of equilibration, shown for the most stable polymorph of nine systems. Solid lines represent the cumulative drift, while shaded regions reflect instantaneous deviations. Following the NVE energy conservation tests, the thermal stability of each model was assessed through canonical (NVT) MD simulations using the Berendsen thermostat, with a time… view at source ↗
Figure 5
Figure 5. Figure 5: P2 orientational order parameter as a function of simulation time for all Pyrazi￾namide polymorphs at 300 K, 500 K, and 600 K. Each line corresponds to a distinct poly￾morph. near zero indicates random orientational disorder, as observed for PYRZIN17, whose general packing cannot be assigned to a well-defined motif. Values between 0.1 and 0.2 are indica￾tive of a herringbone packing arrangement, as seen fo… view at source ↗
Figure 6
Figure 6. Figure 6: P2 orientational order parameter as a function of simulation time for all Benzoic acid (BENZAC) polymorphs at 300 K, 500 K, and 600 K. Each line corresponds to a distinct polymorph. ing. EHOWIH shows larger P2 fluctuations already at 300 K, suggesting an incipient thermal effect on orientational order at this temperature. Niacinamide (NICOAM) remains stable across most of its polymorphs, which predominantl… view at source ↗
Figure 7
Figure 7. Figure 7: Radial distribution functions for selected atomic pairs of Pyrazinamide [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

We present an open Molecular Crystal (MC) database of Machine-Learned Interatomic Potentials (MLIP) called MolCryst-MLIPs. The first release comprises fine-tuned MACE models for nine molecular crystal systems -- Benzamide, Benzoic acid, Coumarin, Durene, Isonicotinamide, Niacinamide, Nicotinamide, Pyrazinamide, and Resorcinol -- developed using the Automated Machine Learning Pipeline (AMLP), which streamlines the entire MLIP development workflow, from reference data generation to model training and validation, into a reproducible and user-friendly pipeline. Models are fine-tuned from the MACE-MH-1 foundation model (omol head), yielding a mean energy MAE of 0.141 kJ/mol/atom and a mean force MAE of 0.648 kJ/mol/Angstrom across all systems. Dynamical stability and structural integrity, as assessed through energy conservation, P2 orientational order parameters, and radial distribution functions, are evaluated using molecular dynamics simulations. The released models and datasets constitute a growing open database of validated MLIPs, ready for production MD simulations of molecular crystal polymorphism under different thermodynamic conditions.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces MolCryst-MLIPs, an open database of machine-learned interatomic potentials (MLIPs) for molecular crystals. It describes fine-tuned MACE models for nine systems (Benzamide, Benzoic acid, Coumarin, Durene, Isonicotinamide, Niacinamide, Nicotinamide, Pyrazinamide, Resorcinol) developed via the Automated Machine Learning Pipeline (AMLP) from the MACE-MH-1 foundation model. Reported performance includes a mean energy MAE of 0.141 kJ/mol/atom and mean force MAE of 0.648 kJ/mol/Å. Validation consists of molecular dynamics simulations assessing energy conservation, P2 orientational order parameters, and radial distribution functions. The authors conclude that the models and datasets are ready for production MD simulations of molecular crystal polymorphism under different thermodynamic conditions.

Significance. If the validation were extended to demonstrate accurate relative polymorph energies and dynamics at non-ambient conditions, this open database would represent a useful contribution to computational chemistry and materials science. The automated pipeline and foundation-model fine-tuning approach could help standardize MLIP workflows for molecular crystals, supporting applications in pharmaceutical polymorphism and materials design. The open release of models and data is a clear strength.

major comments (3)
  1. [Abstract] Abstract: The central claim that the models are 'ready for production MD simulations of molecular crystal polymorphism under different thermodynamic conditions' is not supported by the reported validation. The MD checks (energy conservation, P2 order parameters, RDFs) address only short-term structural integrity of reference structures and provide no results on polymorph energy differences, phase-transition barriers, or simulations at elevated temperatures/pressures.
  2. [Abstract] Abstract: The mean MAEs are stated without error bars, per-system breakdowns, details on training/validation splits, reference data quality metrics, or data exclusion criteria. These omissions prevent assessment of whether the models generalize to the polymorphism use case.
  3. [Abstract] Abstract: It is unclear whether the MD stability tests were performed on held-out structures or conditions distinct from the training data; without this, the dynamical stability metrics cannot confirm readiness for varied thermodynamic conditions.
minor comments (2)
  1. The abstract would be strengthened by a table or explicit list of per-system MAEs rather than reporting only the mean across all nine systems.
  2. Consider adding explicit citations to prior MLIP work on molecular crystals to better contextualize the novelty of the AMLP and the nine-system database.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive feedback, which has prompted us to moderate overstated claims and improve the clarity of our reporting. We address each major comment below and have revised the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the models are 'ready for production MD simulations of molecular crystal polymorphism under different thermodynamic conditions' is not supported by the reported validation. The MD checks (energy conservation, P2 order parameters, RDFs) address only short-term structural integrity of reference structures and provide no results on polymorph energy differences, phase-transition barriers, or simulations at elevated temperatures/pressures.

    Authors: We agree that the original abstract phrasing overstated the immediate applicability to polymorphism and non-ambient conditions. The reported MD validation confirms short-term dynamical stability and structural integrity for the reference structures at standard conditions, but does not include polymorph energy differences or elevated temperature/pressure simulations. We have revised the abstract to state that the models provide a validated foundation for MD studies of these molecular crystals and can serve as a starting point for polymorphism investigations, rather than claiming full readiness for production use under varied thermodynamic conditions. revision: yes

  2. Referee: [Abstract] Abstract: The mean MAEs are stated without error bars, per-system breakdowns, details on training/validation splits, reference data quality metrics, or data exclusion criteria. These omissions prevent assessment of whether the models generalize to the polymorphism use case.

    Authors: The abstract reports summary mean MAEs for brevity. Per-system MAEs (with standard deviations), training/validation/test splits (80/10/10), reference DFT settings, data quality metrics, and exclusion criteria (e.g., removal of configurations with forces exceeding 10 eV/Å) are provided in the Methods section, Results tables, and Supplementary Information. We have added a brief clause to the abstract directing readers to these sections for full assessment of model generalization. revision: partial

  3. Referee: [Abstract] Abstract: It is unclear whether the MD stability tests were performed on held-out structures or conditions distinct from the training data; without this, the dynamical stability metrics cannot confirm readiness for varied thermodynamic conditions.

    Authors: The MD simulations were performed on the optimized reference structures from the training configurations to verify energy conservation and maintenance of order parameters/RDFs without drift. We have clarified this distinction in the revised Methods and Results sections, noting that these tests validate stability for the trained systems but do not constitute fully independent held-out thermodynamic conditions. The open release of models and data enables users to conduct additional validation for specific applications. revision: yes

Circularity Check

0 steps flagged

No significant circularity in model training, evaluation, or MD validation pipeline

full rationale

The paper describes fine-tuning MACE foundation models on reference data for nine molecular crystals, reporting separate MAE metrics on energies and forces, followed by independent MD runs that compute energy conservation, P2 order parameters, and RDFs as stability checks. These evaluation observables are not algebraically or statistically forced by the training loss or fitted parameters; they constitute external tests of the resulting potential. No self-definitional equations, fitted-input predictions, or load-bearing self-citations that reduce the central claims to tautology are present in the provided text. The derivation chain from reference data to reported MAEs and MD metrics remains self-contained and non-circular.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central performance claims rest on the transferability of the MACE-MH-1 foundation model to molecular crystals and on the adequacy of the chosen MD observables to certify production readiness; the neural-network weights themselves constitute a large set of fitted parameters.

free parameters (1)
  • fine-tuned MACE model weights
    Neural network parameters adjusted during fine-tuning on reference data for each of the nine crystal systems.
axioms (1)
  • domain assumption The MACE-MH-1 (omol head) foundation model is a suitable base for fine-tuning on molecular crystal reference data.
    Invoked by the choice to start from this pre-trained model rather than training from scratch.

pith-pipeline@v0.9.0 · 5598 in / 1480 out tokens · 42787 ms · 2026-05-10T13:19:47.601922+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

8 extracted references · 8 canonical work pages

  1. [1]

    K.; Bordawekar, S.; Nagy, Z

    (1) Parks, C.; Koswara, A.; DeVilbiss, F.; Tung, H.-H.; Nere, N. K.; Bordawekar, S.; Nagy, Z. K.; Ramkrishna, D.Phys. Chem. Chem. Phys.2017,19, 5285–5295. (2) Price, S. L.Int. Rev. Phys. Chem.2008,27, 541–568. 18 (3) Schur, E.; Bernstein, J.; Price, L. S.; Guo, R.; Price, S. L.; Lapidus, S. H.; Stephens, P. W.Cryst. Growth Des.2019,19, 4884–4893. (4) Fran...

  2. [2]

    S.; Shuaibi, M.; Spotte-Smith, E

    (11) Levine, D. S.; Shuaibi, M.; Spotte-Smith, E. W. C.; Taylor, M. G.; Hasyim, M. R.; Michel, K.; Batatia, I.; Csányi, G.; Dzamba, M.; Eastman, P.; Frey, N. C.; Fu, X.; Gharakhanyan, V.; Krishnapriyan, A. S.; Rackers, J. A.; Raja, S.; Rizvi, A.; Rosen, A. S.; Ulissi, Z.; Vargas, S.; Zitnick, C. L.; Blau, S. M.; Wood, B. M. The Open Molecules 2025 (OMol25...

  3. [3]

    (13) Lahouari, A.; Rogal, J.; Tuckerman, M. E.J. Chem. Theory Comput.2025,22, 305–

  4. [4]

    S.; Desiraju, G

    (17) Dubey, R.; Pavan, M. S.; Desiraju, G. R.Chem. Commun.2012,48, 9020–9022. (18) Ozaki, K.; Okuno, T.J. Mol. Struct.2018,1173, 959–963. (19) Blagden,N.;Davey,R.;Dent,G.;Song,M.;David,W.I.F.;Pulham,C.R.;Shankland, K.Cryst. Growth Des.2005,5, 2218–2224. (20) Shtukenberg, A. G.; Drori, R.; Sturm, E. V.; Vidavsky, N.; Haddad, A.; Zheng, J.; Estroff, L. A.; ...

  5. [5]

    J.; Chen, C.; Hu, C

    (29) Fellah, N.; Zhang, C. J.; Chen, C.; Hu, C. T.; Kahr, B.; Ward, M. D.; Shtukenberg, A. G.Cryst. Growth Des.2021,21, 4713–4724. (30) Hoser, A. A.; Rekis, T.; Madsen, A. Ø.Acta Crystallogr. B2022,78, 416–424. (31) Li, K.; Gbabode, G.; Barrio, M.; Tamarit, J.-L.; Vergé-Depré, M.; Robert, B.; Rietveld, I. B.Int. J. Pharm.2020,580, 119230. (32) Abourahma, ...

  6. [6]

    (38) Kresse, G.; Furthmüller, J.Comput. Mater. Sci.1996,6, 15–50. (39) Kresse, G.; Furthmüller, J.Phys. Rev. B1996,54, 11169–11186. (40) Perdew, J. P.; Burke, K.; Ernzerhof, M.Phys. Rev. Lett.1996,77, 3865–3868. (41) Caldeweyher, E.; Mewes, J.-M.; Ehlert, S.; Grimme, S.Phys. Chem. Chem. Phys. 2020,22, 8499–8512. (42) Monkhorst, H. J.; Pack, J. D.Phys. Rev...

  7. [7]

    Growth Des.2019,19, 5629–5635

    (44) Safari, F.; Olejniczak, A.; Katrusiak, A.Cryst. Growth Des.2019,19, 5629–5635. (45) Bacon, G. E.; Lisher, E. J.Acta Crystallogr. B1980,36, 1908–1916. (46) Rø, G.; Sørum, H.Acta Crystallogr. B1972,28, 1677–1684. (47) Castro, R. A. E.; Maria, T. M. R.; Évora, A. O. L.; Feiteira, J. C.; Silva, M. R.; Beja, A. M.; Canotilho, J.; Eusébio, M. E. S.Cryst. G...

  8. [8]

    Spherical harmonics up toℓ= 3 Radial cutoff (rmax) 6.0 Å Total receptive field 12.0 Å Number of interaction layers 2 Radial basis transform Agnesi Training Protocol Batch size 10 Optimizer Adam Initial learning rate1×10 −3 Weight decay5×10 −7 Exponential moving average (EMA) decay 0.99 Loss Function Weights (Epochs 0–100) Energy weight 100.0 Forces weight...