pith. sign in

arxiv: 2605.14973 · v1 · pith:MOT4FLV4new · submitted 2026-05-14 · ⚛️ physics.chem-ph

THEMol dataset: Torsion, Hessian, and Energy of Molecules

Pith reviewed 2026-06-30 19:31 UTC · model grok-4.3

classification ⚛️ physics.chem-ph
keywords THEMol datasetHessian matricestorsion scansDFT calculationsorganic moleculesmolecular potentialsconformational samplingatomic multipoles
0
0 comments X

The pith

THEMol supplies over 3 million Hessians, 100 million torsion scans, and 3 billion DFT calculations for organic molecules up to 50 heavy atoms.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents THEMol as an open-source collection of quantum mechanical properties for closed-shell organic molecules. It assembles more than 3 million relaxed geometries with full Hessian matrices, nearly 100 million constrained geometries from exhaustive torsion scans that include energies and forces, and relaxation trajectories whose total DFT calculations reach about 3 billion. The data cover twelve elements and molecular architectures drawn from drug discovery, electrolytes, and ionic liquids, plus electron-density-derived atomic multipoles. The authors organize the material into five subsets and state that the scale and diversity of second-derivative and conformational information will support construction of accurate, transferable molecular potentials.

Core claim

THEMol is a dataset of optimized geometries, relaxation trajectories, Hessian matrices at those geometries, torsion-scan energies and forces, and MBIS atomic multipoles, all obtained from DFT calculations on closed-shell organic molecules containing up to 50 heavy atoms drawn from twelve elements; the collection comprises a Hessian subset exceeding 3 million entries, a TorsionScan subset approaching 100 million entries, and two relaxation-trajectory subsets whose combined DFT work totals roughly 3 billion calculations.

What carries the argument

The THEMol dataset, which stores relaxed geometries, full Hessian matrices, constrained torsion-scan geometries with energies and forces, full relaxation trajectories, and MBIS multipoles, all generated by DFT for diverse organic molecules.

If this is right

  • Machine-learning force fields can be trained directly on the Hessian matrices to reproduce vibrational frequencies and curvatures at minima.
  • Torsion-scan data enable direct fitting of dihedral parameters that capture both ring and chain conformational preferences.
  • The billions of relaxation-trajectory points supply dense sampling of the potential-energy surface for improving molecular-dynamics accuracy.
  • Atomic multipoles derived from the electron density support development of models that include higher-order electrostatic interactions.
  • The overall scale permits construction of potentials whose transferability can be tested across the stated application domains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The dataset size suggests it could serve as a benchmark collection against which future quantum-chemistry approximations are validated.
  • Because the data stop at closed-shell organic species, extensions to open-shell or metal-containing systems would require new calculations.
  • Combining THEMol with existing smaller datasets could produce hybrid training sets that improve coverage of edge cases.
  • The presence of both Hessian and force data at the same geometries allows direct comparison of first- and second-derivative consistency in learned models.

Load-bearing premise

The sampled molecules and their conformations adequately represent the chemical space needed for drug discovery, electrolytes, and similar applications so that potentials trained on the data will transfer to new molecules.

What would settle it

Training a potential on THEMol and then measuring its error on energies or forces for a test set of molecules that use elements outside the twelve covered or that contain ring systems or functional groups absent from the torsion scans.

read the original abstract

We present THEMol (Torsion, Hessian, Energy of Molecules), a massive open-source collection of quantum mechanical properties tailored for closed-shell organic molecules, with up to 50 heavy atoms. THEMol includes a Hessian subset with more than 3 million relaxed geometries with Hessian matrices, a TorsionScan subset with nearly 100 million constrained relaxed geometries with energies and forces, and relaxation-trajectory subsets (HessianRelax and TorsionScanRelax) that together comprise about 3 billion DFT calculations. The chemical space sampling is comprehensive, spanning twelve essential elements and diverse molecular architectures relevant to drug discovery, electrolytes, ionic liquids, and beyond. The dataset also features exhaustive conformational sampling through the TorsionScan and TorsionScanRelax subsets, including comprehensive in-ring and non-ring torsional scans. Furthermore, it contains an extensive library of Hessian matrices, computed at relaxed geometries, to capture critical second-derivative information of the potential energy landscape. Additionally, we supply electron density-derived atomic multipoles computed via the Minimal Basis Iterative Stockholder partition scheme. Organized into five distinct subsets (Hessian, TorsionScan, HessianRelax, TorsionScanRelax, and MBIS), the data encompasses optimized geometries, relaxation trajectories, and derived molecular properties. We anticipate that this massive and diverse dataset will significantly empower the development of highly accurate and transferable molecular potentials.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript presents THEMol, a large open-source dataset of quantum mechanical properties for closed-shell organic molecules (up to 50 heavy atoms). It comprises five subsets: Hessian (>3 million relaxed geometries with Hessian matrices), TorsionScan (~100 million constrained relaxed geometries with energies and forces), HessianRelax and TorsionScanRelax (together ~3 billion DFT calculations), and MBIS (electron density-derived atomic multipoles via MBIS partitioning). The data emphasize comprehensive sampling across 12 elements and diverse architectures, with exhaustive torsional scans including in-ring and non-ring cases.

Significance. If the generation protocol, validation, and error controls are sound and fully documented, the dataset would be a substantial resource for machine-learned interatomic potentials, supplying rare large-scale Hessian data and extensive conformational sampling at a scale that could improve transferability for applications in drug discovery, electrolytes, and ionic liquids.

major comments (1)
  1. The abstract enumerates subset sizes and contents but supplies no computational details (DFT functional/basis, software, convergence thresholds, or error analysis). This information is load-bearing for assessing whether the stated volumes and properties can be reproduced or trusted; without it the central factual claim cannot be evaluated.
minor comments (1)
  1. The abstract is lengthy; consider moving some descriptive sentences to a dedicated methods or data-generation section for improved readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their review and constructive comment. We address the major comment below.

read point-by-point responses
  1. Referee: The abstract enumerates subset sizes and contents but supplies no computational details (DFT functional/basis, software, convergence thresholds, or error analysis). This information is load-bearing for assessing whether the stated volumes and properties can be reproduced or trusted; without it the central factual claim cannot be evaluated.

    Authors: We agree that the abstract would benefit from a concise statement of the core computational parameters. The full manuscript already details the DFT functional, basis set, software package, convergence thresholds, and validation/error controls in the Computational Methods and Validation sections. In the revised manuscript we will add one sentence to the abstract summarizing these parameters (functional, basis, software, and key thresholds) while respecting length constraints. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

This is a dataset release paper whose central claims consist of factual enumerations of generated subset sizes (e.g., >3 million Hessian geometries, ~100 million TorsionScan entries, ~3 billion DFT calculations) and chemical-space coverage. These are presented as direct outcomes of the computational generation process rather than as derived predictions or inferences from equations. No load-bearing derivations, self-citations, ansatzes, or uniqueness theorems appear in the provided text; the argument is self-contained as a description of produced data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are introduced; the work consists of computational data generation and organization.

pith-pipeline@v0.9.1-grok · 5824 in / 1087 out tokens · 40119 ms · 2026-06-30T19:31:58.949435+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

53 extracted references · 32 canonical work pages

  1. [1]

    Dral, Matthias Rupp, and O

    Raghunathan Ramakrishnan, Pavlo O. Dral, Matthias Rupp, and O. Anatole von Lilienfeld. Quantum chemistry structures and properties of 134 kilo molecules. Scientific Data, 1(1):140022, 2014. ISSN 2052-4463. doi: 10.1038/sdata.2014.22

  2. [2]

    PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties.Journal of Chemical Information and Modeling, 60(12):5891–5899, 2020

    Maho Nakata, Tomomi Shimazaki, Masatomo Hashimoto, and Toshiyuki Maeda. PubChemQC PM6: Data Sets of 221 Million Molecules with Optimized Molecular Geometries and Electronic Properties.Journal of Chemical Information and Modeling, 60(12):5891–5899, 2020. ISSN 1549-9596. doi: 10.1021/acs.jcim.0c00740

  3. [3]

    Maho Nakata and Toshiyuki Maeda. PubChemQC B3LYP/6-31G*//PM6 Data Set: The Electronic Structures of 86 Million Molecules Using B3LYP/6-31G* Calculations.Journal of Chemical Information and Modeling, 63(18): 5734–5754, 2023. ISSN 1549-9596. doi: 10.1021/acs.jcim.3c00899

  4. [4]

    The qcml dataset, quantum chemistry reference data from 33.5m dft and 14.7b semi-empirical calculations.Scientific Data, 12(1):406, 2025

    Stefan Ganscha, Oliver T Unke, Daniel Ahlin, Hartmut Maennel, Sergii Kashubin, and Klaus-Robert Müller. The qcml dataset, quantum chemistry reference data from 33.5m dft and 14.7b semi-empirical calculations.Scientific Data, 12(1):406, 2025. ISSN 2052-4463. doi: 10.1038/s41597-025-04720-7

  5. [5]

    Smith, Olexandr Isayev, and Adrian E

    Justin S. Smith, Olexandr Isayev, and Adrian E. Roitberg. ANI-1, A data set of 20 million calculated off- equilibrium conformations for organic molecules. Scientific Data, 4(1):170193, 2017. ISSN 2052-4463. doi: 10.1038/sdata.2017.193

  6. [6]

    Ani-1: an extensible neural network potential with dft accuracy at force field computational cost.Chemical Science, 8(4):3192–3203, 2017

    Justin S Smith, Olexandr Isayev, and Adrian E Roitberg. Ani-1: an extensible neural network potential with dft accuracy at force field computational cost.Chemical Science, 8(4):3192–3203, 2017

  7. [7]

    The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules.Scientific Data, 7(1):134, 2020

    Justin S Smith, Roman Zubatyuk, Benjamin Nebgen, Nicholas Lubbers, Kipton Barros, Adrian E Roitberg, Olexandr Isayev, and Sergei Tretiak. The ani-1ccx and ani-1x data sets, coupled-cluster and density functional theory properties for molecules.Scientific Data, 7(1):134, 2020. ISSN 2052-4463. doi: 10.1038/s41597-020-0473-z

  8. [8]

    Minimal basis iterative stockholder: atoms in molecules for force-field development

    Toon Verstraelen, Steven Vandenbrande, Farnaz Heidar-Zadeh, Louis Vanduyfhuys, Veronique Van Speybroeck, Michel Waroquier, and Paul W Ayers. Minimal basis iterative stockholder: atoms in molecules for force-field development. Journal of Chemical Theory and Computation, 12(8):3894–3912, 2016

  9. [9]

    Nonbonded force field parameters from mbis partitioning of the molecular electron density improve thermophysical properties prediction of organic liquids

    Jorge Pulido, Luis Macaya, and Esteban Vohringer-Martinez. Nonbonded force field parameters from mbis partitioning of the molecular electron density improve thermophysical properties prediction of organic liquids. Journal of Chemical & Engineering Data, 69(9):2917–2926, 2024

  10. [10]

    Smith, Benjamin T

    Justin S. Smith, Benjamin T. Nebgen, Roman Zubatyuk, Nicholas Lubbers, Christian Devereux, Kipton Barros, Sergei Tretiak, Olexandr Isayev, and Adrian E. Roitberg. Approaching coupled cluster accuracy with a general- purpose neural network potential through transfer learning.Nature Communications, 10(1):2903, 2019. ISSN 2041-1723. doi: 10.1038/s41467-019-10827-4

  11. [11]

    Extending the applicability of the ani deep learning molecular potential to sulfur and halogens

    Christian Devereux, Justin S Smith, Kipton K Huddleston, Kipton Barros, Roman Zubatyuk, Olexandr Isayev, and Adrian E Roitberg. Extending the applicability of the ani deep learning molecular potential to sulfur and halogens. Journal of Chemical Theory and Computation, 16(7):4192–4202, 2020

  12. [12]

    ANI-1xBB: An ANI-Based Reactive Potential for Small Organic Molecules.Journal of Chemical Theory and Computation, 21(9):4365–4374,

    Shuhao Zhang, Roman Zubatyuk, Yinuo Yang, Adrian Roitberg, and Olexandr Isayev. ANI-1xBB: An ANI-Based Reactive Potential for Small Organic Molecules.Journal of Chemical Theory and Computation, 21(9):4365–4374,

  13. [13]

    doi: 10.1021/acs.jctc.5c00347

    ISSN 1549-9618. doi: 10.1021/acs.jctc.5c00347

  14. [14]

    Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network.Science Advances, 5(8):eaav6490,

    Roman Zubatyuk, Justin S Smith, Jerzy Leszczynski, and Olexandr Isayev. Accurate and transferable multitask prediction of chemical properties with an atoms-in-molecules neural network.Science Advances, 5(8):eaav6490,

  15. [15]

    doi: 10.1126/sciadv.aav6490

  16. [16]

    Anstine, Roman Zubatyuk, and Olexandr Isayev

    Dylan M. Anstine, Roman Zubatyuk, and Olexandr Isayev. AIMNet2: A neural network potential to meet your neutral, charged, organic, and elemental-organic needs.Chemical Science, 16(23):10228–10244, 2025. ISSN 2041-6539. doi: 10.1039/D4SC08572H

  17. [17]

    Anstine, Maike Bergeler, Volker Settels, Conrad Stork, Sebastian Spicher, and Olexandr Isayev

    Bhupalee Kalita, Roman Zubatyuk, Dylan M. Anstine, Maike Bergeler, Volker Settels, Conrad Stork, Sebastian Spicher, and Olexandr Isayev. AIMNet2-NSE: A Transferable Reactive Neural Network Potential for Open-Shell Chemistry. Angewandte Chemie International Edition, 65(5):e16763, 2026. ISSN 1521-3773. doi: 10.1002/anie. 202516763. 8

  18. [18]

    Aimnet2-rxn: A machine learned potential for generalized reaction modeling on a millions-of-pathways scale.ChemRxiv, 2025

    Dylan M Anstine, Qiyuan Zhao, Roman Zubatiuk, Olexandr Isayev, et al. Aimnet2-rxn: A machine learned potential for generalized reaction modeling on a millions-of-pathways scale.ChemRxiv, 2025. doi: 10.26434/ chemrxiv-2025-hpdmg

  19. [19]

    The open molecules 2025 (omol25) dataset, evaluations, and models.arXiv preprint arXiv:2505.08762, 2025

    Daniel S Levine, Muhammed Shuaibi, Evan Walter Clark Spotte-Smith, Michael G Taylor, Muhammad R Hasyim, Kyle Michel, Ilyes Batatia, Gábor Csányi, Misko Dzamba, Peter Eastman, et al. The open molecules 2025 (omol25) dataset, evaluations, and models.arXiv preprint arXiv:2505.08762, 2025

  20. [20]

    The open polymers 2026 (opoly26) dataset and evaluations

    Daniel S Levine, Nicholas T Liesen, Lauren Chua, et al. The open polymers 2026 (opoly26) dataset and evaluations. arXiv preprint arXiv:2512.23117, 2025

  21. [21]

    Brünig, and Alexandre Tkatchenko

    Adil Kabylda, Sergio Suárez-Dou, Nils Davoine, Florian N. Brünig, and Alexandre Tkatchenko. QCell: Compre- hensive Quantum-Mechanical Dataset Spanning Diverse Biomolecular Fragments.AI for Science, 2026. ISSN 3050-287X. doi: 10.1088/3050-287X/ae5267

  22. [22]

    Spice, a dataset of drug-like molecules and peptides for training machine learning potentials.Scientific Data, 10 (1):11, 2023

    Peter Eastman, Pavan Kumar Behara, David L Dotson, Raimondas Galvelis, John E Herr, Josh T Horton, Yuezhi Mao, John D Chodera, Benjamin P Pritchard, Yuanqing Wang, Gianni De Fabritiis, and Thomas E Markland. Spice, a dataset of drug-like molecules and peptides for training machine learning potentials.Scientific Data, 10 (1):11, 2023. ISSN 2052-4463. doi: ...

  23. [23]

    Geom, energy-annotated molecular conformations for prop- erty prediction and molecular generation

    Simon Axelrod and Rafael Gómez-Bombarelli. Geom, energy-annotated molecular conformations for prop- erty prediction and molecular generation. Scientific Data, 9(1):185, 2022. ISSN 2052-4463. doi: 10.1038/ s41597-022-01288-4

  24. [24]

    Qmugs, quantum mechanical properties of drug-like molecules.Scientific Data, 9(1):273, 2022

    Clemens Isert, Kenneth Atz, José Jiménez-Luna, and Gisbert Schneider. Qmugs, quantum mechanical properties of drug-like molecules.Scientific Data, 9(1):273, 2022

  25. [25]

    Transition1x - a dataset for building generalizable reactive machine learning potentials.Scientific Data, 9(1):779, 2022

    Mathias Schreiner, Arghya Bhowmik, Tejs Vegge, Jonas Busk, and Ole Winther. Transition1x - a dataset for building generalizable reactive machine learning potentials.Scientific Data, 9(1):779, 2022. ISSN 2052-4463. doi: 10.1038/s41597-022-01870-w

  26. [26]

    Horm: A large scale molecular hessian database for optimizing reactive machine learning interatomic potentials.arXiv preprint arXiv:2505.12447, 2025

    Qiyuan Zhao, Yunhong Han, Taoyong Cui, et al. Horm: A large scale molecular hessian database for optimizing reactive machine learning interatomic potentials.arXiv preprint arXiv:2505.12447, 2025

  27. [27]

    Quantum chemical benchmark databases of gold-standard dimer interaction energies.Scientific Data, 8(1): 55, 2021

    Alexander G Donchev, Andrew G Taube, Elizabeth Decolvenaere, Cory Hargus, Robert T McGibbon, Ka-Hei Law, Brent A Gregersen, Je-Luen Li, Kim Palmo, Karthik Siva, Michael Bergdorf, John L Klepeis, and David E Shaw. Quantum chemical benchmark databases of gold-standard dimer interaction energies.Scientific Data, 8(1): 55, 2021

  28. [28]

    Burns, John C

    Lori A. Burns, John C. Faver, Zheng Zheng, Michael S. Marshall, Daniel G. A. Smith, Kenno Vanommeslaeghe, Alexander D. MacKerell, Kenneth M. Merz, and C. David Sherrill. The BioFragment Database (BFDb): An open-data platform for computational chemistry analysis of noncovalent interactions.The Journal of Chemical Physics, 147(16):161727, 2017. ISSN 0021-96...

  29. [29]

    Non-Covalent Interactions Atlas Benchmark Data Sets: Hydrogen Bonding.Journal of Chemical Theory and Computation, 16(4):2355–2368, 2020

    Jan Řezáč. Non-Covalent Interactions Atlas Benchmark Data Sets: Hydrogen Bonding.Journal of Chemical Theory and Computation, 16(4):2355–2368, 2020. ISSN 1549-9618. doi: 10.1021/acs.jctc.9b01265

  30. [30]

    AIMD-Chig: Exploring the conformational space of a 166-atom protein Chignolin with ab initio molecular dynamics.Scientific Data, 10(1):549, 2023

    Tong Wang, Xinheng He, Mingyu Li, Bin Shao, and Tie-Yan Liu. AIMD-Chig: Exploring the conformational space of a 166-atom protein Chignolin with ab initio molecular dynamics.Scientific Data, 10(1):549, 2023. ISSN 2052-4463. doi: 10.1038/s41597-023-02465-9

  31. [31]

    Williams, Lara Kabalan, Ljiljana Stojanovic, Viktor Zólyomi, and Edward O

    Nicholas J. Williams, Lara Kabalan, Ljiljana Stojanovic, Viktor Zólyomi, and Edward O. Pyzer-Knapp. Hessian QM9: A quantum chemistry database of molecular Hessians in implicit solvents.Scientific Data, 12(1):9, 2025. ISSN 2052-4463. doi: 10.1038/s41597-024-04361-2

  32. [32]

    Zapata Trujillo and Laura K

    Juan C. Zapata Trujillo and Laura K. McKemmish. VIBFREQ1295: A New Database for Vibrational Frequency Calculations. The Journal of Physical Chemistry A, 126(25):4100–4122, 2022. ISSN 1089-5639. doi: 10.1021/acs. jpca.2c01438

  33. [33]

    Horton, Trevor Gokey, David L

    Pavan Kumar Behara, Hyesu Jang, Joshua T. Horton, Trevor Gokey, David L. Dotson, Simon Boothroyd, Christopher I. Bayly, Daniel J. Cole, Lee-Ping Wang, and David L. Mobley. Benchmarking quantum mechanical levels of theory for valence parametrization in force fields.TheJournalofPhysicalChemistryB, 128(32):7888–7902,

  34. [34]

    PMID: 39087913

    doi: 10.1021/acs.jpcb.4c03167. PMID: 39087913. 9

  35. [35]

    Data-driven parametrization of molecular mechanics force fields for expansive chemical space coverage.Chem

    Tianze Zheng, Ailun Wang, Xu Han, Yu Xia, Xingyuan Xu, Jiawei Zhan, Yu Liu, Yang Chen, Zhi Wang, Xiaojie Wu, Sheng Gong, and Wen Yan. Data-driven parametrization of molecular mechanics force fields for expansive chemical space coverage.Chem. Sci., 16:2730–2740, 2025. doi: 10.1039/D4SC06640E

  36. [36]

    Unichem: a unified chemical structure cross-referencing and identifier tracking system.Journal of cheminformatics, 5(1):3, 2013

    Jon Chambers, Mark Davies, Anna Gaulton, Anne Hersey, Sameer Velankar, Robert Petryszak, Janna Hastings, Louisa Bellis, Shaun McGlinchey, and John P Overington. Unichem: a unified chemical structure cross-referencing and identifier tracking system.Journal of cheminformatics, 5(1):3, 2013

  37. [37]

    Ross, Markus K

    Chao Lu, Chuanjie Wu, Delaram Ghoreishi, Wei Chen, Lingle Wang, Wolfgang Damm, Gregory A. Ross, Markus K. Dahlgren, Ellery Russell, Christopher D. Von Bargen, Robert Abel, Richard A. Friesner, and Edward D. Harder. OPLS4: Improving force field accuracy on challenging regimes of chemical space.Journal of Chemical Theory and Computation, 17(7):4291–4300, 20...

  38. [38]

    Magee, Andrei F

    Demian Riccardi, Ala Bazyleva, Eugene Paulechka, Vladimir Diky, Josepha W. Magee, Andrei F. Kazakov, Scott A. Townsend, and Chris D. Muzny. ThermoML data archive, 2021. URLhttps://trc.nist.gov/ThermoML/. Accessed: 2025-03-30

  39. [39]

    Dahlgren, Jeremy Greenwood, Donna L

    Lingle Wang, Yujie Wu, Yuqing Deng, Byungchan Kim, Levi Pierce, Goran Krilov, Dmitry Lupyan, Shaughnessy Robinson, Markus K. Dahlgren, Jeremy Greenwood, Donna L. Romero, Craig Masse, Jennifer L. Knight, Thomas Steinbrecher, Thijs Beuming, Wolfgang Damm, Ed Harder, Woody Sherman, Mark Brewer, Ron Wester, Mark Murcko, Leah Frye, Ramy Farid, Teng Lin, David ...

  40. [40]

    LeBard, Dan Wandschneider, Mike Beachy, Richard A

    Lingle Wang, Yuqing Deng, Yujie Wu, Byungchan Kim, David N. LeBard, Dan Wandschneider, Mike Beachy, Richard A. Friesner, and Robert Abel. Accurate modeling of scaffold hopping transformations in drug discovery. Journal of Chemical Theory and Computation, 13(1):42–54, 2017. ISSN 1549-9618. doi: 10.1021/acs.jctc.6b00991

  41. [41]

    Christina E. M. Schindler, Hannah Baumann, Andreas Blum, Dietrich Böse, Hans-Peter Buchstaller, Lars Burgdorf, Daniel Cappel, Eugene Chekler, Paul Czodrowski, Dieter Dorsch, Merveille K. I. Eguida, Bruce Follows, Thomas Fuchß, Ulrich Grädler, Jakub Gunera, Theresa Johnson, Catherine Jorand Lebrun, Srinivasa Karra, Markus Klein, Tim Knehans, Lisa Koetzner,...

  42. [42]

    Molecular fragmentation as a crucial step in the ai-based drug development pathway.Communications Chemistry, 7(1):20, 2024

    Shao Jinsong, Jia Qifeng, Chen Xing, Yajie Hao, and Li Wang. Molecular fragmentation as a crucial step in the ai-based drug development pathway.Communications Chemistry, 7(1):20, 2024

  43. [43]

    Stevenson, Chao Lu, Markus K

    Katarina Roos, Chuanjie Wu, Wolfgang Damm, Mark Reboul, James M. Stevenson, Chao Lu, Markus K. Dahlgren, Sayan Mondal, Wei Chen, Lingle Wang, Robert Abel, Richard A. Friesner, and Edward D. Harder. OPLS3e: Extending Force Field Coverage for Drug-Like Small Molecules.Journal of Chemical Theory and Computation, 15(3):1863–1874, 2019. ISSN 1549-9618. doi: 10...

  44. [44]

    Epik: a software program for pk a prediction and protonation state generation for drug-like molecules.Journal of computer-aided molecular design, 21:681–691, 2007

    John C Shelley, Anuradha Cholleti, Leah L Frye, Jeremy R Greenwood, Mathew R Timlin, and Makoto Uchimaya. Epik: a software program for pk a prediction and protonation state generation for drug-like molecules.Journal of computer-aided molecular design, 21:681–691, 2007

  45. [45]

    Geometry optimization made simple with translation and rotation coordinates

    Lee-Ping Wang and Chenchen Song. Geometry optimization made simple with translation and rotation coordinates. The Journal of Chemical Physics, 144(21):214108, 2016. ISSN 1089-7690. doi: 10.1063/1.4952956

  46. [46]

    Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields.The Journal of physical chemistry, 98(45):11623–11627, 1994

    Philip J Stephens, Frank J Devlin, Cary F Chabalowski, and Michael J Frisch. Ab initio calculation of vibrational absorption and circular dichroism spectra using density functional force fields.The Journal of physical chemistry, 98(45):11623–11627, 1994

  47. [47]

    Gaussian basis sets for use in correlated molecular calculations

    Thom H Dunning Jr. Gaussian basis sets for use in correlated molecular calculations. i. the atoms boron through neon and hydrogen.The Journal of chemical physics, 90(2):1007–1023, 1989

  48. [48]

    Development and benchmarking of open force field 2.0.0—the sage small molecule force field.ChemRxiv, 2023

    Simon Boothroyd, Pavan Kumar Behara, Owen Madin, et al. Development and benchmarking of open force field 2.0.0—the sage small molecule force field.ChemRxiv, 2023. doi: 10.26434/chemrxiv-2022-n2z1c-v2

  49. [49]

    Toward reliable density functional methods without adjustable parameters: The pbe0 model.The Journal of chemical physics, 110(13):6158–6170, 1999

    Carlo Adamo and Vincenzo Barone. Toward reliable density functional methods without adjustable parameters: The pbe0 model.The Journal of chemical physics, 110(13):6158–6170, 1999. 10

  50. [50]

    Property-optimized gaussian basis sets for molecular response calculations

    Dmitrij Rappoport and Filipp Furche. Property-optimized gaussian basis sets for molecular response calculations. The Journal of chemical physics, 133(13), 2010

  51. [51]

    Density functional theory is straying from the path toward the exact functional.Science, 355(6320):49–52, 2017

    Michael G Medvedev, Ivan S Bushmarinov, Jianwei Sun, John P Perdew, and Konstantin A Lyssenko. Density functional theory is straying from the path toward the exact functional.Science, 355(6320):49–52, 2017

  52. [52]

    Jiashu Liang and Martin Head-Gordon. Gold-Standard Chemical Database 137 (GSCDB137): A Diverse Set of Accurate Energy Differences for Assessing and Developing Density Functionals.Journal of Chemical Theory and Computation, 2025. ISSN 1549-9618. doi: 10.1021/acs.jctc.5c01380

  53. [53]

    Diptarka Hait and Martin Head-Gordon. How accurate is density functional theory at predicting dipole moments? an assessment using a new database of 200 benchmark values.Journal of chemical theory and computation, 14 (4):1969–1981, 2018. 11 Appendix A Appendix A.1 Detailed Data Format This appendix provides a detailed specification of the CSV columns and H...