pith. sign in

arxiv: 1708.04987 · v4 · pith:GPM27EDEnew · submitted 2017-08-16 · ⚛️ physics.chem-ph · cs.LG· physics.data-an

ANI-1: A data set of 20M off-equilibrium DFT calculations for organic molecules

classification ⚛️ physics.chem-ph cs.LGphysics.data-an
keywords datachemistryfittingmethodsmodelsmoleculesorganicpotentials
0
0 comments X
read the original abstract

One of the grand challenges in modern theoretical chemistry is designing and implementing approximations that expedite ab initio methods without loss of accuracy. Machine learning (ML), in particular neural networks, are emerging as a powerful approach to constructing various forms of transferable atomistic potentials. They have been successfully applied in a variety of applications in chemistry, biology, catalysis, and solid-state physics. However, these models are heavily dependent on the quality and quantity of data used in their fitting. Fitting highly flexible ML potentials comes at a cost: a vast amount of reference data is required to properly train these models. We address this need by providing access to a large computational DFT database, which consists of 20M conformations for 57,454 small organic molecules. We believe it will become a new standard benchmark for comparison of current and future methods in the ML potential community.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. 3D Reconstruction Techniques in the Manufacturing Domain: Applications, Research Opportunities and Use Cases

    cs.CV 2026-04 unverdicted novelty 2.0

    A survey of 106 papers finds quality inspection dominates 3D reconstruction use in manufacturing at 40 percent of applications, with a shift toward hybrid sensor systems and a noted gap in unified frameworks.