pith. sign in

arxiv: 2603.23536 · v2 · pith:DU2PQIWUnew · submitted 2026-03-12 · 💻 cs.DB · cond-mat.mtrl-sci

optimade-maker: Automated generation of interoperable materials APIs from static datasets

Pith reviewed 2026-05-22 10:43 UTC · model grok-4.3

classification 💻 cs.DB cond-mat.mtrl-sci
keywords OPTIMADEmaterials dataAPI generationdata interoperabilityatomistic structuresFAIR datadatabase integrationREST API
0
0 comments X

The pith

The optimade-maker toolkit automatically generates OPTIMADE-compliant APIs from raw atomistic structure and property data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

optimade-maker is a lightweight toolkit that turns static collections of atomistic structures and properties into standardized REST APIs following the OPTIMADE specification. This automation removes the need for data providers to build and maintain compliant services from scratch, which has been a technical hurdle for many repositories. The toolkit includes conversion steps to a common data representation and supports quick setup for both local testing and live production use. Demonstrations include an automated service for user-contributed datasets on the Materials Cloud Archive plus transformation pipelines that bring the Cambridge Structural Database and the Inorganic Crystal Structure Database into the same interoperable framework. A sympathetic reader would see this as a practical step that makes cross-repository searches and combined analyses more routine in materials research.

Core claim

optimade-maker enables automated generation of OPTIMADE-compliant APIs directly from raw atomistic structure and property data. The toolkit supports a wide range of raw datasets, enables conversion to a standardised OPTIMADE data representation, and allows for rapid deployment of APIs in both local and production environments, as shown through an automated service on the Materials Cloud Archive and data transformation pipelines for the Cambridge Structural Database and the Inorganic Crystal Structure Database.

What carries the argument

optimade-maker toolkit, which automates conversion of heterogeneous raw datasets to a standardised OPTIMADE data representation followed by API generation and deployment.

If this is right

  • Data providers gain the ability to publish interoperable APIs from existing static files with reduced technical effort.
  • Community archives can automatically expose new contributed datasets through OPTIMADE immediately upon upload.
  • Curated resources such as the Cambridge Structural Database and Inorganic Crystal Structure Database become accessible through a single standardized query interface.
  • Both local development and production hosting of compliant services become feasible without bespoke infrastructure.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Broader use of this approach could make routine cross-database queries a standard part of materials discovery workflows.
  • The method might be extended to additional raw data formats or linked to automated validation checks for data quality.
  • Wider adoption would reduce repeated effort in building one-off APIs and support the growth of a unified materials data layer.

Load-bearing premise

A wide range of heterogeneous raw datasets can be automatically converted to a standardized OPTIMADE data representation with sufficient fidelity without requiring extensive manual curation or custom code per dataset.

What would settle it

Processing a diverse collection of contributed raw datasets and finding that many lose critical properties, require substantial per-dataset custom code, or fail to produce working APIs would show the automation does not scale as claimed.

read the original abstract

Atomistic structural data are central to materials science, condensed matter physics, and chemistry, and are increasingly digitised across diverse repositories and databases. Interoperable access to these heterogeneous data sources enables reusable clients and tools, and is essential for cross-database analyses and data-driven materials discovery. Toward this aim, the OPTIMADE (Open Databases Integration for Materials Design) specification defines a standard REST API for atomistic structures and related properties. However, deploying and maintaining compliant services remains technically demanding and poses a significant barrier for many data providers. Here, we present optimade-maker, a lightweight toolkit for the automated generation of OPTIMADE-compliant APIs directly from raw atomistic structure and property data. The toolkit supports a wide range of raw datasets, enables conversion to a standardised OPTIMADE data representation, and allows for rapid deployment of APIs in both local and production environments. We further demonstrate it through an automated service on the Materials Cloud Archive, which automatically creates and publishes OPTIMADE APIs for contributed datasets, enabling immediate discoverability and interoperability. In addition, we implement data transformation pipelines for the Cambridge Structural Database (CSD) and the Inorganic Crystal Structure Database (ICSD), enabling unified access to these curated resources through the OPTIMADE framework. By lowering the technical barriers to interoperable data publication, optimade-maker represents an important step toward a scalable, FAIR materials data ecosystem integrating both community-contributed and curated databases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper presents optimade-maker, a lightweight toolkit for automated generation of OPTIMADE-compliant REST APIs directly from static raw atomistic structure and property datasets. It claims to support a wide range of heterogeneous sources, enable standardized conversion and rapid deployment in local or production environments, and demonstrates the approach via an automated service on the Materials Cloud Archive plus specific transformation pipelines for the CSD and ICSD databases.

Significance. If the automation and fidelity claims hold with minimal per-dataset effort, the work would meaningfully reduce barriers to OPTIMADE adoption and support a more scalable, interoperable materials data ecosystem. The software contribution is practical and timely for data providers; explicit release of code, examples, and validation would strengthen its reproducibility and impact.

major comments (1)
  1. [Abstract] Abstract: The central claim that optimade-maker enables 'automated generation of OPTIMADE-compliant APIs directly from raw atomistic structure and property data' and 'lowers the technical barriers' is load-bearing, yet the text states that the authors 'implement data transformation pipelines for the Cambridge Structural Database (CSD) and the Inorganic Crystal Structure Database (ICSD)'. This indicates dataset-specific mapping logic; clarification is required on how much custom code or manual curation remains necessary for arbitrary new heterogeneous sources versus truly generic automation.
minor comments (2)
  1. The abstract refers to support for 'a wide range of raw datasets' without enumerating formats, schemas, or limitations; a table or section listing supported input types and conversion coverage would improve clarity.
  2. No validation metrics (e.g., conversion success rates, error rates on heterogeneous fields, or fidelity checks) are mentioned in the provided description; adding quantitative evidence of reliability would strengthen the demonstration claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on the manuscript. We address the major comment below and will revise the manuscript to improve clarity on the level of automation provided.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that optimade-maker enables 'automated generation of OPTIMADE-compliant APIs directly from raw atomistic structure and property data' and 'lowers the technical barriers' is load-bearing, yet the text states that the authors 'implement data transformation pipelines for the Cambridge Structural Database (CSD) and the Inorganic Crystal Structure Database (ICSD)'. This indicates dataset-specific mapping logic; clarification is required on how much custom code or manual curation remains necessary for arbitrary new heterogeneous sources versus truly generic automation.

    Authors: We agree that the abstract would benefit from clarification to distinguish the generic toolkit capabilities from the specific demonstrations. optimade-maker provides a reusable framework with built-in parsers, schema validation, and deployment tools that automate the majority of the API generation process once a mapping is defined. For arbitrary new sources, users typically provide lightweight configuration files or short mapping scripts rather than implementing full custom code; this is substantially less effort than developing an OPTIMADE service from scratch. The CSD and ICSD pipelines represent more complex cases due to proprietary formats and extensive property mappings, serving as templates rather than the norm. The Materials Cloud Archive integration demonstrates near-full automation for contributed datasets. We will revise the abstract and add a dedicated subsection on typical effort levels and configuration examples for new heterogeneous sources. revision: yes

Circularity Check

0 steps flagged

No circularity: software engineering tool with no derivation chain

full rationale

The paper describes the design and implementation of a software toolkit (optimade-maker) for converting static atomistic datasets into OPTIMADE-compliant REST APIs. It contains no mathematical derivations, first-principles predictions, fitted models, or uniqueness theorems. The central contribution is a practical engineering artifact whose correctness is established by code execution and deployment examples rather than by any chain of logical steps that could reduce to self-definition or self-citation. The mention of specific transformation pipelines for CSD and ICSD is simply documentation of implemented functionality, not a load-bearing premise that loops back on itself. Because the work is self-contained as a software description with externally verifiable outputs (running APIs), the circularity score is 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is a software toolkit paper with no free parameters, mathematical axioms, or invented physical entities; it builds directly on the pre-existing OPTIMADE specification without introducing new fitted values or postulates.

pith-pipeline@v0.9.0 · 5816 in / 1172 out tokens · 42580 ms · 2026-05-22T10:43:18.631247+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.