OpenCitations Meta
Pith reviewed 2026-05-24 08:25 UTC · model grok-4.3
The pith
OpenCitations Meta merges metadata from Crossref, DataCite and PubMed into the largest Semantic Web bibliographic database and assigns its own persistent identifiers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OpenCitations Meta stores bibliographic metadata for scholarly publications cited within the OpenCitations infrastructure, following the OpenCitations Data Model and published under CC0. It ingests data from Crossref, DataCite and PubMed to become the largest bibliographic metadata collection that uses Semantic Web technologies. It creates OMIDs for every resource so that publications described by different external PIDs can be unified and so that works without external PIDs can still participate in citations. Metadata is hosted internally rather than fetched on demand, and an automated pipeline performs deduplication, error correction, enrichment and complete provenance tracking.
What carries the argument
OpenCitations Meta Identifiers (OMIDs) together with the automated curation pipeline that follows the OpenCitations Data Model.
If this is right
- Publications described by different external PIDs such as a DOI and a PMID become a single record.
- Citations involving publications that lack any external PID can still be recorded and queried.
- Query responses no longer depend on live calls to external APIs, raising performance.
- Every metadata change carries full provenance, making data integrity traceable.
- Access is available through SPARQL, REST APIs and bulk dumps while remaining fully interoperable with other Semantic Web resources.
Where Pith is reading between the lines
- Analyses that combine citation links with bibliographic details can be performed inside a single local store rather than across multiple external services.
- The CC0 release and provenance records create a foundation that other projects could reuse or extend without legal or technical friction.
- If the curation pipeline proves reliable over time, the database could serve as a reference point for checking completeness of other open metadata collections.
- The same internal-hosting pattern could be applied to citation data itself to further reduce external dependencies.
Load-bearing premise
The automated curation pipeline can deduplicate records, correct errors and enrich metadata from heterogeneous sources without introducing systematic new errors or losing coverage.
What would settle it
A sample audit that finds the same publication assigned two different OMIDs or that finds source records from Crossref, DataCite or PubMed that are absent from the Meta database after the claimed ingestion.
read the original abstract
OpenCitations Meta is a new database for open bibliographic metadata of scholarly publications involved in the citations indexed by the OpenCitations infrastructure, adhering to Open Science principles and published under a CC0 license to promote maximum reuse. It presently incorporates bibliographic metadata for publications recorded in Crossref, DataCite and PubMed, making it the largest bibliographic metadata source using Semantic Web technologies. It assigns new globally persistent identifiers (PIDs), known as OpenCitations Meta Identifiers (OMIDs) to all bibliographic resources, enabling it both to disambiguate publications described using different external PIDS (e.g., a DOI in Crossref and a PMID in PubMed), and to handle citations involving publications lacking external PIDs. By hosting bibliographic metadata internally, OpenCitations Meta eliminates its former reliance on API calls to external resources and thus enhances performance in response to user queries. Its automated data curation, following the OpenCitations Data Model, includes deduplication, error correction, metadata enrichment and full provenance tracking, ensuring transparency and traceability of data and bolstering confidence in data integrity, a feature unparalleled in other bibliographic databases. Its commitment to Semantic Web standards ensures superior interoperability compared to other machine-readable formats, with availability via a SPARQL endpoint, REST APIs and data dumps.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents OpenCitations Meta, a new open bibliographic metadata database adhering to Open Science principles and published under CC0. It aggregates metadata for publications from Crossref, DataCite and PubMed, assigns OMIDs to enable disambiguation across external PIDs and to handle publications without PIDs, hosts metadata internally to eliminate external API calls, performs automated curation (deduplication, error correction, enrichment, provenance tracking) per the OpenCitations Data Model, and exposes data via SPARQL endpoint, REST APIs and dumps. The abstract asserts that this makes it the largest bibliographic metadata source using Semantic Web technologies.
Significance. If the scale, curation accuracy and provenance claims hold, the work provides a substantial open infrastructure contribution: a large-scale, interoperable Semantic Web bibliographic resource that improves query performance over prior external-API reliance and offers transparent, traceable data not matched by other bibliographic databases. This directly supports reuse, interoperability and scholarly analysis under open-science principles.
major comments (1)
- [Abstract] Abstract: the claim that OpenCitations Meta is 'the largest bibliographic metadata source using Semantic Web technologies' is unsupported by any reported counts of unique publications, OMIDs or citations, and by any explicit comparison to other RDF-based collections (e.g., Wikidata scholarly items). Without these figures the size assertion remains unevaluated.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback and positive assessment of the significance of OpenCitations Meta. We address the single major comment below and will revise the manuscript to strengthen the unsupported claim.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that OpenCitations Meta is 'the largest bibliographic metadata source using Semantic Web technologies' is unsupported by any reported counts of unique publications, OMIDs or citations, and by any explicit comparison to other RDF-based collections (e.g., Wikidata scholarly items). Without these figures the size assertion remains unevaluated.
Authors: We agree that the size claim in the abstract is currently unsupported, as the manuscript provides no explicit counts of unique publications, OMIDs or citations, nor any direct comparison to other Semantic Web resources such as Wikidata. In the revised manuscript we will add these quantitative figures (drawn from the integrated Crossref, DataCite and PubMed sources) together with a concise comparison to relevant RDF collections, either substantiating the claim or qualifying it appropriately. revision: yes
Circularity Check
No derivation chain or fitted results; database construction paper with no self-referential predictions
full rationale
The paper describes construction of OpenCitations Meta by ingesting and curating bibliographic metadata from external sources (Crossref, DataCite, PubMed). It assigns OMIDs, performs deduplication and enrichment, and exposes data via SPARQL/REST. No equations, parameters, predictions, or derivations appear in the provided text. Claims about size and uniqueness are presented as direct consequences of the aggregation process rather than outputs derived from the database itself. No self-citation load-bearing steps, ansatz smuggling, or renaming of known results are present. The work is self-contained as a report of infrastructure building.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Mapping bibliographic metadata collections: the case of OpenCitations Meta and OpenAlex
Authors map entities between OpenCitations Meta and OpenAlex to add identifiers and evaluate bibliographic metadata consistency.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.