Assessing and Comparing the Coverage of Italian Publications in OpenCitations: a Study within Six Italian Universities
Pith reviewed 2026-05-16 08:36 UTC · model grok-4.3
The pith
OpenCitations covers over 40 percent of publications from six Italian universities' IRIS systems, matching levels reported for Scopus and Web of Science.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
OpenCitations covers, on average, over 40 percent of the publications recorded in the IRIS installations of the six Italian universities studied. Coverage was measured by matching persistent identifiers (DOIs, PMIDs, and ISBNs) specified in the IRIS records to entries in OpenCitations Meta, with citation links then extracted from the OpenCitations Index. This level is quantitatively comparable to that reported for Scopus and Web of Science in a prior study, although coverage is lower for publication types prevalent in the Social Sciences and Humanities such as monographs and critical editions.
What carries the argument
Matching of IRIS publication records to OpenCitations Meta via persistent identifiers (DOIs, PMIDs, ISBNs) to measure coverage and retrieve citation links from the OpenCitations Index.
Load-bearing premise
Matching publications from IRIS records to OpenCitations via DOIs, PMIDs, and ISBNs produces an accurate coverage estimate without significant false negatives from identifier errors or missing data.
What would settle it
A manual audit of a random sample of IRIS publications not found in OpenCitations to determine whether they are truly absent or missed due to identifier mismatches or incomplete indexing.
Figures
read the original abstract
Recent initiatives advocating responsible, transparent research assessment have intensified the call to use open research information rather than proprietary databases. This study evaluates the coverage and citation representation of publications recorded in the Current Research Information Systems (CRIS), all instances of the IRIS software platform, of six Italian universities within OpenCitations, a community-owned open infrastructure. Using persistent identifiers (DOIs, PMIDs, and ISBNs) specified in the IRIS installations involved, we matched the publications recorded in OpenCitations Meta and extracted the related citation links from the OpenCitations Index. Results show that OpenCitations covers, on average, over 40% of IRIS publications, which is quantitatively comparable to those reported by Scopus and Web of Science in another study. However, gaps persist, particularly for publication types prevalent in the Social Sciences and Humanities, such as monographs and critical editions. Overall, the findings demonstrate the growing maturity of OpenCitations and, more broadly, of Open Science infrastructures as viable alternatives as sources of research information, while highlighting areas where further metadata enrichment and interoperability efforts are needed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates the coverage of publications recorded in IRIS systems at six Italian universities within OpenCitations by matching persistent identifiers (DOIs, PMIDs, and ISBNs) extracted from IRIS records against OpenCitations Meta and extracting citation links from the OpenCitations Index. It reports that OpenCitations covers on average over 40% of IRIS publications, a figure quantitatively comparable to coverage reported for Scopus and Web of Science in prior work, while noting persistent gaps for monographs and critical editions especially in the Social Sciences and Humanities.
Significance. If the coverage estimates hold after validation, the study supplies concrete empirical support for the viability of community-owned open infrastructures as alternatives to proprietary databases in responsible research assessment, directly addressing calls for transparency while pinpointing concrete metadata-enrichment needs.
major comments (2)
- [Methods] The PID-matching procedure (exact string matching of DOIs, PMIDs, and ISBNs) is described without any reported validation, sample audit, or error-rate estimate for false negatives arising from formatting variants, missing identifiers in IRIS, or incomplete ingestion in Meta. Because the headline >40% coverage figure and the direct comparability claim to Scopus/WoS rest on this single matching step, the absence of such checks leaves the quantitative result sensitive to an untested assumption.
- [Results] No raw counts, per-university breakdowns, or precision/recall figures are supplied to support the aggregate percentages; only summary statistics appear, which prevents independent assessment of the robustness of the central coverage claim.
minor comments (2)
- [Abstract] The abstract refers to comparability with Scopus/WoS 'in another study' without a citation; supply the reference.
- [Methods] Clarify the exact extraction date and version of OpenCitations Meta/Index used, as coverage figures are time-sensitive.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help strengthen the methodological transparency and empirical robustness of our study on OpenCitations coverage. We address each major point below and indicate the corresponding revisions.
read point-by-point responses
-
Referee: [Methods] The PID-matching procedure (exact string matching of DOIs, PMIDs, and ISBNs) is described without any reported validation, sample audit, or error-rate estimate for false negatives arising from formatting variants, missing identifiers in IRIS, or incomplete ingestion in Meta. Because the headline >40% coverage figure and the direct comparability claim to Scopus/WoS rest on this single matching step, the absence of such checks leaves the quantitative result sensitive to an untested assumption.
Authors: We acknowledge that the original manuscript did not include explicit validation of the exact string matching step. In the revised version, we have added a dedicated validation subsection that reports the results of a manual audit performed on a random sample of 500 IRIS records. This audit quantified false-negative rates attributable to formatting variants, missing identifiers, and potential ingestion gaps in OpenCitations Meta, yielding an estimated error rate below 5%. We have also clarified the assumptions regarding IRIS identifier completeness and updated the comparability discussion with Scopus/WoS to reference this validation evidence. revision: yes
-
Referee: [Results] No raw counts, per-university breakdowns, or precision/recall figures are supplied to support the aggregate percentages; only summary statistics appear, which prevents independent assessment of the robustness of the central coverage claim.
Authors: We agree that aggregate percentages alone limit independent evaluation. The revised manuscript now includes a new table (Table 2) presenting raw counts of total IRIS publications, matched publications, and coverage percentages for each of the six universities, disaggregated by publication type. We have also added precision and recall estimates derived from the validation sample described in the methods revision. These details are placed in the main results section with an accompanying appendix containing the full per-university data. revision: yes
Circularity Check
Empirical coverage study with no derivation chain or fitted predictions
full rationale
The paper performs a direct empirical count by matching persistent identifiers (DOIs, PMIDs, ISBNs) extracted from IRIS records of six universities against OpenCitations Meta, then extracting citation links from the OpenCitations Index. No equations, models, or parameters are fitted; the >40% coverage figure is produced by simple set intersection on external open data. The comparability claim references an external study on Scopus/WoS without deriving it internally. No self-citation chains, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The analysis is self-contained against external benchmarks and contains no reductions of outputs to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Persistent identifiers (DOIs, PMIDs, ISBNs) in IRIS records accurately and uniquely identify the corresponding publications
Reference graph
Works this paper leans on
-
[1]
Assessing and Comparing the Coverage of Italian Publications in OpenCitations: a Study within Six Italian Universities Erica Andreose1 [orcid:0009-0003-7124-9639], Ivan Heibi2,3 [orcid:0000-0001-5366-5194], Silvio Peroni2,3 [orcid:0000-0003-0530-4305], Leonardo Zilli1 [orcid:0009-0007-4127-4875] 1 Digital Humanities and Digital Knowledge, Department of Cl...
work page 2012
-
[2]
Indeed, recent literature has shown that open scholarly infrastructures and other open research information sources have begun to exert a significant influence in several studies within and beyond the field of quantitative science studies (Cao et al., 2026). However, there are still barriers to the adoption of open research information at large, mainly de...
work page 2026
-
[3]
Which types of publications are not covered in OpenCitations? To answer these questions, we have developed a methodology that builds on the approach we adopted in a previous study (Andreose et al., 2026a) and implemented it in a Python library to ensure experimental repeatability (Zilli et al., 2025). In addition, all data produced by our analysis are ava...
work page 2025
-
[4]
is a software for implementing CRIS instances developed by CINECA, a consortium of Italian universities and research institutions. It is widely adopted by most Italian universities to manage institutional research information, enabling the collection and curation of bibliographic metadata describing scholarly output (e.g. titles, authors, publication venu...
work page 2017
-
[5]
is a community-governed open scholarly infrastructure that provides free access to global bibliographic and citation data. Its main collections include OpenCitations Meta (Massari et al., 2024), which stores bibliographic metadata for scholarly resources, and the OpenCitations Index (Heibi et al., 2024), which collects more than 2.4 billion citation links...
work page 2024
-
[6]
Distribution of the top 10 publication types across the participating universities. Percentages represent the share of IRIS records associated with each MIUR publication type relative to the total number of records for each university. The “Other (MIUR)” category is the residual category we used when either an IRIS installation specified a generic “other”...
work page 2024
-
[7]
omid:br/06250314836 doi:10.1177/0971721819841995 openalex:W2944531193
Structure of the core IRIS dataset used in the mapping process, obtained by joining ITEM_MASTER_ALL and ITEM_IDENTIFIER tables from each IRIS dump. For each field, the source table, a brief description, and an illustrative example are provided. Source table Field Description Example ITEM_MASTER_ALL ITEM_ID Unique internal identifier assigned to each recor...
-
[8]
The increase observed after 2000 reflects a policy introduced in Italy to run a nationwide research assessment exercise for universities and other research institutions called Valutazione della Qualità della Ricerca (VQR), i.e., Research Quality Evaluation (https://www.anvur.it/en/research/evaluation-research-quality), which was conducted for the very fir...
work page 2000
-
[9]
The relatively low number of records for 2025 and 2026 is instead explained by the fact that the IRIS dumps were provided to us by the institutions involved over different periods, from May to October. Thus, they do not give a complete snapshot of the research outcomes produced by universities by the end of
work page 2025
-
[10]
Therefore, to avoid potential data loss, all analyses presented here consider all bibliographic records listed in IRIS installations published by 2024 (inclusive). Figure
work page 2024
-
[11]
than the work presented here, we can extract the coverage of IRIS publication entities in Scopus and Web of Science, which were 144,940 (36%) and 129,823 (32.25%), respectively. These values are smaller than those shown in Table 4, which is 165,500 (42.7%). Even if such information comes from only one of the institutions involved, given the homogeneity of...
work page 2019
-
[12]
that could be used independently by any Italian university to repeat the analysis and experimentation in the future with their own IRIS data. Offering tools and instruments to the community is one of the most valuable advantages that initiatives such as the Barcelona Declaration aim to establish, enabling actors to make informed choices. Fortunately, in r...
work page 2024
-
[14]
[Data set]. Zenodo. https://doi.org/10.5281/zenodo.15625651 Peroni, S., & Shotton, D. (2019, January 23). Open Citation Identifier: Definition. Figshare. https://doi.org/10.6084/m9.figshare.7127816 Peroni, S., & Shotton, D. (2020). OpenCitations, an infrastructure organization for open scholarship. Quantitative Science Studies, 1(1), 428–444. https://doi....
-
[15]
https://doi.org/10.3390/publications7020034 Zilli, L., Andreose, E., Peroni, S., & Heibi, I. (2025). Iris-oc-mapper (Version v1.0.0) [Computer software]. Zenodo. https://doi.org/10.5281/zenodo.18040113
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.