pith. sign in

arxiv: 2508.20747 · v3 · submitted 2025-08-28 · 💻 cs.DL

An analysis of the effects of open science indicators on citations in the French Open Science Monitor

Pith reviewed 2026-05-18 20:58 UTC · model grok-4.3

classification 💻 cs.DL
keywords open science indicatorscitation impactpreprintsdata sharingsoftware sharingopen accessFrench publicationscitation prediction
0
0 comments X

The pith

French publications with pre-prints receive 19% more citations, while data sharing adds 14.3% and software sharing adds 13.5%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how open science indicators relate to citation counts in a dataset of roughly 900,000 French-authored publications from 2020 to 2022. It combines the French Open Science Monitor with data from OpenAlex and Crossref to track pre-prints, data sharing, and software sharing across 576,537 works. The most complete citation prediction model attributes a 19% positive effect to pre-prints, 13.5% to software sharing, and 14.3% to data sharing. Effects vary across disciplines, and open access status itself correlates with an 8.6% citation increase. The results are observational yet point to a consistent link between these practices and higher citation impact.

Core claim

By analyzing the French Open Science Monitor dataset linked to OpenAlex and Crossref, the study establishes that the presence of a pre-print is correlated with a 19% positive effect on citation counts, software sharing with 13.5%, and data sharing with 14.3% in the most complete citation prediction model. Large variations exist across disciplines, and open access status adds an 8.6% increase. The results suggest a consistent correlation between open science indicators and higher citations, though they remain observational.

What carries the argument

A citation prediction model that includes open science indicators as variables while accounting for publication characteristics and disciplinary differences.

If this is right

  • Pre-prints are correlated with a 19% increase in citation counts.
  • Data sharing is correlated with a 14.3% increase in citation counts.
  • Software sharing is correlated with a 13.5% increase in citation counts.
  • Open access status is correlated with an 8.6% increase in citation counts.
  • The strength of these correlations differs substantially across research disciplines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Funding agencies could consider these patterns when designing incentives for open practices.
  • Researchers might weigh the observed associations when deciding whether to share data or code early.
  • Similar analyses in other national contexts could test whether the patterns generalize beyond France.

Load-bearing premise

The citation prediction model adequately controls for confounding factors such as research quality, discipline-specific norms, and publication characteristics.

What would settle it

A replication using randomized assignment of open practices or finer-grained quality controls that finds no citation differences would undermine the reported percentage effects.

Figures

Figures reproduced from arXiv: 2508.20747 by Giovanni Colavizza, Iain Hrynaszkiewicz, Lauren Cadwallader.

Figure 1
Figure 1. Figure 1: Adoption of open science indicators (OSI) over time in FOSM. Each OSI remains adopted by a fraction of [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Adoption of OSI by BSO class, as shown in Table 1. Each OSI remains adopted by a fraction of publications, [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Percentage change on citation counts linked to each OSI, divided by BSO class. We only consider journal [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗
read the original abstract

This study investigates the correlation of citation impact with various open science indicators (OSI) within the French Open Science Monitor (FOSM), a dataset comprising approximately 900,000 publications authored by French authors from 2020 to 2022. By integrating data from OpenAlex and Crossref, we analyze open science indicators such as the presence of a pre-print, data sharing, and software sharing in 576,537 publications in the FOSM dataset. Our analysis reveals a positive correlation between these OSI and citation counts. Considering our most complete citation prediction model, we find pre-prints are correlated with a significant positive effect of 19% on citation counts, software sharing of 13.5%, and data sharing of 14.3%. We find large variations in the correlations of OSIs with citations in different research disciplines, and observe that open access status of publications is correlated with a 8.6% increase in citations in our model. While these results remain observational and are limited to the scope of the analysis, they suggest a consistent correlation between citation advantages and open science indicators. Our results may be valuable to policy makers, funding agencies, researchers, publishers, institutions, and other stakeholders who are interested in understanding the academic impacts, or effects, of open science practices.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript analyzes correlations between open science indicators (pre-prints, data sharing, software sharing) and citation counts in the French Open Science Monitor dataset of ~576,537 publications (2020-2022), integrating OpenAlex and Crossref data. Using a citation prediction model, it reports positive associations: 19% for pre-prints, 13.5% for software sharing, and 14.3% for data sharing in the most complete specification, along with an 8.6% open-access effect and substantial disciplinary variation. The work remains explicitly observational and positions the results as potentially useful for policy and stakeholders.

Significance. If the regression adequately adjusts for confounders such as journal prestige, author metrics, and field norms, the large-scale national dataset and multi-source integration would add useful correlational evidence on open-science practices and citations. The explicit acknowledgment of observational limits and disciplinary heterogeneity is a strength; the results could inform funding and institutional decisions provided the modeling choices are transparent and robust.

major comments (2)
  1. [Methods] Methods section (citation prediction model description): the abstract and results present the 19%, 13.5%, and 14.3% figures as effects from the 'most complete' model, yet no details are supplied on the full covariate set, variable selection, inclusion of discipline fixed effects, or tests for selection/endogeneity. Given the large disciplinary variation already noted, omitted-variable bias remains a live concern for attributing the coefficients to the OSIs rather than to research quality or field norms.
  2. [Results] Results section (model output table or figure): the reported percentage effects are regression coefficients estimated on the same sample used to specify the model, with no accompanying standard errors, model-fit statistics, or robustness checks (e.g., alternative specifications or subsample analyses). This makes it difficult to evaluate whether the point estimates are stable or practically meaningful.
minor comments (2)
  1. [Abstract] Abstract: the phrasing 'significant positive effect' and 'effects of open science indicators' should be aligned with the later explicit statement that the study is observational and reports correlations, to avoid any implication of causality.
  2. [Data and methods] Data and methods: the exact sample construction (how the 576k subset was derived from the ~900k FOSM records) and any handling of missing OSI data should be stated more explicitly.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for these constructive comments, which highlight important areas for improving the clarity and rigor of our manuscript. Below we respond to each major comment.

read point-by-point responses
  1. Referee: [Methods] Methods section (citation prediction model description): the abstract and results present the 19%, 13.5%, and 14.3% figures as effects from the 'most complete' model, yet no details are supplied on the full covariate set, variable selection, inclusion of discipline fixed effects, or tests for selection/endogeneity. Given the large disciplinary variation already noted, omitted-variable bias remains a live concern for attributing the coefficients to the OSIs rather than to research quality or field norms.

    Authors: We agree that additional methodological transparency is needed. The revised manuscript will expand the Methods section to list all covariates in the most complete specification, describe the variable selection process, confirm the use of discipline fixed effects, and discuss any checks or limitations related to selection and endogeneity. We will also strengthen the text on the observational nature of the study and the risk of omitted-variable bias when interpreting associations with open science indicators. revision: yes

  2. Referee: [Results] Results section (model output table or figure): the reported percentage effects are regression coefficients estimated on the same sample used to specify the model, with no accompanying standard errors, model-fit statistics, or robustness checks (e.g., alternative specifications or subsample analyses). This makes it difficult to evaluate whether the point estimates are stable or practically meaningful.

    Authors: We accept this point. The revision will add standard errors (and confidence intervals) to all reported coefficients, include model-fit statistics, and report robustness checks such as alternative specifications and disciplinary subsample analyses. These additions will be placed in the Results section with supporting material in an appendix if needed. revision: yes

Circularity Check

1 steps flagged

Reported OSI citation effects are direct outputs of the fitted regression model

specific steps
  1. fitted input called prediction [Abstract]
    "Considering our most complete citation prediction model, we find pre-prints are correlated with a significant positive effect of 19% on citation counts, software sharing of 13.5%, and data sharing of 14.3%."

    The paper fits the citation prediction model (a regression on the 576,537-publication dataset) and then presents the resulting coefficients as the 'effects' of the open science indicators. These percentages are therefore the fitted parameters by construction, not independent predictions or external validations.

full rationale

The paper's central quantitative claims consist of percentage effects (19%, 13.5%, 14.3%) obtained by fitting a citation prediction model to the full observational dataset. These quantities are the estimated coefficients themselves rather than out-of-sample predictions or derivations from independent data. This matches the fitted-input-called-prediction pattern: the model is constructed on the same publications used to report the effects, so the headline results reduce to the fit by construction. No other circular patterns (self-definitional equations, load-bearing self-citations, or ansatz smuggling) are identifiable from the provided text.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on an unspecified regression model whose coefficients are fitted to the observed citation data and on the unstated assumption that the integrated OpenAlex-Crossref data accurately captures open science indicators without systematic measurement error.

free parameters (1)
  • OSI regression coefficients
    The 19%, 13.5%, and 14.3% effects are parameters estimated by fitting the citation prediction model to the French publication data.
axioms (1)
  • domain assumption No major unmeasured confounders jointly affect both open science indicator adoption and citation counts.
    Required for interpreting regression coefficients as isolated effects of the indicators.

pith-pipeline@v0.9.0 · 5769 in / 1415 out tokens · 46586 ms · 2026-05-18T20:58:07.968960+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages

  1. [1]

    Large-Scale Machine-Learning Analysis of Scientific PDF for Monitoring the Production and the Openness of Research Data and Software in France

    Aricia Bassinet et al. “Large-Scale Machine-Learning Analysis of Scientific PDF for Monitoring the Production and the Openness of Research Data and Software in France”. 2023. (Visited on 05/08/2025)

  2. [2]

    PLOS Open Science Indicators Principles and Definitions

    Iain Hrynaszkiewicz and Veronique Kiermer. “PLOS Open Science Indicators Principles and Definitions”. In: (2022), 194697 Bytes. DOI: 10.6084/M9.FIGSHARE.21640889.V1. (Visited on 05/08/2025)

  3. [3]

    UNESCO Recommendation on Open Science

    UNESCO. UNESCO Recommendation on Open Science. Tech. rep. UNESCO, 2021.DOI: 10.54677/MNMH8546. (Visited on 05/08/2025)

  4. [4]

    Principles of Open Science Monitoring

    Open Science Monitoring Initiative. Principles of Open Science Monitoring . Tech. rep. Ministère de l’enseignement supérieur et de la recherche, 2024. DOI: 10.52949/49. (Visited on 05/08/2025)

  5. [5]

    Monitoring Open Science as Transformative Change: Towards a Systemic Framework

    Ismael Rafols, Ingeborg Meijer, and Jordi Molas-Gallart. “Monitoring Open Science as Transformative Change: Towards a Systemic Framework”. In:F1000Research 13 (Apr. 2024), p. 320. ISSN : 2046-1402. DOI: 10.12688/ f1000research.148290.1. (Visited on 05/08/2025)

  6. [6]

    Apartis et al

    S. Apartis et al. Open Science Impact Indicator Handbook . Jan. 2025. DOI: 10.5281/ZENODO.14538442 . (Visited on 05/08/2025)

  7. [7]

    The Societal Impact of Open Science: A Scoping Review

    Nicki Lisa Cole et al. “The Societal Impact of Open Science: A Scoping Review”. In: Royal Society Open Science 11.6 (June 2024), p. 240286. ISSN : 2054-5703. DOI: 10.1098/rsos.240286. (Visited on 05/08/2025). 13 A PREPRINT - SEPTEMBER 4, 2025

  8. [8]

    The Academic Impact of Open Science: A Scoping Review

    Thomas Klebel et al. “The Academic Impact of Open Science: A Scoping Review”. In: Royal Society Open Science 12.3 (Mar. 2025), p. 241248. ISSN : 2054-5703. DOI: 10.1098/rsos.241248. (Visited on 05/08/2025)

  9. [9]

    The Economic Impact of Open Science: A Scoping Review

    Lena Tsipouri et al. The Economic Impact of Open Science: A Scoping Review . Feb. 2025. DOI: 10.31222/osf. io/kqse5_v1. (Visited on 05/11/2025)

  10. [10]

    A bio-inspired bistable recurrent cell allows for long-lasting memory.PLOS ONE, 16(6):e0252676, 2021

    Giovanni Colavizza et al. “An Analysis of the Effects of Sharing Research Data, Code, and Preprints on Citations”. In: PLOS ONE 19.10 (Oct. 2024). Ed. by Yongli Tang, e0311493.ISSN : 1932-6203. DOI: 10.1371/journal. pone.0311493. (Visited on 05/08/2025)

  11. [11]

    PLOS Open Science Indicators

    Public Library Of Science. PLOS Open Science Indicators . 2023. DOI: 10.6084/M9.FIGSHARE.21687686.V5. (Visited on 05/08/2025)

  12. [12]

    The Citation Advantage of Linking Publications to Research Data

    Giovanni Colavizza et al. “The Citation Advantage of Linking Publications to Research Data”. In:PLOS ONE 15.4 (Apr. 2020). Ed. by Jelte M. Wicherts, e0230416. ISSN : 1932-6203. DOI: 10 . 1371 / journal . pone . 0230416. (Visited on 12/15/2023)

  13. [13]

    Crossref Relationships between Preprints and Journal Articles

    Dominika Tkaczyk. Crossref Relationships between Preprints and Journal Articles . Nov. 2023. DOI: 10.5281/ ZENODO.10144857. (Visited on 06/09/2025)

  14. [14]

    PLoS One18(5) (2015) https://doi.org/10.1371/journal.pone

    Mariia Levchenko et al. “Enabling Preprint Discovery, Evaluation, and Analysis with Europe PMC”. In:PLOS ONE 19.9 (Sept. 2024). Ed. by Florian Naudet, e0303005. ISSN : 1932-6203. DOI: 10.1371/journal.pone. 0303005. (Visited on 05/12/2025)

  15. [15]

    Monitoring Open Access at a National Level: French Case Study

    Eric Jeangirard. “Monitoring Open Access at a National Level: French Case Study”. In: ELPUB 2019 23d Inter- national Conference on Electronic Publishing. OpenEdition Press, June 2019. DOI: 10.4000/proceedings. elpub.2019.20. (Visited on 05/12/2025)

  16. [16]

    Is the Open Access Citation Advantage Real? A Systematic Review of the Citation of Open Access and Subscription-Based Articles

    Allison Langham-Putrow, Caitlin Bakker, and Amy Riegelman. “Is the Open Access Citation Advantage Real? A Systematic Review of the Citation of Open Access and Subscription-Based Articles”. In: PLOS ONE 16.6 (June 2021). Ed. by Sergi Lozano, e0253129. ISSN : 1932-6203. DOI: 10.1371/journal.pone.0253129. (Visited on 06/09/2025)

  17. [17]

    A Study on the Citation Impact of Open Science Indicators in the French Open Science Monitor

    Giovanni Colavizza, Lauren Cadwallader, and Iain Hrynaszkiewicz. A Study on the Citation Impact of Open Science Indicators in the French Open Science Monitor . 2025. DOI: 10.6084/M9.FIGSHARE.27822663.V2. (Visited on 06/10/2025)

  18. [18]

    Code Sharing Is Associated with Research Impact in Image Processing

    Patrick Vandewalle. “Code Sharing Is Associated with Research Impact in Image Processing”. In:Computing in Science & Engineering 14.4 (July 2012), pp. 42–47. ISSN : 1521-9615. DOI: 10.1109/MCSE.2012.63. (Visited on 05/12/2025)

  19. [19]

    An Analysis of the Suitability of OpenAlex for Bibliometric Analyses

    Juan Pablo Alperin et al. An Analysis of the Suitability of OpenAlex for Bibliometric Analyses . 2024. DOI: 10.48550/ARXIV.2404.17663. (Visited on 05/12/2025). 14