An analysis of the effects of open science indicators on citations in the French Open Science Monitor
Pith reviewed 2026-05-18 20:58 UTC · model grok-4.3
The pith
French publications with pre-prints receive 19% more citations, while data sharing adds 14.3% and software sharing adds 13.5%.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By analyzing the French Open Science Monitor dataset linked to OpenAlex and Crossref, the study establishes that the presence of a pre-print is correlated with a 19% positive effect on citation counts, software sharing with 13.5%, and data sharing with 14.3% in the most complete citation prediction model. Large variations exist across disciplines, and open access status adds an 8.6% increase. The results suggest a consistent correlation between open science indicators and higher citations, though they remain observational.
What carries the argument
A citation prediction model that includes open science indicators as variables while accounting for publication characteristics and disciplinary differences.
If this is right
- Pre-prints are correlated with a 19% increase in citation counts.
- Data sharing is correlated with a 14.3% increase in citation counts.
- Software sharing is correlated with a 13.5% increase in citation counts.
- Open access status is correlated with an 8.6% increase in citation counts.
- The strength of these correlations differs substantially across research disciplines.
Where Pith is reading between the lines
- Funding agencies could consider these patterns when designing incentives for open practices.
- Researchers might weigh the observed associations when deciding whether to share data or code early.
- Similar analyses in other national contexts could test whether the patterns generalize beyond France.
Load-bearing premise
The citation prediction model adequately controls for confounding factors such as research quality, discipline-specific norms, and publication characteristics.
What would settle it
A replication using randomized assignment of open practices or finer-grained quality controls that finds no citation differences would undermine the reported percentage effects.
Figures
read the original abstract
This study investigates the correlation of citation impact with various open science indicators (OSI) within the French Open Science Monitor (FOSM), a dataset comprising approximately 900,000 publications authored by French authors from 2020 to 2022. By integrating data from OpenAlex and Crossref, we analyze open science indicators such as the presence of a pre-print, data sharing, and software sharing in 576,537 publications in the FOSM dataset. Our analysis reveals a positive correlation between these OSI and citation counts. Considering our most complete citation prediction model, we find pre-prints are correlated with a significant positive effect of 19% on citation counts, software sharing of 13.5%, and data sharing of 14.3%. We find large variations in the correlations of OSIs with citations in different research disciplines, and observe that open access status of publications is correlated with a 8.6% increase in citations in our model. While these results remain observational and are limited to the scope of the analysis, they suggest a consistent correlation between citation advantages and open science indicators. Our results may be valuable to policy makers, funding agencies, researchers, publishers, institutions, and other stakeholders who are interested in understanding the academic impacts, or effects, of open science practices.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript analyzes correlations between open science indicators (pre-prints, data sharing, software sharing) and citation counts in the French Open Science Monitor dataset of ~576,537 publications (2020-2022), integrating OpenAlex and Crossref data. Using a citation prediction model, it reports positive associations: 19% for pre-prints, 13.5% for software sharing, and 14.3% for data sharing in the most complete specification, along with an 8.6% open-access effect and substantial disciplinary variation. The work remains explicitly observational and positions the results as potentially useful for policy and stakeholders.
Significance. If the regression adequately adjusts for confounders such as journal prestige, author metrics, and field norms, the large-scale national dataset and multi-source integration would add useful correlational evidence on open-science practices and citations. The explicit acknowledgment of observational limits and disciplinary heterogeneity is a strength; the results could inform funding and institutional decisions provided the modeling choices are transparent and robust.
major comments (2)
- [Methods] Methods section (citation prediction model description): the abstract and results present the 19%, 13.5%, and 14.3% figures as effects from the 'most complete' model, yet no details are supplied on the full covariate set, variable selection, inclusion of discipline fixed effects, or tests for selection/endogeneity. Given the large disciplinary variation already noted, omitted-variable bias remains a live concern for attributing the coefficients to the OSIs rather than to research quality or field norms.
- [Results] Results section (model output table or figure): the reported percentage effects are regression coefficients estimated on the same sample used to specify the model, with no accompanying standard errors, model-fit statistics, or robustness checks (e.g., alternative specifications or subsample analyses). This makes it difficult to evaluate whether the point estimates are stable or practically meaningful.
minor comments (2)
- [Abstract] Abstract: the phrasing 'significant positive effect' and 'effects of open science indicators' should be aligned with the later explicit statement that the study is observational and reports correlations, to avoid any implication of causality.
- [Data and methods] Data and methods: the exact sample construction (how the 576k subset was derived from the ~900k FOSM records) and any handling of missing OSI data should be stated more explicitly.
Simulated Author's Rebuttal
We thank the referee for these constructive comments, which highlight important areas for improving the clarity and rigor of our manuscript. Below we respond to each major comment.
read point-by-point responses
-
Referee: [Methods] Methods section (citation prediction model description): the abstract and results present the 19%, 13.5%, and 14.3% figures as effects from the 'most complete' model, yet no details are supplied on the full covariate set, variable selection, inclusion of discipline fixed effects, or tests for selection/endogeneity. Given the large disciplinary variation already noted, omitted-variable bias remains a live concern for attributing the coefficients to the OSIs rather than to research quality or field norms.
Authors: We agree that additional methodological transparency is needed. The revised manuscript will expand the Methods section to list all covariates in the most complete specification, describe the variable selection process, confirm the use of discipline fixed effects, and discuss any checks or limitations related to selection and endogeneity. We will also strengthen the text on the observational nature of the study and the risk of omitted-variable bias when interpreting associations with open science indicators. revision: yes
-
Referee: [Results] Results section (model output table or figure): the reported percentage effects are regression coefficients estimated on the same sample used to specify the model, with no accompanying standard errors, model-fit statistics, or robustness checks (e.g., alternative specifications or subsample analyses). This makes it difficult to evaluate whether the point estimates are stable or practically meaningful.
Authors: We accept this point. The revision will add standard errors (and confidence intervals) to all reported coefficients, include model-fit statistics, and report robustness checks such as alternative specifications and disciplinary subsample analyses. These additions will be placed in the Results section with supporting material in an appendix if needed. revision: yes
Circularity Check
Reported OSI citation effects are direct outputs of the fitted regression model
specific steps
-
fitted input called prediction
[Abstract]
"Considering our most complete citation prediction model, we find pre-prints are correlated with a significant positive effect of 19% on citation counts, software sharing of 13.5%, and data sharing of 14.3%."
The paper fits the citation prediction model (a regression on the 576,537-publication dataset) and then presents the resulting coefficients as the 'effects' of the open science indicators. These percentages are therefore the fitted parameters by construction, not independent predictions or external validations.
full rationale
The paper's central quantitative claims consist of percentage effects (19%, 13.5%, 14.3%) obtained by fitting a citation prediction model to the full observational dataset. These quantities are the estimated coefficients themselves rather than out-of-sample predictions or derivations from independent data. This matches the fitted-input-called-prediction pattern: the model is constructed on the same publications used to report the effects, so the headline results reduce to the fit by construction. No other circular patterns (self-definitional equations, load-bearing self-citations, or ansatz smuggling) are identifiable from the provided text.
Axiom & Free-Parameter Ledger
free parameters (1)
- OSI regression coefficients
axioms (1)
- domain assumption No major unmeasured confounders jointly affect both open science indicator adoption and citation counts.
Reference graph
Works this paper leans on
-
[1]
Aricia Bassinet et al. “Large-Scale Machine-Learning Analysis of Scientific PDF for Monitoring the Production and the Openness of Research Data and Software in France”. 2023. (Visited on 05/08/2025)
work page 2023
-
[2]
PLOS Open Science Indicators Principles and Definitions
Iain Hrynaszkiewicz and Veronique Kiermer. “PLOS Open Science Indicators Principles and Definitions”. In: (2022), 194697 Bytes. DOI: 10.6084/M9.FIGSHARE.21640889.V1. (Visited on 05/08/2025)
-
[3]
UNESCO Recommendation on Open Science
UNESCO. UNESCO Recommendation on Open Science. Tech. rep. UNESCO, 2021.DOI: 10.54677/MNMH8546. (Visited on 05/08/2025)
-
[4]
Principles of Open Science Monitoring
Open Science Monitoring Initiative. Principles of Open Science Monitoring . Tech. rep. Ministère de l’enseignement supérieur et de la recherche, 2024. DOI: 10.52949/49. (Visited on 05/08/2025)
work page doi:10.52949/49 2024
-
[5]
Monitoring Open Science as Transformative Change: Towards a Systemic Framework
Ismael Rafols, Ingeborg Meijer, and Jordi Molas-Gallart. “Monitoring Open Science as Transformative Change: Towards a Systemic Framework”. In:F1000Research 13 (Apr. 2024), p. 320. ISSN : 2046-1402. DOI: 10.12688/ f1000research.148290.1. (Visited on 05/08/2025)
work page 2024
-
[6]
S. Apartis et al. Open Science Impact Indicator Handbook . Jan. 2025. DOI: 10.5281/ZENODO.14538442 . (Visited on 05/08/2025)
-
[7]
The Societal Impact of Open Science: A Scoping Review
Nicki Lisa Cole et al. “The Societal Impact of Open Science: A Scoping Review”. In: Royal Society Open Science 11.6 (June 2024), p. 240286. ISSN : 2054-5703. DOI: 10.1098/rsos.240286. (Visited on 05/08/2025). 13 A PREPRINT - SEPTEMBER 4, 2025
-
[8]
The Academic Impact of Open Science: A Scoping Review
Thomas Klebel et al. “The Academic Impact of Open Science: A Scoping Review”. In: Royal Society Open Science 12.3 (Mar. 2025), p. 241248. ISSN : 2054-5703. DOI: 10.1098/rsos.241248. (Visited on 05/08/2025)
-
[9]
The Economic Impact of Open Science: A Scoping Review
Lena Tsipouri et al. The Economic Impact of Open Science: A Scoping Review . Feb. 2025. DOI: 10.31222/osf. io/kqse5_v1. (Visited on 05/11/2025)
-
[10]
A bio-inspired bistable recurrent cell allows for long-lasting memory.PLOS ONE, 16(6):e0252676, 2021
Giovanni Colavizza et al. “An Analysis of the Effects of Sharing Research Data, Code, and Preprints on Citations”. In: PLOS ONE 19.10 (Oct. 2024). Ed. by Yongli Tang, e0311493.ISSN : 1932-6203. DOI: 10.1371/journal. pone.0311493. (Visited on 05/08/2025)
-
[11]
Public Library Of Science. PLOS Open Science Indicators . 2023. DOI: 10.6084/M9.FIGSHARE.21687686.V5. (Visited on 05/08/2025)
-
[12]
The Citation Advantage of Linking Publications to Research Data
Giovanni Colavizza et al. “The Citation Advantage of Linking Publications to Research Data”. In:PLOS ONE 15.4 (Apr. 2020). Ed. by Jelte M. Wicherts, e0230416. ISSN : 1932-6203. DOI: 10 . 1371 / journal . pone . 0230416. (Visited on 12/15/2023)
work page 2020
-
[13]
Crossref Relationships between Preprints and Journal Articles
Dominika Tkaczyk. Crossref Relationships between Preprints and Journal Articles . Nov. 2023. DOI: 10.5281/ ZENODO.10144857. (Visited on 06/09/2025)
work page 2023
-
[14]
PLoS One18(5) (2015) https://doi.org/10.1371/journal.pone
Mariia Levchenko et al. “Enabling Preprint Discovery, Evaluation, and Analysis with Europe PMC”. In:PLOS ONE 19.9 (Sept. 2024). Ed. by Florian Naudet, e0303005. ISSN : 1932-6203. DOI: 10.1371/journal.pone. 0303005. (Visited on 05/12/2025)
-
[15]
Monitoring Open Access at a National Level: French Case Study
Eric Jeangirard. “Monitoring Open Access at a National Level: French Case Study”. In: ELPUB 2019 23d Inter- national Conference on Electronic Publishing. OpenEdition Press, June 2019. DOI: 10.4000/proceedings. elpub.2019.20. (Visited on 05/12/2025)
-
[16]
Allison Langham-Putrow, Caitlin Bakker, and Amy Riegelman. “Is the Open Access Citation Advantage Real? A Systematic Review of the Citation of Open Access and Subscription-Based Articles”. In: PLOS ONE 16.6 (June 2021). Ed. by Sergi Lozano, e0253129. ISSN : 1932-6203. DOI: 10.1371/journal.pone.0253129. (Visited on 06/09/2025)
-
[17]
A Study on the Citation Impact of Open Science Indicators in the French Open Science Monitor
Giovanni Colavizza, Lauren Cadwallader, and Iain Hrynaszkiewicz. A Study on the Citation Impact of Open Science Indicators in the French Open Science Monitor . 2025. DOI: 10.6084/M9.FIGSHARE.27822663.V2. (Visited on 06/10/2025)
-
[18]
Code Sharing Is Associated with Research Impact in Image Processing
Patrick Vandewalle. “Code Sharing Is Associated with Research Impact in Image Processing”. In:Computing in Science & Engineering 14.4 (July 2012), pp. 42–47. ISSN : 1521-9615. DOI: 10.1109/MCSE.2012.63. (Visited on 05/12/2025)
-
[19]
An Analysis of the Suitability of OpenAlex for Bibliometric Analyses
Juan Pablo Alperin et al. An Analysis of the Suitability of OpenAlex for Bibliometric Analyses . 2024. DOI: 10.48550/ARXIV.2404.17663. (Visited on 05/12/2025). 14
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.