pith. sign in

arxiv: 2411.00008 · v5 · submitted 2024-10-17 · ⚛️ physics.soc-ph · cs.DL

Women in Science: Measuring Participation in Europe Across Disciplines, Generations and Over Time

Pith reviewed 2026-05-23 19:23 UTC · model grok-4.3

classification ⚛️ physics.soc-ph cs.DL
keywords women in scienceSTEMM disciplinesgender participationEuropeage cohortsbibliometric analysisscience growthdiscipline differences
0
0 comments X

The pith

Women now comprise 50% of publishing scientists in four STEMM disciplines and over 50% of young scientists in five, while contributing only marginally to growth in mathematics, computer science, physics and engineering.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper uses publication records to quantify how women have entered science in Europe from 1990 to 2023 across 14 STEMM disciplines. It finds that overall science growth includes rising numbers of women but with sharp splits: parity or better in some fields among current or young researchers, yet only marginal contribution in a cluster of four highly mathematical ones. The analysis covers 1.74 million scientists across ten age cohorts and 32 European countries plus comparators. A sympathetic reader would care because it turns digital publication traces into concrete measures of demographic change in research.

Core claim

A monolithic segment of STEMM science emerges as divided between the disciplines in which the growth was powerfully driven by women and the disciplines in which the role of women was marginal. There are four disciplines in which 50% of currently publishing scientists are women, and five disciplines in which more than 50% of currently young scientists are women. But there is also a cluster of four highly mathematized disciplines (MATH, COMP, PHYS, and ENG) in which the growth of science is only marginally driven by women.

What carries the argument

Structured Big Data from global publication indexes used to assign gender from author names, academic age from publication history, and discipline from field metadata, allowing quantification of women's contribution to science growth by discipline and over time.

If this is right

  • Science growth in Europe has been accompanied by growth in the number of women scientists with powerful cross-disciplinary differentiations.
  • Younger age cohorts show higher female shares than older ones in several disciplines.
  • The four mathematical disciplines may see their expansion continue with limited additional input from women unless participation patterns shift.
  • The overall sample shows 39.40% women scientists across all disciplines and countries examined.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The sharp disciplinary split implies that aggregate targets for women in science may mask persistent shortfalls in specific high-mathematics fields.
  • Extending the same name-and-publication method to later years could test whether the current patterns in young cohorts persist as those scientists age.
  • The same data approach could be applied to non-European countries to check whether the observed divisions between disciplines are region-specific.

Load-bearing premise

Gender can be determined from author names in publication records with accuracy sufficient for the reported percentages, and indexed publications adequately represent the full population of active scientists in each discipline and age cohort.

What would settle it

A validation study that compares name-based gender assignments against self-reported gender data for a large random sample of scientists across the 14 disciplines would falsify the percentages if mismatch rates exceed a few percent.

Figures

Figures reproduced from arXiv: 2411.00008 by Lukasz Szymula, Marek Kwiek.

Figure 1
Figure 1. Figure 1: Flowchart: Stages in constructing the study sample and the 2023 subsample. Methods The raw data were made available to us by Elsevier under a multi-year agreement with the ICSR Lab. The Scopus database version for 2023 and before, dated March 29, 2024, was used. To obtain the results at the aggregate level, the operation in the ICSR Lab relied on the use of the Databricks environment, which allowed for man… view at source ↗
Figure 2
Figure 2. Figure 2: Growth in participation in publishing nonoccasional STEMM women scientists in European countries over time by discipline, 1990–2023 (in %) (N = 684,155) Our data show that women’s participation in science have different trends for different clusters of disciplines. In some disciplines, women’s participation was already high in 1990, our starting point (reaching about one-third of all scientists publishing … view at source ↗
Figure 3
Figure 3. Figure 3: Growth in participation in publishing nonoccasional STEMM women scientists in the four comparator countries over time by discipline, 1990–2023 (in %) [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: The percentage (Left) and numbers (Right) of publishing nonoccasional STEMM scientists in European countries by discipline and gender (in %) (row percentages: 100% horizontally), 2023 (N = 684,155, of which 275,204, or 40.23% identified as women) To provide an example of using a high granularity level at our disposal, we know from our computations that, in ENG Engineering across Europe (in 2023), there wer… view at source ↗
Figure 5
Figure 5. Figure 5: Participation of women in science by academic generations. Horizontal approach: distribution of publishing nonoccasional STEMM scientists in European systems by discipline, age group, and gender (row percentages: 100% horizontally), 2023 (N = 1,740,985) [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗
read the original abstract

In this research, we quantify an inflow of women into science in the past three decades. Structured Big Data allow us to estimate the contribution of women scientists to the growth of science by disciplines (N = STEMM 14 disciplines) and over time (1990-2023). A monolithic segment of STEMM science emerges from this research as divided between the disciplines in which the growth was powerfully driven by women - and the disciplines in which the role of women was marginal. There are four disciplines in which 50% of currently publishing scientists are women; and five disciplines in which more than 50% of currently young scientists are women. But there is also a cluster of four highly mathematized disciplines (MATH, COMP, PHYS, and ENG) in which the growth of science is only marginally driven by women. Digital traces left by scientists in their publications indexed in global datasets open two new dimensions in large-scale academic profession studies: time and gender. The growth of science in Europe was accompanied by growth in the number of women scientists, but with powerful cross-disciplinary and cross-generational differentiations. We examined the share of women scientists coming from ten different age cohorts for 32 European and four comparator countries (the USA, Canada, Australia, and Japan). Our study sample was N = 1,740,985 scientists (including 39.40% women scientists). Three critical methodological challenges of using structured Big Data of the bibliometric type were discussed: gender determination, academic age determination, and discipline determination.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript uses bibliometric records from 1,740,985 scientists (39.40% women) across 14 STEMM disciplines in 32 European countries plus the USA, Canada, Australia, and Japan (1990-2023) to describe women's participation. It claims four disciplines currently have exactly 50% women among publishing scientists, five have >50% among young scientists, and four highly mathematized disciplines (MATH, COMP, PHYS, ENG) show only marginal female contribution to science growth. The work discusses three methodological challenges (gender, academic age, and discipline determination from publication data) but remains purely descriptive.

Significance. If the gender-inference and population-representation assumptions hold, the study supplies a large-scale, temporally resolved descriptive map of gender participation that reveals substantial cross-disciplinary heterogeneity within STEMM. The scale (N=1.74M) and coverage of multiple European countries and age cohorts constitute a concrete empirical contribution to the sociology of science, particularly by distinguishing disciplines in which women appear to drive growth from those in which their role remains marginal.

major comments (2)
  1. [Abstract and Methods (gender determination)] Abstract and Methods (gender determination section): The paper correctly flags gender determination from author names as a critical methodological challenge yet provides no validation metrics, accuracy rates, or sensitivity analysis for the name-based classifier applied to the 1.74M records. Documented error rates of 5-15% (higher for non-Western or unisex names) could shift disciplines across the exact 50% thresholds that underpin the headline claims of four disciplines at 50% and the marginal cluster in MATH/COMP/PHYS/ENG.
  2. [Results (discipline and generational shares)] Results (discipline and generational shares): No error bars, confidence intervals, or robustness checks are reported for the percentages that cross or approach 50%. Without these, it is impossible to assess whether the reported distinctions between disciplines and between current vs. young cohorts remain stable under plausible misclassification rates that may also vary systematically by discipline or cohort.
minor comments (2)
  1. [Abstract] Abstract: The total N is given but no per-discipline or per-country breakdown is supplied, which would help readers evaluate the precision of the 50% claims.
  2. [Methods] Methods: The operational definitions of 'academic age' and 'young scientists' are listed as challenges but receive insufficient detail on how cohorts were assigned from publication records.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of methodological transparency. We respond to each major comment below and indicate the revisions that will be made.

read point-by-point responses
  1. Referee: [Abstract and Methods (gender determination)] Abstract and Methods (gender determination section): The paper correctly flags gender determination from author names as a critical methodological challenge yet provides no validation metrics, accuracy rates, or sensitivity analysis for the name-based classifier applied to the 1.74M records. Documented error rates of 5-15% (higher for non-Western or unisex names) could shift disciplines across the exact 50% thresholds that underpin the headline claims of four disciplines at 50% and the marginal cluster in MATH/COMP/PHYS/ENG.

    Authors: We agree that explicit validation metrics and sensitivity analyses were not reported in the submitted version, even though the Methods section identifies gender determination as a methodological challenge. In the revision we will add the accuracy rates documented for the name-based classifier employed, together with a sensitivity analysis that re-computes the key shares under simulated misclassification rates of 5–15 %. This will directly test whether the four disciplines at or above 50 % and the marginal cluster in MATH, COMP, PHYS and ENG remain stable. revision: yes

  2. Referee: [Results (discipline and generational shares)] Results (discipline and generational shares): No error bars, confidence intervals, or robustness checks are reported for the percentages that cross or approach 50%. Without these, it is impossible to assess whether the reported distinctions between disciplines and between current vs. young cohorts remain stable under plausible misclassification rates that may also vary systematically by discipline or cohort.

    Authors: We concur that uncertainty quantification is required for percentages near the 50 % threshold. The revised manuscript will include binomial or bootstrap confidence intervals for all reported shares by discipline and cohort. We will also add robustness checks that incorporate plausible, discipline-specific misclassification rates to verify that the distinctions between disciplines and between current and young cohorts are not sensitive to these rates. revision: yes

Circularity Check

0 steps flagged

No circularity: purely descriptive counts from external publication dataset

full rationale

The paper reports empirical shares of women scientists across 14 STEMM disciplines and age cohorts using N=1.74M publication records (1990-2023). It contains no equations, fitted parameters, predictions, or derivations. The central claims (four disciplines at 50% women among current publishers; five at >50% among young scientists; marginal role in MATH/COMP/PHYS/ENG) are direct tabulations from name-inferred gender on indexed records. Methodological challenges (gender determination, academic age, discipline assignment) are explicitly flagged as limitations but do not create self-referential loops or reduce any result to its inputs by construction. No self-citations, ansatzes, or uniqueness theorems are invoked to justify the reported percentages. The analysis is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The measurement rests on two standard bibliometric assumptions rather than new postulates; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Author names allow sufficiently accurate gender inference for aggregate statistics
    Paper explicitly lists gender determination as a methodological challenge yet proceeds to report precise percentages.
  • domain assumption Publications indexed in global databases represent the active scientific workforce by discipline and age
    Core premise for using publication counts to measure participation and growth.

pith-pipeline@v0.9.0 · 5808 in / 1372 out tokens · 30341 ms · 2026-05-23T19:23:41.671306+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages · 1 internal anchor

  1. [1]

    https://doi.org/10.1177/2378023117738903 Kyvik, Svein. 1990. Age and scientific productivity: Differences between fields of learning. Higher Education 19: 37–55. Larivière, Vincent, Chaoqun Ni, Yves Gingras, Blaise Cronin, and Cassidy R. Sugimoto. 2013. Bibliometrics: Global gender disparities in science. Nature 504(7479): 211–213. Larivière, Vincent, Eri...

  2. [2]

    https://doi.org/10.1038/s41586-022-04966-w Rosser, Sue V. 2004. The science glass ceiling: Academic women scientists and the struggle to succeed. New York: Routledge. Salganik, Matthew J. 2018. Bit by bit: Social research in the digital age. Princeton: Princeton University Press. Sanliturk, Eda, Emilio Zagheni, Mateusz J. Dańko, Tiziana Theile, and Mohamm...