pith. sign in

arxiv: 2301.06196 · v5 · submitted 2023-01-15 · 💻 cs.DL

Young Male and Female Scientists: A Quantitative Exploratory Study of the Changing Demographics of the Global Scientific Workforce

Pith reviewed 2026-05-24 10:09 UTC · model grok-4.3

classification 💻 cs.DL
keywords scientific workforcegender demographicsSTEMM disciplinesbibliometric analysisyoung scientistsage groupsOECD countriespublication metadata
0
0 comments X

The pith

Publication data from 4.3 million scientists shows young women already outnumber young men in one-third of STEMM disciplines.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes generational shifts in the global scientific workforce by tracking age and gender distributions across 16 disciplines using publication records of 4.3 million nonoccasional scientists in 38 OECD countries from 1990 to 2021. It establishes that one-third of disciplines now have more women than men in the youngest age group, that most women scientists overall are young, and that medicine accounts for more than half of all women scientists. These patterns emerge only when data are broken down by age group and discipline rather than aggregated. The work tests the value of bibliometric sources for workforce studies and contrasts global patterns with national-level findings.

Core claim

Across 16 STEMM disciplines, one-third already show more youngest female than male scientists; the majority of women scientists are young women; and 55.02 percent of all women scientists are located in medicine. These distributions are derived from large-scale cross-sectional and longitudinal analysis of publication metadata that distinguishes nonoccasional scientists by inferred age and gender.

What carries the argument

Large-scale generational analysis of publication metadata to map age-group and gender proportions by discipline and time period.

If this is right

  • Aggregated workforce statistics conceal distinct gender dynamics that appear only when age groups and disciplines are examined separately.
  • Some disciplines are already numerically dominated by women while change remains slow in others.
  • Bibliometric datasets enable global workforce tracking along gender, age, discipline, and time dimensions.
  • Global patterns can be compared directly with findings from single-country studies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Discipline-specific policies may be more effective than broad STEMM initiatives if young-female dominance continues in certain fields.
  • Longitudinal follow-up on the same cohorts could show whether higher entry of young women translates into sustained participation at later career stages.
  • The concentration of women in medicine raises questions about whether similar shifts occur in non-STEMM fields not covered here.

Load-bearing premise

Publication metadata can be used to reliably infer scientist age, gender, and active nonoccasional status across countries and disciplines.

What would settle it

A national census or survey in one of the 38 countries that reports substantially different age-by-gender proportions within disciplines than those derived from the publication records.

Figures

Figures reproduced from arXiv: 2301.06196 by Lukasz Szymula, Marek Kwiek.

Figure 1
Figure 1. Figure 1: Flowchart: stages in constructing the population and two subpopulations. 2.2. Methods In this section, we present the five basic procedures to unambiguously define the attributes of the scientists in our population. We initially used raw data for 2020 and before, here based on the Scopus database version dated 18 August 2021. The raw data were made available to us by Elsevier under an agreement with the IC… view at source ↗
Figure 2
Figure 2. Figure 2: The number of publishing nonoccasional STEMM scientists in 38 OECD countries by discipline and gender (left top) and by country (20 biggest systems only) and gender (right top). The share by discipline and gender (left bottom) and by country (20 biggest OECD systems only) and gender (right bottom) (in %), 2021 (N = 1,502,792) [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Ever-increasing participation of women in younger generations of scientists, with a few exceptions (e.g., COMP, MATH). Horizontal approach: distribution of publishing nonoccasional STEMM scientists by discipline, age group, and gender (row percentages: 100% horizontally), 2021 (N = 1,502,792) [PITH_FULL_IMAGE:figures/full_fig_p019_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Zooming in on young scientists only. More young men than young women in all STEMM disciplines except six (e.g., MED). Horizontal approach: young scientists only (academic age 10 years and less). Distribution of young publishing nonoccasional STEMM scientists by discipline, age group, and gender (row percentages: 100% horizontally), 2021 (N = 666,355) 3.2.2. A comparative horizontal view (2000 vs. 2021) Whe… view at source ↗
Figure 5
Figure 5. Figure 5: The increasing participation of young female scientists for all disciplines over time. Overview of percentage change directions, 2000 vs. 2021: horizontal approach. Zooming in on young scientists only (academic age 10 years or less). Distribution of young publishing nonoccasional STEMM scientists by discipline, age group, and gender; dark blue percentage female scientists 2021, white lines percentage femal… view at source ↗
Figure 6
Figure 6. Figure 6: Young women in STEMM: in most disciplines, the majority of women belong to the two youngest age groups. Vertical approach: distribution of publishing nonoccasional STEMM scientists by discipline, age group, and gender (column percentages: 100% vertically, for all age groups combined), 2021 (N = 1,502,792) [PITH_FULL_IMAGE:figures/full_fig_p024_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Zooming in on young scientists only. Higher concentration of young women than young men across all disciplines. Vertical approach: zooming in on young scientists only (academic age 10 and less): distribution of publishing nonoccasional STEMM scientists by discipline, age group, and gender (column percentages, vertically: percentage of young female scientists among all women, and young men among all men; wo… view at source ↗
Figure 8
Figure 8. Figure 8: Shrinking percentages of the youngest male and female scientists among all male and female scientists over time, across all disciplines. Overview of change directions in percentages, 2000 vs. 2021: vertical approach. Distribution of nonoccasional publishing STEMM scientists by discipline, age group, and gender (column percentages: 100% vertically for all age groups combined, dark blue 2000, light blue 2021… view at source ↗
Figure 9
Figure 9. Figure 9: Shrinking base of young scientists, both men and women, over time. Overview of percentage change directions, 2000 vs. 2021: vertical approach. Zooming in on young scientists only (academic age 10 years or less). Distribution of young publishing nonoccasional STEMM scientists by discipline, age group, and gender, 2000 (dark blue) and 2021 (light blue) (based on column percentages) (N2021 = 666,355, N2000 = … view at source ↗
Figure 10
Figure 10. Figure 10: Expanding base of old scientists, both men and women, over time. Overview of change directions, 2000 vs. 2021: vertical approach. Zooming in on old scientists only: academic age of 31–50 years. Distribution of old publishing nonoccasional STEMM scientists by discipline, age group, and gender, 2000 (dark blue) and 2021 (light blue) (based on column percentages) (N2021 = 146,090, N2000 = 17,463) 3.4. Result… view at source ↗
Figure 11
Figure 11. Figure 11: Different starting points and growth in participation of women in science over time. The trend in the percentage of female scientists by discipline, 1990–2021 (N = 4,314,666) Hypothetically, under stable conditions of professional access to disciplines and current trends in women’s participation in science by discipline, here based on the past three decades, none of which can be guaranteed in the future, … view at source ↗
Figure 12
Figure 12. Figure 12: Gender parity (50/50) vs. gender balance (40/60), time needed to achieve, in years, by discipline [PITH_FULL_IMAGE:figures/full_fig_p032_12.png] view at source ↗
read the original abstract

In this study, the global scientific workforce is explored through large-scale, generational, cross-sectional, and longitudinal approaches. We examine 4.3 million nonoccasional scientists from 38 OECD countries publishing in 1990-2021. Our interest is in the changing distribution of young male and female scientists over time across 16 STEMM (science, technology, engineering, mathematics, medicine) disciplines. We unpack the details of the changing scientific workforce using age groups. Some disciplines are already numerically dominated by women, and the change is fast in some and slow in other disciplines. In one-third of disciplines, there are already more youngest female than male scientists. Across all disciplines combined, the majority of women are young women. And more than half of women scientists (55.02%) are located in medicine. The usefulness of global bibliometric data sources in analyzing the scientific workforce along gender, age, discipline, and time is tested. Traditional aggregated data about scientists in general hide a nuanced picture of the changing gender dynamics within and across disciplines and age groups. The limitations of bibliometric datasets are explored, and global studies are compared with national-level studies. The methodological choices and their implications are shown, and new opportunities for how to study scientists globally are discussed.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper analyzes 4.3 million nonoccasional scientists from 38 OECD countries (1990-2021) across 16 STEMM disciplines using bibliometric records. It employs generational, cross-sectional, and longitudinal approaches to track changes in the gender distribution of young scientists, reporting that one-third of disciplines already have more youngest females than males, that the majority of women scientists are young, and that 55.02% of women are in medicine. The study also evaluates the utility and limitations of global bibliometric data for workforce demographics compared to national studies.

Significance. If the age, gender, and activity-status inferences prove reliable after validation, the work provides a large-scale view of discipline-specific gender shifts that aggregated statistics obscure, with potential value for diversity policy. The scale (4.3M records) and multi-country, multi-decade scope are strengths, as is the explicit comparison of global versus national data and discussion of methodological implications.

major comments (3)
  1. [Abstract and §2] Abstract and §2 (Data and sample construction): No details are provided on the algorithms or thresholds used to infer gender from names, age groups from publication-year proxies, or the criteria defining 'nonoccasional' scientists and excluding occasional publishers. These processing steps are load-bearing for all reported fractions (e.g., one-third of disciplines, 55.02% in medicine).
  2. [§3 and Results] §3 (Methods) and Results on youngest scientists: The headline claims rest on name-based gender inference and publication-year age grouping without reported accuracy rates, country- or discipline-specific calibration, or sensitivity analysis to differential misclassification. Known error patterns in these proxies could alter the observed proportions without external validation.
  3. [Discussion] Discussion section: While limitations of bibliometric data are explored, the absence of quantitative tests (e.g., comparison to ground-truth national registries or error-rate estimates by field/country) leaves the central assertion that the data 'accurately reflect workforce demographics' untested at the level required to support the specific numerical claims.
minor comments (2)
  1. [Abstract] The abstract states the dataset size and high-level approach but does not preview the specific inference methods or validation steps; moving a concise methods summary to the abstract would improve clarity.
  2. [§2] Notation for age groups and 'young' cutoffs should be defined explicitly with the chosen thresholds and any robustness checks, rather than left as free parameters.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below with proposed revisions to improve methodological transparency and the robustness of our claims.

read point-by-point responses
  1. Referee: [Abstract and §2] Abstract and §2 (Data and sample construction): No details are provided on the algorithms or thresholds used to infer gender from names, age groups from publication-year proxies, or the criteria defining 'nonoccasional' scientists and excluding occasional publishers. These processing steps are load-bearing for all reported fractions (e.g., one-third of disciplines, 55.02% in medicine).

    Authors: We agree that these processing steps require explicit documentation. In the revised manuscript we will expand §2 with a dedicated subsection detailing the gender-inference algorithm and probability thresholds employed, the exact publication-year rules used to assign age groups, and the quantitative criteria (minimum publication counts and time windows) used to classify nonoccasional scientists. revision: yes

  2. Referee: [§3 and Results] §3 (Methods) and Results on youngest scientists: The headline claims rest on name-based gender inference and publication-year age grouping without reported accuracy rates, country- or discipline-specific calibration, or sensitivity analysis to differential misclassification. Known error patterns in these proxies could alter the observed proportions without external validation.

    Authors: We acknowledge the value of accuracy reporting and sensitivity checks. The revised Methods section will cite published accuracy rates for the gender-inference tool used and will include a sensitivity analysis that varies gender-probability thresholds and age-proxy cut-offs to quantify their effect on the reported discipline-level proportions. Country- and discipline-specific calibration data are not available to us; this limitation will be stated explicitly. revision: partial

  3. Referee: [Discussion] Discussion section: While limitations of bibliometric data are explored, the absence of quantitative tests (e.g., comparison to ground-truth national registries or error-rate estimates by field/country) leaves the central assertion that the data 'accurately reflect workforce demographics' untested at the level required to support the specific numerical claims.

    Authors: We agree that direct quantitative validation against national registries would strengthen the paper. Obtaining such registries for all 38 countries and 16 disciplines lies outside the feasible scope of the present study. In the revised Discussion we will incorporate field-specific error-rate estimates from the existing literature, qualify the strength of our numerical claims, and expand the existing comparison with national studies to highlight remaining uncertainties. revision: partial

Circularity Check

0 steps flagged

No circularity: claims derived from direct counts on external publication records

full rationale

The paper conducts an exploratory descriptive analysis of 4.3 million publication records to compute observed proportions of young male/female scientists by discipline and age group. No equations, fitted parameters, or predictions are defined in terms of the target results. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core counts. The derivation chain consists solely of data filtering and aggregation steps applied to independent bibliometric sources; results are not presupposed by definitions or prior author work.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The analysis rests on domain assumptions about data representativeness and demographic inference that are not independently verified in the abstract; no free parameters or invented entities are explicitly introduced.

free parameters (2)
  • Age group cutoffs for 'young' scientists
    Used to define generational cohorts but exact thresholds not stated in abstract.
  • Criteria defining 'nonoccasional' scientists
    Filters the 4.3 million sample but definition absent from abstract.
axioms (2)
  • domain assumption Publication metadata can be used to reliably infer scientist gender and approximate age.
    Required for all generational and gender claims.
  • domain assumption The selected bibliometric database covers a representative sample of the scientific workforce in the 38 OECD countries.
    Necessary to generalize findings beyond the observed publications.

pith-pipeline@v0.9.0 · 5758 in / 1420 out tokens · 41962 ms · 2026-05-24T10:09:11.645078+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

5 extracted references · 5 canonical work pages

  1. [1]

    W., & D’Angelo, C

    Abramo, G., Aksnes, D. W., & D’Angelo, C. A. (2021). Gender differences in research performance within and between countries: Italy vs Norway. Journal of Informetrics, 15(2), 101144. Abramo, G., D’Angelo, C. A., & Murgia, G. (2016). The combined effects of age and seniority on research performance of full professors. Science and Public Policy, 43(3), 301–...

  2. [2]

    10.1162/qss_a_00019 Boekhout, H., van der Weijden, I., & Waltman, L. (2021). Gender differences in scientific careers: A large- scale bibliometric analysis. Preprint: https://arxiv.org/abs/2106.12624 Boothby, C., Milojevic, S., Larivière, V., Radicchi, F., & Sugimoto, C. (2022). Consistent churn of early career researchers: An analysis of turnover and rep...

  3. [3]

    F., & Nikivincze, I

    Fox, M. F., & Nikivincze, I. (2021). Being highly prolific in academic science: Characteristics of individuals and their departments. Higher Education, 81, 1237–1255. Fumasoli, T., Goastellec, G., & Kehm, B. M. (Eds.). (2015). Academic work and careers in Europe: Trends, challenges, perspectives. Springer. Halevi, G. (2019). Bibliometric studies on gender...

  4. [4]

    Kwiek, M. (2016). The European research elite: A cross-national study of highly productive academics across 11 European systems. Higher Education, 71(3), 379–397. Kwiek, M. (2018). High research productivity in vertically undifferentiated higher education systems: Who are the top performers? Scientometrics, 115(1), 415–462. Kwiek, M. (2019). Changing Euro...

  5. [5]

    chilly climate

    Morris, L. K., & Daniel, L. G. (2008). Perceptions of a chilly climate: differences in traditional and non- traditional majors for women. Research in Higher Education, 49, 256–273. Morrison, A. M., White, R. P., & Van Velsor, E. (1987). Breaking the glass ceiling: Can women reach the top of America's largest corporations? Addison-Wesley. Nielsen, M. W., &...