Young Male and Female Scientists: A Quantitative Exploratory Study of the Changing Demographics of the Global Scientific Workforce
Pith reviewed 2026-05-24 10:09 UTC · model grok-4.3
The pith
Publication data from 4.3 million scientists shows young women already outnumber young men in one-third of STEMM disciplines.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Across 16 STEMM disciplines, one-third already show more youngest female than male scientists; the majority of women scientists are young women; and 55.02 percent of all women scientists are located in medicine. These distributions are derived from large-scale cross-sectional and longitudinal analysis of publication metadata that distinguishes nonoccasional scientists by inferred age and gender.
What carries the argument
Large-scale generational analysis of publication metadata to map age-group and gender proportions by discipline and time period.
If this is right
- Aggregated workforce statistics conceal distinct gender dynamics that appear only when age groups and disciplines are examined separately.
- Some disciplines are already numerically dominated by women while change remains slow in others.
- Bibliometric datasets enable global workforce tracking along gender, age, discipline, and time dimensions.
- Global patterns can be compared directly with findings from single-country studies.
Where Pith is reading between the lines
- Discipline-specific policies may be more effective than broad STEMM initiatives if young-female dominance continues in certain fields.
- Longitudinal follow-up on the same cohorts could show whether higher entry of young women translates into sustained participation at later career stages.
- The concentration of women in medicine raises questions about whether similar shifts occur in non-STEMM fields not covered here.
Load-bearing premise
Publication metadata can be used to reliably infer scientist age, gender, and active nonoccasional status across countries and disciplines.
What would settle it
A national census or survey in one of the 38 countries that reports substantially different age-by-gender proportions within disciplines than those derived from the publication records.
Figures
read the original abstract
In this study, the global scientific workforce is explored through large-scale, generational, cross-sectional, and longitudinal approaches. We examine 4.3 million nonoccasional scientists from 38 OECD countries publishing in 1990-2021. Our interest is in the changing distribution of young male and female scientists over time across 16 STEMM (science, technology, engineering, mathematics, medicine) disciplines. We unpack the details of the changing scientific workforce using age groups. Some disciplines are already numerically dominated by women, and the change is fast in some and slow in other disciplines. In one-third of disciplines, there are already more youngest female than male scientists. Across all disciplines combined, the majority of women are young women. And more than half of women scientists (55.02%) are located in medicine. The usefulness of global bibliometric data sources in analyzing the scientific workforce along gender, age, discipline, and time is tested. Traditional aggregated data about scientists in general hide a nuanced picture of the changing gender dynamics within and across disciplines and age groups. The limitations of bibliometric datasets are explored, and global studies are compared with national-level studies. The methodological choices and their implications are shown, and new opportunities for how to study scientists globally are discussed.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes 4.3 million nonoccasional scientists from 38 OECD countries (1990-2021) across 16 STEMM disciplines using bibliometric records. It employs generational, cross-sectional, and longitudinal approaches to track changes in the gender distribution of young scientists, reporting that one-third of disciplines already have more youngest females than males, that the majority of women scientists are young, and that 55.02% of women are in medicine. The study also evaluates the utility and limitations of global bibliometric data for workforce demographics compared to national studies.
Significance. If the age, gender, and activity-status inferences prove reliable after validation, the work provides a large-scale view of discipline-specific gender shifts that aggregated statistics obscure, with potential value for diversity policy. The scale (4.3M records) and multi-country, multi-decade scope are strengths, as is the explicit comparison of global versus national data and discussion of methodological implications.
major comments (3)
- [Abstract and §2] Abstract and §2 (Data and sample construction): No details are provided on the algorithms or thresholds used to infer gender from names, age groups from publication-year proxies, or the criteria defining 'nonoccasional' scientists and excluding occasional publishers. These processing steps are load-bearing for all reported fractions (e.g., one-third of disciplines, 55.02% in medicine).
- [§3 and Results] §3 (Methods) and Results on youngest scientists: The headline claims rest on name-based gender inference and publication-year age grouping without reported accuracy rates, country- or discipline-specific calibration, or sensitivity analysis to differential misclassification. Known error patterns in these proxies could alter the observed proportions without external validation.
- [Discussion] Discussion section: While limitations of bibliometric data are explored, the absence of quantitative tests (e.g., comparison to ground-truth national registries or error-rate estimates by field/country) leaves the central assertion that the data 'accurately reflect workforce demographics' untested at the level required to support the specific numerical claims.
minor comments (2)
- [Abstract] The abstract states the dataset size and high-level approach but does not preview the specific inference methods or validation steps; moving a concise methods summary to the abstract would improve clarity.
- [§2] Notation for age groups and 'young' cutoffs should be defined explicitly with the chosen thresholds and any robustness checks, rather than left as free parameters.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address each major comment below with proposed revisions to improve methodological transparency and the robustness of our claims.
read point-by-point responses
-
Referee: [Abstract and §2] Abstract and §2 (Data and sample construction): No details are provided on the algorithms or thresholds used to infer gender from names, age groups from publication-year proxies, or the criteria defining 'nonoccasional' scientists and excluding occasional publishers. These processing steps are load-bearing for all reported fractions (e.g., one-third of disciplines, 55.02% in medicine).
Authors: We agree that these processing steps require explicit documentation. In the revised manuscript we will expand §2 with a dedicated subsection detailing the gender-inference algorithm and probability thresholds employed, the exact publication-year rules used to assign age groups, and the quantitative criteria (minimum publication counts and time windows) used to classify nonoccasional scientists. revision: yes
-
Referee: [§3 and Results] §3 (Methods) and Results on youngest scientists: The headline claims rest on name-based gender inference and publication-year age grouping without reported accuracy rates, country- or discipline-specific calibration, or sensitivity analysis to differential misclassification. Known error patterns in these proxies could alter the observed proportions without external validation.
Authors: We acknowledge the value of accuracy reporting and sensitivity checks. The revised Methods section will cite published accuracy rates for the gender-inference tool used and will include a sensitivity analysis that varies gender-probability thresholds and age-proxy cut-offs to quantify their effect on the reported discipline-level proportions. Country- and discipline-specific calibration data are not available to us; this limitation will be stated explicitly. revision: partial
-
Referee: [Discussion] Discussion section: While limitations of bibliometric data are explored, the absence of quantitative tests (e.g., comparison to ground-truth national registries or error-rate estimates by field/country) leaves the central assertion that the data 'accurately reflect workforce demographics' untested at the level required to support the specific numerical claims.
Authors: We agree that direct quantitative validation against national registries would strengthen the paper. Obtaining such registries for all 38 countries and 16 disciplines lies outside the feasible scope of the present study. In the revised Discussion we will incorporate field-specific error-rate estimates from the existing literature, qualify the strength of our numerical claims, and expand the existing comparison with national studies to highlight remaining uncertainties. revision: partial
Circularity Check
No circularity: claims derived from direct counts on external publication records
full rationale
The paper conducts an exploratory descriptive analysis of 4.3 million publication records to compute observed proportions of young male/female scientists by discipline and age group. No equations, fitted parameters, or predictions are defined in terms of the target results. No self-citation chains, uniqueness theorems, or ansatzes are invoked to justify core counts. The derivation chain consists solely of data filtering and aggregation steps applied to independent bibliometric sources; results are not presupposed by definitions or prior author work.
Axiom & Free-Parameter Ledger
free parameters (2)
- Age group cutoffs for 'young' scientists
- Criteria defining 'nonoccasional' scientists
axioms (2)
- domain assumption Publication metadata can be used to reliably infer scientist gender and approximate age.
- domain assumption The selected bibliometric database covers a representative sample of the scientific workforce in the 38 OECD countries.
Reference graph
Works this paper leans on
-
[1]
Abramo, G., Aksnes, D. W., & D’Angelo, C. A. (2021). Gender differences in research performance within and between countries: Italy vs Norway. Journal of Informetrics, 15(2), 101144. Abramo, G., D’Angelo, C. A., & Murgia, G. (2016). The combined effects of age and seniority on research performance of full professors. Science and Public Policy, 43(3), 301–...
work page 2021
-
[2]
10.1162/qss_a_00019 Boekhout, H., van der Weijden, I., & Waltman, L. (2021). Gender differences in scientific careers: A large- scale bibliometric analysis. Preprint: https://arxiv.org/abs/2106.12624 Boothby, C., Milojevic, S., Larivière, V., Radicchi, F., & Sugimoto, C. (2022). Consistent churn of early career researchers: An analysis of turnover and rep...
-
[3]
Fox, M. F., & Nikivincze, I. (2021). Being highly prolific in academic science: Characteristics of individuals and their departments. Higher Education, 81, 1237–1255. Fumasoli, T., Goastellec, G., & Kehm, B. M. (Eds.). (2015). Academic work and careers in Europe: Trends, challenges, perspectives. Springer. Halevi, G. (2019). Bibliometric studies on gender...
work page 2021
-
[4]
Kwiek, M. (2016). The European research elite: A cross-national study of highly productive academics across 11 European systems. Higher Education, 71(3), 379–397. Kwiek, M. (2018). High research productivity in vertically undifferentiated higher education systems: Who are the top performers? Scientometrics, 115(1), 415–462. Kwiek, M. (2019). Changing Euro...
-
[5]
Morris, L. K., & Daniel, L. G. (2008). Perceptions of a chilly climate: differences in traditional and non- traditional majors for women. Research in Higher Education, 49, 256–273. Morrison, A. M., White, R. P., & Van Velsor, E. (1987). Breaking the glass ceiling: Can women reach the top of America's largest corporations? Addison-Wesley. Nielsen, M. W., &...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.