Evaluating Gender Wage Inequality in Academia using Causal Inference Methods for Observational Data

Jan Hannig; Zihan Zhang

arxiv: 2505.24078 · v3 · submitted 2025-05-29 · 📊 stat.AP · econ.GN· q-fin.EC

Evaluating Gender Wage Inequality in Academia using Causal Inference Methods for Observational Data

Zihan Zhang , Jan Hannig This is my paper

Pith reviewed 2026-05-19 13:01 UTC · model grok-4.3

classification 📊 stat.AP econ.GNq-fin.EC

keywords gender wage gapcausal inferencefaculty salariespropensity score matchingcausal forestsobservational dataacademia

0 comments

The pith

Causal analysis of 12,039 UNC faculty salaries finds women earn 6% less than comparable men.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper applies modern causal inference tools to observational salary records to isolate the effect of gender after accounting for measurable differences in academic positions. Using data from over twelve thousand tenure-track faculty in the University of North Carolina system together with publication metrics, the authors combine propensity score matching and causal forests to adjust for rank, discipline, research output, and career length. The resulting estimate shows female faculty receive roughly six percent lower pay than otherwise similar male colleagues. The gap is not fixed but changes depending on career stage and how much research the faculty member produces.

Core claim

Using records from 12,039 tenure-track faculty in the University of North Carolina system linked with bibliometric indicators and institutional classifications, the study estimates the causal effect of gender on faculty salaries by combining propensity score matching with causal forests to adjust for rank, discipline, research productivity, and career experience. Results indicate that female faculty earn approximately 6% less than comparable male colleagues, with variation in the gap across career stages and levels of research productivity.

What carries the argument

Propensity score matching combined with causal forests to estimate the gender salary effect while adjusting for observed confounders including rank, discipline, research productivity, and career experience.

Load-bearing premise

All factors that influence both gender and salary are captured by the measured variables of rank, discipline, research productivity, and career experience, leaving no important unmeasured confounders.

What would settle it

Re-estimating the model after adding data on unmeasured factors such as family responsibilities or negotiation outcomes and finding that the six-percent gap shrinks to zero or changes sign.

read the original abstract

Observational studies often present challenges for causal inference due to confounding and heterogeneity. In this paper, we illustrate how modern causal inference methods can be applied to large-scale academic salary data. Using records from 12,039 tenure-track faculty in the University of North Carolina system, linked with bibliometric indicators and institutional classifications, we estimate the causal effect of gender on faculty salaries. Our analysis combines propensity score matching with causal forests to adjust for rank, discipline, research productivity, and career experience. Results indicate that female faculty earn approximately 6% less than comparable male colleagues, with variation in the gap across career stages and levels of research productivity. This case study demonstrates how causal inference methods for observational data can provide insight into structural disparities in complex social systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Applies PSM plus causal forests to UNC salary data and reports a 6% gender gap after adjusting for rank, discipline, productivity and experience, but the causal claim rests on an untested no-unmeasured-confounding assumption.

read the letter

The paper's main result is that female tenure-track faculty in the UNC system earn roughly 6% less than comparable male colleagues once rank, discipline, bibliometric output, and career experience are accounted for. They reach this by first doing propensity score matching and then fitting causal forests to recover both the average effect and variation by stage and productivity level. The sample is large—over 12,000 faculty—so the estimates have reasonable precision on their face.

Referee Report

3 major / 1 minor

Summary. The manuscript applies propensity score matching followed by causal forests to administrative salary records from 12,039 tenure-track faculty in the University of North Carolina system, linked to bibliometric and institutional data. After conditioning on rank, discipline, research productivity, and career experience, the authors report that female faculty earn approximately 6% less than comparable male colleagues, with heterogeneity in the gap across career stages and productivity levels. The work is presented as a case study illustrating the use of modern causal inference tools on observational data to study structural disparities.

Significance. If the conditional ignorability assumption holds after the reported adjustments, the analysis supplies evidence that a gender salary gap persists in academia even after accounting for observable productivity and career factors. The combination of matching with causal forests is a reasonable choice for handling both selection and heterogeneity in a large administrative dataset. The large sample and explicit focus on variation by stage and productivity are strengths. However, the causal interpretation of the 6% figure rests on an untestable assumption whose plausibility is not quantified in the manuscript.

major comments (3)

[Abstract / Methods] Abstract and methods: No details are provided on variable definitions (e.g., exact construction of the research productivity measure or career experience variable), propensity-score matching diagnostics (balance tables, common support checks), or causal forest hyperparameters (number of trees, minimum node size, splitting criteria). These omissions prevent evaluation of whether the reported 6% average treatment effect is robust to implementation choices.
[Results] Results: The manuscript reports no sensitivity analyses (Rosenbaum bounds, e-values, or placebo tests) for unmeasured confounding. Because the central claim—that the 6% gap is the causal effect of gender—requires that rank, discipline, bibliometrics, and experience capture all relevant confounders, the absence of quantitative robustness checks is load-bearing for the interpretation.
[Results] Results: Standard errors or confidence intervals around the 6% estimate and the heterogeneity patterns are not described. Without these, it is impossible to assess whether the reported gap and its variation across subgroups are statistically distinguishable from zero.

minor comments (1)

[Abstract] The abstract states the headline result but does not indicate the exact functional form of the outcome (log salary or raw salary) or the precise definition of the treatment (binary gender indicator).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major point below and will revise the manuscript to improve transparency, robustness, and statistical reporting.

read point-by-point responses

Referee: [Abstract / Methods] Abstract and methods: No details are provided on variable definitions (e.g., exact construction of the research productivity measure or career experience variable), propensity-score matching diagnostics (balance tables, common support checks), or causal forest hyperparameters (number of trees, minimum node size, splitting criteria). These omissions prevent evaluation of whether the reported 6% average treatment effect is robust to implementation choices.

Authors: We agree that these implementation details are necessary for reproducibility. In the revised manuscript we will add exact definitions of the research productivity measure (including the specific bibliometric indicators and any weighting) and the career experience variable (years since terminal degree and time in current rank). We will also include propensity score balance tables, common support diagnostics, and full causal forest hyperparameters (number of trees, minimum node size, and splitting criteria). revision: yes
Referee: [Results] Results: The manuscript reports no sensitivity analyses (Rosenbaum bounds, e-values, or placebo tests) for unmeasured confounding. Because the central claim—that the 6% gap is the causal effect of gender—requires that rank, discipline, bibliometrics, and experience capture all relevant confounders, the absence of quantitative robustness checks is load-bearing for the interpretation.

Authors: We recognize the importance of quantifying sensitivity to unmeasured confounding. In the revision we will add e-value calculations for the main estimate and selected heterogeneity results, along with a discussion of how large an unobserved confounder would need to be to nullify the findings. We will also note the rich set of observed covariates (rank, discipline, productivity, and experience) that support the conditional ignorability assumption in this administrative setting. revision: yes
Referee: [Results] Results: Standard errors or confidence intervals around the 6% estimate and the heterogeneity patterns are not described. Without these, it is impossible to assess whether the reported gap and its variation across subgroups are statistically distinguishable from zero.

Authors: The causal forest procedure we employed produces variance estimates and confidence intervals. We will report these for the overall average treatment effect and for the subgroup analyses by career stage and productivity level in the revised results section and figures. revision: yes

Circularity Check

0 steps flagged

No circularity: standard causal estimators applied to external administrative records

full rationale

The paper applies propensity-score matching followed by causal forests to estimate the gender salary gap from UNC system records linked to bibliometric data. The 6% ATE and heterogeneity results are produced by these methods operating on the observed covariates and outcomes; they are not obtained by redefining the target quantity in terms of itself or by fitting a parameter whose value is then relabeled as a prediction. No self-citation chain, uniqueness theorem, or ansatz is invoked to justify the central estimate. The derivation therefore remains self-contained against the external data and does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The analysis rests on standard causal inference assumptions rather than new free parameters or invented entities.

axioms (1)

domain assumption Ignorability (no unmeasured confounding) given the observed covariates
Required for causal interpretation of observational salary data; invoked when applying propensity score matching and causal forests.

pith-pipeline@v0.9.0 · 5655 in / 1215 out tokens · 33220 ms · 2026-05-19T13:01:09.563627+00:00 · methodology

Evaluating Gender Wage Inequality in Academia using Causal Inference Methods for Observational Data

Core claim

What carries the argument

Load-bearing premise

What would settle it

discussion (0)