Typical Physics PhD Admissions Criteria Limit Access to Underrepresented Groups but Fail to Predict Doctoral Completion, including some additional information
Pith reviewed 2026-05-25 13:57 UTC · model grok-4.3
The pith
Standard physics PhD admissions metrics such as GRE scores show little to no link with doctoral completion yet create large barriers for underrepresented groups.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Multivariate statistical analysis of a national sample of physics PhD entrants finds that only undergraduate GPA maintains a statistically significant association with PhD completion in every model tested. Neither GRE Physics nor GRE Verbal scores predict completion in any model. GRE Quantitative scores reach significance in two models, yet the practical difference in completion probability between low and high scorers remains under 10 percentage points. The same scores exhibit substantial race, gender, and citizenship disparities, so continued emphasis on them selects against already underrepresented groups while adding little predictive value for doctoral completion.
What carries the argument
Multivariate statistical models that relate undergraduate GPA and GRE Quantitative, Verbal, and Physics scores to binary PhD completion outcomes while testing for demographic covariates.
If this is right
- Continued heavy use of GRE scores in admissions reduces participation by underrepresented racial, gender, and citizenship groups.
- Only undergraduate GPA shows consistent statistical linkage to PhD completion.
- Probability of completing the degree changes by less than 10 percentage points across wide ranges of GRE Quantitative scores.
- Admissions practices that weight GRE scores heavily lack justification from completion data.
Where Pith is reading between the lines
- Departments could test whether replacing GRE thresholds with emphasis on research experience or recommendation letters improves both diversity and completion rates.
- Parallel analyses in other STEM fields might reveal whether the same pattern holds outside physics.
- Reducing GRE weight could increase the share of US citizens entering physics PhD programs without lowering overall completion percentages.
Load-bearing premise
The sampled students represent the full population of physics PhD entrants and the models have controlled for all important confounding influences on completion.
What would settle it
A follow-up study using complete national records for a recent cohort that finds GRE Physics or Verbal scores retain a large, statistically significant association with completion after the same controls would contradict the central result.
Figures
read the original abstract
This work aims to understand how effective the typical admissions criteria used in physics are at identifying students who will complete the PhD. Through a multivariate statistical analysis of a sample that includes roughly one in eight students who entered physics PhD programs from 2000-2010, we find that the traditional admissions metrics of undergraduate GPA and the Graduate Records Examination (GRE) Quantitative, Verbal, and Physics Subject Tests do not predict completion in US physics graduate programs with the efficacy often assumed by admissions committees. We find only undergraduate GPA to have a statistically significant association with physics PhD completion across all models studied. In no model did GRE Physics or GRE Verbal predict PhD completion. GRE Quantitative scores had statistically significant relationships with PhD completion in two of four models studied. However, in practice, probability of completing the PhD changed by less than 10 percentage points for students scoring in the 10 ^th vs 90 ^th percentile of US test takers that were physics majors. Noting the significant race, gender, and citizenship gaps in GRE scores, these findings indicate that the heavy reliance on these test scores within typical PhD admissions process is a deterrent to increasing access, diversity, and equity in physics. Misuse of GRE scores selects against already-underrepresented groups and US citizens with tools that fail to meaningfully predict PhD completion. This is a draft; see the journal for the published version. Additionally included in blue text are several responses to queries about this work.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript reports a multivariate statistical analysis of a sample comprising roughly one in eight students entering US physics PhD programs from 2000-2010. It claims that undergraduate GPA shows a statistically significant association with PhD completion across models, while GRE Quantitative is significant in only two of four models, and GRE Physics and Verbal show no significant association in any model. Probability of completion shifts by less than 10 percentage points between the 10th and 90th percentiles of US physics-major test-takers, leading to the conclusion that heavy reliance on GRE scores in admissions limits access for underrepresented groups without improving prediction of completion.
Significance. If the central empirical claims hold after addressing sample selection, model specification, and range-restriction issues, the work would provide evidence-based support for revising physics PhD admissions criteria to reduce barriers for underrepresented groups. The finding that only GPA retains consistent significance while GRE scores add little predictive value would be relevant to ongoing debates on standardized testing in graduate admissions.
major comments (4)
- [Abstract/Methods] Abstract and Methods: The multivariate models are described only at a high level with no equations, no explicit list of covariates or interaction terms, no variable definitions (e.g., how completion is coded, how missing data are handled), and no robustness checks. Because the central claims rest entirely on the fitted models and their marginal effects, the absence of these details prevents evaluation of whether the reported null results for GRE Physics/Verbal are robust to specification choices.
- [Methods] Sample construction: The claim that the analytic sample represents 'roughly one in eight' entrants is presented without describing the sampling frame, response rate, or any correction for selection into the observed programs. This is load-bearing because the paper's inference about national test-taker percentiles depends on the representativeness of the restricted admitted-student sample.
- [Results] Results on percentile contrasts: The reported <10 pp probability shifts are computed from fitted models but reference the full national distribution of physics-major test-takers. No information is supplied on the actual min/max, interquartile range, or density of GRE/GPA scores within the analytic sample, nor on tests for linearity or interactions with admission thresholds. Given the range restriction inherent to an admitted-student sample, the marginal effects at the 10th percentile constitute untested extrapolation.
- [Methods/Results] Confounding and controls: The models are asserted to show that GRE scores 'fail to predict' completion, yet the manuscript provides no discussion of whether research experience, institutional resources, advisor quality, or program fixed effects are included or tested as confounders. This omission directly affects the interpretation of the null GRE coefficients.
minor comments (1)
- [Abstract] The abstract states 'This is a draft; see the journal for the published version'—this should be removed or clarified for the submitted manuscript.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight areas where the manuscript can be strengthened for clarity and rigor. We respond to each major comment below and indicate planned revisions.
read point-by-point responses
-
Referee: [Abstract/Methods] Abstract and Methods: The multivariate models are described only at a high level with no equations, no explicit list of covariates or interaction terms, no variable definitions (e.g., how completion is coded, how missing data are handled), and no robustness checks. Because the central claims rest entirely on the fitted models and their marginal effects, the absence of these details prevents evaluation of whether the reported null results for GRE Physics/Verbal are robust to specification choices.
Authors: We agree that the methods description is insufficiently detailed. In the revised manuscript we will add the full logistic regression equations for each model, an explicit list of all covariates (including any interactions), precise variable definitions (e.g., binary coding of PhD completion and treatment of missing values), and results from additional robustness checks such as alternative specifications and specification tests. revision: yes
-
Referee: [Methods] Sample construction: The claim that the analytic sample represents 'roughly one in eight' entrants is presented without describing the sampling frame, response rate, or any correction for selection into the observed programs. This is load-bearing because the paper's inference about national test-taker percentiles depends on the representativeness of the restricted admitted-student sample.
Authors: The one-in-eight figure is an estimate based on the total number of U.S. physics PhD entrants 2000–2010 and the size of the survey sample. We will expand the methods section to describe the sampling frame, participating programs, response rates, and any weighting or selection adjustments. We will also add explicit discussion of the admitted-student restriction and its implications for representativeness. revision: yes
-
Referee: [Results] Results on percentile contrasts: The reported <10 pp probability shifts are computed from fitted models but reference the full national distribution of physics-major test-takers. No information is supplied on the actual min/max, interquartile range, or density of GRE/GPA scores within the analytic sample, nor on tests for linearity or interactions with admission thresholds. Given the range restriction inherent to an admitted-student sample, the marginal effects at the 10th percentile constitute untested extrapolation.
Authors: We will add descriptive statistics on the observed range, quartiles, and density of GRE and GPA scores within the analytic sample, plus tests for linearity (e.g., quadratic terms or splines). We acknowledge that contrasts involving the national 10th percentile involve extrapolation beyond the sample support; the revised text will note this limitation while emphasizing that probability differences remain modest even within the observed data range. revision: partial
-
Referee: [Methods/Results] Confounding and controls: The models are asserted to show that GRE scores 'fail to predict' completion, yet the manuscript provides no discussion of whether research experience, institutional resources, advisor quality, or program fixed effects are included or tested as confounders. This omission directly affects the interpretation of the null GRE coefficients.
Authors: The models incorporate the survey measures that are available, including undergraduate research experience and institutional characteristics. Detailed advisor-quality or program fixed-effect data are not present in the dataset. We will add a limitations subsection that explicitly discusses these unmeasured potential confounders and their possible bearing on the GRE coefficients. revision: yes
Circularity Check
No circularity: empirical regression on observational admissions data
full rationale
This is an observational multivariate statistical study reporting associations between admissions metrics (GPA, GRE scores) and PhD completion in a sample of physics graduate students. The central claims rest on fitted regression models and significance tests applied directly to the data; no derivations, self-definitional relations, fitted parameters renamed as predictions, or load-bearing self-citations are present. The analysis is self-contained and externally falsifiable against the sample without reducing to its own inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Multivariate regression models assume linearity, no omitted variable bias, and correct functional form for the relationship between predictors and PhD completion.
Reference graph
Works this paper leans on
-
[1]
National Science Foundation, Doctorate recipients from U.S. universities: 2014. special report NSF 16-300., Special Report NSF 16-300, National Center for Science and Engineering Statistics, Arlington, VA (2015). Www.nsf.gov/statistics/2016/nsf16300/
work page 2014
-
[2]
S. W. Raudenbush, R. P. Fotiu, Y. F. Cheong, Inequality of access to educational resources: A national report card for eighth-grade math. Educational Evaluation and Policy Analysis 20, 253-267 (1998)
work page 1998
-
[3]
C. Riegle-Crumb, M. Humphries, Exploring bias in math teachers’ perceptions of students’ ability by gender and race/ethnicity. Gender & Society 26, 290-322 (2012)
work page 2012
-
[4]
A. C. Johnson, Unintended consequences: How science professors discourage women of color. Science Education 91, 805–821 (2007)
work page 2007
-
[5]
L. Espinosa, Pipelines and pathways: Women of color in undergraduate stem majors and the college experiences that contribute to persistence. Harvard Educational Review 81, 209-241 (2011)
work page 2011
-
[6]
J. R. Posselt, Disciplinary logics in doctoral admissions: Understanding patterns of faculty evaluation. The Journal of Higher Education 86, 807-833 (2015)
work page 2015
-
[7]
C. M. Steele, J. Aronson, Stereotype threat and the intellectual test performance of African Americans. Journal of personality and social psychology 69, 797 (1995)
work page 1995
- [8]
-
[9]
Committee on Underrepresented Groups and the Expansion of the Science and Engineering Workforce Pipeline; Committee on Science, Engineering, and Public Policy; Policy and Global Affairs; National Academy of Sciences, National Academy of Engineering, and Institutes of Medicine, Expanding Underrepresented Minority Participation: America’s Science and Techno...
work page 2011
-
[10]
G. Attiyeh, R. Attiyeh, Testing for bias in graduate school admissions. Journal of Human Resources pp. 524–548 (1997)
work page 1997
- [11]
-
[12]
E. E. Cureton, L. W. Cureton, R. Bishop, Prediction of success in graduate study of psychology at the University of Tennessee. American Psychologist 8, 361-362 (1949)
work page 1949
-
[13]
R. J. Sternberg, W. M. Williams, Does the graduate record examination predict meaningful success in the graduate training of psychology? a case study. American Psychologist 52, 630 (1997)
work page 1997
-
[14]
S. L. Petersen, E. S. Erenrich, D. L. Levine, J. Vigoreaux, K. Gile, Multi- institutional study of GRE scores as predictors of STEM PhD degree completion: GRE gets a low mark. PloS one 13, e0206570 (2018)
work page 2018
-
[15]
N. R. Kuncel, S. A. Hezlett, D. S. Ones, A comprehensive meta-analysis of the predictive validity of the graduate record examinations: implications for graduate student selection and performance. Psychological bulletin 127, 162 (2001)
work page 2001
-
[16]
N. R. Kuncel, S. Wee, L. Serafin, S. A. Hezlett, The validity of the graduate record examination for master’s and doctoral programs: A meta- analytic investigation. Educational and Psychological Measurement 70, 340–352 (2010)
work page 2010
-
[17]
J. D. Hall, A. B. O’Connell, J. G. Cook, Predictors of student p roductivity in biomedical graduate school applications. PloS one 12, e0169121 (2017)
work page 2017
-
[18]
L. Moneta -Koehler, A. M. Brown, K. A. Petrie, B. J. Evans, R. Chalkley, The limitations of the GRE in predicting success in biomedical graduate school. PloS one 12, e0166742 (2017)
work page 2017
- [19]
-
[20]
Integrated Science and Engineering Resources Data System; https://ncsesdata.nsf.gov/webcaspar/
-
[21]
Completion and Attrition: Analysis of Baseline Demographic Data from the Ph.D
Council of Graduate Schools, “Ph.D. Completion and Attrition: Analysis of Baseline Demographic Data from the Ph.D. Completion Project†(2008). Available from: http://www.phdcompletion.org/information/book2.asp
work page 2008
-
[22]
ETS Guide to the Use of Scores, 2017- 18; https://www.ets.org/s/gre/pdf/greguide.pdf
work page 2017
-
[23]
J. R. Posselt, Inside graduate admissions: Merit, diversity, and faculty gatekeeping (Harvard University Press, 2016)
work page 2016
-
[24]
C. W. Miller, Admissions criteria and diversity in graduate school. APS News 2, The Back Page (2013)
work page 2013
-
[25]
National Research Council; A Data- Based Assessment of Research-Doctorate Programs in the United States; http://www.nap.edu/rdp/
-
[26]
Singh Chawla, Researchers question ‘one -size-fits-all’ cut-off for p values
D. Singh Chawla, Researchers question ‘one -size-fits-all’ cut-off for p values. Nature News (2017)
work page 2017
-
[27]
S. S. Swinton, The predictive validity of the restructured GRE with particular attention to older students. ETS Research Report Series 1987, i–18 (1987)
work page 1987
-
[28]
J. D. House, Age bias in prediction of graduate grade point average from graduate record examination scores. Educational and Psychological Measurement 49, 663–666 (1989)
work page 1989
-
[29]
J. W. Morphew, J. P. Mestre, H.- A. Kang, H.-H. Chang, G. Fabry, Using computer adaptive testing to assess physics proficiency and improve exam performance in an introductory physics course. Physical Review Physics Education Research 14, 020110 (2018)
work page 2018
-
[30]
D. M. Klieger, F. A. Cline, S. L. Holtzman, J. L. Minsky, F. Lorenz, New perspectives on the validity of the GRE general test for predicting graduate school grades. ETS Research Report Series 2014, 1–62 (2014)
work page 2014
-
[31]
B. E. Lovitts, Leaving the ivory tower: The causes and consequences of departure from doctoral study (Rowman & Littlefield, 2001)
work page 2001
-
[32]
National Science Foundation, National Center for Science and Engineering Statistics. 2017. Women, Minorities, and Persons with Disabilities in Science and Engineering:
work page 2017
-
[33]
Special Report NSF 17-310. Arlington, VA. Available at www.nsf.gov/statistics/wmpd/
-
[34]
S. Rojstaczer, C. Healy, Where a is ordinary: The evolution of American college and university grading, 1940–2009. Teachers College Record 114, 1–23 (2012)
work page 1940
-
[35]
Rojstaczer, Grade inflation at American colleges and universities (2016)
S. Rojstaczer, Grade inflation at American colleges and universities (2016). gradeinflation.com
work page 2016
-
[36]
D. R. Hancock, Effects of test anxiety and evaluative threat on students’ achievement and motivation. The Journal of Educational Research 94, 284–290 (2001)
work page 2001
-
[37]
R. Zwick, Fair game?: The use of standardized admissions tests in higher education (Psychology Press, 2002)
work page 2002
-
[38]
S. Nicholson, P. J. Mulvey, Roster of physics departments with enrollment and degree data, 2016 (2016). www.aip.org/statistics/reports/roster-physics-2016
work page 2016
-
[39]
K. Z. Victoroff, R. E. Boyatzis, What is the relationship between emotional intelligence and dental student clinical performance? Journal of Dental Education 77, 416– 426 (2013)
work page 2013
-
[40]
P. J. Mulvey and S. Nicholson, Physics Graduate Degrees, AIP Statistical Research Center, 2011, http://www.aip.org/sites/default/files/statistics/graduate/graddegrees-p-08.pdf. Appendices The goal of this section is to address questions and comments regarding our article. Correlations and Collinearity One of our initial steps in assessing the data set col...
work page 2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.