pith. sign in

arxiv: 1906.11618 · v2 · pith:HGDZWYTPnew · submitted 2019-06-27 · ⚛️ physics.soc-ph · physics.ed-ph

Typical Physics PhD Admissions Criteria Limit Access to Underrepresented Groups but Fail to Predict Doctoral Completion, including some additional information

Pith reviewed 2026-05-25 13:57 UTC · model grok-4.3

classification ⚛️ physics.soc-ph physics.ed-ph
keywords physics PhD admissionsGRE predictive validitydoctoral completionunderrepresented groupsdiversity in physicsadmissions criteriaGPA and GRE
0
0 comments X

The pith

Standard physics PhD admissions metrics such as GRE scores show little to no link with doctoral completion yet create large barriers for underrepresented groups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines records for roughly one in eight students who entered US physics PhD programs between 2000 and 2010 and tests whether undergraduate GPA and the four GRE scores predict who finishes the degree. Only undergraduate GPA shows a statistically significant association with completion across the models examined. GRE Physics and GRE Verbal scores show no relationship with completion in any model, while GRE Quantitative reaches significance in only two of four models and produces less than a 10-percentage-point shift in completion probability between the 10th and 90th percentiles. Because the same tests display sizable gaps by race, gender, and citizenship, the analysis concludes that heavy reliance on them restricts access without improving prediction of success.

Core claim

Multivariate statistical analysis of a national sample of physics PhD entrants finds that only undergraduate GPA maintains a statistically significant association with PhD completion in every model tested. Neither GRE Physics nor GRE Verbal scores predict completion in any model. GRE Quantitative scores reach significance in two models, yet the practical difference in completion probability between low and high scorers remains under 10 percentage points. The same scores exhibit substantial race, gender, and citizenship disparities, so continued emphasis on them selects against already underrepresented groups while adding little predictive value for doctoral completion.

What carries the argument

Multivariate statistical models that relate undergraduate GPA and GRE Quantitative, Verbal, and Physics scores to binary PhD completion outcomes while testing for demographic covariates.

If this is right

  • Continued heavy use of GRE scores in admissions reduces participation by underrepresented racial, gender, and citizenship groups.
  • Only undergraduate GPA shows consistent statistical linkage to PhD completion.
  • Probability of completing the degree changes by less than 10 percentage points across wide ranges of GRE Quantitative scores.
  • Admissions practices that weight GRE scores heavily lack justification from completion data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Departments could test whether replacing GRE thresholds with emphasis on research experience or recommendation letters improves both diversity and completion rates.
  • Parallel analyses in other STEM fields might reveal whether the same pattern holds outside physics.
  • Reducing GRE weight could increase the share of US citizens entering physics PhD programs without lowering overall completion percentages.

Load-bearing premise

The sampled students represent the full population of physics PhD entrants and the models have controlled for all important confounding influences on completion.

What would settle it

A follow-up study using complete national records for a recent cohort that finds GRE Physics or Verbal scores retain a large, statistically significant association with completion after the same controls would contradict the central result.

Figures

Figures reproduced from arXiv: 1906.11618 by Benjamin M. Zwickl, Casey W. Miller, Julie R. Posselt, Rachel T. Silvestrini, Theodore Hodapp.

Figure 1
Figure 1. Figure 1: The fraction of US test takers above a specified GRE Physics Score shows [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
read the original abstract

This work aims to understand how effective the typical admissions criteria used in physics are at identifying students who will complete the PhD. Through a multivariate statistical analysis of a sample that includes roughly one in eight students who entered physics PhD programs from 2000-2010, we find that the traditional admissions metrics of undergraduate GPA and the Graduate Records Examination (GRE) Quantitative, Verbal, and Physics Subject Tests do not predict completion in US physics graduate programs with the efficacy often assumed by admissions committees. We find only undergraduate GPA to have a statistically significant association with physics PhD completion across all models studied. In no model did GRE Physics or GRE Verbal predict PhD completion. GRE Quantitative scores had statistically significant relationships with PhD completion in two of four models studied. However, in practice, probability of completing the PhD changed by less than 10 percentage points for students scoring in the 10 ^th vs 90 ^th percentile of US test takers that were physics majors. Noting the significant race, gender, and citizenship gaps in GRE scores, these findings indicate that the heavy reliance on these test scores within typical PhD admissions process is a deterrent to increasing access, diversity, and equity in physics. Misuse of GRE scores selects against already-underrepresented groups and US citizens with tools that fail to meaningfully predict PhD completion. This is a draft; see the journal for the published version. Additionally included in blue text are several responses to queries about this work.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

4 major / 1 minor

Summary. The manuscript reports a multivariate statistical analysis of a sample comprising roughly one in eight students entering US physics PhD programs from 2000-2010. It claims that undergraduate GPA shows a statistically significant association with PhD completion across models, while GRE Quantitative is significant in only two of four models, and GRE Physics and Verbal show no significant association in any model. Probability of completion shifts by less than 10 percentage points between the 10th and 90th percentiles of US physics-major test-takers, leading to the conclusion that heavy reliance on GRE scores in admissions limits access for underrepresented groups without improving prediction of completion.

Significance. If the central empirical claims hold after addressing sample selection, model specification, and range-restriction issues, the work would provide evidence-based support for revising physics PhD admissions criteria to reduce barriers for underrepresented groups. The finding that only GPA retains consistent significance while GRE scores add little predictive value would be relevant to ongoing debates on standardized testing in graduate admissions.

major comments (4)
  1. [Abstract/Methods] Abstract and Methods: The multivariate models are described only at a high level with no equations, no explicit list of covariates or interaction terms, no variable definitions (e.g., how completion is coded, how missing data are handled), and no robustness checks. Because the central claims rest entirely on the fitted models and their marginal effects, the absence of these details prevents evaluation of whether the reported null results for GRE Physics/Verbal are robust to specification choices.
  2. [Methods] Sample construction: The claim that the analytic sample represents 'roughly one in eight' entrants is presented without describing the sampling frame, response rate, or any correction for selection into the observed programs. This is load-bearing because the paper's inference about national test-taker percentiles depends on the representativeness of the restricted admitted-student sample.
  3. [Results] Results on percentile contrasts: The reported <10 pp probability shifts are computed from fitted models but reference the full national distribution of physics-major test-takers. No information is supplied on the actual min/max, interquartile range, or density of GRE/GPA scores within the analytic sample, nor on tests for linearity or interactions with admission thresholds. Given the range restriction inherent to an admitted-student sample, the marginal effects at the 10th percentile constitute untested extrapolation.
  4. [Methods/Results] Confounding and controls: The models are asserted to show that GRE scores 'fail to predict' completion, yet the manuscript provides no discussion of whether research experience, institutional resources, advisor quality, or program fixed effects are included or tested as confounders. This omission directly affects the interpretation of the null GRE coefficients.
minor comments (1)
  1. [Abstract] The abstract states 'This is a draft; see the journal for the published version'—this should be removed or clarified for the submitted manuscript.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight areas where the manuscript can be strengthened for clarity and rigor. We respond to each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Abstract/Methods] Abstract and Methods: The multivariate models are described only at a high level with no equations, no explicit list of covariates or interaction terms, no variable definitions (e.g., how completion is coded, how missing data are handled), and no robustness checks. Because the central claims rest entirely on the fitted models and their marginal effects, the absence of these details prevents evaluation of whether the reported null results for GRE Physics/Verbal are robust to specification choices.

    Authors: We agree that the methods description is insufficiently detailed. In the revised manuscript we will add the full logistic regression equations for each model, an explicit list of all covariates (including any interactions), precise variable definitions (e.g., binary coding of PhD completion and treatment of missing values), and results from additional robustness checks such as alternative specifications and specification tests. revision: yes

  2. Referee: [Methods] Sample construction: The claim that the analytic sample represents 'roughly one in eight' entrants is presented without describing the sampling frame, response rate, or any correction for selection into the observed programs. This is load-bearing because the paper's inference about national test-taker percentiles depends on the representativeness of the restricted admitted-student sample.

    Authors: The one-in-eight figure is an estimate based on the total number of U.S. physics PhD entrants 2000–2010 and the size of the survey sample. We will expand the methods section to describe the sampling frame, participating programs, response rates, and any weighting or selection adjustments. We will also add explicit discussion of the admitted-student restriction and its implications for representativeness. revision: yes

  3. Referee: [Results] Results on percentile contrasts: The reported <10 pp probability shifts are computed from fitted models but reference the full national distribution of physics-major test-takers. No information is supplied on the actual min/max, interquartile range, or density of GRE/GPA scores within the analytic sample, nor on tests for linearity or interactions with admission thresholds. Given the range restriction inherent to an admitted-student sample, the marginal effects at the 10th percentile constitute untested extrapolation.

    Authors: We will add descriptive statistics on the observed range, quartiles, and density of GRE and GPA scores within the analytic sample, plus tests for linearity (e.g., quadratic terms or splines). We acknowledge that contrasts involving the national 10th percentile involve extrapolation beyond the sample support; the revised text will note this limitation while emphasizing that probability differences remain modest even within the observed data range. revision: partial

  4. Referee: [Methods/Results] Confounding and controls: The models are asserted to show that GRE scores 'fail to predict' completion, yet the manuscript provides no discussion of whether research experience, institutional resources, advisor quality, or program fixed effects are included or tested as confounders. This omission directly affects the interpretation of the null GRE coefficients.

    Authors: The models incorporate the survey measures that are available, including undergraduate research experience and institutional characteristics. Detailed advisor-quality or program fixed-effect data are not present in the dataset. We will add a limitations subsection that explicitly discusses these unmeasured potential confounders and their possible bearing on the GRE coefficients. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical regression on observational admissions data

full rationale

This is an observational multivariate statistical study reporting associations between admissions metrics (GPA, GRE scores) and PhD completion in a sample of physics graduate students. The central claims rest on fitted regression models and significance tests applied directly to the data; no derivations, self-definitional relations, fitted parameters renamed as predictions, or load-bearing self-citations are present. The analysis is self-contained and externally falsifiable against the sample without reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the chosen regression models correctly capture predictive relationships in the observational data without major omitted variables or selection bias in the one-in-eight sample.

axioms (1)
  • domain assumption Multivariate regression models assume linearity, no omitted variable bias, and correct functional form for the relationship between predictors and PhD completion.
    Standard assumption invoked when interpreting regression coefficients as evidence of predictive validity.

pith-pipeline@v0.9.0 · 5826 in / 1362 out tokens · 35806 ms · 2026-05-25T13:57:14.357238+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages

  1. [1]

    universities: 2014

    National Science Foundation, Doctorate recipients from U.S. universities: 2014. special report NSF 16-300., Special Report NSF 16-300, National Center for Science and Engineering Statistics, Arlington, VA (2015). Www.nsf.gov/statistics/2016/nsf16300/

  2. [2]

    S. W. Raudenbush, R. P. Fotiu, Y. F. Cheong, Inequality of access to educational resources: A national report card for eighth-grade math. Educational Evaluation and Policy Analysis 20, 253-267 (1998)

  3. [3]

    Riegle-Crumb, M

    C. Riegle-Crumb, M. Humphries, Exploring bias in math teachers’ perceptions of students’ ability by gender and race/ethnicity. Gender & Society 26, 290-322 (2012)

  4. [4]

    A. C. Johnson, Unintended consequences: How science professors discourage women of color. Science Education 91, 805–821 (2007)

  5. [5]

    Espinosa, Pipelines and pathways: Women of color in undergraduate stem majors and the college experiences that contribute to persistence

    L. Espinosa, Pipelines and pathways: Women of color in undergraduate stem majors and the college experiences that contribute to persistence. Harvard Educational Review 81, 209-241 (2011)

  6. [6]

    J. R. Posselt, Disciplinary logics in doctoral admissions: Understanding patterns of faculty evaluation. The Journal of Higher Education 86, 807-833 (2015)

  7. [7]

    C. M. Steele, J. Aronson, Stereotype threat and the intellectual test performance of African Americans. Journal of personality and social psychology 69, 797 (1995)

  8. [8]

    Miyake, L

    A. Miyake, L. E. Kost -Smith, N. D. Finkelstein, S. J. Pollock, G. L. Cohen, T. A. Ito, Reducing the gender achievement gap in college science: A classroom study of values affirmation. Science 330, 1234–1237 (2010)

  9. [9]

    Committee on Underrepresented Groups and the Expansion of the Science and Engineering Workforce Pipeline; Committee on Science, Engineering, and Public Policy; Policy and Global Affairs; National Academy of Sciences, National Academy of Engineering, and Institutes of Medicine, Expanding Underrepresented Minority Participation: America’s Science and Techno...

  10. [10]

    Attiyeh, R

    G. Attiyeh, R. Attiyeh, Testing for bias in graduate school admissions. Journal of Human Resources pp. 524–548 (1997)

  11. [11]

    Miller, K

    C. Miller, K. Stassun, A test that fails. Nature 510, 303 (2014)

  12. [12]

    E. E. Cureton, L. W. Cureton, R. Bishop, Prediction of success in graduate study of psychology at the University of Tennessee. American Psychologist 8, 361-362 (1949)

  13. [13]

    R. J. Sternberg, W. M. Williams, Does the graduate record examination predict meaningful success in the graduate training of psychology? a case study. American Psychologist 52, 630 (1997)

  14. [14]

    S. L. Petersen, E. S. Erenrich, D. L. Levine, J. Vigoreaux, K. Gile, Multi- institutional study of GRE scores as predictors of STEM PhD degree completion: GRE gets a low mark. PloS one 13, e0206570 (2018)

  15. [15]

    N. R. Kuncel, S. A. Hezlett, D. S. Ones, A comprehensive meta-analysis of the predictive validity of the graduate record examinations: implications for graduate student selection and performance. Psychological bulletin 127, 162 (2001)

  16. [16]

    N. R. Kuncel, S. Wee, L. Serafin, S. A. Hezlett, The validity of the graduate record examination for master’s and doctoral programs: A meta- analytic investigation. Educational and Psychological Measurement 70, 340–352 (2010)

  17. [17]

    J. D. Hall, A. B. O’Connell, J. G. Cook, Predictors of student p roductivity in biomedical graduate school applications. PloS one 12, e0169121 (2017)

  18. [18]

    Moneta -Koehler, A

    L. Moneta -Koehler, A. M. Brown, K. A. Petrie, B. J. Evans, R. Chalkley, The limitations of the GRE in predicting success in biomedical graduate school. PloS one 12, e0166742 (2017)

  19. [19]

    Potvin, D

    G. Potvin, D. Chari, T. Hodapp, Investigating approaches to diversity in a national survey of physics doctoral degree programs: The graduate admissions landscape. Phys. Rev. Phys. Educ. Res. 13, 020142 (2017)

  20. [20]

    Integrated Science and Engineering Resources Data System; https://ncsesdata.nsf.gov/webcaspar/

  21. [21]

    Completion and Attrition: Analysis of Baseline Demographic Data from the Ph.D

    Council of Graduate Schools, “Ph.D. Completion and Attrition: Analysis of Baseline Demographic Data from the Ph.D. Completion Project” (2008). Available from: http://www.phdcompletion.org/information/book2.asp

  22. [22]

    ETS Guide to the Use of Scores, 2017- 18; https://www.ets.org/s/gre/pdf/greguide.pdf

  23. [23]

    J. R. Posselt, Inside graduate admissions: Merit, diversity, and faculty gatekeeping (Harvard University Press, 2016)

  24. [24]

    C. W. Miller, Admissions criteria and diversity in graduate school. APS News 2, The Back Page (2013)

  25. [25]

    National Research Council; A Data- Based Assessment of Research-Doctorate Programs in the United States; http://www.nap.edu/rdp/

  26. [26]

    Singh Chawla, Researchers question ‘one -size-fits-all’ cut-off for p values

    D. Singh Chawla, Researchers question ‘one -size-fits-all’ cut-off for p values. Nature News (2017)

  27. [27]

    S. S. Swinton, The predictive validity of the restructured GRE with particular attention to older students. ETS Research Report Series 1987, i–18 (1987)

  28. [28]

    J. D. House, Age bias in prediction of graduate grade point average from graduate record examination scores. Educational and Psychological Measurement 49, 663–666 (1989)

  29. [29]

    J. W. Morphew, J. P. Mestre, H.- A. Kang, H.-H. Chang, G. Fabry, Using computer adaptive testing to assess physics proficiency and improve exam performance in an introductory physics course. Physical Review Physics Education Research 14, 020110 (2018)

  30. [30]

    D. M. Klieger, F. A. Cline, S. L. Holtzman, J. L. Minsky, F. Lorenz, New perspectives on the validity of the GRE general test for predicting graduate school grades. ETS Research Report Series 2014, 1–62 (2014)

  31. [31]

    B. E. Lovitts, Leaving the ivory tower: The causes and consequences of departure from doctoral study (Rowman & Littlefield, 2001)

  32. [32]

    National Science Foundation, National Center for Science and Engineering Statistics. 2017. Women, Minorities, and Persons with Disabilities in Science and Engineering:

  33. [33]

    Arlington, VA

    Special Report NSF 17-310. Arlington, VA. Available at www.nsf.gov/statistics/wmpd/

  34. [34]

    Rojstaczer, C

    S. Rojstaczer, C. Healy, Where a is ordinary: The evolution of American college and university grading, 1940–2009. Teachers College Record 114, 1–23 (2012)

  35. [35]

    Rojstaczer, Grade inflation at American colleges and universities (2016)

    S. Rojstaczer, Grade inflation at American colleges and universities (2016). gradeinflation.com

  36. [36]

    D. R. Hancock, Effects of test anxiety and evaluative threat on students’ achievement and motivation. The Journal of Educational Research 94, 284–290 (2001)

  37. [37]

    Zwick, Fair game?: The use of standardized admissions tests in higher education (Psychology Press, 2002)

    R. Zwick, Fair game?: The use of standardized admissions tests in higher education (Psychology Press, 2002)

  38. [38]

    Nicholson, P

    S. Nicholson, P. J. Mulvey, Roster of physics departments with enrollment and degree data, 2016 (2016). www.aip.org/statistics/reports/roster-physics-2016

  39. [39]

    K. Z. Victoroff, R. E. Boyatzis, What is the relationship between emotional intelligence and dental student clinical performance? Journal of Dental Education 77, 416– 426 (2013)

  40. [40]

    All students

    P. J. Mulvey and S. Nicholson, Physics Graduate Degrees, AIP Statistical Research Center, 2011, http://www.aip.org/sites/default/files/statistics/graduate/graddegrees-p-08.pdf. Appendices The goal of this section is to address questions and comments regarding our article. Correlations and Collinearity One of our initial steps in assessing the data set col...