Advancing clustering methods in physics education research: A case for mixture models

Karen Nylund-Gibson; Marsha Ing; Meagan Sundstrom; Minghui Wang

arxiv: 2506.11229 · v1 · pith:EV672LC5new · submitted 2025-06-12 · 📊 stat.ME · physics.ed-ph

Advancing clustering methods in physics education research: A case for mixture models

Minghui Wang , Meagan Sundstrom , Karen Nylund-Gibson , Marsha Ing This is my paper

Pith reviewed 2026-05-22 00:26 UTC · model grok-4.3

classification 📊 stat.ME physics.ed-ph

keywords clustering methodsmixture modelslatent class analysisphysics education researchk-modessubgroup identificationmodel-based clusteringclassification errors

0 comments

The pith

Mixture models provide a probabilistic alternative to k-means that accounts for classification errors and integrates subgroup membership into broader analyses in physics education research.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that k-means and k-modes clustering, widely used in physics education research to find subgroups with similar responses, rely on algorithmic hard partitioning that assigns definite membership without modeling uncertainty. Mixture models such as latent class analysis instead estimate the probability that each person belongs to each subgroup, which directly incorporates classification error and lets researchers include group membership as part of a larger statistical model. If this approach holds, researchers could obtain more reliable links between student subgroups and other measured variables without needing separate post-hoc tests. The authors support the claim by laying out the theoretical differences and by running both methods side by side on the same research questions with real data.

Core claim

Mixture models, specifically latent class analysis for categorical data, serve as a model-based alternative to k-modes clustering. They account for classification errors and permit direct integration of subgroup membership into a broader latent variable framework, as shown through parallel analyses that address identical research questions.

What carries the argument

Latent class analysis, a mixture model that estimates the probability of each individual belonging to each latent class from observed response patterns rather than forcing a single assignment.

If this is right

Subgroup membership can be modeled jointly with other variables inside one framework instead of through separate post-hoc steps.
Classification uncertainty is quantified and carried forward rather than treated as zero.
Model fit to the observed data can be assessed directly.
The same workflow applies to the categorical survey responses that dominate education research.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same shift from hard clustering to mixture models could be tested in psychology or sociology datasets that also rely on survey-based subgroups.
Researchers could examine whether mixture-model subgroups produce different predictions for student outcomes than k-means subgroups on held-out data.
Extensions might combine mixture models with continuous variables or multilevel structures common in classroom studies.

Load-bearing premise

That the probabilistic structure of mixture models will produce practically more useful insights than hard partitioning when applied to typical physics education research datasets and questions.

What would settle it

A replication in which the mixture-model and k-modes analyses produce identical subgroup interpretations and reach the same substantive conclusions on the same dataset, or in which adding subgroup membership to other variables yields no measurable improvement.

Figures

Figures reproduced from arXiv: 2506.11229 by Karen Nylund-Gibson, Marsha Ing, Meagan Sundstrom, Minghui Wang.

**Figure 4.** Figure 4: FIG. 4: Two-cluster solution for [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5: Three-class solution for latent class analysis. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6: Path diagram of the model we used to measure the [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 8.** Figure 8: shows the three-cluster solution for the k-modes clustering method. Cluster 1 contains 43% of the sample (n = 243) and we label this cluster as “High professional and low identity-based support.” Similar to Cluster 2 in the twocluster solution presented in the main text, many students in this cluster report receiving social support from professional sources (e.g., physics faculty at their institution). A … view at source ↗

**Figure 9.** Figure 9: FIG. 9: Two-class solution for latent class analysis. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

read the original abstract

Clustering methods are often used in physics education research (PER) to identify subgroups of individuals within a population who share similar response patterns or characteristics. K-means (or k-modes, for categorical data) is one of the most commonly used clustering methods in PER. This algorithm, however, is not model-based: it relies on algorithmic partitioning and assigns individuals to subgroups with definite membership. Researchers must also conduct post-hoc analyses to relate subgroup membership to other variables. Mixture models offer a model-based alternative that accounts for classification errors and allows researchers to directly integrate subgroup membership into a broader latent variable framework. In this paper, we outline the theoretical similarities and differences between k-modes clustering and latent class analysis (one type of mixture model for categorical data). We also present parallel analyses using each method to address the same research questions in order to demonstrate these similarities and differences. We provide the data and R code to replicate the worked example presented in the paper for researchers interested in using mixture models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that k-modes clustering, commonly used in PER, is limited by its algorithmic hard partitioning and requirement for post-hoc analyses, whereas latent class analysis (LCA) as a mixture model is model-based, accounts for classification uncertainty, and permits direct integration of subgroup membership into larger latent variable models. It outlines theoretical similarities and differences between the approaches and illustrates them via parallel analyses addressing identical research questions on the same data, with accompanying R code and data for replication.

Significance. If the parallel analyses convincingly show that LCA's probabilistic features produce more reliable or distinct inferences about subgroups and their relations to other variables, the work could meaningfully shift PER practice toward model-based clustering. The explicit reproducibility materials strengthen the contribution by lowering barriers to adoption.

major comments (2)

[parallel analyses / empirical example] In the section presenting the parallel analyses, the manuscript reports broadly similar subgroup profiles and post-hoc relations under both methods but does not quantify or highlight any differences arising from LCA's soft assignments or explicit modeling of classification error; this leaves the claim of practical superiority as an untested assumption rather than a demonstrated outcome.
[abstract and methods] The abstract and methods description provide insufficient detail on data characteristics (e.g., sample size, number and distribution of categorical items), model selection criteria, and fit diagnostics (e.g., BIC, entropy, or classification probabilities for the LCA solution), which are necessary to evaluate whether the mixture model is well-identified and whether the reported differences are robust.

minor comments (2)

[theoretical comparison] Notation for posterior class probabilities and item-response probabilities in the theoretical comparison section could be introduced more explicitly to aid readers without prior mixture-model experience.
[results figures] Figure captions for the parallel-analysis results should include the exact number of classes retained and the criterion used for that choice.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us strengthen the manuscript. We address each major comment below and have revised the paper to incorporate additional details and clarifications while preserving the core contribution of comparing k-modes and latent class analysis in physics education research.

read point-by-point responses

Referee: In the section presenting the parallel analyses, the manuscript reports broadly similar subgroup profiles and post-hoc relations under both methods but does not quantify or highlight any differences arising from LCA's soft assignments or explicit modeling of classification error; this leaves the claim of practical superiority as an untested assumption rather than a demonstrated outcome.

Authors: We agree that the original parallel analyses section focused primarily on the broad similarities in subgroup profiles and relations, which was deliberate to show that both approaches can address the same research questions. To address this concern, we have revised the section to include quantitative metrics highlighting LCA-specific features, such as average posterior class probabilities, entropy values, and a brief discussion of how accounting for classification uncertainty can affect the precision of post-hoc relations. These additions demonstrate the practical value of the model-based approach without overstating superiority, as the profiles remain largely consistent in this dataset. revision: yes
Referee: The abstract and methods description provide insufficient detail on data characteristics (e.g., sample size, number and distribution of categorical items), model selection criteria, and fit diagnostics (e.g., BIC, entropy, or classification probabilities for the LCA solution), which are necessary to evaluate whether the mixture model is well-identified and whether the reported differences are robust.

Authors: We appreciate this observation and have expanded both the abstract and methods sections in the revised manuscript. We now report the sample size, the number and distribution of the categorical items, the model selection process (including BIC comparisons across class solutions), and key fit diagnostics such as entropy and average classification probabilities for the selected LCA model. These revisions allow readers to better assess model identification and the robustness of the results. revision: yes

Circularity Check

0 steps flagged

No circularity: standard methodological comparison with independent empirical illustration

full rationale

The paper's core argument rests on established distinctions between algorithmic hard partitioning (k-modes) and model-based approaches (LCA) that account for classification uncertainty and permit direct integration into latent variable models. These distinctions are presented as theoretical background rather than derived from any fitted quantities within the paper. The parallel analyses serve as an empirical demonstration of similarities and differences on the same research questions, with data and R code supplied for reproducibility; no step reduces a claimed prediction or result to a parameter fitted from the same dataset or to a self-citation chain. The derivation chain is therefore self-contained against external methodological literature and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard assumptions of mixture models for categorical data and the modeling choice of number of classes; no new entities are postulated.

free parameters (1)

number of latent classes
The number of subgroups is a modeling choice typically selected via fit criteria or theory and functions as a free parameter in the mixture model.

axioms (1)

domain assumption Response patterns arise from a finite mixture of categorical distributions corresponding to latent classes.
This is the core modeling assumption invoked for latent class analysis to represent subgroup structure in categorical survey data.

pith-pipeline@v0.9.0 · 5710 in / 1259 out tokens · 63647 ms · 2026-05-22T00:26:11.616986+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

Structural parameters: the proportion of the population belonging to each class, P (c = k), which indicates the relative class size, and

work page
[2]

posterior

Measurement parameters: the conditional item proba- bilities, or the probability that a student in classk would endorse a specific indicator j, P (uj = 1|c = k) [20]. The basic LCA model assumes conditional independence, meaning that the latent class variable creating the subgroups explains all of the shared variance in the observed indicators. To estimat...

work page
[3]

What combinations of social support do undergraduate women and gender minorities in physics draw upon?

work page
[4]

What groups and/or people have you drawn support from in your journey through undergraduate physics (select all that apply)?

How does students’ combination of social support relate to their gender identity and physics identity? We note that we do not aim to completely address these re- search questions; rather, we use them to illustrate a practical application of clustering methods in PER. As such, we sim- plify our analysis to only include one gender identity as a pre- dictor ...

work page 2025
[5]

High professional and identity-based support,

Cluster identification We performed the k-modes clustering using the klaR pack- age in R (Version 4.3.2) [19]. The clustering was performed iteratively for a range of cluster values ( k) from 1 to 10. The maximum number of iterations allowed was set as 300 and a random seed was set for reproducibility. We used two metrics to determine the optimal number o...

work page
[6]

Relating social support cluster membership to gender identity and physics identity We related students’ cluster membership to other variables using a post-hoc analysis as in prior work [5, 8]. First, we used logistic regression to examine the relationship between student gender identity, particularly non-binary status, and k- modes cluster membership (rec...

work page
[7]

High professional and identity-based support

Class enumeration We estimated LCA models using maximum likelihood es- timation with robust standard errors in MplusAutomation in R (Version 4.3.2) [42]. We estimated the models with 200 random sets of starting values, as recommended by Hipp and Bauer [43], to ensure that the model converged on a global rather than a local solution. The algorithm first ra...

work page 2000
[8]

Here we demonstrate an example of this to address the second re- search question

Relating social support class membership to gender identity and physics identity Another advantage of LCA (and mixture modeling more broadly) is that it allows for integrating the identified classes into a larger system of auxiliary variables to understand how the emergent classes relate to other measured variables. Here we demonstrate an example of this ...

work page
[9]

Research commentary: A gap-gazing fetish in mathematics education? Problematizing research on the achievement gap

Rochelle Guti ´errez. Research commentary: A gap-gazing fetish in mathematics education? Problematizing research on the achievement gap. Journal for Research in Mathematics Edu- cation, 39(4):357–364, 2008

work page 2008
[10]

Harper and Andrew H

Shaun R. Harper and Andrew H. Nichols. Are they not all the same?: Racial heterogeneity among black male undergraduates. Journal of College Student Development, 49(3):199–214, 2008

work page 2008
[11]

Jarrad W. T. Pond and Jacquelyn J. Chini. Exploring student learning profiles in algebra-based studio physics: A person- centered approach. Physical Review Physics Education Re- search, 13:010119, 2017

work page 2017
[12]

Unsupervised quantitative methods to analyze student reasoning lines: Theoretical aspects and examples

Onofrio Rosario Battaglia, Benedetto Di Paola, and Claudio Fazio. Unsupervised quantitative methods to analyze student reasoning lines: Theoretical aspects and examples. Physical Review Physics Education Research, 15:020112, 2019

work page 2019
[13]

Quinn, Michelle M

Katherine N. Quinn, Michelle M. Kelley, Kathryn L. McGill, Emily M. Smith, Zachary Whipps, and N. G. Holmes. Group roles in unstructured labs show inequitable gender divide.Phys- ical Review Physics Education Research, 16:010129, 2020

work page 2020
[14]

Doty, Ashley A

Tong Wan, Constance M. Doty, Ashley A. Geraets, Christo- pher A. Nix, Erin K. H. Saitta, and Jacquelyn J. Chini. Evaluat- ing the impact of a classroom simulator training on graduate teaching assistants’ instructional practices and undergraduate student learning. Physical Review Physics Education Research, 17:010146, 2021

work page 2021
[15]

Wu, Cole Walsh, Ashley B

Meagan Sundstrom, David G. Wu, Cole Walsh, Ashley B. Heim, and N. G. Holmes. Examining the effects of lab instruc- tion and gender composition on intergroup interaction networks in introductory physics labs. Physical Review Physics Educa- tion Research, 18:010102, 2022

work page 2022
[16]

Gerd Kortemeyer and Wolfgang Bauer. Cheat sites and artificial intelligence usage in online introductory physics courses: What is the extent and what effect does it have on assessments?Phys- ical Review Physics Education Research, 20:010145, 2024

work page 2024
[17]

Char- acterizing active learning environments in physics using latent profile analysis

Kelley Commeford, Eric Brewe, and Adrienne Traxler. Char- acterizing active learning environments in physics using latent profile analysis. Physical Review Physics Education Research, 18(1):010113, 2022

work page 2022
[18]

Open Science Framework

Minghui Wang, Meagan Sundstrom, Karen Nylund-Gibson, and Marsha Ing. Open Science Framework. https://osf. io/7y9gf/, 2025

work page 2025
[19]

Some methods for classification and anal- ysis of multivariate observations

James MacQueen. Some methods for classification and anal- ysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probabil- ity, Volume 1: Statistics, volume 5, pages 281–298. University of California Press, 1967

work page 1967
[20]

Algorithms for hierarchi- cal clustering: An overview

Fionn Murtagh and Pedro Contreras. Algorithms for hierarchi- cal clustering: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1):86–97, 2012

work page 2012
[21]

Reynolds

Douglas A. Reynolds. Gaussian mixture models. Encyclopedia of Biometrics, 741(659-663):3, 2009

work page 2009
[22]

DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN

Erich Schubert, J ¨org Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3):1–21, 2017

work page 2017
[23]

Unsupervised deep embedding for clustering analysis

Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. In International Con- ference on Machine Learning, pages 478–487. PMLR, 2016

work page 2016
[24]

Muth ´en

Bengt O. Muth ´en. Latent variable mixture modeling. In New Developments and Techniques in Structural Equation Model- ing, pages 21–54. Psychology Press, 2001

work page 2001
[25]

Katherine E. Masyn. Latent class analysis and finite mixture modeling. In Todd D. Little, editor, The Oxford Handbook of Quantitative Methods, volume 2, pages 551–611. Oxford Uni- versity Press, New York, 2013

work page 2013
[26]

An introduction to statistical learning, volume 112

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshi- rani, et al. An introduction to statistical learning, volume 112. Springer, 2013

work page 2013
[27]

A fast clustering algorithm to cluster very large categorical data sets in data mining

Zhexue Huang. A fast clustering algorithm to cluster very large categorical data sets in data mining. Dmkd, 3(8):34–39, 1997

work page 1997
[28]

Ten frequently asked questions about latent class analysis.Translational Issues in Psychological Science, 4(4):440, 2018

Karen Nylund-Gibson and Andrew Young Choi. Ten frequently asked questions about latent class analysis.Translational Issues in Psychological Science, 4(4):440, 2018

work page 2018
[29]

Vermunt and Jay Magidson

Jeroen K. Vermunt and Jay Magidson. Latent class cluster anal- ysis. Applied Latent Class Analysis, 11(89-106):60, 2002

work page 2002
[30]

K-means clustering with outlier removal

Guojun Gan and Michael Kwok-Po Ng. K-means clustering with outlier removal. Pattern Recognition Letters , 90:8–14, 2017

work page 2017
[31]

k-means–: A unified ap- proach to clustering and outlier detection

Sanjay Chawla and Aristides Gionis. k-means–: A unified ap- proach to clustering and outlier detection. InProceedings of the 2013 SIAM International Conference on Data Mining , pages 189–197. SIAM, 2013

work page 2013
[32]

A latent transition mixture model using the three- step specification

Karen Nylund-Gibson, Ryan Grimm, Matt Quirk, and Michael Furlong. A latent transition mixture model using the three- step specification. Structural Equation Modeling: A Multidis- ciplinary Journal, 21(3):439–454, 2014

work page 2014
[33]

Grant, Jodi McCloskey, Meghan Hatfield, Con- nie Uratsu, James D

Richard W. Grant, Jodi McCloskey, Meghan Hatfield, Con- nie Uratsu, James D. Ralston, Elizabeth Bayliss, and Chris J. Kennedy. Use of latent class analysis and k-means cluster- ing to identify complex patient profiles. JAMA Network Open, 3(12):e2029068–e2029068, 2020

work page 2020
[34]

Cooper, Xiao Hu, Roma Maguire, Kathi Apostolidis, Jo Armes, Yvette P

Nikoloas Papachristou, Payam Barnaghi, Bruce A. Cooper, Xiao Hu, Roma Maguire, Kathi Apostolidis, Jo Armes, Yvette P. Conley, Marilyn Hammer, Stylianos Katsaragakis, et al. Congruence between latent class and k-modes analyses in the identification of oncology patients with distinct symp- tom experiences. Journal of Pain and Symptom Management , 55(2):318–...

work page 2018
[35]

Latent class models for clustering: A comparison with k-means

Jay Magidson and Jeroen Vermunt. Latent class models for clustering: A comparison with k-means. Canadian Journal of Marketing Research, 20(1):36–43, 2002

work page 2002
[36]

Women and science careers: Leaky pipeline or gender filter? Gender and Education , 17(4):369– 386, 2005

Jacob Clark Blickenstaff. Women and science careers: Leaky pipeline or gender filter? Gender and Education , 17(4):369– 386, 2005

work page 2005
[37]

Sax, Kathleen J

Linda J. Sax, Kathleen J. Lehman, Ram ´on S. Barthelemy, and Gloria Lim. Women in physics: A comparison to science, tech- nology, engineering, and math education over four decades. Physical Review Physics Education Research , 12(2):020108, 2016

work page 2016
[38]

Women in physics and 13 astronomy, 2019

Anne Marie Porter and Rachel Ivie. Women in physics and 13 astronomy, 2019. AIP Statistical Research Center, 2019

work page 2019
[39]

Maxwell Franklin, Eric Brewe, and Annette R. Ponnock. Ex- amining reasons undergraduate women join physics. Physical Review Physics Education Research, 19:010110, 2023

work page 2023
[40]

Gutzwa, Ram ´on S

Justin A. Gutzwa, Ram ´on S. Barthelemy, Camila Amaral, Madison Swirtz, Adrienne Traxler, and Charles Henderson. How women and lesbian, gay, bisexual, transgender, and queer physics doctoral students navigate graduate education: The roles of professional environments and social networks. Physi- cal Review Physics Education Research, 20:020115, 2024

work page 2024
[41]

Egocentric mixed-methods SNA: Analyzing inter- views with women and/or queer and LGBT+ Ph.D

Chase Hatcher, Lily Donis, Adrienne Traxler, Madison Swirtz, Camila Manni, Justin Gutzwa, Charles Henderson, and Ram ´on Barthelemy. Egocentric mixed-methods SNA: Analyzing inter- views with women and/or queer and LGBT+ Ph.D. physicists. arXiv preprint arXiv:2504.10621, 2025

work page arXiv 2025
[42]

What correlates with persis- tence of women in physics?Physical Review Physics Education Research, 21:010115, 2025

Maxwell Franklin and Eric Brewe. What correlates with persis- tence of women in physics?Physical Review Physics Education Research, 21:010115, 2025

work page 2025
[43]

Dropout from higher education: A theoretical synthesis of recent research

Vincent Tinto. Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1):89–125, 1975

work page 1975
[44]

McCoy, Rachelle Winkle-Wagner, and Imani Barnes

Paris Wicker, Dorian L. McCoy, Rachelle Winkle-Wagner, and Imani Barnes. A web of support: A critical narrative analysis of black women’s relationships in stem disciplines.The Review of Higher Education, 47(1):93–125, 2023

work page 2023
[45]

Kim and Gale M

Ann Y . Kim and Gale M. Sinatra. Science identity development: An interactionist approach. International Journal of STEM Ed- ucation, 5:1–6, 2018

work page 2018
[46]

Social networks, social capital, social support and academic success in higher education: A systematic review with a special focus on ‘underrepresented’ students

Shweta Mishra. Social networks, social capital, social support and academic success in higher education: A systematic review with a special focus on ‘underrepresented’ students. Educa- tional Research Review, 29:100307, 2020

work page 2020
[47]

Sadler, and Marie- Claire Shanahan

Zahra Hazari, Gerhard Sonnert, Philip M. Sadler, and Marie- Claire Shanahan. Connecting high school physics experi- ences, outcome expectations, physics identity, and physics ca- reer choice: A gender study. Journal of Research in Science Teaching, 47(8):978–1003, 2010

work page 2010
[48]

Zahra Hazari, Deepa Chari, Geoff Potvin, and Eric Brewe. The context dependence of physics identity: Examining the role of performance/competence, recognition, interest, and sense of belonging for lower and upper female physics undergraduates. Journal of Research in Science Teaching , 57(10):1583–1607, 2020

work page 2020
[49]

Rousseeuw

Peter J. Rousseeuw. Silhouettes: A graphical aid to the inter- pretation and validation of cluster analysis. Journal of Compu- tational and Applied Mathematics, 20:53–65, 1987

work page 1987
[50]

Hallquist and Joshua F

Michael N. Hallquist and Joshua F. Wiley. MplusAutomation: An R package for facilitating large-scale latent variable anal- yses in M plus. Structural Equation Modeling: A Multidisci- plinary Journal, 25(4):621–638, 2018

work page 2018
[51]

Hipp and Daniel J

John R. Hipp and Daniel J. Bauer. Local solutions in the es- timation of growth mixture models. Psychological methods, 11(1):36, 2006

work page 2006
[52]

Daniel S. Nagin. Group-based modeling of development. Har- vard University Press, 2005

work page 2005
[53]

Jeroen K. Vermunt. Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4):450– 469, 2010

work page 2010
[54]

Latent profile analysis: A review and “how to” guide of its application within vocational behavior re- search

Daniel Spurk, Andreas Hirschi, Mo Wang, Domingo Valero, and Simone Kauffeld. Latent profile analysis: A review and “how to” guide of its application within vocational behavior re- search. Journal of Vocational Behavior, 120:103445, 2020

work page 2020
[55]

Garber, Delwin B

Karen Nylund-Gibson, Adam C. Garber, Delwin B. Carter, Meiki Chan, Dina A. N. Arch, Odelia Simon, Kelly Whaling, Erica Tartt, and Smaranda I. Lawrie. Ten frequently asked ques- tions about latent transition analysis. Psychological Methods, 28(2):284, 2023

work page 2023
[56]

High professional and low identity-based support

Gitta H. Lubke and Bengt Muth ´en. Investigating population heterogeneity with factor mixture models. Psychological Meth- ods, 10(1):21, 2005. 14 APPENDIX A. Model convergence rates for latent class analysis Tables V and VI show the convergence and log-likelihood replication rates for different numbers of initial and final ran- dom starts of the models. A...

work page 2005
[57]

There is strong alignment between Class 2 (Low professional and identity- based support) and Cluster 2, with about 94% of students in Class 2 being assigned to Cluster 2

The other half of Class 1 is assigned to Cluster 2 (High professional and low identity-based support). There is strong alignment between Class 2 (Low professional and identity- based support) and Cluster 2, with about 94% of students in Class 2 being assigned to Cluster 2. Therefore, the two al- gorithms produce slightly different classifications when cre...

work page

[1] [1]

Structural parameters: the proportion of the population belonging to each class, P (c = k), which indicates the relative class size, and

work page

[2] [2]

posterior

Measurement parameters: the conditional item proba- bilities, or the probability that a student in classk would endorse a specific indicator j, P (uj = 1|c = k) [20]. The basic LCA model assumes conditional independence, meaning that the latent class variable creating the subgroups explains all of the shared variance in the observed indicators. To estimat...

work page

[3] [3]

What combinations of social support do undergraduate women and gender minorities in physics draw upon?

work page

[4] [4]

What groups and/or people have you drawn support from in your journey through undergraduate physics (select all that apply)?

How does students’ combination of social support relate to their gender identity and physics identity? We note that we do not aim to completely address these re- search questions; rather, we use them to illustrate a practical application of clustering methods in PER. As such, we sim- plify our analysis to only include one gender identity as a pre- dictor ...

work page 2025

[5] [5]

High professional and identity-based support,

Cluster identification We performed the k-modes clustering using the klaR pack- age in R (Version 4.3.2) [19]. The clustering was performed iteratively for a range of cluster values ( k) from 1 to 10. The maximum number of iterations allowed was set as 300 and a random seed was set for reproducibility. We used two metrics to determine the optimal number o...

work page

[6] [6]

Relating social support cluster membership to gender identity and physics identity We related students’ cluster membership to other variables using a post-hoc analysis as in prior work [5, 8]. First, we used logistic regression to examine the relationship between student gender identity, particularly non-binary status, and k- modes cluster membership (rec...

work page

[7] [7]

High professional and identity-based support

Class enumeration We estimated LCA models using maximum likelihood es- timation with robust standard errors in MplusAutomation in R (Version 4.3.2) [42]. We estimated the models with 200 random sets of starting values, as recommended by Hipp and Bauer [43], to ensure that the model converged on a global rather than a local solution. The algorithm first ra...

work page 2000

[8] [8]

Here we demonstrate an example of this to address the second re- search question

Relating social support class membership to gender identity and physics identity Another advantage of LCA (and mixture modeling more broadly) is that it allows for integrating the identified classes into a larger system of auxiliary variables to understand how the emergent classes relate to other measured variables. Here we demonstrate an example of this ...

work page

[9] [9]

Research commentary: A gap-gazing fetish in mathematics education? Problematizing research on the achievement gap

Rochelle Guti ´errez. Research commentary: A gap-gazing fetish in mathematics education? Problematizing research on the achievement gap. Journal for Research in Mathematics Edu- cation, 39(4):357–364, 2008

work page 2008

[10] [10]

Harper and Andrew H

Shaun R. Harper and Andrew H. Nichols. Are they not all the same?: Racial heterogeneity among black male undergraduates. Journal of College Student Development, 49(3):199–214, 2008

work page 2008

[11] [11]

Jarrad W. T. Pond and Jacquelyn J. Chini. Exploring student learning profiles in algebra-based studio physics: A person- centered approach. Physical Review Physics Education Re- search, 13:010119, 2017

work page 2017

[12] [12]

Unsupervised quantitative methods to analyze student reasoning lines: Theoretical aspects and examples

Onofrio Rosario Battaglia, Benedetto Di Paola, and Claudio Fazio. Unsupervised quantitative methods to analyze student reasoning lines: Theoretical aspects and examples. Physical Review Physics Education Research, 15:020112, 2019

work page 2019

[13] [13]

Quinn, Michelle M

Katherine N. Quinn, Michelle M. Kelley, Kathryn L. McGill, Emily M. Smith, Zachary Whipps, and N. G. Holmes. Group roles in unstructured labs show inequitable gender divide.Phys- ical Review Physics Education Research, 16:010129, 2020

work page 2020

[14] [14]

Doty, Ashley A

Tong Wan, Constance M. Doty, Ashley A. Geraets, Christo- pher A. Nix, Erin K. H. Saitta, and Jacquelyn J. Chini. Evaluat- ing the impact of a classroom simulator training on graduate teaching assistants’ instructional practices and undergraduate student learning. Physical Review Physics Education Research, 17:010146, 2021

work page 2021

[15] [15]

Wu, Cole Walsh, Ashley B

Meagan Sundstrom, David G. Wu, Cole Walsh, Ashley B. Heim, and N. G. Holmes. Examining the effects of lab instruc- tion and gender composition on intergroup interaction networks in introductory physics labs. Physical Review Physics Educa- tion Research, 18:010102, 2022

work page 2022

[16] [16]

Gerd Kortemeyer and Wolfgang Bauer. Cheat sites and artificial intelligence usage in online introductory physics courses: What is the extent and what effect does it have on assessments?Phys- ical Review Physics Education Research, 20:010145, 2024

work page 2024

[17] [17]

Char- acterizing active learning environments in physics using latent profile analysis

Kelley Commeford, Eric Brewe, and Adrienne Traxler. Char- acterizing active learning environments in physics using latent profile analysis. Physical Review Physics Education Research, 18(1):010113, 2022

work page 2022

[18] [18]

Open Science Framework

Minghui Wang, Meagan Sundstrom, Karen Nylund-Gibson, and Marsha Ing. Open Science Framework. https://osf. io/7y9gf/, 2025

work page 2025

[19] [19]

Some methods for classification and anal- ysis of multivariate observations

James MacQueen. Some methods for classification and anal- ysis of multivariate observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probabil- ity, Volume 1: Statistics, volume 5, pages 281–298. University of California Press, 1967

work page 1967

[20] [20]

Algorithms for hierarchi- cal clustering: An overview

Fionn Murtagh and Pedro Contreras. Algorithms for hierarchi- cal clustering: An overview. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 2(1):86–97, 2012

work page 2012

[21] [21]

Reynolds

Douglas A. Reynolds. Gaussian mixture models. Encyclopedia of Biometrics, 741(659-663):3, 2009

work page 2009

[22] [22]

DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN

Erich Schubert, J ¨org Sander, Martin Ester, Hans Peter Kriegel, and Xiaowei Xu. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems (TODS), 42(3):1–21, 2017

work page 2017

[23] [23]

Unsupervised deep embedding for clustering analysis

Junyuan Xie, Ross Girshick, and Ali Farhadi. Unsupervised deep embedding for clustering analysis. In International Con- ference on Machine Learning, pages 478–487. PMLR, 2016

work page 2016

[24] [24]

Muth ´en

Bengt O. Muth ´en. Latent variable mixture modeling. In New Developments and Techniques in Structural Equation Model- ing, pages 21–54. Psychology Press, 2001

work page 2001

[25] [25]

Katherine E. Masyn. Latent class analysis and finite mixture modeling. In Todd D. Little, editor, The Oxford Handbook of Quantitative Methods, volume 2, pages 551–611. Oxford Uni- versity Press, New York, 2013

work page 2013

[26] [26]

An introduction to statistical learning, volume 112

Gareth James, Daniela Witten, Trevor Hastie, Robert Tibshi- rani, et al. An introduction to statistical learning, volume 112. Springer, 2013

work page 2013

[27] [27]

A fast clustering algorithm to cluster very large categorical data sets in data mining

Zhexue Huang. A fast clustering algorithm to cluster very large categorical data sets in data mining. Dmkd, 3(8):34–39, 1997

work page 1997

[28] [28]

Ten frequently asked questions about latent class analysis.Translational Issues in Psychological Science, 4(4):440, 2018

Karen Nylund-Gibson and Andrew Young Choi. Ten frequently asked questions about latent class analysis.Translational Issues in Psychological Science, 4(4):440, 2018

work page 2018

[29] [29]

Vermunt and Jay Magidson

Jeroen K. Vermunt and Jay Magidson. Latent class cluster anal- ysis. Applied Latent Class Analysis, 11(89-106):60, 2002

work page 2002

[30] [30]

K-means clustering with outlier removal

Guojun Gan and Michael Kwok-Po Ng. K-means clustering with outlier removal. Pattern Recognition Letters , 90:8–14, 2017

work page 2017

[31] [31]

k-means–: A unified ap- proach to clustering and outlier detection

Sanjay Chawla and Aristides Gionis. k-means–: A unified ap- proach to clustering and outlier detection. InProceedings of the 2013 SIAM International Conference on Data Mining , pages 189–197. SIAM, 2013

work page 2013

[32] [32]

A latent transition mixture model using the three- step specification

Karen Nylund-Gibson, Ryan Grimm, Matt Quirk, and Michael Furlong. A latent transition mixture model using the three- step specification. Structural Equation Modeling: A Multidis- ciplinary Journal, 21(3):439–454, 2014

work page 2014

[33] [33]

Grant, Jodi McCloskey, Meghan Hatfield, Con- nie Uratsu, James D

Richard W. Grant, Jodi McCloskey, Meghan Hatfield, Con- nie Uratsu, James D. Ralston, Elizabeth Bayliss, and Chris J. Kennedy. Use of latent class analysis and k-means cluster- ing to identify complex patient profiles. JAMA Network Open, 3(12):e2029068–e2029068, 2020

work page 2020

[34] [34]

Cooper, Xiao Hu, Roma Maguire, Kathi Apostolidis, Jo Armes, Yvette P

Nikoloas Papachristou, Payam Barnaghi, Bruce A. Cooper, Xiao Hu, Roma Maguire, Kathi Apostolidis, Jo Armes, Yvette P. Conley, Marilyn Hammer, Stylianos Katsaragakis, et al. Congruence between latent class and k-modes analyses in the identification of oncology patients with distinct symp- tom experiences. Journal of Pain and Symptom Management , 55(2):318–...

work page 2018

[35] [35]

Latent class models for clustering: A comparison with k-means

Jay Magidson and Jeroen Vermunt. Latent class models for clustering: A comparison with k-means. Canadian Journal of Marketing Research, 20(1):36–43, 2002

work page 2002

[36] [36]

Women and science careers: Leaky pipeline or gender filter? Gender and Education , 17(4):369– 386, 2005

Jacob Clark Blickenstaff. Women and science careers: Leaky pipeline or gender filter? Gender and Education , 17(4):369– 386, 2005

work page 2005

[37] [37]

Sax, Kathleen J

Linda J. Sax, Kathleen J. Lehman, Ram ´on S. Barthelemy, and Gloria Lim. Women in physics: A comparison to science, tech- nology, engineering, and math education over four decades. Physical Review Physics Education Research , 12(2):020108, 2016

work page 2016

[38] [38]

Women in physics and 13 astronomy, 2019

Anne Marie Porter and Rachel Ivie. Women in physics and 13 astronomy, 2019. AIP Statistical Research Center, 2019

work page 2019

[39] [39]

Maxwell Franklin, Eric Brewe, and Annette R. Ponnock. Ex- amining reasons undergraduate women join physics. Physical Review Physics Education Research, 19:010110, 2023

work page 2023

[40] [40]

Gutzwa, Ram ´on S

Justin A. Gutzwa, Ram ´on S. Barthelemy, Camila Amaral, Madison Swirtz, Adrienne Traxler, and Charles Henderson. How women and lesbian, gay, bisexual, transgender, and queer physics doctoral students navigate graduate education: The roles of professional environments and social networks. Physi- cal Review Physics Education Research, 20:020115, 2024

work page 2024

[41] [41]

Egocentric mixed-methods SNA: Analyzing inter- views with women and/or queer and LGBT+ Ph.D

Chase Hatcher, Lily Donis, Adrienne Traxler, Madison Swirtz, Camila Manni, Justin Gutzwa, Charles Henderson, and Ram ´on Barthelemy. Egocentric mixed-methods SNA: Analyzing inter- views with women and/or queer and LGBT+ Ph.D. physicists. arXiv preprint arXiv:2504.10621, 2025

work page arXiv 2025

[42] [42]

What correlates with persis- tence of women in physics?Physical Review Physics Education Research, 21:010115, 2025

Maxwell Franklin and Eric Brewe. What correlates with persis- tence of women in physics?Physical Review Physics Education Research, 21:010115, 2025

work page 2025

[43] [43]

Dropout from higher education: A theoretical synthesis of recent research

Vincent Tinto. Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1):89–125, 1975

work page 1975

[44] [44]

McCoy, Rachelle Winkle-Wagner, and Imani Barnes

Paris Wicker, Dorian L. McCoy, Rachelle Winkle-Wagner, and Imani Barnes. A web of support: A critical narrative analysis of black women’s relationships in stem disciplines.The Review of Higher Education, 47(1):93–125, 2023

work page 2023

[45] [45]

Kim and Gale M

Ann Y . Kim and Gale M. Sinatra. Science identity development: An interactionist approach. International Journal of STEM Ed- ucation, 5:1–6, 2018

work page 2018

[46] [46]

Social networks, social capital, social support and academic success in higher education: A systematic review with a special focus on ‘underrepresented’ students

Shweta Mishra. Social networks, social capital, social support and academic success in higher education: A systematic review with a special focus on ‘underrepresented’ students. Educa- tional Research Review, 29:100307, 2020

work page 2020

[47] [47]

Sadler, and Marie- Claire Shanahan

Zahra Hazari, Gerhard Sonnert, Philip M. Sadler, and Marie- Claire Shanahan. Connecting high school physics experi- ences, outcome expectations, physics identity, and physics ca- reer choice: A gender study. Journal of Research in Science Teaching, 47(8):978–1003, 2010

work page 2010

[48] [48]

Zahra Hazari, Deepa Chari, Geoff Potvin, and Eric Brewe. The context dependence of physics identity: Examining the role of performance/competence, recognition, interest, and sense of belonging for lower and upper female physics undergraduates. Journal of Research in Science Teaching , 57(10):1583–1607, 2020

work page 2020

[49] [49]

Rousseeuw

Peter J. Rousseeuw. Silhouettes: A graphical aid to the inter- pretation and validation of cluster analysis. Journal of Compu- tational and Applied Mathematics, 20:53–65, 1987

work page 1987

[50] [50]

Hallquist and Joshua F

Michael N. Hallquist and Joshua F. Wiley. MplusAutomation: An R package for facilitating large-scale latent variable anal- yses in M plus. Structural Equation Modeling: A Multidisci- plinary Journal, 25(4):621–638, 2018

work page 2018

[51] [51]

Hipp and Daniel J

John R. Hipp and Daniel J. Bauer. Local solutions in the es- timation of growth mixture models. Psychological methods, 11(1):36, 2006

work page 2006

[52] [52]

Daniel S. Nagin. Group-based modeling of development. Har- vard University Press, 2005

work page 2005

[53] [53]

Jeroen K. Vermunt. Latent class modeling with covariates: Two improved three-step approaches. Political Analysis, 18(4):450– 469, 2010

work page 2010

[54] [54]

Latent profile analysis: A review and “how to” guide of its application within vocational behavior re- search

Daniel Spurk, Andreas Hirschi, Mo Wang, Domingo Valero, and Simone Kauffeld. Latent profile analysis: A review and “how to” guide of its application within vocational behavior re- search. Journal of Vocational Behavior, 120:103445, 2020

work page 2020

[55] [55]

Garber, Delwin B

Karen Nylund-Gibson, Adam C. Garber, Delwin B. Carter, Meiki Chan, Dina A. N. Arch, Odelia Simon, Kelly Whaling, Erica Tartt, and Smaranda I. Lawrie. Ten frequently asked ques- tions about latent transition analysis. Psychological Methods, 28(2):284, 2023

work page 2023

[56] [56]

High professional and low identity-based support

Gitta H. Lubke and Bengt Muth ´en. Investigating population heterogeneity with factor mixture models. Psychological Meth- ods, 10(1):21, 2005. 14 APPENDIX A. Model convergence rates for latent class analysis Tables V and VI show the convergence and log-likelihood replication rates for different numbers of initial and final ran- dom starts of the models. A...

work page 2005

[57] [57]

There is strong alignment between Class 2 (Low professional and identity- based support) and Cluster 2, with about 94% of students in Class 2 being assigned to Cluster 2

The other half of Class 1 is assigned to Cluster 2 (High professional and low identity-based support). There is strong alignment between Class 2 (Low professional and identity- based support) and Cluster 2, with about 94% of students in Class 2 being assigned to Cluster 2. Therefore, the two al- gorithms produce slightly different classifications when cre...

work page