Discovering Misconceptions and Misunderstandings From Administrations of Research-Designed Multiple Choice Instruments
Pith reviewed 2026-06-27 14:25 UTC · model grok-4.3
The pith
A multidimensional model applied to 34,000 Force Concept Inventory responses extracts 22 coherent student misconceptions in Newtonian mechanics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a flexible multidimensional item-response model that lets different answer choices within each question point in different directions, the analysis of approximately 34,000 Force Concept Inventory administrations uncovers 22 robust, partly-overlapping dimensions. Each dimension is defined by distractors that share a coherent theme identifiable with a misconception or misunderstanding. These are sorted by historical era into Ancient, Medieval, and Post-Newtonian groups. Simple misconception scores are then computed for students and classes, revealing that some misconceptions remain largely unchanged by instruction while others are better remediated in below- or above-average students, wi
What carries the argument
The flexible multidimensional item-response model for multiple-choice data, which allows answer choices to occupy different directions in the knowledge space so that distinct misconceptions encoded in distractors can be separated.
If this is right
- Misconception scores can be calculated for individual students or entire classes to track specific errors.
- Instruction leaves some misconceptions largely unchanged while remediating others more effectively in students of higher or lower ability.
- Many misconceptions remain poorly addressed for students of average or below-average ability.
- Instructors gain a tool for class-level formative assessment focused on particular alternate ideas.
Where Pith is reading between the lines
- The same modeling approach could be applied to other multiple-choice concept inventories to surface hidden misconceptions in different domains.
- The historical classification implies that some persistent errors may benefit from teaching that explicitly contrasts pre-Newtonian ideas with modern ones.
- If the dimensions prove stable across populations, they could serve as targets for controlled experiments comparing different remediation strategies.
Load-bearing premise
The dimensions extracted by the model correspond to genuine, stable student misconceptions rather than statistical artifacts of the chosen parameterization or the specific distractors in the test items.
What would settle it
Repeating the full analysis with an alternate multidimensional parameterization or on a different concept inventory and finding that the resulting dimensions no longer group into coherent, historically recognizable misconceptions would falsify the central claim.
Figures
read the original abstract
Misconceptions are "alternate hypotheses" that are incorrect according to established theories of how the world works. Often held with confidence by students, they are relatively context-insensitive, can seem like common-sense views, and are noted for being resistant to remediation using traditional instruction. To find misconceptions in Newtonian mechanics, we analyze ~34,000 administrations of the pioneering Force Concept Inventory using a flexible multidimensional item-response model for multiple-choice data. In contrast to most earlier work, we allow answer choices within each question to have different directions in the multidimensional space of student knowledge, essential for concept inventories in which distractors often codify distinct misconceptions. We uncover 22 robust, partly-overlapping dimensions whose distractors share a coherent theme identifiable with a misconception or misunderstanding. Motivated by the realization that many mirror previously-accepted theories of mechanics, we broadly sort these by historical era: Ancient (learned by infants but codified by Greeks), Medieval (reactions and extensions of Aristotelian ideas), and Post-Newtonian (including known modern misconceptions as well as two which appear novel). We also present a simple approach for computing "misconception scores" for students and classes. Examining these scores before and after instruction reveals surprisingly varied patterns of remediation in our sample: some misconceptions persist largely unchanged by instruction, while others are better remediated in below- or above-average students. In general, we find that many misconceptions are poorly remediated for students of average or lower ability. We hope our work will serve as a guide for developing, evaluating, and improving interventions for these while providing physics instructors with a valuable tool for class-level formative assessment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper applies a flexible multidimensional item-response model (allowing answer choices to point in different directions) to ~34,000 Force Concept Inventory administrations. It claims to extract 22 robust, partly-overlapping dimensions whose distractors share coherent themes identifiable as misconceptions, sorts them historically into Ancient, Medieval, and Post-Newtonian categories, and introduces misconception scores whose pre/post-instruction changes reveal varied remediation patterns, with many misconceptions poorly remediated for average or lower-ability students.
Significance. If the dimensions prove stable and generalizable beyond the specific model and item set, the work would supply physics education researchers with a data-driven taxonomy of misconceptions and a practical scoring method for formative assessment, potentially guiding more targeted interventions than unidimensional FCI scoring.
major comments (2)
- [Methods (IRT model and dimension extraction)] The central claim that the 22 dimensions are 'robust' and reflect genuine misconceptions (rather than artifacts of the chosen multidimensional IRT parameterization or distractor correlations) is load-bearing, yet the abstract supplies no description of the robustness procedure, cross-validation, hold-out testing, sensitivity to dimension count or link function, or comparison against unidimensional baselines. This information is required to evaluate the claim.
- [Results (dimension extraction and robustness)] No model-fit statistics, information criteria, or details on how the dimensionality was selected or validated are reported. Without these, it is impossible to determine whether the 22 dimensions are overparameterized or whether the observed thematic coherence arises from the model structure itself.
minor comments (2)
- [Discussion (historical classification)] The historical-era sorting of dimensions is presented as motivated by prior theories but would benefit from an explicit decision rule or inter-rater procedure to avoid appearing post-hoc.
- [Methods (misconception scores)] The misconception-score formula is described as 'simple' but its exact definition, normalization, and handling of overlapping dimensions should be stated explicitly with an equation.
Simulated Author's Rebuttal
Thank you for the opportunity to respond to the referee's report. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods (IRT model and dimension extraction)] The central claim that the 22 dimensions are 'robust' and reflect genuine misconceptions (rather than artifacts of the chosen multidimensional IRT parameterization or distractor correlations) is load-bearing, yet the abstract supplies no description of the robustness procedure, cross-validation, hold-out testing, sensitivity to dimension count or link function, or comparison against unidimensional baselines. This information is required to evaluate the claim.
Authors: We agree that the abstract lacks a summary of the robustness checks. The full manuscript details the cross-validation and hold-out procedures used to establish the stability of the 22 dimensions, along with comparisons showing improved fit relative to unidimensional models. We will revise the abstract to include a concise description of these procedures and add explicit sensitivity analyses for dimension count in the methods section. We did not perform a full sensitivity analysis on the link function in the original work; this can be added as a supplementary check if requested. revision: yes
-
Referee: [Results (dimension extraction and robustness)] No model-fit statistics, information criteria, or details on how the dimensionality was selected or validated are reported. Without these, it is impossible to determine whether the 22 dimensions are overparameterized or whether the observed thematic coherence arises from the model structure itself.
Authors: We acknowledge that the manuscript would benefit from explicit reporting of model-fit statistics. In the revised version we will add AIC, BIC, and other information criteria, together with a step-by-step account of how dimensionality was selected through successive model comparisons and validation. These additions will allow readers to assess whether the 22 dimensions are overparameterized. revision: yes
Circularity Check
No significant circularity; derivation is data-driven extraction from external administrations.
full rationale
The paper applies a multidimensional IRT model to an external dataset of ~34,000 FCI administrations and extracts dimensions whose themes are identified post-hoc from the data. No step defines the target dimensions in terms of the fitted parameters themselves, renames a fitted quantity as a prediction, or relies on a self-citation chain whose content is unverified outside the present work. The central result is therefore an empirical finding rather than a tautological re-expression of inputs or prior author claims.
Axiom & Free-Parameter Ledger
free parameters (1)
- dimensionality of the IRT model
axioms (1)
- domain assumption Distractors that load on the same dimension share a coherent misconception theme
Reference graph
Works this paper leans on
-
[1]
Segado, Martin and Adair, Aaron and Stewart, John and Ma, Yunfei and Drury, Byron and Pritchard, David , year =. A. Frontiers in Psychology , volume =. doi:10.3389/fpsyg.2025.1506320 , langid =
-
[2]
Student Misconceptions about Newtonian Mechanics: Origins and Solutions through Changes to Instruction
-
[3]
1957 , Address =
Concepts of Force: A Study in the Foundations of Dynamics , Author =. 1957 , Address =
1957
-
[4]
Chi, Michelene T. H. and Feltovich, Paul J. and Glaser, Robert , year =. Categorization and. Cognitive Science , volume =. doi:10.1207/s15516709cog0502_2 , langid =
-
[5]
Educational and Psychological Measurement , volume = 65, number = 5, pages =
Gradient Projection Algorithms and Software for Arbitrary Rotation Criteria in Factor Analysis , author =. Educational and Psychological Measurement , volume = 65, number = 5, pages =
-
[6]
Journal of the American Statistical Association , publisher =
Variational Inference: A Review for Statisticians , author =. Journal of the American Statistical Association , publisher =
-
[7]
Psychometrika , volume = 37, number = 1, pages =
Estimating Item parameters and Latent Ability when Responses Are Scored in Two or More Nominal Categories , author =. Psychometrika , volume = 37, number = 1, pages =
-
[8]
Applied Psychological Measurement , volume = 12, number = 3, pages =
Full-Information Item Factor Analysis , author =. Applied Psychological Measurement , volume = 12, number = 3, pages =
-
[9]
Characterizing the mathematical problem-solving strategies of transitioning novice physics students , author =. Phys. Rev. Phys. Educ. Res. , volume =. 2020 , publisher =
2020
-
[10]
Philip , year = 2012, journal =
Chalmers, R. Philip , year = 2012, journal =. Mirt: A Multidimensional Item Response Theory Package for the
2012
-
[11]
The Physics Teacher , volume = 30, number = 3, pages =
Force concept inventory , author =. The Physics Teacher , volume = 30, number = 3, pages =
-
[12]
Force Concept Inventory, revised version (v95) , author =
-
[13]
Hestenes, David and Jackson, Jane , year = 2010, howpublished =. Table
2010
-
[14]
Psychometrika , volume = 76, number = 4, pages =
Exploratory Bi-Factor Analysis , author =. Psychometrika , volume = 76, number = 4, pages =
-
[15]
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization , author =. doi:10.48550/arXiv.1412.6980 , howpublished =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1412.6980
-
[16]
Handbook of Item Response Theory, Volume 2: Statistical Tools , year = 2016, publisher =
2016
-
[17]
Natesan, Prathiba and Nandakumar, Ratna and Minka, Tom and Rubright, Jonathan D. , year = 2016, journal =. Bayesian Prior Choice in. doi:10.3389/fpsyg.2016.01422 , pmcid =
-
[18]
Proceedings of the Sixth ACM Conference on Learning @ Scale , location =
Mining Students Pre-instruction Beliefs for Improved Learning , author =. Proceedings of the Sixth ACM Conference on Learning @ Scale , location =
-
[19]
Brown, David E. , year =. Students'. Science & Education , volume =. doi:10.1007/s11191-013-9655-9 , langid =
-
[20]
and Kryjevskaia, Mila and Stetzer, MacKenzie R
Gette, Cody R. and Kryjevskaia, Mila and Stetzer, MacKenzie R. and Heron, Paula R. L. , year =. Probing Student Reasoning Approaches through the Lens of Dual-Process Theories:. Physical Review Physics Education Research , volume =
-
[21]
and Slotta, James D
Chi, Michelene T.H. and Slotta, James D. , year =. The. Cognition and Instruction , volume =
-
[22]
1993 , journal =
Toward an. 1993 , journal =
1993
-
[23]
Composable Effects for Flexible and Accelerated Probabilistic Programming in NumPyro
Phan, Du and Pradhan, Neeraj and Jankowiak, Martin , year = 2019, publisher =. Composable Effects for Flexible and Accelerated Probabilistic Programming in. doi:10.48550/arXiv.1912.11554 , howpublished =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1912.11554 2019
-
[24]
Frontiers in Psychology , volume = 8, doi =
Bayesian Dimensionality Assessment for the Multidimensional Nominal Response Model , author =. Frontiers in Psychology , volume = 8, doi =
-
[25]
Psychometrika , volume = 31, number = 1, pages =
A Generalized Solution of the Orthogonal Procrustes Problem , author =. Psychometrika , volume = 31, number = 1, pages =
-
[26]
Physical Review Physics Education Research , volume = 17, number = 1, pages =
Examining the Relation of Correct Knowledge and Misconceptions Using the Nominal Response Model , author =. Physical Review Physics Education Research , volume = 17, number = 1, pages =
-
[27]
Psychometrika , volume = 52, number = 3, pages =
On the Relationship between Item Response Theory and Factor Analysis of Discretized Variables , author =. Psychometrika , volume = 52, number = 3, pages =
-
[28]
Handbook of Polytomous Item Response Theory Models , publisher =
The Nominal Categories Item Response Model , author =. Handbook of Polytomous Item Response Theory Models , publisher =. doi:10.4324/9780203861264.ch3 , isbn =
-
[29]
Handbook of Item Response Theory , publisher =
Nominal Categories Models , author =. Handbook of Item Response Theory , publisher =
-
[30]
Exploring the structure of misconceptions in the Force Concept Inventory with modified module analysis , author =. Phys. Rev. Phys. Educ. Res. , volume =. 2019 , publisher =
2019
-
[31]
Comparing conceptual understanding across institutions with module analysis , author =. Phys. Rev. Phys. Educ. Res. , volume =. 2022 , publisher =
2022
-
[32]
Psychometrika , volume = 85, number = 2, pages =
A Note on Exploratory Item Factor Analysis by Singular Value Decomposition , author =. Psychometrika , volume = 85, number = 2, pages =
-
[33]
Using the Method of Dominant Incorrect Answers with the. 2017 , journal =. doi:10.1088/1361-6552/52/1/015006 , copyright =
-
[34]
Local Minima and Factor Rotations in Exploratory Factor Analysis. , author =. 2023 , journal =. doi:10.1037/met0000467 , langid =
-
[35]
Scherr, Rachel E. , year =. Modeling Student Thinking:. American Journal of Physics , volume =. doi:10.1119/1.2410013 , abstract =
-
[36]
Using module analysis for multiple choice responses: A new method applied to Force Concept Inventory data , author =. Phys. Rev. Phys. Educ. Res. , volume =. 2016 , publisher =
2016
-
[37]
Exploring the structure of misconceptions in the. Phys. Rev. Phys. Educ. Res. , author =. 2020 , doi =
2020
-
[38]
Quantitatively ranking incorrect responses to multiple-choice questions using item response theory , author=. Phys. Rev. Phys. Educ. Res. , volume=. 2020 , publisher=
2020
-
[39]
The Uniqueness and Significance of Simple Structure Demonstrated by Contrasting Organic ``Natural Structure'' and ``Random Structure'' Data , author =. 1963 , journal =. doi:10.1007/BF02289548 , abstract =
-
[40]
Chi, Michelene T. H. , editor =. Three. Handbook of. 2008 , pages =
2008
-
[41]
Dijksterhuis, E. J. , editor =. The. Critical. 1969 , pages =
1969
-
[42]
2010 , booktitle =
Docktor, Jennifer and Mestre, Jose , pages =. 2010 , booktitle =
2010
-
[43]
Journal of the History of Ideas , author =
Impetus. Journal of the History of Ideas , author =. 1975 , pages =. doi:10.2307/2709009 , number =
-
[44]
Generating a growth-oriented partial credit grading model for the. Phys. Rev. Phys. Educ. Res. , author =. 2019 , pages =. doi:10.1103/PhysRevPhysEducRes.15.020151 , number =
-
[45]
The cognitive revolution in educational psychology , pages=
The impact of the cognitive revolution on science learning and teaching , author=. The cognitive revolution in educational psychology , pages=. 2005 , publisher=
2005
-
[46]
Lee, Sunbok and Chen, Zhongzhou and Pritchard, David and Kimn, Alex and Paul, Andrew , year =. Factor. Proceedings of the. doi:10.1145/3051457.3053984 , copyright =
-
[47]
Tucker's Congruence Coefficient as a Meaningful Index of Factor Similarity , author =. 2006 , journal =. doi:10.1027/1614-2241.2.2.57 , abstract =
-
[48]
Procrustes Matching by Congruence Coefficients , author =. 1976 , journal =. doi:10.1007/BF02296973 , abstract =
-
[49]
Generalized Procrustes Analysis , author =. 1975 , journal =. doi:10.1007/BF02291478 , abstract =
-
[50]
Multivariate Behavioral Research , author =
The. Multivariate Behavioral Research , author =. 1992 , note =. doi:10.1207/s15327906mbr2704_5 , abstract =
-
[51]
Hattori, Minami and Zhang, Guangjian and Preacher, Kristopher J. , year =. Multiple. Multivariate Behavioral Research , volume =. doi:10.1080/00273171.2017.1361312 , abstract =
-
[52]
, year =
Hake, Richard R. , year =. Interactive-Engagement versus Traditional Methods:. American Journal of Physics , volume =
-
[53]
Revuelta, Javier and. Factor. 2020 , journal =. doi:10.1080/10705511.2019.1668276 , abstract =
-
[54]
Lee, Eun and Forthofer, Ronald , year =. Strategies for. Analyzing. doi:10.4135/9781412983341.n4 , isbn =
-
[55]
Koedinger, Kenneth R. and Corbett, Albert T. and Perfetti, Charles , year =. The. Cognitive Science , volume =. doi:10.1111/j.1551-6709.2012.01245.x , langid =
-
[56]
Students' proficiency scores within multitrait item response theory , author=. Phys. Rev. Phys. Educ. Res. , volume=. 2015 , publisher=
2015
-
[57]
and Schumayer, D
Scott, T.F. and Schumayer, D. and Gray, A.R. , Journal =. Exploratory factor analysis of a. 2012 , Number =
2012
-
[58]
and Dietz, R.D
Semak, M.R. and Dietz, R.D. and Pearson, R.H. and Willis, C.W. , journal =. Examining evolving performance on the. 2017 , publisher =
2017
-
[59]
and Zabriskie, C
Stewart, J. and Zabriskie, C. and DeVore, S. and Stewart, G. , journal =. Multidimensional item response theory and the. 2018 , publisher =
2018
-
[60]
and Wells, J
Yang, J. and Wells, J. and Henderson, R. and Christman, E. and Stewart, G. and Stewart, J. , journal=. Extending modified module analysis to include correct responses:. 2020 , publisher=
2020
-
[61]
What Babies Know: Core Knowledge and Composition:
Spelke, Elizabeth , year =. What Babies Know: Core Knowledge and Composition:
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.