Revisiting the Regularity of Student Learning Rate: Sensitivity to Which Observations Are Included
Pith reviewed 2026-05-21 00:42 UTC · model grok-4.3
The pith
Estimates of student variation in learning rate from practice data change sharply depending on which observations the model includes, while initial knowledge estimates remain stable.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When the individual Additive Factors Model is applied to the same 27 datasets but with different rules for how many observations of each student's practice on a given skill are retained, the estimated variance across students in learning rate increases markedly while the estimated variance in initial knowledge does not. One specification raises the learning-rate variance by a median of 118 percent; the second raises it several-fold. The same model and data therefore produce substantially different pictures of how much students differ in learning speed solely because of the choice of which observations enter the fit.
What carries the argument
The individual Additive Factors Model (iAFM), a mixed-effects regression that fits student-specific intercepts for initial knowledge and student-specific slopes for learning rate to sequences of practice observations.
Load-bearing premise
The model is correctly specified, so that differences in estimates arise mainly from which observations are kept rather than from changes in sample composition or unmodeled data features.
What would settle it
A side-by-side test of which observation-inclusion rule yields better out-of-sample predictions of future student performance on held-out practice trials.
Figures
read the original abstract
Mixed-effects models fit to observational practice data are widely used in learning analytics to estimate student-level variation in initial knowledge and learning rate, and the resulting estimates increasingly inform substantive claims about learners. We examine whether such estimates can be read as properties of learners or whether they depend on choices about which observations the model is fit to. As a case study, we revisit the ``astonishing regularity'' reported by Koedinger et al. (2023): that students vary substantially in initial knowledge but much less in learning rate. The finding is based on fits of the individual Additive Factors Model (iAFM) to 27 educational datasets, and rests on a model-derived estimate of student-level learning-rate variation being small in absolute terms. We refit the same model on the same datasets under two specifications, each varying how much of each student's practice on a given skill is used in fitting. The estimate of student-level variation in initial knowledge stays approximately stable across both specifications. The estimate of student-level variation in learning rate does not: it inflates by a median of 118\% under one specification and is several times larger under the other. The same model, fit to the same data, returns substantially different estimates of how much students vary in learning rate depending on which observations are included. When estimates from mixed-effects models on observational practice data are used to support substantive claims about learners, sensitivity to such choices deserves a central place in how those estimates are reported and read.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that student-level variation in learning rate estimated from the individual Additive Factors Model (iAFM) on 27 educational datasets is highly sensitive to choices about which observations are included in the fit. Refitting the identical model under two alternative specifications for observation inclusion (varying how much of each student's practice on a given skill is retained) produces a median 118% inflation in the learning-rate variance estimate under one rule and several-times-larger values under the other, while the estimate of student-level variation in initial knowledge remains approximately stable. The authors conclude that such estimates cannot be read as intrinsic properties of learners without explicit attention to sensitivity to data-inclusion decisions.
Significance. If the central sensitivity result holds after addressing sample-composition controls, the work provides a concrete, reproducible demonstration that mixed-effects estimates of learner differences in observational practice data can shift substantially with routine modeling choices. This strengthens the case for routine robustness reporting in learning analytics and supplies a direct counter-example to the 'astonishing regularity' claim in Koedinger et al. (2023). The use of publicly referenced datasets and identical model re-fits is a methodological strength.
major comments (2)
- [Methods / Results] Methods section (and any results tables reporting variance components): the manuscript must explicitly state and verify whether the set of students and the distribution of skill-opportunity counts are held fixed across the three inclusion specifications. If restricting observations per student-skill pair causes some students to drop below inclusion thresholds or alters the balance of observations per random-effect group, the reported inflation in learning-rate variance could partly reflect changes in effective sample size and shrinkage rather than pure model sensitivity. The abstract notes stability of initial-knowledge variance but does not report diagnostics confirming fixed student sets or within-student observation counts.
- [Results] Results (variance-component tables or figures): provide the per-dataset student counts, mean observations per student-skill pair, and effective sample sizes under each inclusion rule. Without these, it is impossible to rule out that the 118% median inflation (or larger multiples) arises from reduced shrinkage in the fuller-inclusion condition rather than intrinsic sensitivity of the iAFM random-slope variance.
minor comments (2)
- [Methods] Clarify the exact two specifications used for 'varying how much of each student's practice' (e.g., first-N vs. all observations, or minimum-count thresholds) with precise pseudocode or equations.
- [Results] Add a short paragraph comparing the new variance estimates to the original Koedinger et al. (2023) numbers to make the magnitude of change immediately interpretable.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which help clarify important aspects of our sensitivity analysis. We address each major comment below and have revised the manuscript accordingly to improve transparency regarding sample composition.
read point-by-point responses
-
Referee: [Methods / Results] Methods section (and any results tables reporting variance components): the manuscript must explicitly state and verify whether the set of students and the distribution of skill-opportunity counts are held fixed across the three inclusion specifications. If restricting observations per student-skill pair causes some students to drop below inclusion thresholds or alters the balance of observations per random-effect group, the reported inflation in learning-rate variance could partly reflect changes in effective sample size and shrinkage rather than pure model sensitivity. The abstract notes stability of initial-knowledge variance but does not report diagnostics confirming fixed student sets or within-student observation counts.
Authors: We agree that explicitly confirming the fixed student set and reporting relevant diagnostics is necessary to isolate the effect of observation-inclusion rules from potential changes in effective sample size or shrinkage. In the original analysis, the set of students and skills was held constant across the three inclusion specifications; we varied only the number of observations retained per student-skill pair (e.g., limiting to the first k opportunities) without imposing minimum thresholds that would drop students. We have now added explicit statements to the Methods section verifying that no students were excluded due to the inclusion rules and that the student set remains identical. We have also included a brief verification note that the number of students per dataset is unchanged. While the distribution of observations per student-skill pair necessarily varies by design, the stability of the initial-knowledge variance component across specifications provides supporting evidence that sample-composition shifts are not the primary driver of the reported changes in learning-rate variance. revision: yes
-
Referee: [Results] Results (variance-component tables or figures): provide the per-dataset student counts, mean observations per student-skill pair, and effective sample sizes under each inclusion rule. Without these, it is impossible to rule out that the 118% median inflation (or larger multiples) arises from reduced shrinkage in the fuller-inclusion condition rather than intrinsic sensitivity of the iAFM random-slope variance.
Authors: We accept this recommendation and have added the requested information to the revised manuscript. A new supplementary table now reports, for each of the 27 datasets, the number of students, the mean observations per student-skill pair, and the total observations (as a proxy for effective sample size) under each of the three inclusion rules. This allows direct assessment of how observation counts change and helps readers evaluate the potential contribution of differential shrinkage. We continue to interpret the differential sensitivity—large changes in learning-rate variance but stability in initial-knowledge variance—as evidence of intrinsic model sensitivity to inclusion decisions rather than a pure artifact of sample size. revision: yes
Circularity Check
No significant circularity in empirical sensitivity analysis
full rationale
The paper conducts an empirical re-analysis by refitting the individual Additive Factors Model (iAFM) to the same 27 datasets under two alternative observation-inclusion specifications. The reported result—that student-level variance in learning rate inflates substantially (median 118% or more) while initial-knowledge variance remains stable—is obtained directly from the new parameter estimates on the altered data subsets. No step reduces by construction to a fitted input, self-definition, or load-bearing self-citation; the original Koedinger et al. (2023) finding is treated as an external benchmark that is then tested for robustness. The derivation chain is therefore self-contained and falsifiable against the public datasets.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption The individual Additive Factors Model (iAFM) is an appropriate mixed-effects model for estimating student-level initial knowledge and learning rate from observational practice data.
Reference graph
Works this paper leans on
-
[1]
Vincent Aleven, Elmar Stahl, Silke Schworm, Frank Fischer, and Raven Wallace
-
[2]
Help seeking and help design in interactive learning environments.Review of Educational Research73, 3 (2003), 277–320
work page 2003
- [3]
-
[4]
Ryan S Baker and Aaron Hawn. 2022. Algorithmic bias in education.International Journal of Artificial Intelligence in Education32, 4 (2022), 1052–1092
work page 2022
-
[5]
Ryan S. J. d. Baker. 2007. Modeling and understanding students’ off-task behavior in intelligent tutoring systems. InProceedings of the ACM CHI Conference on Human Factors in Computing Systems. 1059–1068. Revisiting the Regularity of Student Learning Rate L@S ’26, June 29–July 3, 2026, Seoul, Republic of Korea
work page 2007
-
[6]
Ryan S. J. d. Baker, Albert T. Corbett, Kenneth R. Koedinger, Shelley Evenson, Ido Roll, Angela Z. Wagner, Meghan Naim, Jay Raspat, Daniel J. Baker, and Joseph E. Beck. 2006. Adapting to when students game an intelligent tutoring system. InProceedings of the International Conference on Intelligent Tutoring Systems. Springer, 392–401
work page 2006
-
[7]
Douglas Bates, Martin Mächler, Ben Bolker, and Steve Walker. 2015. Fitting linear mixed-effects models using lme4.Journal of Statistical Software67, 1 (2015), 1–48
work page 2015
-
[8]
Joseph E Beck and Kai-min Chang. 2007. Identifiability: A fundamental problem of student modeling. InInternational Conference on User Modeling. Springer, 137–146
work page 2007
-
[9]
Joseph E. Beck and Yue Gong. 2013. Wheel-spinning: Students who fail to master a skill. InProceedings of the International Conference on Artificial Intelligence in Education. Springer, 431–440
work page 2013
-
[10]
James H. Block and Robert B. Burns. 1976. Mastery learning.Review of Research in Education4 (1976), 3–49
work page 1976
-
[11]
Hao Cen, Kenneth Koedinger, and Brian Junker. 2006. Learning Factors Analysis: a general method for cognitive model evaluation and improvement. InProceedings of the International Conference on Intelligent Tutoring Systems. Springer, 164–175
work page 2006
-
[12]
Hao Cen, Kenneth R. Koedinger, and Brian Junker. 2007. Is over practice necessary?—Improving learning efficiency with the Cognitive Tutor through educational data mining.Frontiers in Artificial Intelligence and Applications158 (2007), 511
work page 2007
-
[13]
Min Chi, Kenneth R. Koedinger, Geoffrey J. Gordon, Pamela Jordan, and Kurt VanLehn. 2011. Instructional factors analysis: a cognitive model for multiple instructional interventions. InProceedings of the 4th International Conference on Educational Data Mining (EDM). 61–70
work page 2011
-
[14]
Peter Diggle and Michael G. Kenward. 1994. Informative drop-out in longitudinal data analysis.Journal of the Royal Statistical Society: Series C (Applied Statistics) 43, 1 (1994), 49–73
work page 1994
-
[15]
Tomáš Effenberger, Radek Pelánek, and Jaroslav Čechák. 2020. Exploration of the robustness and generalizability of the additive factors model. InProceedings of the Tenth International Conference on Learning Analytics & Knowledge. 472–479
work page 2020
-
[16]
Nathan J. Evans, Scott D. Brown, Douglas J. K. Mewhort, and Andrew Heathcote
-
[17]
Refining the law of practice.Psychological Review125, 4 (2018), 592–605
work page 2018
-
[18]
Paul M. Fitts and Michael I. Posner. 1967.Human Performance. Brooks/Cole, Belmont, CA
work page 1967
-
[19]
Theodore W. Frick. 1990. A comparison of three decision models for adapting the length of computer-based mastery tests.Journal of Educational Computing Research6, 4 (1990), 479–513
work page 1990
-
[20]
April Galyardt and Ilya Goldin. 2015. Move your lamp post: recent data reflects learner knowledge better than older data.Journal of Educational Data Mining7, 2 (2015), 83–108
work page 2015
-
[21]
Gillian Gold, Conrad Borchers, and Paulo F. Carvalho. 2024. Further evidence for regularity in student learning rates across demographic, academic proficiency, and motivational groups. InCompanion Proceedings of the 14th International Conference on Learning Analytics & Knowledge (LAK24)
work page 2024
-
[22]
Cyril Goutte, Guillaume Durand, and Serge Léger. 2018. On the learning curve attrition bias in additive factor modeling. InProceedings of the International Conference on Artificial Intelligence in Education. Springer, 109–113
work page 2018
-
[23]
Andrew Heathcote, Scott Brown, and D. J. K. Mewhort. 2000. The power law repealed: the case for an exponential law of practice.Psychonomic Bulletin & Review7, 2 (2000), 185–207
work page 2000
-
[24]
Tanja Käser, Kenneth R. Koedinger, and Markus Gross. 2014. Different parameters—same prediction: an analysis of learning curves. InProceedings of the 7th International Conference on Educational Data Mining. 52–59
work page 2014
-
[25]
René F. Kizilcec and Hansol Lee. 2022. Algorithmic fairness in education. InThe Ethics of Artificial Intelligence in Education. Routledge, 174–202
work page 2022
-
[26]
Kenneth R. Koedinger, Ryan S. J. d. Baker, Kyle Cunningham, Alida Skogsholm, Brett Leber, and John Stamper. 2010. A data repository for the EDM community: the PSLC DataShop. InHandbook of Educational Data Mining, Cristóbal Romero, Sebastián Ventura, Mykola Pechenizkiy, and Ryan S. J. d. Baker (Eds.). CRC Press, 43–56
work page 2010
-
[27]
Koedinger, Emma Brunskill, Ryan S
Kenneth R. Koedinger, Emma Brunskill, Ryan S. J. d. Baker, Elizabeth A. McLaugh- lin, and John Stamper. 2013. New potentials for data-driven intelligent tutoring system development and optimization.AI Magazine34, 3 (2013), 27–41
work page 2013
-
[28]
Kenneth R. Koedinger, Paulo F. Carvalho, Ran Liu, and Elizabeth A. McLaughlin
-
[29]
Overcoming catastrophic forgetting in neural networks
An astonishing regularity in student learning rate.Proceedings of the National Academy of Sciences120, 13 (2023), e2221311120. doi:10.1073/pnas. 2221311120
-
[30]
Roderick J. A. Little. 1993. Pattern-mixture models for multivariate incomplete data.J. Amer. Statist. Assoc.88, 421 (1993), 125–134
work page 1993
- [31]
- [32]
-
[33]
Charles Murray, Steven Ritter, Tristan Nixon, Ryan Schwiebert, Robert G
R. Charles Murray, Steven Ritter, Tristan Nixon, Ryan Schwiebert, Robert G. M. Hausmann, Brendon Towle, Stephen E. Fancsali, and Annalies Vuong. 2013. Revealing the learning in learning curves. InInternational Conference on Artificial Intelligence in Education. Springer, 473–482
work page 2013
-
[34]
Allen Newell and Paul S. Rosenbloom. 1981. Mechanisms of skill acquisition and the law of practice. InCognitive Skills and Their Acquisition, John R. Anderson (Ed.). Psychology Press, 1–55
work page 1981
-
[35]
Pavlik, Hao Cen, and Kenneth R
Philip I. Pavlik, Hao Cen, and Kenneth R. Koedinger. 2009. Performance factors analysis—a new alternative to knowledge tracing. InProceedings of the 14th International Conference on Artificial Intelligence in Education (AIED). IOS Press, 531–538
work page 2009
-
[36]
Radek Pelánek. 2018. The details matter: methodological nuances in the evalua- tion of student models.User Modeling and User-Adapted Interaction28, 3 (2018), 207–235
work page 2018
-
[37]
2012.Joint Models for Longitudinal and Time-to-Event Data: With Applications in R
Dimitris Rizopoulos. 2012.Joint Models for Longitudinal and Time-to-Event Data: With Applications in R. CRC Press
work page 2012
-
[38]
astonishing regularity in student learning rate
Mary Ann Simpson, Kole A. Norberg, and Stephen E. Fancsali. 2024. Replicating an “astonishing regularity in student learning rate”. InProceedings of the 17th International Conference on Educational Data Mining. 420–425
work page 2024
-
[39]
Anastasios A. Tsiatis and Marie Davidian. 2004. Joint modeling of longitudinal and time-to-event data: an overview.Statistica Sinica14 (2004), 809–834
work page 2004
-
[40]
Kurt VanLehn. 2006. The behavior of tutoring systems.International Journal of Artificial Intelligence in Education16, 3 (2006), 227–265
work page 2006
-
[41]
Lang Wu, Wei Liu, Grace Y. Yi, and Yangxin Huang. 2012. Analysis of longitudinal and survival data: joint modeling, inference methods, and issues.Journal of Probability and Statistics2012 (2012), 640153
work page 2012
-
[42]
Margaret C. Wu and Raymond J. Carroll. 1988. Estimation and comparison of changes in the presence of informative right censoring by modeling the censoring process.Biometrics44, 1 (1988), 175–188
work page 1988
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.