Recognition: 2 theorem links
· Lean TheoremPredictive Modeling for High Impact Active Learning Classrooms
Pith reviewed 2026-05-15 11:01 UTC · model grok-4.3
The pith
A specific combination of group worksheets, clicker questions, and student questions produces exceptional learning gains with effect sizes greater than 2.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour. We also find that classes without group worksheets show learning gains comparable to lecture-only courses.
What carries the argument
A predictive model that maps the percentages of time spent on different active learning activities and the frequency of student questions to measured student conceptual learning gains.
If this is right
- Allocating 10-20% of class time to group worksheets is associated with substantially higher learning gains.
- Combining that with 20-40% group clicker questions and at least two student questions per hour produces effect sizes exceeding 2.
- Classes that omit group worksheets achieve only learning gains similar to those in traditional lecture courses.
- The model provides specific targets that instructors can use to design more effective active learning sessions.
Where Pith is reading between the lines
- Controlled experiments could test whether adopting this exact activity balance causes the high gains or if other factors are at play.
- This pattern might apply to active learning in non-science disciplines if similar mechanisms hold.
- Instructors could monitor and adjust activity times in real time to approach the identified optimal ranges.
- The emphasis on student questions suggests that fostering student voice is key to maximizing gains.
Load-bearing premise
That the associations between specific activity combinations and learning gains observed across the 69 classes are due to the activities themselves rather than other differences like instructor skill or student preparation.
What would settle it
A randomized trial assigning classes to the identified activity mix versus other combinations and finding no significant difference in learning gains beyond effect size 2 would falsify the predictive association.
read the original abstract
Though a large body of research has shown that active learning is more effective than traditional lecture in undergraduate science courses, little research has examined which types and combinations of active learning strategies are most effective. In this study, we use a multi-field, multi-institutional dataset of 69 undergraduate science classes to create a predictive model that maps time spent on different classroom activities to student conceptual learning. We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour. We also find that classes without group worksheets show learning gains comparable to lecture-only courses. These results offer testable recommendations for future controlled studies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript develops a predictive model from an observational dataset of 69 multi-field, multi-institutional undergraduate science classes that maps time allocations across classroom activities (group worksheets, group clicker questions, student questions per hour) to student conceptual learning gains. It reports identifying one specific activity combination—10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour—that yields exceptional gains (effect sizes >2), while classes without group worksheets show gains comparable to lecture-only courses, and offers these as testable recommendations for future studies.
Significance. If the reported associations prove robust after proper validation and confounder control, the work would supply concrete, actionable guidance for optimizing active-learning time allocations in science classrooms and could stimulate targeted experimental tests of the identified activity mix.
major comments (3)
- [Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.
- [Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.
- [Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.
minor comments (1)
- [Abstract] Abstract: the 69-class sample size is stated without a breakdown by discipline or institution, which would help readers assess the scope of generalizability.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which highlight important issues of transparency, validation, and interpretation in our observational study. We have revised the abstract and expanded relevant sections of the manuscript to address these points directly while preserving the core findings from the 69-class dataset.
read point-by-point responses
-
Referee: [Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.
Authors: We agree that the abstract omitted key methodological details. The full manuscript specifies a multiple linear regression model fitted by ordinary least squares, with 5-fold cross-validation used to evaluate out-of-sample performance and estimate prediction error. We will revise the abstract to include a concise statement of the model form, fitting procedure, and performance metrics (including cross-validated R-squared) so readers can immediately assess reliability. revision: yes
-
Referee: [Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.
Authors: This concern is well-founded. The reported activity thresholds (10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour) are indeed derived from the fitted model on the same 69-class sample and represent the profile that maximizes predicted gains within our data. We already frame the results as generating testable hypotheses for future controlled studies rather than as independently validated patterns. We will add explicit language in the abstract and discussion to emphasize the data-driven, exploratory nature of these thresholds and the need for out-of-sample confirmation. revision: yes
-
Referee: [Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.
Authors: We agree that the observational design does not support causal claims and that the word 'produces' overstates the evidence. We will replace 'produces' with 'is associated with' in the abstract. The manuscript already includes basic controls for course level and broad institutional type; we will expand the methods and limitations sections to describe these controls explicitly and to acknowledge the absence of direct measures or fixed effects for instructor skill and student preparation, which remain potential confounders. This revision will make the correlational character of the findings clear. revision: yes
Circularity Check
Fitted model on observational data identifies high-impact activity mix post-hoc
specific steps
-
fitted input called prediction
[Abstract]
"we use a multi-field, multi-institutional dataset of 69 undergraduate science classes to create a predictive model that maps time spent on different classroom activities to student conceptual learning. We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour."
The identification of the precise activity combination and its claimed exceptional gains is performed by fitting the model to the full dataset and then highlighting the subset of activity proportions that exhibit effect sizes >2 within that fit; the reported 'predictive' result is therefore a post-hoc description of the fitted parameters rather than an independent out-of-sample prediction.
full rationale
The paper constructs a predictive model by fitting to the same 69-class observational dataset used to identify the specific activity thresholds (10-20% worksheets, 20-40% clickers, >=2 questions/hour) that yield effect sizes >2. This matches the 'fitted input called prediction' pattern at a minor level because the reported exceptional class type is extracted from the fitted associations rather than tested on held-out data or external benchmarks. No self-citation chain, self-definition, or ansatz smuggling reduces the central claim to its inputs by construction; the derivation is a standard regression-style mapping of observed activity proportions to measured gains and remains self-contained against external benchmarks. The causal language ('produces') raises a separate validity concern but does not create circularity.
Axiom & Free-Parameter Ledger
free parameters (1)
- activity time percentages =
10-20% group worksheets, 20-40% group clicker questions
axioms (1)
- domain assumption The multi-institutional dataset of 69 classes is representative and free of major selection bias for building a predictive model of learning gains.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We use multiple linear regression to train a model that maps the fraction of two-minute class intervals... to the concept inventory effect size
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Classes that spend 10-20% of class time on group worksheets, 20-40% on group clicker questions...
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Classes that use no group worksheets have consistently small effect sizes, even when other active learning strategies (i.e., group clicker questions) are used (first panel of Fig. 1)
-
[2]
In classes spending less than 30% of class time on group worksheets, frequent student questions (≈2 or more per hour) are necessary to achieve large effect sizes (first five panels of Fig. 1)
-
[3]
Classes that spend 10-20% of class time on group worksheets, 20-40% of class time on group clicker questions, and more than 10% of class time on student questions (≈2 student questions per hour) consistently have large effect sizes (third, fourth, and fifth panels of Fig. 1). Discussion Our analysis identifies specific types and combinations of active lea...
-
[4]
M. Dancy,et al., Physics Instructors’ Knowledge and Use of Active Learning Has Increased Over the Last Decade but Most Still Lecture Too Much.Physical Review Physics Education Research20(1), 010119 (2024), doi:10.1103/PhysRevPhysEducRes.20.010119
-
[5]
M. Stains,et al., Anatomy of STEM Teaching in American Universities: A Snapshot from a Large-Scale Observation Study.Science359(6383), 1468–1470 (2018), doi:10.1126/science. aap8892
-
[6]
S. Freeman,et al., Active Learning Increases Student Performance in Science, Engineering, and Mathematics.Proceedings of the National Academy of Sciences111(23), 8410–8415 (2014), doi:10.1073/pnas.1319030111
-
[7]
E. E. Prather, A. L. Rudolph, G. Brissenden, W. M. Schlingman, A National Study Assessing the Teaching and Learning of Introductory Astronomy. Part I. The Effect of Interactive Instruction. American Journal of Physics77(4), 320–330 (2009), doi:10.1119/1.3065023
-
[8]
E. J. Theobald,et al., Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math.Proceedings of the National Academy of Sciences117(12), 6476–6483 (2020), doi:10.1073/pnas.1916903117,https: //www.pnas.org/doi/full/10.1073/pnas.1916903117
-
[9]
M. K. Smith, F. H. M. Jones, S. L. Gilbert, C. E. Wieman, The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize University STEM Classroom Practices.CBE–Life Sciences Education12(4), 618–627 (2013), doi:10. 1187/cbe.13-08-0154
work page 2013
- [10]
-
[11]
D. Lombardi, T. F. Shipley, The Curious Construct of Active Learning.Psychological Science in the Public Interest22(1), 8–43 (2021), doi:10.1177/1529100620973974. 9
-
[12]
M. Sundstrom, J. Gambrell, C. Green, A. L. Traxle, E. Brewe, Relative Benefits of Different Active Learning Methods to Conceptual Physics Learning (2025), doi:10.48550/arXiv.2505. 04577
-
[13]
R. J. Beichner,et al., The Student-Centered Activities for Large Enrollment Undergraduate Programs (SCALE-UP) Project.Reviews in Physics Education Research1(1), 1–42 (2007), doi:10.1119/RevPERv1.1.4
-
[14]
L. K. Weir,et al., Small Changes, Big Gains: A Curriculum-Wide Study of Teaching Practices and Student Learning in Undergraduate Biology.PLOS ONE14(8), e0220900 (2019), doi: 10.1371/journal.pone.0220900
-
[15]
M. T. H. Chi, Active-Constructive-Interactive: A Conceptual Framework for Differentiat- ing Learning Activities.Topics in Cognitive Science1(1), 73–105 (2009), doi:10.1111/j. 1756-8765.2008.01005.x
work page doi:10.1111/j 2009
-
[16]
M. T. H. Chi, R. Wylie, The ICAP Framework: Linking Cognitive Engagement to Active Learning Outcomes.Educational Psychologist49(4), 219–243 (2014), doi:10.1080/00461520. 2014.965823
-
[17]
G. L. Connell, D. A. Donovan, T. G. Chambers, Increasing the Use of Student-Centered Pedagogies from Moderate to High Improves Student Learning and Attitudes About Biology. CBE–Life Sciences Education15(1), ar3 (2016), doi:10.1187/cbe.15-03-0062
-
[18]
T. Bazett, C. L. Clough, Course Coordination as an Avenue to Departmental Culture Change. PRIMUS31(3-5), 467–482 (2021), doi:10.1080/10511970.2020.1793853
-
[19]
K. Commeford, E. Brewe, A. Traxler, Characterizing active learning environments in physics using latent profile analysis.Physical Review Physics Education Research18(1), 010113 (2022), doi:10.1103/PhysRevPhysEducRes.18.010113,https://link.aps.org/doi/10. 1103/PhysRevPhysEducRes.18.010113
-
[20]
E. Burkholder, C. Walsh, N. G. Holmes, Examination of Quantitative Methods for Analyzing Data from Concept Inventories.Physical Review Physics Education Research16(1), 010141 (2020), doi:10.1103/PhysRevPhysEducRes.16.010141. 10
-
[21]
J. M. Nissen, R. M. Talbot, A. Nasim Thompson, B. Van Dusen, Comparison of Normalized Gain and Cohen’s d for Analyzing Gains on Concept Inventories.Physical Review Physics Education Research14(1), 010115 (2018), doi:10.1103/PhysRevPhysEducRes.14.010115
-
[22]
Shmueli, To Explain or to Predict?Statistical Science25(3), 289–310 (2010), doi:10.1214/ 10-STS330
G. Shmueli, To Explain or to Predict?Statistical Science25(3), 289–310 (2010), doi:10.1214/ 10-STS330
work page 2010
-
[23]
J. M. Aiken, R. De Bin, H. J. Lewandowski, M. D. Caballero, Framework for Evaluating Statistical Models in Physics Education Research.Physical Review Physics Education Research 17(2), 020104 (2021), doi:10.1103/PhysRevPhysEducRes.17.020104
-
[24]
C. Chin, J. Osborne, Students’ Questions: A Potential Resource for Teaching and Learning Science.Studies in Science Education44(1), 1–39 (2008), doi:10.1080/03057260701828101
-
[25]
M. B. Rowe, Wait Time: Slowing Down May Be a Way of Speeding Up!Journal of Teacher Education37(1), 43–50 (1986), doi:10.1177/002248718603700110
-
[26]
Materials and methods are available as supplementary material
-
[27]
C. H. Crouch, E. Mazur, Peer Instruction: Ten Years of Experience and Results.American Journal of Physics69(9), 970–977 (2001), doi:10.1119/1.1374249
-
[28]
E. Bojinova, J. Oigara, Teaching and Learning with Clickers in Higher Education.International Journal of Teaching and Learning in Higher Education25(2), 154–165 (2013)
work page 2013
-
[29]
T. J. Lund,et al., The Best of Both Worlds: Building on the COPUS and RTOP Observation Protocols to Easily and Reliably Measure Various Levels of Reformed Instructional Practice. CBE–Life Sciences Education14(2), ar18 (2015), doi:10.1187/cbe.14-10-0168
-
[30]
T. M. Andrews, M. J. Leonard, C. A. Colgrove, S. T. Kalinowski, Active Learning Not As- sociated with Student Learning in a Random Sample of College Biology Courses.CBE–Life Sciences Education10(4), 394–405 (2011), doi:10.1187/cbe.11-07-0061
-
[31]
J. Nissen, R. Donatello, B. Van Dusen, Missing Data and Bias in Physics Education Research: A Case for Using Multiple Imputation.Physical Review Physics Education Research15(2), 020106 (2019), doi:10.1103/PhysRevPhysEducRes.15.020106. 11
-
[32]
S. Seabold, J. Perktold, Statsmodels: Econometric and Statistical Modeling with Python.SciPy Conference Proceedings(2010), doi:10.25080/Majora-92bf1922-011. Acknowledgments We are grateful to the instructors who allowed data collection in their courses and the researchers who shared their data publicly. We thank the members of the first author’s doctoral ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.