arxiv: 2603.14335 · v2 · submitted 2026-03-15 · ⚛️ physics.ed-ph

Recognition: 2 theorem links

· Lean Theorem

Predictive Modeling for High Impact Active Learning Classrooms

Olive Ross , Meagan Sundstrom , N.G. Holmes

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:01 UTC · model grok-4.3

classification ⚛️ physics.ed-ph

keywords active learningpredictive modelgroup worksheetsclicker questionsstudent questionslearning gainsundergraduate science

0 comments

The pith

A specific combination of group worksheets, clicker questions, and student questions produces exceptional learning gains with effect sizes greater than 2.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Using data from 69 undergraduate science classes across multiple fields and institutions, the authors create a predictive model linking time spent on classroom activities to student conceptual learning gains. They identify a particular mix—10 to 20 percent of class time on group worksheets, 20 to 40 percent on group clicker questions, plus at least two student questions per hour—that yields effect sizes over 2, much larger than typical active learning. Classes lacking group worksheets perform no better than traditional lectures. These findings translate observational patterns into concrete, testable guidance for improving active learning effectiveness in science courses.

Core claim

We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour. We also find that classes without group worksheets show learning gains comparable to lecture-only courses.

What carries the argument

A predictive model that maps the percentages of time spent on different active learning activities and the frequency of student questions to measured student conceptual learning gains.

If this is right

Allocating 10-20% of class time to group worksheets is associated with substantially higher learning gains.
Combining that with 20-40% group clicker questions and at least two student questions per hour produces effect sizes exceeding 2.
Classes that omit group worksheets achieve only learning gains similar to those in traditional lecture courses.
The model provides specific targets that instructors can use to design more effective active learning sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Controlled experiments could test whether adopting this exact activity balance causes the high gains or if other factors are at play.
This pattern might apply to active learning in non-science disciplines if similar mechanisms hold.
Instructors could monitor and adjust activity times in real time to approach the identified optimal ranges.
The emphasis on student questions suggests that fostering student voice is key to maximizing gains.

Load-bearing premise

That the associations between specific activity combinations and learning gains observed across the 69 classes are due to the activities themselves rather than other differences like instructor skill or student preparation.

What would settle it

A randomized trial assigning classes to the identified activity mix versus other combinations and finding no significant difference in learning gains beyond effect size 2 would falsify the predictive association.

read the original abstract

Though a large body of research has shown that active learning is more effective than traditional lecture in undergraduate science courses, little research has examined which types and combinations of active learning strategies are most effective. In this study, we use a multi-field, multi-institutional dataset of 69 undergraduate science classes to create a predictive model that maps time spent on different classroom activities to student conceptual learning. We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour. We also find that classes without group worksheets show learning gains comparable to lecture-only courses. These results offer testable recommendations for future controlled studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper fits a model to 69 classes and gives specific activity time ranges tied to effect sizes above 2, but the observational data cannot support the causal claim that those activities produce the gains.

read the letter

The one thing to know is that this paper fits a predictive model to 69 classes and flags one activity mix—10-20% group worksheets, 20-40% group clickers, and at least two student questions per hour—as linked to effect sizes over 2. What is new is the specific ranges rather than just active learning in general. The multi-field, multi-institution sample gives it some breadth, and the note that classes without worksheets look like straight lecture is a useful contrast to prior broad comparisons.

Referee Report

3 major / 1 minor

Summary. The manuscript develops a predictive model from an observational dataset of 69 multi-field, multi-institutional undergraduate science classes that maps time allocations across classroom activities (group worksheets, group clicker questions, student questions per hour) to student conceptual learning gains. It reports identifying one specific activity combination—10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour—that yields exceptional gains (effect sizes >2), while classes without group worksheets show gains comparable to lecture-only courses, and offers these as testable recommendations for future studies.

Significance. If the reported associations prove robust after proper validation and confounder control, the work would supply concrete, actionable guidance for optimizing active-learning time allocations in science classrooms and could stimulate targeted experimental tests of the identified activity mix.

major comments (3)

[Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.
[Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.
[Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.

minor comments (1)

[Abstract] Abstract: the 69-class sample size is stated without a breakdown by discipline or institution, which would help readers assess the scope of generalizability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important issues of transparency, validation, and interpretation in our observational study. We have revised the abstract and expanded relevant sections of the manuscript to address these points directly while preserving the core findings from the 69-class dataset.

read point-by-point responses

Referee: [Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.

Authors: We agree that the abstract omitted key methodological details. The full manuscript specifies a multiple linear regression model fitted by ordinary least squares, with 5-fold cross-validation used to evaluate out-of-sample performance and estimate prediction error. We will revise the abstract to include a concise statement of the model form, fitting procedure, and performance metrics (including cross-validated R-squared) so readers can immediately assess reliability. revision: yes
Referee: [Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.

Authors: This concern is well-founded. The reported activity thresholds (10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour) are indeed derived from the fitted model on the same 69-class sample and represent the profile that maximizes predicted gains within our data. We already frame the results as generating testable hypotheses for future controlled studies rather than as independently validated patterns. We will add explicit language in the abstract and discussion to emphasize the data-driven, exploratory nature of these thresholds and the need for out-of-sample confirmation. revision: yes
Referee: [Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.

Authors: We agree that the observational design does not support causal claims and that the word 'produces' overstates the evidence. We will replace 'produces' with 'is associated with' in the abstract. The manuscript already includes basic controls for course level and broad institutional type; we will expand the methods and limitations sections to describe these controls explicitly and to acknowledge the absence of direct measures or fixed effects for instructor skill and student preparation, which remain potential confounders. This revision will make the correlational character of the findings clear. revision: yes

Circularity Check

1 steps flagged

Fitted model on observational data identifies high-impact activity mix post-hoc

specific steps

fitted input called prediction [Abstract]
"we use a multi-field, multi-institutional dataset of 69 undergraduate science classes to create a predictive model that maps time spent on different classroom activities to student conceptual learning. We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour."

The identification of the precise activity combination and its claimed exceptional gains is performed by fitting the model to the full dataset and then highlighting the subset of activity proportions that exhibit effect sizes >2 within that fit; the reported 'predictive' result is therefore a post-hoc description of the fitted parameters rather than an independent out-of-sample prediction.

full rationale

The paper constructs a predictive model by fitting to the same 69-class observational dataset used to identify the specific activity thresholds (10-20% worksheets, 20-40% clickers, >=2 questions/hour) that yield effect sizes >2. This matches the 'fitted input called prediction' pattern at a minor level because the reported exceptional class type is extracted from the fitted associations rather than tested on held-out data or external benchmarks. No self-citation chain, self-definition, or ansatz smuggling reduces the central claim to its inputs by construction; the derivation is a standard regression-style mapping of observed activity proportions to measured gains and remains self-contained against external benchmarks. The causal language ('produces') raises a separate validity concern but does not create circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that observational data from 69 classes can be used to identify causal combinations of classroom activities; the time-percentage ranges are outputs of a fitted predictive model.

free parameters (1)

activity time percentages = 10-20% group worksheets, 20-40% group clicker questions
The 10-20% and 20-40% ranges are identified by the predictive model fitted to the 69-class dataset.

axioms (1)

domain assumption The multi-institutional dataset of 69 classes is representative and free of major selection bias for building a predictive model of learning gains.
The abstract treats the collected classes as sufficient to map activity times to learning outcomes without further qualification.

pith-pipeline@v0.9.0 · 5422 in / 1475 out tokens · 90519 ms · 2026-05-15T11:01:03.879280+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We use multiple linear regression to train a model that maps the fraction of two-minute class intervals... to the concept inventory effect size
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Classes that spend 10-20% of class time on group worksheets, 20-40% on group clicker questions...

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Classes that use no group worksheets have consistently small effect sizes, even when other active learning strategies (i.e., group clicker questions) are used (first panel of Fig. 1)

work page
[2]

In classes spending less than 30% of class time on group worksheets, frequent student questions (≈2 or more per hour) are necessary to achieve large effect sizes (first five panels of Fig. 1)

work page
[3]

Classes that spend 10-20% of class time on group worksheets, 20-40% of class time on group clicker questions, and more than 10% of class time on student questions (≈2 student questions per hour) consistently have large effect sizes (third, fourth, and fifth panels of Fig. 1). Discussion Our analysis identifies specific types and combinations of active lea...

work page
[4]

M. Dancy,et al., Physics Instructors’ Knowledge and Use of Active Learning Has Increased Over the Last Decade but Most Still Lecture Too Much.Physical Review Physics Education Research20(1), 010119 (2024), doi:10.1103/PhysRevPhysEducRes.20.010119

work page doi:10.1103/physrevphyseducres.20.010119 2024
[5]

Stains,et al., Anatomy of STEM Teaching in American Universities: A Snapshot from a Large-Scale Observation Study.Science359(6383), 1468–1470 (2018), doi:10.1126/science

M. Stains,et al., Anatomy of STEM Teaching in American Universities: A Snapshot from a Large-Scale Observation Study.Science359(6383), 1468–1470 (2018), doi:10.1126/science. aap8892

work page doi:10.1126/science 2018
[6]

S. Freeman,et al., Active Learning Increases Student Performance in Science, Engineering, and Mathematics.Proceedings of the National Academy of Sciences111(23), 8410–8415 (2014), doi:10.1073/pnas.1319030111

work page doi:10.1073/pnas.1319030111 2014
[7]

E. E. Prather, A. L. Rudolph, G. Brissenden, W. M. Schlingman, A National Study Assessing the Teaching and Learning of Introductory Astronomy. Part I. The Effect of Interactive Instruction. American Journal of Physics77(4), 320–330 (2009), doi:10.1119/1.3065023

work page doi:10.1119/1.3065023 2009
[8]

E. J. Theobald,et al., Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math.Proceedings of the National Academy of Sciences117(12), 6476–6483 (2020), doi:10.1073/pnas.1916903117,https: //www.pnas.org/doi/full/10.1073/pnas.1916903117

work page doi:10.1073/pnas.1916903117 2020
[9]

M. K. Smith, F. H. M. Jones, S. L. Gilbert, C. E. Wieman, The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize University STEM Classroom Practices.CBE–Life Sciences Education12(4), 618–627 (2013), doi:10. 1187/cbe.13-08-0154

work page 2013
[10]

Olson, D

S. Olson, D. G. Riordan, Engage to Excel: Producing One Million Additional College Graduates with Degrees in Science, Technology, Engineering, and Mathematics (2012)

work page 2012
[11]

Lombardi, T

D. Lombardi, T. F. Shipley, The Curious Construct of Active Learning.Psychological Science in the Public Interest22(1), 8–43 (2021), doi:10.1177/1529100620973974. 9

work page doi:10.1177/1529100620973974 2021
[12]

Sundstrom, J

M. Sundstrom, J. Gambrell, C. Green, A. L. Traxle, E. Brewe, Relative Benefits of Different Active Learning Methods to Conceptual Physics Learning (2025), doi:10.48550/arXiv.2505. 04577

work page doi:10.48550/arxiv.2505 2025
[13]

R. J. Beichner,et al., The Student-Centered Activities for Large Enrollment Undergraduate Programs (SCALE-UP) Project.Reviews in Physics Education Research1(1), 1–42 (2007), doi:10.1119/RevPERv1.1.4

work page doi:10.1119/revperv1.1.4 2007
[14]

L. K. Weir,et al., Small Changes, Big Gains: A Curriculum-Wide Study of Teaching Practices and Student Learning in Undergraduate Biology.PLOS ONE14(8), e0220900 (2019), doi: 10.1371/journal.pone.0220900

work page doi:10.1371/journal.pone.0220900 2019
[15]

M. T. H. Chi, Active-Constructive-Interactive: A Conceptual Framework for Differentiat- ing Learning Activities.Topics in Cognitive Science1(1), 73–105 (2009), doi:10.1111/j. 1756-8765.2008.01005.x

work page doi:10.1111/j 2009
[16]

M. T. H. Chi, R. Wylie, The ICAP Framework: Linking Cognitive Engagement to Active Learning Outcomes.Educational Psychologist49(4), 219–243 (2014), doi:10.1080/00461520. 2014.965823

work page doi:10.1080/00461520 2014
[17]

G. L. Connell, D. A. Donovan, T. G. Chambers, Increasing the Use of Student-Centered Pedagogies from Moderate to High Improves Student Learning and Attitudes About Biology. CBE–Life Sciences Education15(1), ar3 (2016), doi:10.1187/cbe.15-03-0062

work page doi:10.1187/cbe.15-03-0062 2016
[18]

Bazett, C

T. Bazett, C. L. Clough, Course Coordination as an Avenue to Departmental Culture Change. PRIMUS31(3-5), 467–482 (2021), doi:10.1080/10511970.2020.1793853

work page doi:10.1080/10511970.2020.1793853 2021
[19]

Commeford, E

K. Commeford, E. Brewe, A. Traxler, Characterizing active learning environments in physics using latent profile analysis.Physical Review Physics Education Research18(1), 010113 (2022), doi:10.1103/PhysRevPhysEducRes.18.010113,https://link.aps.org/doi/10. 1103/PhysRevPhysEducRes.18.010113

work page doi:10.1103/physrevphyseducres.18.010113 2022
[20]

Burkholder, C

E. Burkholder, C. Walsh, N. G. Holmes, Examination of Quantitative Methods for Analyzing Data from Concept Inventories.Physical Review Physics Education Research16(1), 010141 (2020), doi:10.1103/PhysRevPhysEducRes.16.010141. 10

work page doi:10.1103/physrevphyseducres.16.010141 2020
[21]

J. M. Nissen, R. M. Talbot, A. Nasim Thompson, B. Van Dusen, Comparison of Normalized Gain and Cohen’s d for Analyzing Gains on Concept Inventories.Physical Review Physics Education Research14(1), 010115 (2018), doi:10.1103/PhysRevPhysEducRes.14.010115

work page doi:10.1103/physrevphyseducres.14.010115 2018
[22]

Shmueli, To Explain or to Predict?Statistical Science25(3), 289–310 (2010), doi:10.1214/ 10-STS330

G. Shmueli, To Explain or to Predict?Statistical Science25(3), 289–310 (2010), doi:10.1214/ 10-STS330

work page 2010
[23]

J. M. Aiken, R. De Bin, H. J. Lewandowski, M. D. Caballero, Framework for Evaluating Statistical Models in Physics Education Research.Physical Review Physics Education Research 17(2), 020104 (2021), doi:10.1103/PhysRevPhysEducRes.17.020104

work page doi:10.1103/physrevphyseducres.17.020104 2021
[24]

C. Chin, J. Osborne, Students’ Questions: A Potential Resource for Teaching and Learning Science.Studies in Science Education44(1), 1–39 (2008), doi:10.1080/03057260701828101

work page doi:10.1080/03057260701828101 2008
[25]

M. B. Rowe, Wait Time: Slowing Down May Be a Way of Speeding Up!Journal of Teacher Education37(1), 43–50 (1986), doi:10.1177/002248718603700110

work page doi:10.1177/002248718603700110 1986
[26]

Materials and methods are available as supplementary material

work page
[27]

C. H. Crouch, E. Mazur, Peer Instruction: Ten Years of Experience and Results.American Journal of Physics69(9), 970–977 (2001), doi:10.1119/1.1374249

work page doi:10.1119/1.1374249 2001
[28]

Bojinova, J

E. Bojinova, J. Oigara, Teaching and Learning with Clickers in Higher Education.International Journal of Teaching and Learning in Higher Education25(2), 154–165 (2013)

work page 2013
[29]

T. J. Lund,et al., The Best of Both Worlds: Building on the COPUS and RTOP Observation Protocols to Easily and Reliably Measure Various Levels of Reformed Instructional Practice. CBE–Life Sciences Education14(2), ar18 (2015), doi:10.1187/cbe.14-10-0168

work page doi:10.1187/cbe.14-10-0168 2015
[30]

T. M. Andrews, M. J. Leonard, C. A. Colgrove, S. T. Kalinowski, Active Learning Not As- sociated with Student Learning in a Random Sample of College Biology Courses.CBE–Life Sciences Education10(4), 394–405 (2011), doi:10.1187/cbe.11-07-0061

work page doi:10.1187/cbe.11-07-0061 2011
[31]

Nissen, R

J. Nissen, R. Donatello, B. Van Dusen, Missing Data and Bias in Physics Education Research: A Case for Using Multiple Imputation.Physical Review Physics Education Research15(2), 020106 (2019), doi:10.1103/PhysRevPhysEducRes.15.020106. 11

work page doi:10.1103/physrevphyseducres.15.020106 2019
[32]

Seabold, J

S. Seabold, J. Perktold, Statsmodels: Econometric and Statistical Modeling with Python.SciPy Conference Proceedings(2010), doi:10.25080/Majora-92bf1922-011. Acknowledgments We are grateful to the instructors who allowed data collection in their courses and the researchers who shared their data publicly. We thank the members of the first author’s doctoral ...

work page doi:10.25080/majora-92bf1922-011 2010