pith. machine review for the scientific record. sign in

arxiv: 2603.14335 · v2 · submitted 2026-03-15 · ⚛️ physics.ed-ph

Recognition: 2 theorem links

· Lean Theorem

Predictive Modeling for High Impact Active Learning Classrooms

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:01 UTC · model grok-4.3

classification ⚛️ physics.ed-ph
keywords active learningpredictive modelgroup worksheetsclicker questionsstudent questionslearning gainsundergraduate science
0
0 comments X

The pith

A specific combination of group worksheets, clicker questions, and student questions produces exceptional learning gains with effect sizes greater than 2.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Using data from 69 undergraduate science classes across multiple fields and institutions, the authors create a predictive model linking time spent on classroom activities to student conceptual learning gains. They identify a particular mix—10 to 20 percent of class time on group worksheets, 20 to 40 percent on group clicker questions, plus at least two student questions per hour—that yields effect sizes over 2, much larger than typical active learning. Classes lacking group worksheets perform no better than traditional lectures. These findings translate observational patterns into concrete, testable guidance for improving active learning effectiveness in science courses.

Core claim

We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour. We also find that classes without group worksheets show learning gains comparable to lecture-only courses.

What carries the argument

A predictive model that maps the percentages of time spent on different active learning activities and the frequency of student questions to measured student conceptual learning gains.

If this is right

  • Allocating 10-20% of class time to group worksheets is associated with substantially higher learning gains.
  • Combining that with 20-40% group clicker questions and at least two student questions per hour produces effect sizes exceeding 2.
  • Classes that omit group worksheets achieve only learning gains similar to those in traditional lecture courses.
  • The model provides specific targets that instructors can use to design more effective active learning sessions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Controlled experiments could test whether adopting this exact activity balance causes the high gains or if other factors are at play.
  • This pattern might apply to active learning in non-science disciplines if similar mechanisms hold.
  • Instructors could monitor and adjust activity times in real time to approach the identified optimal ranges.
  • The emphasis on student questions suggests that fostering student voice is key to maximizing gains.

Load-bearing premise

That the associations between specific activity combinations and learning gains observed across the 69 classes are due to the activities themselves rather than other differences like instructor skill or student preparation.

What would settle it

A randomized trial assigning classes to the identified activity mix versus other combinations and finding no significant difference in learning gains beyond effect size 2 would falsify the predictive association.

read the original abstract

Though a large body of research has shown that active learning is more effective than traditional lecture in undergraduate science courses, little research has examined which types and combinations of active learning strategies are most effective. In this study, we use a multi-field, multi-institutional dataset of 69 undergraduate science classes to create a predictive model that maps time spent on different classroom activities to student conceptual learning. We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour. We also find that classes without group worksheets show learning gains comparable to lecture-only courses. These results offer testable recommendations for future controlled studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript develops a predictive model from an observational dataset of 69 multi-field, multi-institutional undergraduate science classes that maps time allocations across classroom activities (group worksheets, group clicker questions, student questions per hour) to student conceptual learning gains. It reports identifying one specific activity combination—10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour—that yields exceptional gains (effect sizes >2), while classes without group worksheets show gains comparable to lecture-only courses, and offers these as testable recommendations for future studies.

Significance. If the reported associations prove robust after proper validation and confounder control, the work would supply concrete, actionable guidance for optimizing active-learning time allocations in science classrooms and could stimulate targeted experimental tests of the identified activity mix.

major comments (3)
  1. [Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.
  2. [Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.
  3. [Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.
minor comments (1)
  1. [Abstract] Abstract: the 69-class sample size is stated without a breakdown by discipline or institution, which would help readers assess the scope of generalizability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight important issues of transparency, validation, and interpretation in our observational study. We have revised the abstract and expanded relevant sections of the manuscript to address these points directly while preserving the core findings from the 69-class dataset.

read point-by-point responses
  1. Referee: [Abstract] Abstract: no details are supplied on the predictive model form, fitting procedure, cross-validation, error estimation, or out-of-sample performance, so the reliability of the reported thresholds and effect-size claim cannot be evaluated.

    Authors: We agree that the abstract omitted key methodological details. The full manuscript specifies a multiple linear regression model fitted by ordinary least squares, with 5-fold cross-validation used to evaluate out-of-sample performance and estimate prediction error. We will revise the abstract to include a concise statement of the model form, fitting procedure, and performance metrics (including cross-validated R-squared) so readers can immediately assess reliability. revision: yes

  2. Referee: [Abstract] Abstract: the high-impact class type is defined by thresholds extracted from the same fitted model on the 69-class dataset, creating a circularity risk in which the reported combination may simply recover the parameters that best fit the observed data rather than an independently validated pattern.

    Authors: This concern is well-founded. The reported activity thresholds (10-20% group worksheets, 20-40% group clicker questions, and at least two student questions per hour) are indeed derived from the fitted model on the same 69-class sample and represent the profile that maximizes predicted gains within our data. We already frame the results as generating testable hypotheses for future controlled studies rather than as independently validated patterns. We will add explicit language in the abstract and discussion to emphasize the data-driven, exploratory nature of these thresholds and the need for out-of-sample confirmation. revision: yes

  3. Referee: [Abstract] Abstract: the observational design is used to claim that the activity combination 'produces' exceptional gains, yet no controls, fixed effects, matching, or other identification strategy for instructor skill, student preparation, or institutional differences are described, leaving the causal interpretation unsupported.

    Authors: We agree that the observational design does not support causal claims and that the word 'produces' overstates the evidence. We will replace 'produces' with 'is associated with' in the abstract. The manuscript already includes basic controls for course level and broad institutional type; we will expand the methods and limitations sections to describe these controls explicitly and to acknowledge the absence of direct measures or fixed effects for instructor skill and student preparation, which remain potential confounders. This revision will make the correlational character of the findings clear. revision: yes

Circularity Check

1 steps flagged

Fitted model on observational data identifies high-impact activity mix post-hoc

specific steps
  1. fitted input called prediction [Abstract]
    "we use a multi-field, multi-institutional dataset of 69 undergraduate science classes to create a predictive model that maps time spent on different classroom activities to student conceptual learning. We identify one type of class that produces exceptional learning gains (effect sizes > 2): 10-20% of time on group worksheets, 20-40% on group clicker questions, and two or more student questions per hour."

    The identification of the precise activity combination and its claimed exceptional gains is performed by fitting the model to the full dataset and then highlighting the subset of activity proportions that exhibit effect sizes >2 within that fit; the reported 'predictive' result is therefore a post-hoc description of the fitted parameters rather than an independent out-of-sample prediction.

full rationale

The paper constructs a predictive model by fitting to the same 69-class observational dataset used to identify the specific activity thresholds (10-20% worksheets, 20-40% clickers, >=2 questions/hour) that yield effect sizes >2. This matches the 'fitted input called prediction' pattern at a minor level because the reported exceptional class type is extracted from the fitted associations rather than tested on held-out data or external benchmarks. No self-citation chain, self-definition, or ansatz smuggling reduces the central claim to its inputs by construction; the derivation is a standard regression-style mapping of observed activity proportions to measured gains and remains self-contained against external benchmarks. The causal language ('produces') raises a separate validity concern but does not create circularity.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that observational data from 69 classes can be used to identify causal combinations of classroom activities; the time-percentage ranges are outputs of a fitted predictive model.

free parameters (1)
  • activity time percentages = 10-20% group worksheets, 20-40% group clicker questions
    The 10-20% and 20-40% ranges are identified by the predictive model fitted to the 69-class dataset.
axioms (1)
  • domain assumption The multi-institutional dataset of 69 classes is representative and free of major selection bias for building a predictive model of learning gains.
    The abstract treats the collected classes as sufficient to map activity times to learning outcomes without further qualification.

pith-pipeline@v0.9.0 · 5422 in / 1475 out tokens · 90519 ms · 2026-05-15T11:01:03.879280+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    Classes that use no group worksheets have consistently small effect sizes, even when other active learning strategies (i.e., group clicker questions) are used (first panel of Fig. 1)

  2. [2]

    In classes spending less than 30% of class time on group worksheets, frequent student questions (≈2 or more per hour) are necessary to achieve large effect sizes (first five panels of Fig. 1)

  3. [3]

    Classes that spend 10-20% of class time on group worksheets, 20-40% of class time on group clicker questions, and more than 10% of class time on student questions (≈2 student questions per hour) consistently have large effect sizes (third, fourth, and fifth panels of Fig. 1). Discussion Our analysis identifies specific types and combinations of active lea...

  4. [4]

    M. Dancy,et al., Physics Instructors’ Knowledge and Use of Active Learning Has Increased Over the Last Decade but Most Still Lecture Too Much.Physical Review Physics Education Research20(1), 010119 (2024), doi:10.1103/PhysRevPhysEducRes.20.010119

  5. [5]

    Stains,et al., Anatomy of STEM Teaching in American Universities: A Snapshot from a Large-Scale Observation Study.Science359(6383), 1468–1470 (2018), doi:10.1126/science

    M. Stains,et al., Anatomy of STEM Teaching in American Universities: A Snapshot from a Large-Scale Observation Study.Science359(6383), 1468–1470 (2018), doi:10.1126/science. aap8892

  6. [6]

    S. Freeman,et al., Active Learning Increases Student Performance in Science, Engineering, and Mathematics.Proceedings of the National Academy of Sciences111(23), 8410–8415 (2014), doi:10.1073/pnas.1319030111

  7. [7]

    E. E. Prather, A. L. Rudolph, G. Brissenden, W. M. Schlingman, A National Study Assessing the Teaching and Learning of Introductory Astronomy. Part I. The Effect of Interactive Instruction. American Journal of Physics77(4), 320–330 (2009), doi:10.1119/1.3065023

  8. [8]

    E. J. Theobald,et al., Active learning narrows achievement gaps for underrepresented students in undergraduate science, technology, engineering, and math.Proceedings of the National Academy of Sciences117(12), 6476–6483 (2020), doi:10.1073/pnas.1916903117,https: //www.pnas.org/doi/full/10.1073/pnas.1916903117

  9. [9]

    M. K. Smith, F. H. M. Jones, S. L. Gilbert, C. E. Wieman, The Classroom Observation Protocol for Undergraduate STEM (COPUS): A New Instrument to Characterize University STEM Classroom Practices.CBE–Life Sciences Education12(4), 618–627 (2013), doi:10. 1187/cbe.13-08-0154

  10. [10]

    Olson, D

    S. Olson, D. G. Riordan, Engage to Excel: Producing One Million Additional College Graduates with Degrees in Science, Technology, Engineering, and Mathematics (2012)

  11. [11]

    Lombardi, T

    D. Lombardi, T. F. Shipley, The Curious Construct of Active Learning.Psychological Science in the Public Interest22(1), 8–43 (2021), doi:10.1177/1529100620973974. 9

  12. [12]

    Sundstrom, J

    M. Sundstrom, J. Gambrell, C. Green, A. L. Traxle, E. Brewe, Relative Benefits of Different Active Learning Methods to Conceptual Physics Learning (2025), doi:10.48550/arXiv.2505. 04577

  13. [13]

    R. J. Beichner,et al., The Student-Centered Activities for Large Enrollment Undergraduate Programs (SCALE-UP) Project.Reviews in Physics Education Research1(1), 1–42 (2007), doi:10.1119/RevPERv1.1.4

  14. [14]

    L. K. Weir,et al., Small Changes, Big Gains: A Curriculum-Wide Study of Teaching Practices and Student Learning in Undergraduate Biology.PLOS ONE14(8), e0220900 (2019), doi: 10.1371/journal.pone.0220900

  15. [15]

    M. T. H. Chi, Active-Constructive-Interactive: A Conceptual Framework for Differentiat- ing Learning Activities.Topics in Cognitive Science1(1), 73–105 (2009), doi:10.1111/j. 1756-8765.2008.01005.x

  16. [16]

    M. T. H. Chi, R. Wylie, The ICAP Framework: Linking Cognitive Engagement to Active Learning Outcomes.Educational Psychologist49(4), 219–243 (2014), doi:10.1080/00461520. 2014.965823

  17. [17]

    G. L. Connell, D. A. Donovan, T. G. Chambers, Increasing the Use of Student-Centered Pedagogies from Moderate to High Improves Student Learning and Attitudes About Biology. CBE–Life Sciences Education15(1), ar3 (2016), doi:10.1187/cbe.15-03-0062

  18. [18]

    Bazett, C

    T. Bazett, C. L. Clough, Course Coordination as an Avenue to Departmental Culture Change. PRIMUS31(3-5), 467–482 (2021), doi:10.1080/10511970.2020.1793853

  19. [19]

    Commeford, E

    K. Commeford, E. Brewe, A. Traxler, Characterizing active learning environments in physics using latent profile analysis.Physical Review Physics Education Research18(1), 010113 (2022), doi:10.1103/PhysRevPhysEducRes.18.010113,https://link.aps.org/doi/10. 1103/PhysRevPhysEducRes.18.010113

  20. [20]

    Burkholder, C

    E. Burkholder, C. Walsh, N. G. Holmes, Examination of Quantitative Methods for Analyzing Data from Concept Inventories.Physical Review Physics Education Research16(1), 010141 (2020), doi:10.1103/PhysRevPhysEducRes.16.010141. 10

  21. [21]

    J. M. Nissen, R. M. Talbot, A. Nasim Thompson, B. Van Dusen, Comparison of Normalized Gain and Cohen’s d for Analyzing Gains on Concept Inventories.Physical Review Physics Education Research14(1), 010115 (2018), doi:10.1103/PhysRevPhysEducRes.14.010115

  22. [22]

    Shmueli, To Explain or to Predict?Statistical Science25(3), 289–310 (2010), doi:10.1214/ 10-STS330

    G. Shmueli, To Explain or to Predict?Statistical Science25(3), 289–310 (2010), doi:10.1214/ 10-STS330

  23. [23]

    J. M. Aiken, R. De Bin, H. J. Lewandowski, M. D. Caballero, Framework for Evaluating Statistical Models in Physics Education Research.Physical Review Physics Education Research 17(2), 020104 (2021), doi:10.1103/PhysRevPhysEducRes.17.020104

  24. [24]

    C. Chin, J. Osborne, Students’ Questions: A Potential Resource for Teaching and Learning Science.Studies in Science Education44(1), 1–39 (2008), doi:10.1080/03057260701828101

  25. [25]

    M. B. Rowe, Wait Time: Slowing Down May Be a Way of Speeding Up!Journal of Teacher Education37(1), 43–50 (1986), doi:10.1177/002248718603700110

  26. [26]

    Materials and methods are available as supplementary material

  27. [27]

    C. H. Crouch, E. Mazur, Peer Instruction: Ten Years of Experience and Results.American Journal of Physics69(9), 970–977 (2001), doi:10.1119/1.1374249

  28. [28]

    Bojinova, J

    E. Bojinova, J. Oigara, Teaching and Learning with Clickers in Higher Education.International Journal of Teaching and Learning in Higher Education25(2), 154–165 (2013)

  29. [29]

    T. J. Lund,et al., The Best of Both Worlds: Building on the COPUS and RTOP Observation Protocols to Easily and Reliably Measure Various Levels of Reformed Instructional Practice. CBE–Life Sciences Education14(2), ar18 (2015), doi:10.1187/cbe.14-10-0168

  30. [30]

    T. M. Andrews, M. J. Leonard, C. A. Colgrove, S. T. Kalinowski, Active Learning Not As- sociated with Student Learning in a Random Sample of College Biology Courses.CBE–Life Sciences Education10(4), 394–405 (2011), doi:10.1187/cbe.11-07-0061

  31. [31]

    Nissen, R

    J. Nissen, R. Donatello, B. Van Dusen, Missing Data and Bias in Physics Education Research: A Case for Using Multiple Imputation.Physical Review Physics Education Research15(2), 020106 (2019), doi:10.1103/PhysRevPhysEducRes.15.020106. 11

  32. [32]

    Seabold, J

    S. Seabold, J. Perktold, Statsmodels: Econometric and Statistical Modeling with Python.SciPy Conference Proceedings(2010), doi:10.25080/Majora-92bf1922-011. Acknowledgments We are grateful to the instructors who allowed data collection in their courses and the researchers who shared their data publicly. We thank the members of the first author’s doctoral ...