Do Gains from Generative AI-Enabled Adaptive Pretesting Persist? Evidence from a Retention Study
Pith reviewed 2026-06-26 10:03 UTC · model grok-4.3
The pith
Adaptive pretesting with generative AI improves initial understanding, but seven-week retention requires structured retrieval practice over learner-directed study.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Adaptive pretesting can elevate initial understanding, but sustained learning over a seven-week retention period depends on how subsequent AI-supported practice is structured, with adaptive and fixed spaced retrieval practice outperforming learner-directed study on posttest performance.
What carries the argument
The three-arm randomized comparison of adaptive spaced retrieval practice, fixed spaced retrieval practice, and learner-directed AI-supported study after the shared adaptive pretesting phase.
If this is right
- Retrieval-based practice conditions produce higher posttest performance than learner-directed AI-supported study.
- Adaptive pretesting raises baseline understanding before the practice phase begins.
- Condition affects both observed practice effort and final retention scores.
- The structure of AI-supported practice, not pretesting alone, determines whether initial gains persist.
- Multivariate analyses confirm a significant effect of practice condition on retention outcomes.
Where Pith is reading between the lines
- AI learning platforms could default to enforcing spaced retrieval after any pretesting step rather than offering fully open-ended study options.
- The seven-week durability finding suggests testing whether the same pattern holds at three-month or semester-long intervals.
- Combining pretesting with fixed rather than adaptive retrieval might deliver most benefits at lower implementation cost.
- Domain-specific follow-ups could check whether the pattern is stronger in reasoning-heavy subjects than in factual ones.
Load-bearing premise
Random assignment after the pretesting phase balanced all relevant individual differences across conditions and the seven-week posttest measured retention without unmeasured outside influences.
What would settle it
A replication finding no posttest difference between the retrieval conditions and the learner-directed condition, or equal retention decay across all three groups after a longer interval.
read the original abstract
Pretesting - attempting problems before instruction - supports learning by activating prior knowledge and sharpening attention to subsequent instruction. Recent work suggests that adaptive AI-assisted pretesting can yield further advantages, particularly for tasks requiring higher-order reasoning, yet it remains unclear whether these gains persist over time. This study examines the durability of learning gains following GenAI-enabled adaptive pretesting over a seven-week retention period. Undergraduate participants completed an adaptive AI-assisted pretesting session, received instruction, and took a baseline assessment, then were randomly assigned to adaptive spaced retrieval practice, fixed spaced retrieval practice, or learner-directed AI-supported study. Multivariate analyses revealed a significant effect of condition on posttest performance and observed practice effort, with retrieval-based conditions outperforming learner-directed study. Findings indicate that adaptive pretesting can elevate initial understanding, but sustained learning depends on how subsequent AI-supported practice is structured.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that generative AI-enabled adaptive pretesting elevates initial understanding, but that gains over a seven-week retention interval depend on the structure of subsequent AI-supported practice. After a common adaptive pretesting + instruction sequence, participants were randomly assigned to adaptive spaced retrieval practice, fixed spaced retrieval practice, or learner-directed AI-supported study; multivariate analyses indicated that the two retrieval conditions produced higher posttest performance than learner-directed study.
Significance. If the causal attribution holds, the work supplies actionable guidance for sequencing AI tools in education: pretesting gains require structured retrieval practice to persist. The post-pretesting randomization is a design strength that isolates practice-type effects from pretesting itself.
major comments (2)
- [Methods] Methods (randomization and retention procedures): the manuscript reports random assignment after pretesting but supplies no post-randomization baseline equivalence checks, attrition rates, or measures of external study/motivation during the seven-week interval. These omissions are load-bearing for the claim that observed posttest differences are caused by the assigned practice conditions rather than time-varying confounders.
- [Results] Results (multivariate analyses): the abstract states a significant effect of condition on posttest performance without reporting sample size, effect sizes, power, or exclusion criteria. This information is required to assess whether the reported superiority of retrieval conditions is statistically and practically meaningful.
minor comments (1)
- [Abstract] Abstract: include at least the total N and a brief statement of effect magnitude so readers can gauge the strength of the retention finding without consulting the full text.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of methodological transparency and statistical reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.
read point-by-point responses
-
Referee: [Methods] Methods (randomization and retention procedures): the manuscript reports random assignment after pretesting but supplies no post-randomization baseline equivalence checks, attrition rates, or measures of external study/motivation during the seven-week interval. These omissions are load-bearing for the claim that observed posttest differences are caused by the assigned practice conditions rather than time-varying confounders.
Authors: We agree these details are essential for supporting causal claims about the practice conditions. In the revised manuscript we will add post-randomization baseline equivalence checks on the initial assessment scores across conditions, report attrition rates over the retention interval, and include any available self-report data on external study or motivation. Where data were not collected we will note the limitation explicitly. revision: yes
-
Referee: [Results] Results (multivariate analyses): the abstract states a significant effect of condition on posttest performance without reporting sample size, effect sizes, power, or exclusion criteria. This information is required to assess whether the reported superiority of retrieval conditions is statistically and practically meaningful.
Authors: We agree that complete reporting is required. We will revise the abstract to include sample size and expand the results section to report effect sizes (partial eta-squared and pairwise Cohen's d), a post-hoc power analysis, and the full exclusion criteria. These additions will allow readers to evaluate both statistical and practical significance. revision: yes
Circularity Check
No circularity: empirical RCT with no derivations or fitted predictions
full rationale
The paper is a randomized controlled experiment reporting observed posttest differences after random assignment to practice conditions. It contains no equations, no parameter fitting, no predictions derived from models, and no derivation chain. Claims rest on multivariate statistical tests of group differences, not on any self-referential construction or self-citation load-bearing for a mathematical result. This is the standard case of an empirical study with no opportunity for the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (2)
- standard math Standard assumptions underlying multivariate analysis (normality, homogeneity of covariance matrices) hold for the posttest and effort measures.
- domain assumption Random assignment after the pretesting phase produced equivalent groups on all unmeasured variables that could affect retention.
Reference graph
Works this paper leans on
-
[1]
arXiv preprint arXiv:2504.10249 (2025)
Akgun, M., Toker, S.: Struggle first, prompt later: How task complexity shapes learning with genai-assisted pretesting. arXiv preprint arXiv:2504.10249 (2025)
arXiv 2025
-
[2]
Carpenter, S.K., DeLosh, E.L.: Application of the testing and spacing effects to name learning. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition19(5), 619–636 (2005) Do Gains from Generative AI–Enabled Adaptive Pretesting Persist? 7
2005
-
[3]
Nature Reviews Psychology1, 496 – 511 (2022), https://api.semanticscholar.org/CorpusID:251294660
Carpenter, S.K., Pan, S.C., Butler, A.C.: The science of effective learning with spacing and retrieval practice. Nature Reviews Psychology1, 496 – 511 (2022), https://api.semanticscholar.org/CorpusID:251294660
2022
-
[4]
Dover Pub- lications, New York (1913), original work published 1885
Ebbinghaus, H.: Memory: A contribution to experimental psychology. Dover Pub- lications, New York (1913), original work published 1885
1913
-
[5]
Journal of Applied Research in Memory and Cognition12(3), 431 (2023)
Giebl, S., Mena, S., Sandberg, R., Bjork, E.L., Bjork, R.A.: Thinking first ver- sus googling first: Preferences and consequences. Journal of Applied Research in Memory and Cognition12(3), 431 (2023)
2023
-
[6]
Psychology Learning & Teaching20(1), 58–75 (2021)
Giebl, S., Mena, S., Storm, B.C., Bjork, E.L., Bjork, R.A.: Answer first or google first? using the internet in ways that enhance, not impair, one’s subsequent reten- tion of needed information. Psychology Learning & Teaching20(1), 58–75 (2021)
2021
-
[7]
Educational Psychology Review28(4), 853–873 (2016)
Hopkins, R.F., Lyle, K.B., Hieb, J.L., Ralston, P.A.: Spaced retrieval practice in- creases college students’ short-and long-term retention of mathematics knowledge. Educational Psychology Review28(4), 853–873 (2016)
2016
-
[8]
Journal of experimental psychology: learning, memory, and cognition33(4), 704 (2007)
Karpicke, J.D., Roediger III, H.L.: Expanding retrieval practice promotes short- term retention, but equally spaced retrieval enhances long-term retention. Journal of experimental psychology: learning, memory, and cognition33(4), 704 (2007)
2007
-
[9]
Journal of Experimental Psychology: Learning, Memory, and Cognition35(4), 989 (2009)
Kornell, N., Hays, M.J., Bjork, R.A.: Unsuccessful retrieval attempts enhance sub- sequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition35(4), 989 (2009)
2009
-
[10]
Educational Psychology Review33(3), 959–987 (2021)
Latimier, A., Peyre, H., Ramus, F.: A meta-analytic review of the benefit of spacing out retrieval practice episodes on retention. Educational Psychology Review33(3), 959–987 (2021)
2021
-
[11]
Educational Psychology Review35(4), 97 (2023)
Pan, S.C., Carpenter, S.K.: Prequestioning and pretesting effects: A review of em- pirical research, theoretical perspectives, and implications for educational practice. Educational Psychology Review35(4), 97 (2023)
2023
-
[12]
Pressley, M., Tanenbaum, R., McDaniel, M.A., Wood, E.: What happens when university students try to answer prequestions that accompany textbook material? Contemporary Educational Psychology15(1), 27–35 (1990)
1990
-
[13]
Richland, L.E., Kornell, N., Kao, L.S.: The pretesting effect: Do unsuccessful re- trieval attempts enhance learning? Journal of Experimental Psychology: Applied 15(3), 243 (2009)
2009
-
[14]
Psychological science17(3), 249–255 (2006)
Roediger III, H.L., Karpicke, J.D.: Test-enhanced learning: Taking memory tests improves long-term retention. Psychological science17(3), 249–255 (2006)
2006
-
[15]
science333(6043), 776–778 (2011)
Sparrow, B., Liu, J., Wegner, D.M.: Google effects on memory: Cognitive conse- quences of having information at our fingertips. science333(6043), 776–778 (2011)
2011
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.