Do Gains from Generative AI-Enabled Adaptive Pretesting Persist? Evidence from a Retention Study

Mahir Akgun; Sacip Toker

arxiv: 2606.22328 · v1 · pith:USXHSMYTnew · submitted 2026-06-21 · 💻 cs.CY

Do Gains from Generative AI-Enabled Adaptive Pretesting Persist? Evidence from a Retention Study

Mahir Akgun , Sacip Toker This is my paper

Pith reviewed 2026-06-26 10:03 UTC · model grok-4.3

classification 💻 cs.CY

keywords adaptive pretestinggenerative AIretentionspaced retrieval practiceAI-supported learningundergraduate educationlearning persistence

0 comments

The pith

Adaptive pretesting with generative AI improves initial understanding, but seven-week retention requires structured retrieval practice over learner-directed study.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The study tests whether advantages from generative AI-enabled adaptive pretesting last beyond the immediate session. Undergraduates completed adaptive pretesting, instruction, and a baseline test, then were randomly assigned to one of three follow-up conditions for seven weeks before a retention posttest. Multivariate results showed retrieval-based conditions produced higher posttest scores and different practice effort patterns than learner-directed AI-supported study. This indicates that pretesting gains are real but fragile, hinging on the design of later practice rather than occurring automatically.

Core claim

Adaptive pretesting can elevate initial understanding, but sustained learning over a seven-week retention period depends on how subsequent AI-supported practice is structured, with adaptive and fixed spaced retrieval practice outperforming learner-directed study on posttest performance.

What carries the argument

The three-arm randomized comparison of adaptive spaced retrieval practice, fixed spaced retrieval practice, and learner-directed AI-supported study after the shared adaptive pretesting phase.

If this is right

Retrieval-based practice conditions produce higher posttest performance than learner-directed AI-supported study.
Adaptive pretesting raises baseline understanding before the practice phase begins.
Condition affects both observed practice effort and final retention scores.
The structure of AI-supported practice, not pretesting alone, determines whether initial gains persist.
Multivariate analyses confirm a significant effect of practice condition on retention outcomes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

AI learning platforms could default to enforcing spaced retrieval after any pretesting step rather than offering fully open-ended study options.
The seven-week durability finding suggests testing whether the same pattern holds at three-month or semester-long intervals.
Combining pretesting with fixed rather than adaptive retrieval might deliver most benefits at lower implementation cost.
Domain-specific follow-ups could check whether the pattern is stronger in reasoning-heavy subjects than in factual ones.

Load-bearing premise

Random assignment after the pretesting phase balanced all relevant individual differences across conditions and the seven-week posttest measured retention without unmeasured outside influences.

What would settle it

A replication finding no posttest difference between the retrieval conditions and the learner-directed condition, or equal retention decay across all three groups after a longer interval.

read the original abstract

Pretesting - attempting problems before instruction - supports learning by activating prior knowledge and sharpening attention to subsequent instruction. Recent work suggests that adaptive AI-assisted pretesting can yield further advantages, particularly for tasks requiring higher-order reasoning, yet it remains unclear whether these gains persist over time. This study examines the durability of learning gains following GenAI-enabled adaptive pretesting over a seven-week retention period. Undergraduate participants completed an adaptive AI-assisted pretesting session, received instruction, and took a baseline assessment, then were randomly assigned to adaptive spaced retrieval practice, fixed spaced retrieval practice, or learner-directed AI-supported study. Multivariate analyses revealed a significant effect of condition on posttest performance and observed practice effort, with retrieval-based conditions outperforming learner-directed study. Findings indicate that adaptive pretesting can elevate initial understanding, but sustained learning depends on how subsequent AI-supported practice is structured.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The study finds retrieval practice sustains gains from AI pretesting better than learner-directed study over seven weeks, but the abstract omits sample details and balance checks.

read the letter

The paper's core result is that after GenAI adaptive pretesting plus instruction, students assigned to either adaptive or fixed spaced retrieval did better on a posttest seven weeks later than those in learner-directed AI study. The randomization to practice condition happened after the pretesting phase, and multivariate tests picked up a condition effect on performance and effort.

It builds on established pretesting and retrieval ideas by moving them into a generative AI setup and adding the retention delay. That combination is the main addition. The design itself is straightforward: same pretesting for everyone, then random split into the three follow-up conditions.

The soft spots sit in the reporting and the long gap. The abstract gives no sample size, effect sizes, power numbers, or exclusion rules. It also does not mention baseline equivalence checks after randomization or any measure of what students did outside the assigned practice during the seven weeks. That leaves open the possibility that motivation differences or extra studying drove the posttest gap rather than the practice structure itself. If the full paper has those checks and reports them clearly, the result strengthens; right now the abstract does not show it.

This is for people working on AI tools in education who want evidence on how to sequence pretesting with later practice. A reader focused on practical learning design would get value from the condition comparison. It deserves a serious referee because the experimental setup is clear and the retention question is useful, even if the current version needs tighter stats and confounder discussion.

Referee Report

2 major / 1 minor

Summary. The paper claims that generative AI-enabled adaptive pretesting elevates initial understanding, but that gains over a seven-week retention interval depend on the structure of subsequent AI-supported practice. After a common adaptive pretesting + instruction sequence, participants were randomly assigned to adaptive spaced retrieval practice, fixed spaced retrieval practice, or learner-directed AI-supported study; multivariate analyses indicated that the two retrieval conditions produced higher posttest performance than learner-directed study.

Significance. If the causal attribution holds, the work supplies actionable guidance for sequencing AI tools in education: pretesting gains require structured retrieval practice to persist. The post-pretesting randomization is a design strength that isolates practice-type effects from pretesting itself.

major comments (2)

[Methods] Methods (randomization and retention procedures): the manuscript reports random assignment after pretesting but supplies no post-randomization baseline equivalence checks, attrition rates, or measures of external study/motivation during the seven-week interval. These omissions are load-bearing for the claim that observed posttest differences are caused by the assigned practice conditions rather than time-varying confounders.
[Results] Results (multivariate analyses): the abstract states a significant effect of condition on posttest performance without reporting sample size, effect sizes, power, or exclusion criteria. This information is required to assess whether the reported superiority of retrieval conditions is statistically and practically meaningful.

minor comments (1)

[Abstract] Abstract: include at least the total N and a brief statement of effect magnitude so readers can gauge the strength of the retention finding without consulting the full text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which highlight important aspects of methodological transparency and statistical reporting. We address each major comment below and will incorporate revisions to strengthen the manuscript.

read point-by-point responses

Referee: [Methods] Methods (randomization and retention procedures): the manuscript reports random assignment after pretesting but supplies no post-randomization baseline equivalence checks, attrition rates, or measures of external study/motivation during the seven-week interval. These omissions are load-bearing for the claim that observed posttest differences are caused by the assigned practice conditions rather than time-varying confounders.

Authors: We agree these details are essential for supporting causal claims about the practice conditions. In the revised manuscript we will add post-randomization baseline equivalence checks on the initial assessment scores across conditions, report attrition rates over the retention interval, and include any available self-report data on external study or motivation. Where data were not collected we will note the limitation explicitly. revision: yes
Referee: [Results] Results (multivariate analyses): the abstract states a significant effect of condition on posttest performance without reporting sample size, effect sizes, power, or exclusion criteria. This information is required to assess whether the reported superiority of retrieval conditions is statistically and practically meaningful.

Authors: We agree that complete reporting is required. We will revise the abstract to include sample size and expand the results section to report effect sizes (partial eta-squared and pairwise Cohen's d), a post-hoc power analysis, and the full exclusion criteria. These additions will allow readers to evaluate both statistical and practical significance. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical RCT with no derivations or fitted predictions

full rationale

The paper is a randomized controlled experiment reporting observed posttest differences after random assignment to practice conditions. It contains no equations, no parameter fitting, no predictions derived from models, and no derivation chain. Claims rest on multivariate statistical tests of group differences, not on any self-referential construction or self-citation load-bearing for a mathematical result. This is the standard case of an empirical study with no opportunity for the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Empirical randomized experiment relying on standard assumptions of experimental design and multivariate statistics; no free parameters, invented entities, or ad-hoc axioms introduced by the paper.

axioms (2)

standard math Standard assumptions underlying multivariate analysis (normality, homogeneity of covariance matrices) hold for the posttest and effort measures.
The abstract reports multivariate analyses without noting any robustness checks or violations.
domain assumption Random assignment after the pretesting phase produced equivalent groups on all unmeasured variables that could affect retention.
The design description in the abstract treats random assignment as sufficient to isolate the effect of practice condition.

pith-pipeline@v0.9.1-grok · 5678 in / 1427 out tokens · 32428 ms · 2026-06-26T10:03:34.717433+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

15 extracted references

[1]

arXiv preprint arXiv:2504.10249 (2025)

Akgun, M., Toker, S.: Struggle first, prompt later: How task complexity shapes learning with genai-assisted pretesting. arXiv preprint arXiv:2504.10249 (2025)

arXiv 2025
[2]

Carpenter, S.K., DeLosh, E.L.: Application of the testing and spacing effects to name learning. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition19(5), 619–636 (2005) Do Gains from Generative AI–Enabled Adaptive Pretesting Persist? 7

2005
[3]

Nature Reviews Psychology1, 496 – 511 (2022), https://api.semanticscholar.org/CorpusID:251294660

Carpenter, S.K., Pan, S.C., Butler, A.C.: The science of effective learning with spacing and retrieval practice. Nature Reviews Psychology1, 496 – 511 (2022), https://api.semanticscholar.org/CorpusID:251294660

2022
[4]

Dover Pub- lications, New York (1913), original work published 1885

Ebbinghaus, H.: Memory: A contribution to experimental psychology. Dover Pub- lications, New York (1913), original work published 1885

1913
[5]

Journal of Applied Research in Memory and Cognition12(3), 431 (2023)

Giebl, S., Mena, S., Sandberg, R., Bjork, E.L., Bjork, R.A.: Thinking first ver- sus googling first: Preferences and consequences. Journal of Applied Research in Memory and Cognition12(3), 431 (2023)

2023
[6]

Psychology Learning & Teaching20(1), 58–75 (2021)

Giebl, S., Mena, S., Storm, B.C., Bjork, E.L., Bjork, R.A.: Answer first or google first? using the internet in ways that enhance, not impair, one’s subsequent reten- tion of needed information. Psychology Learning & Teaching20(1), 58–75 (2021)

2021
[7]

Educational Psychology Review28(4), 853–873 (2016)

Hopkins, R.F., Lyle, K.B., Hieb, J.L., Ralston, P.A.: Spaced retrieval practice in- creases college students’ short-and long-term retention of mathematics knowledge. Educational Psychology Review28(4), 853–873 (2016)

2016
[8]

Journal of experimental psychology: learning, memory, and cognition33(4), 704 (2007)

Karpicke, J.D., Roediger III, H.L.: Expanding retrieval practice promotes short- term retention, but equally spaced retrieval enhances long-term retention. Journal of experimental psychology: learning, memory, and cognition33(4), 704 (2007)

2007
[9]

Journal of Experimental Psychology: Learning, Memory, and Cognition35(4), 989 (2009)

Kornell, N., Hays, M.J., Bjork, R.A.: Unsuccessful retrieval attempts enhance sub- sequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition35(4), 989 (2009)

2009
[10]

Educational Psychology Review33(3), 959–987 (2021)

Latimier, A., Peyre, H., Ramus, F.: A meta-analytic review of the benefit of spacing out retrieval practice episodes on retention. Educational Psychology Review33(3), 959–987 (2021)

2021
[11]

Educational Psychology Review35(4), 97 (2023)

Pan, S.C., Carpenter, S.K.: Prequestioning and pretesting effects: A review of em- pirical research, theoretical perspectives, and implications for educational practice. Educational Psychology Review35(4), 97 (2023)

2023
[12]

Pressley, M., Tanenbaum, R., McDaniel, M.A., Wood, E.: What happens when university students try to answer prequestions that accompany textbook material? Contemporary Educational Psychology15(1), 27–35 (1990)

1990
[13]

Richland, L.E., Kornell, N., Kao, L.S.: The pretesting effect: Do unsuccessful re- trieval attempts enhance learning? Journal of Experimental Psychology: Applied 15(3), 243 (2009)

2009
[14]

Psychological science17(3), 249–255 (2006)

Roediger III, H.L., Karpicke, J.D.: Test-enhanced learning: Taking memory tests improves long-term retention. Psychological science17(3), 249–255 (2006)

2006
[15]

science333(6043), 776–778 (2011)

Sparrow, B., Liu, J., Wegner, D.M.: Google effects on memory: Cognitive conse- quences of having information at our fingertips. science333(6043), 776–778 (2011)

2011

[1] [1]

arXiv preprint arXiv:2504.10249 (2025)

Akgun, M., Toker, S.: Struggle first, prompt later: How task complexity shapes learning with genai-assisted pretesting. arXiv preprint arXiv:2504.10249 (2025)

arXiv 2025

[2] [2]

Carpenter, S.K., DeLosh, E.L.: Application of the testing and spacing effects to name learning. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition19(5), 619–636 (2005) Do Gains from Generative AI–Enabled Adaptive Pretesting Persist? 7

2005

[3] [3]

Nature Reviews Psychology1, 496 – 511 (2022), https://api.semanticscholar.org/CorpusID:251294660

Carpenter, S.K., Pan, S.C., Butler, A.C.: The science of effective learning with spacing and retrieval practice. Nature Reviews Psychology1, 496 – 511 (2022), https://api.semanticscholar.org/CorpusID:251294660

2022

[4] [4]

Dover Pub- lications, New York (1913), original work published 1885

Ebbinghaus, H.: Memory: A contribution to experimental psychology. Dover Pub- lications, New York (1913), original work published 1885

1913

[5] [5]

Journal of Applied Research in Memory and Cognition12(3), 431 (2023)

Giebl, S., Mena, S., Sandberg, R., Bjork, E.L., Bjork, R.A.: Thinking first ver- sus googling first: Preferences and consequences. Journal of Applied Research in Memory and Cognition12(3), 431 (2023)

2023

[6] [6]

Psychology Learning & Teaching20(1), 58–75 (2021)

Giebl, S., Mena, S., Storm, B.C., Bjork, E.L., Bjork, R.A.: Answer first or google first? using the internet in ways that enhance, not impair, one’s subsequent reten- tion of needed information. Psychology Learning & Teaching20(1), 58–75 (2021)

2021

[7] [7]

Educational Psychology Review28(4), 853–873 (2016)

Hopkins, R.F., Lyle, K.B., Hieb, J.L., Ralston, P.A.: Spaced retrieval practice in- creases college students’ short-and long-term retention of mathematics knowledge. Educational Psychology Review28(4), 853–873 (2016)

2016

[8] [8]

Journal of experimental psychology: learning, memory, and cognition33(4), 704 (2007)

Karpicke, J.D., Roediger III, H.L.: Expanding retrieval practice promotes short- term retention, but equally spaced retrieval enhances long-term retention. Journal of experimental psychology: learning, memory, and cognition33(4), 704 (2007)

2007

[9] [9]

Journal of Experimental Psychology: Learning, Memory, and Cognition35(4), 989 (2009)

Kornell, N., Hays, M.J., Bjork, R.A.: Unsuccessful retrieval attempts enhance sub- sequent learning. Journal of Experimental Psychology: Learning, Memory, and Cognition35(4), 989 (2009)

2009

[10] [10]

Educational Psychology Review33(3), 959–987 (2021)

Latimier, A., Peyre, H., Ramus, F.: A meta-analytic review of the benefit of spacing out retrieval practice episodes on retention. Educational Psychology Review33(3), 959–987 (2021)

2021

[11] [11]

Educational Psychology Review35(4), 97 (2023)

Pan, S.C., Carpenter, S.K.: Prequestioning and pretesting effects: A review of em- pirical research, theoretical perspectives, and implications for educational practice. Educational Psychology Review35(4), 97 (2023)

2023

[12] [12]

Pressley, M., Tanenbaum, R., McDaniel, M.A., Wood, E.: What happens when university students try to answer prequestions that accompany textbook material? Contemporary Educational Psychology15(1), 27–35 (1990)

1990

[13] [13]

Richland, L.E., Kornell, N., Kao, L.S.: The pretesting effect: Do unsuccessful re- trieval attempts enhance learning? Journal of Experimental Psychology: Applied 15(3), 243 (2009)

2009

[14] [14]

Psychological science17(3), 249–255 (2006)

Roediger III, H.L., Karpicke, J.D.: Test-enhanced learning: Taking memory tests improves long-term retention. Psychological science17(3), 249–255 (2006)

2006

[15] [15]

science333(6043), 776–778 (2011)

Sparrow, B., Liu, J., Wegner, D.M.: Google effects on memory: Cognitive conse- quences of having information at our fingertips. science333(6043), 776–778 (2011)

2011