Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

Eric Cosyn; Eyad Kurd-Misto; Hasan Uzun; Jeffrey Matayoshi; Sina Rismanchian

arxiv: 2605.21629 · v1 · pith:HQW7QOWOnew · submitted 2026-05-20 · 💻 cs.CY · cs.AI· cs.HC

Faster Completion, Less Learning: Generative AI Reduced Study Time on Math Problems and the Knowledge They Build

Sina Rismanchian , Hasan Uzun , Jeffrey Matayoshi , Eric Cosyn , Eyad Kurd-Misto This is my paper

Pith reviewed 2026-05-22 09:10 UTC · model grok-4.3

classification 💻 cs.CY cs.AIcs.HC

keywords generative AImathematics educationstudy timeknowledge retentionproctored assessmentquasi-experimental designcognitive offloadinglearning outcomes

0 comments

The pith

Generative AI reduces time spent on math problems by up to 31 percent and lowers odds of correct retention by 25 percent on proctored tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tracks how student learning behavior changed after the release of ChatGPT by comparing problems that can be copied into an AI prompt against those that require interactive platform work inside the same curriculum. Time on the easier-to-offload problems falls steadily for college and high-school students, adding up to roughly one-quarter less study time over nearly three years, while middle-school effects are smaller and fifth-graders show none. When the same students are tested under proctoring that blocks AI, the time savings disappear and performance on randomly assigned retention items declines sharply. The pattern is inconsistent with simple platform improvements or cohort shifts, pointing instead to widespread substitution of AI for personal problem-solving.

Core claim

After ChatGPT, time on AI-susceptible text problems declines 2.8 percent per quarter among college students, reaching a 26.9 percent cumulative drop over eleven quarters; high-school students show 31.3 percent, middle-school students 9.0 percent, and fifth-graders no change. Under proctoring the time divergence vanishes. Logistic fixed-effects models on randomly assigned proctored retention items register a 25 percent cumulative decline in odds of correct response, while the identical estimator on non-proctored assessment shows a large increase.

What carries the argument

Quasi-experimental contrast between text-based word problems (transcribable into AI prompts) and graph-based problems (requiring live platform manipulation) within the same curriculum sequence.

If this is right

Proctored assessments become necessary to measure actual knowledge rather than AI-assisted performance.
Placement and progress tests that rely on unproctored results will overstate student readiness.
Curriculum sequences built on cumulative mastery may need redesign if earlier topics are skipped via AI.
Policy discussions about AI in education must weigh measurable losses in long-term retention against short-term efficiency gains.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Curricula could deliberately increase the share of interactive, non-transcribable tasks to limit offloading.
Longer-term studies could test whether the retention gap closes when AI access is later restricted.
Similar designs could be applied to other subjects to check whether the pattern is specific to mathematics problem solving.

Load-bearing premise

Any post-ChatGPT divergence between the two problem types is produced by AI use rather than unmeasured shifts in teaching, student habits, or platform design.

What would settle it

If a new cohort shows no decline in proctored retention odds on text-based items relative to graph-based items after the same time period, the claim of reduced durable learning from AI substitution would not hold.

Figures

Figures reproduced from arXiv: 2605.21629 by Eric Cosyn, Eyad Kurd-Misto, Hasan Uzun, Jeffrey Matayoshi, Sina Rismanchian.

**Figure 1.** Figure 1: Event-study estimates of differential learning-time trend by age group. Pre-ChatGPT coefficients are small in magnitude across all age groups; College and High School exhibit a small positive pre-trend (relative slowing of AI-susceptible items) opposite in sign to the post-ChatGPT effect, which makes the unadjusted estimates potentially conservative (SI Appendix, Section G). After ChatGPT’s release, a mon… view at source ↗

**Figure 2.** Figure 2: Event-study estimates by proctoring condition (ALEKS PPL response time). In non-proctored assessments, response times for AI-susceptible topics decline monotonically after ChatGPT’s release; in proctored assessments, the post-ChatGPT trend is flat and statistically indistinguishable from zero under every inference method applied. Error bars are 95% cluster-robust confidence intervals. Crucially, learning t… view at source ↗

**Figure 3.** Figure 3: Examples of AI-susceptible and AI-resistant ALEKS items. AI-susceptible items (a & b) are text-based word problems whose full informational content can be transcribed into a natural-language prompt and solved by a large language model in seconds. AI-resistant items (c & d) require visual interpretation of graphical displays and interactive manipulation of plot widgets embedded in the ALEKS interface, makin… view at source ↗

read the original abstract

How much have students' ordinary learning processes shifted in response to generative AI, and how does that affect their durable learning outcomes? Self-report surveys show little change, while small-scale behavioral studies report widespread AI use without the scale or duration to measure learning consequences. We address both questions using a ten-year panel of $3.2$ million ALEKS learning interactions for the time-on-task analysis, complemented by ALEKS PPL placement-assessment data for the proctoring and retention analyses, with a quasi-experimental design exploiting within-curriculum variation in AI susceptibility: text-based word problems transcribable into AI prompts serve as the treated group; graph-based problems requiring interactive platform manipulation as the comparison. Learning time on AI-susceptible problems declines $2.8\%$ per quarter among college students after ChatGPT's release, cumulating to $26.9\%$ over eleven quarters; high-schoolers show $31.3\%$, middle-schoolers $9.0\%$, and Grade 5 students no detectable change. The divergence vanishes entirely under proctoring for college students, making general efficiency gains unlikely. Logistic fixed-effects models on randomly assigned proctored retention items yield a $25\%$ cumulative decline in odds of correct response; the same estimator on non-proctored assessment produces a large opposite-signed increase -- inconsistent with any platform, cohort, or curriculum explanation. These results are among the first large-scale behavioral and outcome evidence that generative AI has altered how students study and the knowledge they build -- the population-level indicator of \emph{cognitive surrender}, with direct implications for educational research, assessment governance, and AI policy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows post-ChatGPT drops in study time on text math problems and lower proctored retention using a large ALEKS panel and a text-versus-graph split, but the results rest on that split cleanly capturing AI effects.

read the letter

The main thing to know is that this paper reports clear drops in time spent on text-based word problems after ChatGPT launched, along with lower odds of correct answers on proctored retention items, drawn from millions of ALEKS interactions over a decade. The design splits problems inside the same curriculum, treating text ones as easy to paste into AI and graph ones as harder to offload because they need platform interaction. Time on task for college students on the text problems falls 2.8 percent per quarter and totals about 27 percent over eleven quarters, with smaller effects in high school and none in grade 5. The gap disappears under proctoring, which undercuts simple stories about platform-wide efficiency gains. The retention analysis uses logistic fixed-effects models on randomly assigned proctored items and finds a 25 percent cumulative drop in odds of getting the answer right, while non-proctored assessments show the opposite increase. That contrast helps rule out broad changes in test difficulty or student cohorts. What the paper does well is bring population-scale behavioral data to a question that has mostly been studied with small surveys or self-reports. The proctoring check is a practical way to test alternatives, and the within-curriculum contrast avoids some of the usual selection problems in AI studies. The soft spot is the assumption that text and graph problems would have moved in parallel without AI. If curriculum tweaks, platform scoring updates, or shifts in student engagement hit the two groups differently around late 2022, the divergence could reflect those factors instead. The abstract gives effect sizes but leaves out the exact model equations, sample cuts, and full robustness checks, so those details will decide how much the numbers can carry. This work is for education researchers and policy people who need large-scale evidence on how AI changes study habits and durable knowledge. Readers who care about assessment design or real-world AI effects will get value from the scale and the proctoring contrast. It deserves a serious referee because the data volume and the attempt at a clean comparison give it traction on a live question, even if revisions are needed to tighten the identification. I would send it out for review.

Referee Report

3 major / 2 minor

Summary. The paper uses a ten-year panel of 3.2 million ALEKS interactions and placement-assessment data in a quasi-experimental design that treats text-based word problems as AI-susceptible and graph-based problems as non-susceptible. It reports post-ChatGPT declines in time-on-task (2.8% per quarter for college students, cumulating to 26.9%) that vanish under proctoring, and a 25% cumulative decline in odds of correct response on randomly assigned proctored retention items via logistic fixed-effects models, while non-proctored assessment shows an opposite increase; the authors interpret this as evidence of cognitive surrender induced by generative AI.

Significance. If the identification holds, the study supplies large-scale behavioral and outcome evidence on how generative AI alters study time and durable knowledge in mathematics, with direct implications for assessment design, curriculum policy, and regulation of AI tools in education. The proctoring contrast and within-curriculum variation are strengths that help rule out some platform-wide confounds.

major comments (3)

The identification rests on the assumption that text-based and graph-based problems would have followed parallel trends absent ChatGPT and that no other post-2022 shocks (curriculum changes, platform scoring updates, or differential engagement) affect the two groups differently. The manuscript does not report explicit pre-trend tests, placebo periods, or robustness checks that reclassify problems or interact with other time-varying covariates; without these, the 25% retention decline and time reductions cannot be cleanly attributed to AI use rather than correlated unobservables.
The logistic fixed-effects models are described only at a high level in the abstract and results; the manuscript provides neither the exact specification (e.g., the form of the fixed effects, clustering, or handling of multiple observations per student), nor robustness tables showing sensitivity to alternative estimators or sample restrictions. This omission makes it impossible to evaluate whether the reported odds ratio is load-bearing or sensitive to modeling choices.
The claim that the proctoring contrast rules out general efficiency gains is plausible but incomplete: the manuscript does not show whether the proctored subsample is representative of the full population or whether proctoring itself interacts with problem type in ways that could mechanically alter time or retention independent of AI.

minor comments (2)

The abstract and results would benefit from a table or figure that directly displays the quarterly time-on-task coefficients by grade band and problem type, with confidence intervals and the exact number of observations per cell.
Notation for the cumulative decline (26.9% over eleven quarters) should be tied explicitly to the quarterly rate and the functional form used (e.g., whether it is a linear trend or exponential).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments on our manuscript. We address each major comment below, providing clarifications on our identification strategy and committing to revisions that enhance transparency without altering the core findings.

read point-by-point responses

Referee: The identification rests on the assumption that text-based and graph-based problems would have followed parallel trends absent ChatGPT and that no other post-2022 shocks (curriculum changes, platform scoring updates, or differential engagement) affect the two groups differently. The manuscript does not report explicit pre-trend tests, placebo periods, or robustness checks that reclassify problems or interact with other time-varying covariates; without these, the 25% retention decline and time reductions cannot be cleanly attributed to AI use rather than correlated unobservables.

Authors: We agree that explicit documentation of pre-trends would strengthen the parallel trends assumption. Our design already incorporates within-curriculum variation and the sharp proctoring contrast (where time reductions vanish and retention effects reverse in non-proctored settings) to address many alternative explanations such as platform-wide changes. Nevertheless, we will add formal pre-trend tests, placebo analyses on pre-ChatGPT periods, and robustness checks with alternative problem reclassifications and time-varying covariates in the revised manuscript. revision: yes
Referee: The logistic fixed-effects models are described only at a high level in the abstract and results; the manuscript provides neither the exact specification (e.g., the form of the fixed effects, clustering, or handling of multiple observations per student), nor robustness tables showing sensitivity to alternative estimators or sample restrictions. This omission makes it impossible to evaluate whether the reported odds ratio is load-bearing or sensitive to modeling choices.

Authors: We acknowledge that the current manuscript describes the logistic fixed-effects models at a high level. The specification includes student fixed effects, quarter fixed effects, and problem fixed effects, with standard errors clustered at the student level to account for multiple observations per student. We will expand the methods section with the precise equation, variable definitions, and additional robustness tables (including alternative estimators and sample restrictions) in the revision. revision: yes
Referee: The claim that the proctoring contrast rules out general efficiency gains is plausible but incomplete: the manuscript does not show whether the proctored subsample is representative of the full population or whether proctoring itself interacts with problem type in ways that could mechanically alter time or retention independent of AI.

Authors: The ALEKS PPL proctored assessments are randomly assigned, supporting representativeness, and the reversal of effects in non-proctored settings is inconsistent with mechanical proctoring interactions. We agree that explicit checks would be valuable and will add balance tables comparing proctored versus non-proctored samples on observables as well as tests for proctoring-by-problem-type interactions in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No significant circularity in quasi-experimental design

full rationale

The paper's derivation relies on an external timing shock (ChatGPT release) and a within-curriculum classification of problems into AI-susceptible (text-based) versus non-susceptible (graph-based) groups, with logistic fixed-effects models applied to randomly assigned proctored retention items and a proctoring check that eliminates the divergence. No equations reduce reported declines or odds ratios to quantities defined by the same fitted parameters or outcomes; the analysis does not invoke self-citations for uniqueness, smuggle ansatzes, or rename known results as new derivations. The central estimates are therefore self-contained against external benchmarks such as the release date and proctoring status.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claims rest on the assumption that differences between text-based and graph-based problems isolate AI use, plus the timing of ChatGPT release as an exogenous shock. No new entities are postulated.

free parameters (1)

quarterly decline rate
The 2.8% per quarter decline for college students is estimated from the post-ChatGPT time series.

axioms (1)

domain assumption Text-based word problems can be directly transcribed into AI prompts while graph-based problems cannot
This distinction defines the treated and comparison groups in the quasi-experimental design.

pith-pipeline@v0.9.0 · 5856 in / 1309 out tokens · 59617 ms · 2026-05-22T09:10:32.268779+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages · 1 internal anchor

[1]

Generative AI Without Guardrails Can Harm Learning: Evidence from High School Mathemat- ics

doi: 10.1073/pnas.2422633122. 12 Preprint. Under review. John Bound, Charles Brown, and Nancy Mathiowetz. Measurement error in survey data. InHandbook of econometrics, volume 5, pp. 3705–3843. Elsevier,

work page doi:10.1073/pnas.2422633122
[2]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

URL https://arxiv.org/abs/2303.12712. A Colin Cameron and Douglas L Miller. A practitioner’s guide to cluster-robust inference.Journal of human resources, 50(2):317–372,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Ruishi Chen, Victor R Lee, Annie Camey Kuo, Denise Clark Pope, and Sarah Miles

doi: 10.1162/rest.90.3.414. Ruishi Chen, Victor R Lee, Annie Camey Kuo, Denise Clark Pope, and Sarah Miles. Cheating in the second year of generative AI chatbots: a follow-up study on high school student cheating behaviors. Educational technology research and development, 74:649–667,

work page doi:10.1162/rest.90.3.414
[4]

Michelene TH Chi and Ruth Wylie

doi: 10.1007/s11423-026-10587-1. Michelene TH Chi and Ruth Wylie. The ICAP framework: Linking cognitive engagement to active learning outcomes.Educational psychologist, 49(4):219–243,

work page doi:10.1007/s11423-026-10587-1
[5]

Masset, R

doi: 10.1016/j. jmp.2021.102512. Christopher Doble, Jeffrey Matayoshi, Eric Cosyn, Hasan Uzun, and Arash Karami. A data-based simula- tion study of reliability for an adaptive assessment based on knowledge space theory.International Journal of Artificial Intelligence in Education, 29(2):258–282,

work page doi:10.1016/j 2021
[6]

Manu Kapur

doi: 10.1007/978-3-642-58625-5. Manu Kapur. Productive failure in mathematical problem solving.Instructional science, 38(6):523–550,

work page doi:10.1007/978-3-642-58625-5
[7]

Grace Liu, Brian Christian, Tsvetomira Dumbalska, Michiel A Bakker, and Rachit Dubey

doi: 10.1016/j.caeai.2024.100253. Grace Liu, Brian Christian, Tsvetomira Dumbalska, Michiel A Bakker, and Rachit Dubey. AI assistance reduces persistence and hurts independent performance

work page doi:10.1016/j.caeai.2024.100253 2024
[8]

Intelligent tutoring systems and learning outcomes: A meta-analysis,

doi: 10.1037/a0037123. 13 Preprint. Under review. James G. MacKinnon and Matthew D. Webb. Wild bootstrap inference for wildly different cluster sizes. Journal of Applied Econometrics, 32(2):233–254,

work page doi:10.1037/a0037123
[9]

Donald L

doi: 10.1002/jae.2508. Donald L. McCabe and Linda Klebe Trevino. Academic dishonesty: Honor codes and other contextual influences.The Journal of Higher Education, 64(5):522–538,

work page doi:10.1002/jae.2508
[10]

Donald L

doi: 10.1080/00221546.1993.11778446. Donald L. McCabe and Linda Klebe Trevino. Individual and contextual influences on academic dishonesty: A multicampus investigation.Research in Higher Education, 38(3):379–396,

work page doi:10.1080/00221546.1993.11778446 1993
[11]

Donald L

doi: 10.1023/A:1024954224675. Donald L. McCabe, Kenneth D. Butterfield, and Linda K. Trevi˜no.Cheating in College: Why Students Do It and What Educators Can Do about It. The Johns Hopkins University Press, Baltimore,

work page doi:10.1023/a:1024954224675
[12]

Duncan Pritchard

doi: 10.1007/s10639-024-12495-4. Duncan Pritchard. Why technology doesn’t normally make you dumber, but agentic ai will.International Journal of Human–Computer Interaction, 0(0):1–11,

work page doi:10.1007/s10639-024-12495-4
[13]

URL https://doi.org/10.1080/10447318.2026.2631678

doi: 10.1080/10447318.2026.2631678. URL https://doi.org/10.1080/10447318.2026.2631678. Justin Reich and Jesse Dukes. The future of education technology after the arrival of ChatGPT.Phi Delta Kappan, 107(3-4):19–23,

work page doi:10.1080/10447318.2026.2631678 2026
[14]

Leonhard Reiter, Moritz Joerling, Christoph Fuchs, and Robert B¨ohm

doi: 10.1177/00317217251405516. Leonhard Reiter, Moritz Joerling, Christoph Fuchs, and Robert B¨ohm. Student (Mis)Use of generative AI tools for university-related tasks.International Journal of Human–Computer Interaction, 41(19):12390– 12403,

work page doi:10.1177/00317217251405516
[15]

doi: 10.1080/10447318.2025.2462083. Evan F. Risko and Sam J. Gilbert. Cognitive offloading.Trends in Cognitive Sciences, 20(9):676–688,

work page doi:10.1080/10447318.2025.2462083 2025
[16]

Sina Rismanchian, Peter Liu, Gabe Avakian Orona, Duncan Pritchard, and Shayan Doroudi

doi: 10.1016/j.tics.2016.07.002. Sina Rismanchian, Peter Liu, Gabe Avakian Orona, Duncan Pritchard, and Shayan Doroudi. Artificial integrity: Concerning patterns of AI usage among undergraduate students. EdArXiv preprint,

work page doi:10.1016/j.tics.2016.07.002 2016
[17]

Everett M

doi: 10.35542/osf.io/exm5a v2. Everett M. Rogers.Diffusion of Innovations. Free Press, New York, NY, 5 edition,

work page doi:10.35542/osf.io/exm5a
[18]

doi: 10.31234/osf.io/yk25n v1. Supplementary Information This appendix reports the battery of ten robustness analyses (R1–R10) applied to each primary specifi- cation in the main text, along with two post-hoc sensitivity analyses (R9, R10) and supporting details on sample construction. The ten analyses are: (R1) a functional-form horse race comparing step...

work page doi:10.31234/osf.io/yk25n 2022
[19]

Because this drift is opposite in sign to the post-ChatGPT effect, a trend-adjusted specification would yield a larger negative estimate, not a smaller one

The College and High School learning-time subsets exhibit a small positive pre-trend across all four placebo windows — AI-susceptible word problems were becoming relativelyslowerthan AI-resistant graph problems in the pre-ChatGPT era. Because this drift is opposite in sign to the post-ChatGPT effect, a trend-adjusted specification would yield a larger neg...

work page 2023
[20]

Items in strata

Retention and proctored PPL subsets yield null placebos at every break date; College and High School learning-time subsets yield small positive placebos opposite in sign to the post-ChatGPT effect (see R4). Subset 2018 2019 2020 2021 LearningTime College+0.0071 ∗∗∗ +0.0063∗∗ +0.0055∗ +0.0048 LearningTime HS+0.0046 ∗∗ +0.0041∗∗ +0.0036∗ +0.0031 PPLTime non...

work page arXiv 2018
[21]

The randomly assigned retention subsam- ple produces a bootstrap CI that excludes the null (0.56–0.98)

Delta-method and bootstrap CIs agree closely in every subset. The randomly assigned retention subsam- ple produces a bootstrap CI that excludes the null (0.56–0.98). 21 Preprint. Under review. Table 9:R6 Cumulative-effect95%confidence intervals.Cumulative effects over eleven post-ChatGPT quarters, with both delta-method and cluster-bootstrap 95% CIs. For ...

work page 2020
[22]

Estimates are stable in sign and magnitude across every cut in every subset

Table 10:R7 Window-sensitivity cuts.Per-quarter ramp coefficient β under three window-sensitivity cuts. Estimates are stable in sign and magnitude across every cut in every subset. Subset BaselineβDrop COVID Drop last quarter Floor=100 LearningTime College−0.0284−0.0277−0.0280−0.0291 LearningTime HS−0.0341−0.0334−0.0338−0.0348 PPLTime nonproc−0.0112−0.010...

work page 2008

[1] [1]

Generative AI Without Guardrails Can Harm Learning: Evidence from High School Mathemat- ics

doi: 10.1073/pnas.2422633122. 12 Preprint. Under review. John Bound, Charles Brown, and Nancy Mathiowetz. Measurement error in survey data. InHandbook of econometrics, volume 5, pp. 3705–3843. Elsevier,

work page doi:10.1073/pnas.2422633122

[2] [2]

Sparks of Artificial General Intelligence: Early experiments with GPT-4

URL https://arxiv.org/abs/2303.12712. A Colin Cameron and Douglas L Miller. A practitioner’s guide to cluster-robust inference.Journal of human resources, 50(2):317–372,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Ruishi Chen, Victor R Lee, Annie Camey Kuo, Denise Clark Pope, and Sarah Miles

doi: 10.1162/rest.90.3.414. Ruishi Chen, Victor R Lee, Annie Camey Kuo, Denise Clark Pope, and Sarah Miles. Cheating in the second year of generative AI chatbots: a follow-up study on high school student cheating behaviors. Educational technology research and development, 74:649–667,

work page doi:10.1162/rest.90.3.414

[4] [4]

Michelene TH Chi and Ruth Wylie

doi: 10.1007/s11423-026-10587-1. Michelene TH Chi and Ruth Wylie. The ICAP framework: Linking cognitive engagement to active learning outcomes.Educational psychologist, 49(4):219–243,

work page doi:10.1007/s11423-026-10587-1

[5] [5]

Masset, R

doi: 10.1016/j. jmp.2021.102512. Christopher Doble, Jeffrey Matayoshi, Eric Cosyn, Hasan Uzun, and Arash Karami. A data-based simula- tion study of reliability for an adaptive assessment based on knowledge space theory.International Journal of Artificial Intelligence in Education, 29(2):258–282,

work page doi:10.1016/j 2021

[6] [6]

Manu Kapur

doi: 10.1007/978-3-642-58625-5. Manu Kapur. Productive failure in mathematical problem solving.Instructional science, 38(6):523–550,

work page doi:10.1007/978-3-642-58625-5

[7] [7]

Grace Liu, Brian Christian, Tsvetomira Dumbalska, Michiel A Bakker, and Rachit Dubey

doi: 10.1016/j.caeai.2024.100253. Grace Liu, Brian Christian, Tsvetomira Dumbalska, Michiel A Bakker, and Rachit Dubey. AI assistance reduces persistence and hurts independent performance

work page doi:10.1016/j.caeai.2024.100253 2024

[8] [8]

Intelligent tutoring systems and learning outcomes: A meta-analysis,

doi: 10.1037/a0037123. 13 Preprint. Under review. James G. MacKinnon and Matthew D. Webb. Wild bootstrap inference for wildly different cluster sizes. Journal of Applied Econometrics, 32(2):233–254,

work page doi:10.1037/a0037123

[9] [9]

Donald L

doi: 10.1002/jae.2508. Donald L. McCabe and Linda Klebe Trevino. Academic dishonesty: Honor codes and other contextual influences.The Journal of Higher Education, 64(5):522–538,

work page doi:10.1002/jae.2508

[10] [10]

Donald L

doi: 10.1080/00221546.1993.11778446. Donald L. McCabe and Linda Klebe Trevino. Individual and contextual influences on academic dishonesty: A multicampus investigation.Research in Higher Education, 38(3):379–396,

work page doi:10.1080/00221546.1993.11778446 1993

[11] [11]

Donald L

doi: 10.1023/A:1024954224675. Donald L. McCabe, Kenneth D. Butterfield, and Linda K. Trevi˜no.Cheating in College: Why Students Do It and What Educators Can Do about It. The Johns Hopkins University Press, Baltimore,

work page doi:10.1023/a:1024954224675

[12] [12]

Duncan Pritchard

doi: 10.1007/s10639-024-12495-4. Duncan Pritchard. Why technology doesn’t normally make you dumber, but agentic ai will.International Journal of Human–Computer Interaction, 0(0):1–11,

work page doi:10.1007/s10639-024-12495-4

[13] [13]

URL https://doi.org/10.1080/10447318.2026.2631678

doi: 10.1080/10447318.2026.2631678. URL https://doi.org/10.1080/10447318.2026.2631678. Justin Reich and Jesse Dukes. The future of education technology after the arrival of ChatGPT.Phi Delta Kappan, 107(3-4):19–23,

work page doi:10.1080/10447318.2026.2631678 2026

[14] [14]

Leonhard Reiter, Moritz Joerling, Christoph Fuchs, and Robert B¨ohm

doi: 10.1177/00317217251405516. Leonhard Reiter, Moritz Joerling, Christoph Fuchs, and Robert B¨ohm. Student (Mis)Use of generative AI tools for university-related tasks.International Journal of Human–Computer Interaction, 41(19):12390– 12403,

work page doi:10.1177/00317217251405516

[15] [15]

doi: 10.1080/10447318.2025.2462083. Evan F. Risko and Sam J. Gilbert. Cognitive offloading.Trends in Cognitive Sciences, 20(9):676–688,

work page doi:10.1080/10447318.2025.2462083 2025

[16] [16]

Sina Rismanchian, Peter Liu, Gabe Avakian Orona, Duncan Pritchard, and Shayan Doroudi

doi: 10.1016/j.tics.2016.07.002. Sina Rismanchian, Peter Liu, Gabe Avakian Orona, Duncan Pritchard, and Shayan Doroudi. Artificial integrity: Concerning patterns of AI usage among undergraduate students. EdArXiv preprint,

work page doi:10.1016/j.tics.2016.07.002 2016

[17] [17]

Everett M

doi: 10.35542/osf.io/exm5a v2. Everett M. Rogers.Diffusion of Innovations. Free Press, New York, NY, 5 edition,

work page doi:10.35542/osf.io/exm5a

[18] [18]

doi: 10.31234/osf.io/yk25n v1. Supplementary Information This appendix reports the battery of ten robustness analyses (R1–R10) applied to each primary specifi- cation in the main text, along with two post-hoc sensitivity analyses (R9, R10) and supporting details on sample construction. The ten analyses are: (R1) a functional-form horse race comparing step...

work page doi:10.31234/osf.io/yk25n 2022

[19] [19]

Because this drift is opposite in sign to the post-ChatGPT effect, a trend-adjusted specification would yield a larger negative estimate, not a smaller one

The College and High School learning-time subsets exhibit a small positive pre-trend across all four placebo windows — AI-susceptible word problems were becoming relativelyslowerthan AI-resistant graph problems in the pre-ChatGPT era. Because this drift is opposite in sign to the post-ChatGPT effect, a trend-adjusted specification would yield a larger neg...

work page 2023

[20] [20]

Items in strata

Retention and proctored PPL subsets yield null placebos at every break date; College and High School learning-time subsets yield small positive placebos opposite in sign to the post-ChatGPT effect (see R4). Subset 2018 2019 2020 2021 LearningTime College+0.0071 ∗∗∗ +0.0063∗∗ +0.0055∗ +0.0048 LearningTime HS+0.0046 ∗∗ +0.0041∗∗ +0.0036∗ +0.0031 PPLTime non...

work page arXiv 2018

[21] [21]

The randomly assigned retention subsam- ple produces a bootstrap CI that excludes the null (0.56–0.98)

Delta-method and bootstrap CIs agree closely in every subset. The randomly assigned retention subsam- ple produces a bootstrap CI that excludes the null (0.56–0.98). 21 Preprint. Under review. Table 9:R6 Cumulative-effect95%confidence intervals.Cumulative effects over eleven post-ChatGPT quarters, with both delta-method and cluster-bootstrap 95% CIs. For ...

work page 2020

[22] [22]

Estimates are stable in sign and magnitude across every cut in every subset

Table 10:R7 Window-sensitivity cuts.Per-quarter ramp coefficient β under three window-sensitivity cuts. Estimates are stable in sign and magnitude across every cut in every subset. Subset BaselineβDrop COVID Drop last quarter Floor=100 LearningTime College−0.0284−0.0277−0.0280−0.0291 LearningTime HS−0.0341−0.0334−0.0338−0.0348 PPLTime nonproc−0.0112−0.010...

work page 2008