Little Impact of ChatGPT Availability on High School Student Test Score Performance

Nick Huntington-Klein

arxiv: 2605.08812 · v2 · pith:PBAR5VJOnew · submitted 2026-05-09 · 💰 econ.GN · q-fin.EC

Little Impact of ChatGPT Availability on High School Student Test Score Performance

Nick Huntington-Klein This is my paper

Pith reviewed 2026-05-19 15:18 UTC · model grok-4.3

classification 💰 econ.GN q-fin.EC

keywords ChatGPTAI in educationhigh school test scorescausal identificationsummer activity dropeducational technologystudent achievement

0 comments

The pith

ChatGPT availability had no meaningful impact on high school test scores.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper identifies areas with heavy educational use of AI by tracking the drop in ChatGPT activity during non-school summer months. It then compares high school test score averages across areas that differ in this identified AI use intensity. The result shows no detectable shift in scores in either direction. This matters because it examines how students actually use AI outside of lab settings, rather than in controlled experiments with special tools.

Core claim

The dropoff in ChatGPT activity during summer months serves to mark regions where students rely on the tool for schoolwork, and variation in that dropoff produces no meaningful difference in average high school test scores.

What carries the argument

Summer dropoff in ChatGPT activity used to identify intensity of educational AI use across geographic areas.

If this is right

Any student use of AI to skip learning either has little effect on tested performance or is offset by beneficial uses in the aggregate.
Aggregate high school test outcomes remain largely unchanged despite widespread AI availability.
Net effects of real-world AI adoption on measured achievement appear small for this population and outcome.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same summer-variation approach could be applied to other outcomes such as course grades or college entry metrics.
If the result holds, school policies focused on AI restrictions may have limited leverage on standardized test performance.
Subject-by-subject or demographic breakdowns could reveal effects masked in the overall averages.

Load-bearing premise

The summer drop in ChatGPT activity marks places with heavy educational AI use and is not driven by other time-varying factors that also affect test scores.

What would settle it

A detectable rise or fall in test scores in areas showing larger summer drops in ChatGPT activity compared with areas showing smaller drops would contradict the no-impact finding.

read the original abstract

In educational settings, AI can be used as a learning aid, but can also be used to avoid schoolwork, thereby passing classes while learning little. Many existing studies on the impact of AI on education focus on AI use in controlled settings or with specialized tools. In this paper, the dropoff in ChatGPT activity during non-school summer months in 2023 and 2024 is used to identify areas with heavy educational AI use and thus estimate the educational impact of AI as it is actually used. I find no meaningful impact of AI usage on high school test score averages in either direction. These results imply that, to the extent that high school students use AI to avoid learning, it either does not matter much for their test performance or is cancelled out by positive uses of AI in the aggregate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper uses the dropoff in ChatGPT query activity during non-school summer months in 2023 and 2024 to identify geographic areas with heavy educational AI use by high school students. It then estimates the effect of this usage intensity on average high school test scores and reports no meaningful impact in either direction, concluding that positive and negative uses of AI largely cancel out or are too small to affect aggregate performance.

Significance. If the identification holds, the null result would indicate that real-world student access to generative AI has had little net effect on standardized test outcomes. This provides field evidence complementary to lab studies and could inform education policy by suggesting that fears of widespread learning displacement via AI may be overstated at the aggregate level.

major comments (1)

The central identification strategy treats the summer decline in ChatGPT activity as a valid proxy for cross-area differences in educational (versus recreational or professional) AI use. This requires that any non-educational seasonal patterns are either uniform across regions or uncorrelated with test-score determinants after controls. Without additional tests or evidence ruling out confounders such as regional differences in family travel, leisure patterns, or summer school resources that could also affect test preparation, the zero-effect finding on high-school test averages remains vulnerable to omitted-variable bias.

minor comments (2)

The manuscript should report exact sample sizes, data sources for both ChatGPT usage and test-score outcomes, and the precise regression specification (including all controls and fixed effects) to allow readers to assess the power and robustness of the null result.
Adding a table or appendix with robustness checks for alternative summer-period definitions or geographic subsamples would help address concerns about post-hoc period or geography selection.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive comments on our manuscript. We address the major concern regarding the identification strategy below.

read point-by-point responses

Referee: The central identification strategy treats the summer decline in ChatGPT activity as a valid proxy for cross-area differences in educational (versus recreational or professional) AI use. This requires that any non-educational seasonal patterns are either uniform across regions or uncorrelated with test-score determinants after controls. Without additional tests or evidence ruling out confounders such as regional differences in family travel, leisure patterns, or summer school resources that could also affect test preparation, the zero-effect finding on high-school test averages remains vulnerable to omitted-variable bias.

Authors: We agree that the validity of our proxy depends on non-educational seasonal patterns being either uniform or uncorrelated with test-score determinants after controls. The manuscript already incorporates area-level controls for income, education, population density, and urban status, which proxy for many differences in leisure, travel, and summer activities. We have also verified that the summer usage dropoff shows limited correlation with available proxies for tourism and mobility. That said, we acknowledge that these controls may not capture all unobserved regional heterogeneity. In the revision we will add a dedicated robustness subsection with further specifications incorporating state fixed effects, additional economic activity measures, and explicit discussion of summer school and travel patterns as potential confounders. revision: partial

Circularity Check

0 steps flagged

No circularity: identification uses independent seasonal usage data

full rationale

The paper's core strategy relies on observed summer dropoffs in ChatGPT queries as a proxy for cross-area differences in educational AI intensity. This proxy is constructed from usage data alone and is not fitted to, defined in terms of, or derived from the test-score outcomes. No equations reduce the impact estimate to a tautology or self-citation chain; the variation is treated as exogenous after controls. The analysis is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The design rests on one core domain assumption about what the summer usage drop measures and on standard econometric assumptions about parallel trends or no confounding.

free parameters (1)

regression coefficients on AI-use proxy
Estimated in the main specification relating summer dropoff intensity to test-score outcomes.

axioms (1)

domain assumption Summer dropoff in ChatGPT activity primarily reflects reduced educational use rather than other seasonal factors
Invoked to interpret cross-area variation in dropoff as variation in school-year AI use.

pith-pipeline@v0.9.0 · 5658 in / 1140 out tokens · 60389 ms · 2026-05-19T15:18:04.911567+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the dropoff in ChatGPT activity during non-school summer months in 2023 and 2024 is used to identify areas with heavy educational AI use and thus estimate the educational impact of AI as it is actually used. I find no meaningful impact of AI usage on high school test score averages in either direction.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

2 extracted references · 2 canonical work pages

[1]

ChatGPT as a cognitive crutch: Evidence from a randomized controlled trial on knowledge retention

ACT, Inc. (2026).Data and Visualization - ACT Research. Accessed February 16 2026.url: https://www.act.org/content/act/en/research/services-and-resources/data-and-visualizati on.html. Adair, Alexandra et al. (Oct. 2025).U.S. High School Students’ Use of Generative Artificial Intelligence: New Evidence from High School Students, Parents, and Educators. Res...

work page doi:10.1016/j.ssaho.2025.102287.url: 2026
[2]

Logical equivalences, homomorphism indistinguishability, and forbidden minors

38 Daepp, Madeleine IG and Scott Counts (2025). “The emerging generative artificial intelligence divide in the United States”. In:Proceedings of the International AAAI Conference on Web and Social Media. Vol. 19, pp. 443–456. De Simone, Martín et al. (May 2025).From Chalkboards to Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Ni...

work page doi:10.1016/j.compedu.2024.105224.url: 2025

[1] [1]

ChatGPT as a cognitive crutch: Evidence from a randomized controlled trial on knowledge retention

ACT, Inc. (2026).Data and Visualization - ACT Research. Accessed February 16 2026.url: https://www.act.org/content/act/en/research/services-and-resources/data-and-visualizati on.html. Adair, Alexandra et al. (Oct. 2025).U.S. High School Students’ Use of Generative Artificial Intelligence: New Evidence from High School Students, Parents, and Educators. Res...

work page doi:10.1016/j.ssaho.2025.102287.url: 2026

[2] [2]

Logical equivalences, homomorphism indistinguishability, and forbidden minors

38 Daepp, Madeleine IG and Scott Counts (2025). “The emerging generative artificial intelligence divide in the United States”. In:Proceedings of the International AAAI Conference on Web and Social Media. Vol. 19, pp. 443–456. De Simone, Martín et al. (May 2025).From Chalkboards to Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Ni...

work page doi:10.1016/j.compedu.2024.105224.url: 2025