Little Impact of ChatGPT Availability on High School Student Test Score Performance
Pith reviewed 2026-05-19 15:18 UTC · model grok-4.3
The pith
ChatGPT availability had no meaningful impact on high school test scores.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The dropoff in ChatGPT activity during summer months serves to mark regions where students rely on the tool for schoolwork, and variation in that dropoff produces no meaningful difference in average high school test scores.
What carries the argument
Summer dropoff in ChatGPT activity used to identify intensity of educational AI use across geographic areas.
If this is right
- Any student use of AI to skip learning either has little effect on tested performance or is offset by beneficial uses in the aggregate.
- Aggregate high school test outcomes remain largely unchanged despite widespread AI availability.
- Net effects of real-world AI adoption on measured achievement appear small for this population and outcome.
Where Pith is reading between the lines
- The same summer-variation approach could be applied to other outcomes such as course grades or college entry metrics.
- If the result holds, school policies focused on AI restrictions may have limited leverage on standardized test performance.
- Subject-by-subject or demographic breakdowns could reveal effects masked in the overall averages.
Load-bearing premise
The summer drop in ChatGPT activity marks places with heavy educational AI use and is not driven by other time-varying factors that also affect test scores.
What would settle it
A detectable rise or fall in test scores in areas showing larger summer drops in ChatGPT activity compared with areas showing smaller drops would contradict the no-impact finding.
read the original abstract
In educational settings, AI can be used as a learning aid, but can also be used to avoid schoolwork, thereby passing classes while learning little. Many existing studies on the impact of AI on education focus on AI use in controlled settings or with specialized tools. In this paper, the dropoff in ChatGPT activity during non-school summer months in 2023 and 2024 is used to identify areas with heavy educational AI use and thus estimate the educational impact of AI as it is actually used. I find no meaningful impact of AI usage on high school test score averages in either direction. These results imply that, to the extent that high school students use AI to avoid learning, it either does not matter much for their test performance or is cancelled out by positive uses of AI in the aggregate.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper uses the dropoff in ChatGPT query activity during non-school summer months in 2023 and 2024 to identify geographic areas with heavy educational AI use by high school students. It then estimates the effect of this usage intensity on average high school test scores and reports no meaningful impact in either direction, concluding that positive and negative uses of AI largely cancel out or are too small to affect aggregate performance.
Significance. If the identification holds, the null result would indicate that real-world student access to generative AI has had little net effect on standardized test outcomes. This provides field evidence complementary to lab studies and could inform education policy by suggesting that fears of widespread learning displacement via AI may be overstated at the aggregate level.
major comments (1)
- The central identification strategy treats the summer decline in ChatGPT activity as a valid proxy for cross-area differences in educational (versus recreational or professional) AI use. This requires that any non-educational seasonal patterns are either uniform across regions or uncorrelated with test-score determinants after controls. Without additional tests or evidence ruling out confounders such as regional differences in family travel, leisure patterns, or summer school resources that could also affect test preparation, the zero-effect finding on high-school test averages remains vulnerable to omitted-variable bias.
minor comments (2)
- The manuscript should report exact sample sizes, data sources for both ChatGPT usage and test-score outcomes, and the precise regression specification (including all controls and fixed effects) to allow readers to assess the power and robustness of the null result.
- Adding a table or appendix with robustness checks for alternative summer-period definitions or geographic subsamples would help address concerns about post-hoc period or geography selection.
Simulated Author's Rebuttal
We thank the referee for their constructive comments on our manuscript. We address the major concern regarding the identification strategy below.
read point-by-point responses
-
Referee: The central identification strategy treats the summer decline in ChatGPT activity as a valid proxy for cross-area differences in educational (versus recreational or professional) AI use. This requires that any non-educational seasonal patterns are either uniform across regions or uncorrelated with test-score determinants after controls. Without additional tests or evidence ruling out confounders such as regional differences in family travel, leisure patterns, or summer school resources that could also affect test preparation, the zero-effect finding on high-school test averages remains vulnerable to omitted-variable bias.
Authors: We agree that the validity of our proxy depends on non-educational seasonal patterns being either uniform or uncorrelated with test-score determinants after controls. The manuscript already incorporates area-level controls for income, education, population density, and urban status, which proxy for many differences in leisure, travel, and summer activities. We have also verified that the summer usage dropoff shows limited correlation with available proxies for tourism and mobility. That said, we acknowledge that these controls may not capture all unobserved regional heterogeneity. In the revision we will add a dedicated robustness subsection with further specifications incorporating state fixed effects, additional economic activity measures, and explicit discussion of summer school and travel patterns as potential confounders. revision: partial
Circularity Check
No circularity: identification uses independent seasonal usage data
full rationale
The paper's core strategy relies on observed summer dropoffs in ChatGPT queries as a proxy for cross-area differences in educational AI intensity. This proxy is constructed from usage data alone and is not fitted to, defined in terms of, or derived from the test-score outcomes. No equations reduce the impact estimate to a tautology or self-citation chain; the variation is treated as exogenous after controls. The analysis is therefore self-contained against external benchmarks and does not exhibit any of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
free parameters (1)
- regression coefficients on AI-use proxy
axioms (1)
- domain assumption Summer dropoff in ChatGPT activity primarily reflects reduced educational use rather than other seasonal factors
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the dropoff in ChatGPT activity during non-school summer months in 2023 and 2024 is used to identify areas with heavy educational AI use and thus estimate the educational impact of AI as it is actually used. I find no meaningful impact of AI usage on high school test score averages in either direction.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
ChatGPT as a cognitive crutch: Evidence from a randomized controlled trial on knowledge retention
ACT, Inc. (2026).Data and Visualization - ACT Research. Accessed February 16 2026.url: https://www.act.org/content/act/en/research/services-and-resources/data-and-visualizati on.html. Adair, Alexandra et al. (Oct. 2025).U.S. High School Students’ Use of Generative Artificial Intelligence: New Evidence from High School Students, Parents, and Educators. Res...
-
[2]
Logical equivalences, homomorphism indistinguishability, and forbidden minors
38 Daepp, Madeleine IG and Scott Counts (2025). “The emerging generative artificial intelligence divide in the United States”. In:Proceedings of the International AAAI Conference on Web and Social Media. Vol. 19, pp. 443–456. De Simone, Martín et al. (May 2025).From Chalkboards to Chatbots: Evaluating the Impact of Generative AI on Learning Outcomes in Ni...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.