pith. sign in

arxiv: 2605.16284 · v1 · pith:UF2KVEH4new · submitted 2026-04-12 · 💻 cs.CY · cs.AI

Measuring Changes in Instructor Class Design and Student Learning After the Release of Large Language Models (LLMs)

Pith reviewed 2026-05-21 00:59 UTC · model grok-4.3

classification 💻 cs.CY cs.AI
keywords generative AIlarge language modelshigher educationstudent learningfaculty perceptionsgrade datamixed methodslearning achievement
0
0 comments X

The pith

This mixed-methods study documents patterns in student and faculty perceptions of LLMs as learning tools by combining surveys with pre- and post-release grade data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines shifts in higher education triggered by widespread student use of generative AI products in classwork. It collects instructor surveys, anonymous student surveys, and historical registrar grade data at one New England university to identify recurring patterns in how LLMs are experienced inside and outside class. A sympathetic reader would care because the results are meant to guide professors and institutions trying to shape policy that still maximizes actual learning. The study treats the grade records as one data stream that can be triangulated with self-reported experiences. It is presented as a pilot that other schools could replicate.

Core claim

The authors establish that student use of GenAI has produced substantial shifts in higher education and that a mixed-methods design using retrospective quantitative analysis of grades together with thematic analysis of faculty and student survey responses can identify and document the resulting patterns in perceptions, study methods, course development, and learning achievement across the pre- and post-LLM eras.

What carries the argument

Triangulation of historical grade data reported to the registrar with thematic analysis of instructor and anonymous student surveys to capture LLM use as a learning tool.

If this is right

  • Documented patterns can directly inform GenAI policies that professors and universities adopt.
  • Insights into altered study methods and class design can help institutions maximize student learning with AI tools available.
  • The approach supplies a replicable template that other universities can use to examine the same phenomenon.
  • Triangulated data offers one way to assess whether overall learning achievement has changed since LLMs appeared.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Departments may need to redesign assessments so that learning gains remain visible even when students have routine access to LLMs.
  • Similar studies at institutions with different student populations or disciplinary mixes could reveal whether the observed patterns generalize.
  • Longer-term tracking of the same cohorts could separate temporary adjustment effects from lasting changes in how students approach coursework.

Load-bearing premise

That changes visible in retrospective grade data can be attributed to LLM use rather than to unrelated shifts in grading policies, course difficulty, or other external factors.

What would settle it

Grade distributions that change at the same time as documented policy or difficulty adjustments but show no corresponding difference between courses where surveys report high versus low LLM use.

Figures

Figures reproduced from arXiv: 2605.16284 by Amanda Potasznik, Daniel Haehn.

Figure 1
Figure 1. Figure 1: Instructor Modification of Courses Due to GenAI A large majority of instructors (thirteen of fifteen respondents, 86.7%) reported modifying elements of their course in response to the availability of generative AI tools, while only two instructors reported making no changes [PITH_FULL_IMAGE:figures/full_fig_p012_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Estimated proportion of assignments changed due to student use of GenAI [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Type of course changes made due to GenAI [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Assignment redesign severity across semesters, averaged for all instructors [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Student LLM-usage underreporting index The remainder of student response analysis plots survey responses in relation to the conceptualization framework for composite categories established in [PITH_FULL_IMAGE:figures/full_fig_p016_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Individual instructor grade trends over time Spring 2023, the first full semester in which students had access to GenAI products is classified in this study as the first semester in the Post-LLM period and is indicated with a dotted vertical line in the above graphs. Comparing the Pre- and Post-LLM periods through the lens of student grades reported to the registrar by participating faculty, the following … view at source ↗
read the original abstract

Student use of Generative AI (GenAI) products in completing their classwork, with or without their professors' knowledge and/or approval, has resulted in substantial shifts in higher education. While GenAI use is widespread, its impact on student study methods, faculty course development, grade reporting, and overall learning is not well documented. This is a mixed-methods, multi-course study using retrospective quantitative analysis, instructor surveys, and anonymous student surveys at a university in the New England region of the United States. This research seeks to identify and document patterns in student and faculty perceptions of, and experiences in, the use of LLMs as a learning tool inside and outside of the university classroom. Alongside quantitative and thematic analysis of both faculty and student survey responses, historical grade data as reported to the university registrar is used to triangulate the phenomenon of learning achievement in pre- and post-LLM eras. It is hoped that this research can serve as a pilot study for a broader set of institutions. Results from this study can inform GenAI policy for professors, universities, and other educational institutions that are trying to maximize student learning in the age of AI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. This mixed-methods study examines the impact of LLMs on higher education by analyzing patterns in student and faculty perceptions and experiences with LLMs as learning tools. It combines retrospective quantitative analysis of historical registrar grade data (pre- and post-2022 LLM release), instructor surveys, and anonymous student surveys at a single New England university, with thematic coding to document changes in class design, study methods, and learning achievement. The work positions itself as a pilot study to inform GenAI policies for maximizing student learning.

Significance. If the central claims hold after addressing controls for confounders, the study would provide useful pilot empirical data on LLM effects in real classroom settings, triangulating survey perceptions with grade trends to highlight shifts in educational practices. The mixed-methods design and focus on a timely topic offer potential value for policy-oriented research in computer science and education.

major comments (1)
  1. [Quantitative Analysis / Historical Grade Data] The section on historical grade data analysis (described in the abstract and methods) uses retrospective registrar records to triangulate learning achievement pre- and post-LLM but provides no indication of controls, matching, or documentation for concurrent changes in grading policies, assignment types, course difficulty, or instructor leniency around the 2022-2023 release window. This leaves open the possibility that observed grade shifts are driven by factors unrelated to LLM use, weakening the attribution central to the triangulation claim.
minor comments (2)
  1. [Abstract] The abstract states that results include quantitative and thematic analysis but does not report sample sizes, statistical methods, error bars, or response rates; adding these details would strengthen the presentation of findings.
  2. [Methods] Clarify the exact definitions and boundaries of the 'pre-LLM' and 'post-LLM' eras used for grade comparisons, including any criteria for selecting courses or instructors.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our pilot study manuscript. We appreciate the emphasis on strengthening the quantitative analysis section and will address the concerns regarding potential confounders in the historical grade data.

read point-by-point responses
  1. Referee: The section on historical grade data analysis (described in the abstract and methods) uses retrospective registrar records to triangulate learning achievement pre- and post-LLM but provides no indication of controls, matching, or documentation for concurrent changes in grading policies, assignment types, course difficulty, or instructor leniency around the 2022-2023 release window. This leaves open the possibility that observed grade shifts are driven by factors unrelated to LLM use, weakening the attribution central to the triangulation claim.

    Authors: We agree with the referee that the current manuscript lacks explicit discussion of controls or documentation for potential concurrent changes in the educational environment. As this is a pilot study at a single institution using available retrospective registrar data, we did not collect or have access to detailed information on changes in grading policies, assignment types, or instructor leniency during the study period. In the revised manuscript, we will update the methods and limitations sections to clearly state these constraints and reframe the grade data analysis as providing observational trends that complement the survey findings, rather than establishing direct causal links to LLM adoption. This revision will better contextualize the triangulation approach and highlight the exploratory nature of the work. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical mixed-methods study relies on external survey and registrar data without self-referential derivations or fitted predictions

full rationale

The paper is a mixed-methods empirical study that collects primary survey responses from instructors and students and analyzes historical registrar grade records to compare pre- and post-LLM periods. No mathematical derivations, model fitting, or parameter estimation steps are described that could reduce to the inputs by construction. The central triangulation uses independently reported external data sources rather than any self-citation chain, ansatz smuggling, or renaming of known results. The analysis remains self-contained against external benchmarks because the quantitative leg draws directly from registrar archives and the qualitative legs from new anonymous surveys, with no load-bearing premise justified solely by prior work from the same authors.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the chosen university and its registrar data are representative enough for a pilot and that survey responses reflect genuine perceptions rather than social desirability bias. No free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Historical grade data reported to the registrar can be used to triangulate learning achievement changes attributable to LLM availability.
    Invoked in the abstract paragraph on quantitative analysis and grade data.

pith-pipeline@v0.9.0 · 5735 in / 1273 out tokens · 22547 ms · 2026-05-21T00:59:47.376390+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 34 canonical work pages · 1 internal anchor

  1. [1]

    Grades as Valid Measures of Academic Achievement of Classroom Learning

    “Grades as Valid Measures of Academic Achievement of Classroom Learning.” The Clearing House: A Journal of Educational Strategies, Issues and Ideas 78 (5): 218–23. https://doi.org/10.3200/TCHS.78.5.218-223. “Attacks at the State Level | AAUP.” n.d. Accessed February 5,

  2. [2]

    Transforming Education: A Comprehensive Review of Generative Artificial Intelligence in Educational Settings through Bibliometric and Content Analysis

    “Transforming Education: A Comprehensive Review of Generative Artificial Intelligence in Educational Settings through Bibliometric and Content Analysis.” Multidisciplinary Digital Publishing Institute, August. https://www.mdpi.com/2071-1050/15/17/12983. Bastani, Hamsa, Osbert Bastani, Alp Sungu, Haosen Ge, Özge Kabakcı, and Rei Mariman

  3. [3]

    Braun, Virginia, and Victoria Clarke

    https://doi.org/10.2139/ssrn.4895486. Braun, Virginia, and Victoria Clarke

  4. [4]

    Using Thematic Analysis in Psychology

    “Using Thematic Analysis in Psychology.” Qualitative Research in Psychology 3 (2): 77–101. https://doi.org/10.1191/1478088706qp063oa. Burns, Mary, Rebecca Winthrop, Natasha Luther, Emma Venetis, and Rida Karim

  5. [5]

    861317 Bytes

    Performance Grades as Measures of Academic Achievement. 861317 Bytes. 861317 Bytes. https://doi.org/10.71889/5FYLANTBAK.29861642. Dabirian, Amir, Christopher Swarat, and Su Swarat

  6. [6]

    Necessary but Not Perfect: Changes in AI Perception at a Large University

    “Necessary but Not Perfect: Changes in AI Perception at a Large University.” IT Professional 27 (4): 13–18. https://doi.org/10.1109/MITP.2025.3585226. De Vaus, David

  7. [7]

    Routledge

    Surveys In Social Research. Routledge. https://doi.org/10.4324/9780203519196. Fisher, Robert J

  8. [8]

    Social Desirability Bias and the Validity of Indirect Questioning

    “Social Desirability Bias and the Validity of Indirect Questioning.” Journal of Consumer Research 20 (2): 303–15. https://doi.org/10.1086/209351. François, Maxime, and Kristof De Witte

  9. [9]

    A Decade of Grade Inflation Boosted by the COVID-19 Pandemic—An Empirical Analysis of a Top European University

    “A Decade of Grade Inflation Boosted by the COVID-19 Pandemic—An Empirical Analysis of a Top European University.” British Educational Research Journal 51 (5): 2271–339. https://doi.org/10.1002/berj.4172. 19TH INTERNATIONAL CONFERENCE ON E-LEARNING & INNOVATIVE PEDAGOGIES Goldhaber, Dan, and Maia Goodman Young

  10. [10]

    Course Grades as a Signal of Student Achievement: Evidence of Grade Inflation before and after COVID-19

    “Course Grades as a Signal of Student Achievement: Evidence of Grade Inflation before and after COVID-19.” Journal of Policy Analysis and Management 43 (4): 1270–82. https://doi.org/10.1002/pam.22618. Harry, Alexandara, and Sayudin Sayudin

  11. [11]

    Role of AI in Education

    “Role of AI in Education.” Interdiciplinary Journal and Hummanity (INJURITY) 2 (3): 260–68. https://doi.org/10.58631/injurity.v2i3.52. Hausman, Naomi, Oren Rigbi, and Sarit Weisburd

  12. [12]

    College Grades and Labor Market Rewards

    “College Grades and Labor Market Rewards.” The Journal of Human Resources 25 (2): 253–66. https://doi.org/10.2307/145756. Kamola, Isaac

  13. [13]

    Karaman, Muhammet Remzi, and I?dris Göksu

    https://doi.org/10.64628/AAI.33x5ungpr. Karaman, Muhammet Remzi, and I?dris Göksu

  14. [14]

    Koebler, Jason

    https://campustechnology.com/Articles/2024/08/28/Survey- 86-of-Students-Already-Use-AI-in-Their-Studies.aspx. Koebler, Jason

  15. [15]

    Your Brain on ChatGPT: Accumulation of Cognitive Debt when Using an AI Assistant for Essay Writing Task

    Your Brain on ChatGPT: Accumulation of Cognitive Debt When Using an AI Assistant for Essay Writing Task. https://doi.org/10.48550/arXiv.2506.08872. Kucuk, Turgay

  16. [16]

    https://doi.org/10.1007/s40979-021-00070-

  17. [17]

    https://doi.org/10.48550/arXiv.2504.13038

    How Large Language Models Are Changing MOOC Essay Answers: A Comparison of Pre- and Post-LLM Responses. https://doi.org/10.48550/arXiv.2504.13038. Livingstone, Victoria

  18. [18]

    Martin Sanz, Noemy, Inés G

    https://time.com/7026050/chatgpt-quit-teaching-ai-essay/. Martin Sanz, Noemy, Inés G. Rodrigo, Cristina Izquierdo García, and Patricia Ajenjo Pastrana

  19. [19]

    Technology and Note-Taking in the Classroom, Boardroom, Hospital Room, and Courtroom

    “Technology and Note-Taking in the Classroom, Boardroom, Hospital Room, and Courtroom.” Trends in Neuroscience and Education, Writing in the digital age, vol. 5 (3): 139–45. https://doi.org/10.1016/j.tine.2016.06.002. Nemoto, Tomoko, and David Beglar

  20. [20]

    Developing Likert-Scale Questionnaires

    “Developing Likert-Scale Questionnaires.” JALT 2013 Conference Proceedings. Newton, Philip M

  21. [21]

    How Common Is Commercial Contract Cheating in Higher Education and Is It Increasing? A Systematic Review

    “How Common Is Commercial Contract Cheating in Higher Education and Is It Increasing? A Systematic Review.” Frontiers in Education 3 (August). https://doi.org/10.3389/feduc.2018.00067. OpenAI Help Center. n.d. “Student Discounts for ChatGPT Plus (US/Canada).” Accessed September 19,

  22. [22]

    Palmer, Kathryn

    https://help.openai.com/en/articles/10968654-student- discounts-for-chatgpt-plus-uscanada. Palmer, Kathryn

  23. [23]

    Paris, Britt, Cynthia Conti-Cook, Daniel Greene, et al

    https://www.insidehighered.com/news/quick-takes/2025/03/05/openai- invests-50m-higher-ed-research. Paris, Britt, Cynthia Conti-Cook, Daniel Greene, et al

  24. [24]

    American Association of University Professors

    Artificial Intelligence and Academic Professions. American Association of University Professors. https://www.aaup.org/sites/default/files/2025-07/TREP-Artificial-Intelligence-and- Academic-Professions.pdf. Pellas, Nikolaos

  25. [25]

    Pink, Daniel

    https://doi.org/10.3390/educsci15010102. Pink, Daniel

  26. [26]

    Sandhu, Jamie

    https://www.washingtonpost.com/opinions/2025/03/03/grade-inflation-why-not/. Sandhu, Jamie

  27. [27]

    Grading Grades as a Measure of Student Learning

    “Grading Grades as a Measure of Student Learning.” SCHOLE: A Journal of Leisure Studies and Recreation Education 33 (2): 87–95. https://doi.org/10.1080/1937156X.2018.1513276. Song, Jiachen, Linan Zhang, Jinglei Yu, Yan Peng, Anyao Ma, and Yu Lu

  28. [28]

    Paving the Way for Novices: How to Teach AI for K-12 Education in China

    “Paving the Way for Novices: How to Teach AI for K-12 Education in China.” Proceedings of the AAAI Conference on Artificial Intelligence 36 (11): 12852–57. https://doi.org/10.1609/aaai.v36i11.21565. Speri, Alice

  29. [29]

    Tillinghast, Jonathan A., Dr

    https://www.theguardian.com/technology/ng-interactive/2026/mar/10/ai-impact- professors-students-learning. Tillinghast, Jonathan A., Dr. James W. Mjelde, and Anna Yeritsyan

  30. [30]

    COVID-19 and Grade Inflation: Analysis of Undergraduate GPAs During the Pandemic

    “COVID-19 and Grade Inflation: Analysis of Undergraduate GPAs During the Pandemic.” Sage Open 13 (4): 21582440231209110. https://doi.org/10.1177/21582440231209110. UMass Boston. n.d. “Inclusion & Belonging - UMass Boston.” Accessed February 12,

  31. [31]

    Computer Graphics Forum 31, 2pt2 (2012), 519–528

    “Latent Ability: Grades and Test Scores Systematically Underestimate the Intellectual Ability of Negatively Stereotyped Students.” Psychological Science 20 (9): 1132–39. https://doi.org/10.1111/j.1467- 9280.2009.02417.x. Whittington, Keith E

  32. [32]

    Foundation Models for Education: Promises and Prospects

    “Foundation Models for Education: Promises and Prospects.” arXiv:2405.10959. Preprint, arXiv, April

  33. [33]

    Foundation Models for Education: Promises and Prospects

    https://doi.org/10.48550/arXiv.2405.10959. Yan, Lixiang, Samuel Greiff, Ziwen Teuber, and Dragan Gašević

  34. [34]

    Promises and Challenges of Generative Artificial Intelligence for Human Learning

    “Promises and Challenges of Generative Artificial Intelligence for Human Learning.” Nature Human Behaviour 8 (10): 1839–50. https://doi.org/10.1038/s41562-024-02004-5. Yin, Robert K