pith. sign in

arxiv: 2606.23315 · v1 · pith:5G6YCJOUnew · submitted 2026-06-22 · 💻 cs.CY

Test-Driven, AI-Assisted Learning: Replacing Lectures with Weekly Closed-Book Tests

Pith reviewed 2026-06-26 05:53 UTC · model grok-4.3

classification 💻 cs.CY
keywords test-driven learningAI-assisted educationclosed-book testingcourse redesignself-directed learningtheory of computationeducational technology
0
0 comments X

The pith

Weekly closed-book tests paired with AI-assisted self-study can replace lectures while keeping students accountable in an upper-level theory course.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper reports on a 13-week redesign of a theory of computation course that eliminated lectures and substituted self-directed study supported by AI tools plus weekly independently completed closed-book tests. The authors argue that two supporting conditions made the strict testing gate workable: students received aligned learning sheets and practice materials so the tests felt fair, and the instructor used an AI-assisted harness to scale the weekly cycle of drafting, reviewing, and grading. Survey responses from 18 students and records from the course git history indicate that participants treated the tests as useful accountability measures rather than arbitrary hurdles. The paper presents the overall structure as a reusable design pattern for other instructors and releases the harness as a public template.

Core claim

In DSAA 3071, replacing lectures with self-directed AI-assisted learning and frequent closed-book tests produced a workable high-frequency quality gate; students viewed the tests as useful accountability when given a visible preparation path, and an AI-assisted materials harness made the weekly production and grading cycle operationally feasible for the instructor.

What carries the argument

The AI-assisted materials harness, a version-controlled agent workspace that supports drafting, review, test production, and grading with human oversight.

Load-bearing premise

That results observed in one small course without a control group show the design pattern is reusable and effective for other instructors and different courses.

What would settle it

A controlled comparison in a second course where students using the TDAA pattern show no measurable gain in accountability or learning outcomes relative to a lecture-based section would undermine the reusability claim.

Figures

Figures reproduced from arXiv: 2606.23315 by Jin-Guo Liu, Long-Li Zheng, Shang-Qi Lu, Wei Wang, Xin-Ran Shi.

Figure 1
Figure 1. Figure 1: The TDAA learning model. Panel A: the weekly classroom cycle, where guided preparation feeds a strict closed-book [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Student survey overview (𝑁 = 18). The first four pies use the same seven-point agreement scale (1 = strongly disagree, 7 = strongly agree); the last two show the deeper-vs-faster comparison with a traditional lecture (Q12) and perceived test difficulty (Q26). Percentages are rounded from counts out of 18. W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 week 40 50 60 70 80 90 100 110 120 130 raw score cap 100 min–ma… view at source ↗
Figure 3
Figure 3. Figure 3: Weekly test score distribution (raw points out of 130; dashed line marks the 100-point credit cap; weekly [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
read the original abstract

This paper is an experience report on a 13-week Test-Driven, AI-Assisted (TDAA) redesign of DSAA 3071, Theory of Computation, an upper-level course at the Hong Kong University of Science and Technology (Guangzhou). The design is simple: the course replaces lectures with self-directed, AI-assisted learning, and frequent, independently completed tests create a high-frequency quality gate. AI agents help the instructor prepare the learning path, course website, tests, grading workflow, and repairs. Two conditions made this strict gate workable. Students needed a visible preparation path of learning sheets and aligned validation practice, so the closed-book tests felt fair rather than arbitrary. The instructor needed an AI-assisted materials harness, a version-controlled agent workspace, so that weekly drafting, review, test production, and grading could scale with human oversight. Evidence from a student survey ($N=18$), weekly scores, and the project's git history suggests that students treated the tests as useful accountability and that the harness made frequent closed-book testing operational. The evidence is limited to one small, proof-heavy course without a control group. The contribution is therefore a reusable design pattern: high-frequency tests preserve individual accountability, while AI agents make material production and marking scalable. We release the harness as a public starter template so that other instructors can reproduce it.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript is an experience report on the 13-week redesign of DSAA 3071 (Theory of Computation) at HKUST Guangzhou. Lectures are replaced by self-directed, AI-assisted learning with weekly closed-book tests as high-frequency quality gates. An AI-assisted materials harness supports scalable preparation, test production, grading, and repairs. Evidence from an N=18 student survey, weekly scores, and git history is cited to indicate that students viewed the tests as useful accountability mechanisms and that the harness made frequent testing operational. The paper frames the contribution as a reusable design pattern and releases the harness publicly as a starter template.

Significance. If the TDAA pattern can be shown to transfer, it would provide a concrete, scalable model for preserving individual accountability in self-paced courses while using AI to manage material production and assessment workload. The explicit public release of the harness as a version-controlled starter template is a concrete strength that enables direct replication attempts and incremental refinement by other instructors.

major comments (1)
  1. [Abstract] Abstract: The headline claim that the TDAA approach supplies a 'reusable design pattern' that 'other instructors can adopt' is load-bearing for the contribution, yet rests exclusively on observational data from one 13-week course with N=18 students, no control group, and no second implementation. The manuscript itself notes the single-instance limitation, but the transferability assertion therefore lacks direct empirical support.
minor comments (1)
  1. [Abstract] Abstract: A one-sentence parenthetical gloss on the components of the 'AI-assisted materials harness' would improve immediate readability for readers outside the immediate subfield.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for highlighting the need to calibrate the abstract's framing of transferability. We address the comment below and propose a targeted revision.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim that the TDAA approach supplies a 'reusable design pattern' that 'other instructors can adopt' is load-bearing for the contribution, yet rests exclusively on observational data from one 13-week course with N=18 students, no control group, and no second implementation. The manuscript itself notes the single-instance limitation, but the transferability assertion therefore lacks direct empirical support.

    Authors: We agree that the paper is a single-instance experience report and provides no direct empirical evidence of transfer to other courses or instructors. The abstract already states the limitation explicitly ('The evidence is limited to one small, proof-heavy course without a control group'). We will revise the abstract to describe the TDAA approach as a proposed design pattern derived from this implementation, with the public harness release intended to enable other instructors to test and refine it, rather than asserting that reusability has been empirically demonstrated. revision: yes

Circularity Check

0 steps flagged

No circularity: experience report with direct empirical artifacts, no derivations or self-referential reductions

full rationale

The paper is a qualitative experience report on a single course redesign. It presents student survey responses (N=18), weekly scores, and git history as direct evidence. No equations, fitted parameters, model predictions, or mathematical derivations appear. No self-citations are invoked as load-bearing premises, and the contribution is framed explicitly as a reusable pattern supported by the reported single-instance data rather than reduced to prior author work or definitional equivalence. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The report depends on the domain assumption that self-directed learning with aligned practice materials is viable for theory of computation content, and introduces the harness as a practical innovation whose effectiveness is shown only within this single instance.

axioms (1)
  • domain assumption Students can effectively learn theory of computation content through self-directed study supported by AI assistance and aligned validation practice without traditional lectures.
    This premise is required for the decision to replace lectures entirely while maintaining learning outcomes.
invented entities (1)
  • AI-assisted materials harness no independent evidence
    purpose: To scale weekly drafting, test production, grading, and repairs with version control and human oversight.
    The harness is presented as a new practical tool developed for this course to make frequent testing operational.

pith-pipeline@v0.9.1-grok · 5781 in / 1461 out tokens · 40442 ms · 2026-06-26T05:53:34.481203+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 17 canonical work pages

  1. [1]

    Adesope, Dominic A

    Olusola O. Adesope, Dominic A. Trevisan, and Narayankripa Sundararajan. 2017. Rethinking the Use of Tests: A Meta-Analysis of Practice Testing.Review of Educational Research87, 3 (2017), 659–701. doi:10.3102/0034654316689306

  2. [2]

    Anthropic. 2026. Claude Code overview. https://code.claude.com/docs/en/ overview Accessed 2026-05-08

  3. [3]

    John Biggs. 1996. Enhancing Teaching through Constructive Alignment.Higher Education32, 3 (1996), 347–364. doi:10.1007/BF00138871

  4. [4]

    Bishop and Matthew A

    Jacob L. Bishop and Matthew A. Verleger. 2013. The Flipped Classroom: A Survey of the Research. In2013 ASEE Annual Conference & Exposition. ASEE Conferences, Atlanta, Georgia, 23.1200.1–23.1200.18. doi:10.18260/1-2--22585

  5. [5]

    Benjamin S. Bloom. 1984. The 2 Sigma Problem: The Search for Methods of Group Instruction as Effective as One-to-One Tutoring.Educational Researcher 13, 6 (1984), 4–16. doi:10.3102/0013189X013006004

  6. [6]

    Jennifer Campbell, Andrew Petersen, and Jacqueline Smith. 2019. Self-Paced Mastery Learning CS1. InProceedings of the 50th ACM Technical Symposium on Computer Science Education. Association for Computing Machinery, 955–961. doi:10.1145/3287324.3287481

  7. [7]

    Debby R. E. Cotton, Peter A. Cotton, and J. Reuben Shipway. 2024. Chatting and cheating: Ensuring academic integrity in the era of ChatGPT.Innovations in Education and Teaching International61, 2 (2024), 228–239. doi:10.1080/14703297. 2023.2190148

  8. [8]

    Crouch and Eric Mazur

    Catherine H. Crouch and Eric Mazur. 2001. Peer Instruction: Ten years of experience and results.American Journal of Physics69, 9 (2001), 970–977. doi:10. 1119/1.1374249

  9. [9]

    Cursor. 2026. Cursor Docs. https://cursor.com/docs Accessed 2026-05-09

  10. [10]

    Seth DeVore, Emily Marshman, and Chandralekha Singh. 2017. Challenge of Engaging All Students via Self-Paced Interactive Electronic Learning Tutorials for Introductory Physics.Physical Review Physics Education Research13, 1 (2017), 010127. doi:10.1103/PhysRevPhysEducRes.13.010127

  11. [11]

    L., McDonough, M., Smith, M

    Scott Freeman, Sarah L. Eddy, Miles McDonough, Michelle K. Smith, Nnadozie Okoroafor, Hannah Jordt, and Mary Pat Wenderoth. 2014. Active learn- ing increases student performance in science, engineering, and mathemat- ics.Proceedings of the National Academy of Sciences111, 23 (2014), 8410–8415. doi:10.1073/pnas.1319030111

  12. [12]

    Git Project. 2026. Git. https://git-scm.com/ Accessed 2026-05-09

  13. [13]

    ChatGPT for good? On opportunities and challenges of large language models for education,

    Enkelejda Kasneci, Kathrin Sessler, Stefan Kuechemann, Maria Bannert, Daryna Dementieva, Frank Fischer, Urs Gasser, Georg Groh, Stephan Guennemann, and 2Reusable harness and companion artifact: https://github.com/GiggleLiu/TDAA-Go. Test-Driven, AI-Assisted Learning: Replacing Lectures with Weekly Closed-Book Tests Eyke Huellermeier. 2023. ChatGPT for good...

  14. [14]

    Lage, Glenn J

    Maureen J. Lage, Glenn J. Platt, and Michael Treglia. 2000. Inverting the Class- room: A Gateway to Creating an Inclusive Learning Environment.The Journal of Economic Education31, 1 (2000), 30–43. doi:10.1080/00220480009596759

  15. [15]

    Jin-Guo Liu. 2026. Agentic Production Harnesses for Technical Document Gen- eration. Manuscript in preparation

  16. [16]

    Mollick and Lilach Mollick

    Ethan R. Mollick and Lilach Mollick. 2024. Instructors as Innovators: A Future- Focused Approach to New AI Learning Opportunities, with Prompts.SSRN preprint(2024). doi:10.2139/ssrn.4802463

  17. [17]

    OpenAI. 2026. Codex: AI Coding Partner from OpenAI. https://openai.com/ codex/ Accessed 2026-05-09

  18. [18]

    Seth Poulsen, Yael Gertner, Benjamin Cosman, Matthew West, and Geoffrey L. Herman. 2023. Efficiency of Learning from Proof Blocks Versus Writing Proofs. In Proceedings of the 54th ACM Technical Symposium on Computer Science Education. Association for Computing Machinery, 156–162. doi:10.1145/3545945.3569797

  19. [19]

    & Schaper, E

    Henry L. Roediger and Jeffrey D. Karpicke. 2006. Test-Enhanced Learning: Taking Memory Tests Improves Long-Term Retention.Psychological Science17, 3 (2006), 249–255. doi:10.1111/j.1467-9280.2006.01693.x

  20. [20]

    2013.Introduction to the Theory of Computation(3 ed.)

    Michael Sipser. 2013.Introduction to the Theory of Computation(3 ed.). Cen- gage Learning. https://www.cengage.com/c/introduction-to-the-theory-of- computation-3e-sipser/9781133187790/

  21. [21]

    Ziqi Tan, Yingbin Zhang, and Su Mu. 2024. Error Tolerance in Automatic Short Answer Grading with Large Language Models: The Case of Handwriting Recog- nition Errors. In2024 IEEE International Conference on Consumer Electronics. doi:10.58459/icce.2024.4870

  22. [22]

    Typst Project. 2026. Typst. https://typst.app Accessed 2026-05-08

  23. [23]

    Tamara van Gog, Fred Paas, and John Sweller. 2010. Cognitive Load Theory: Advances in Research on Worked Examples, Animations, and Cognitive Load Measurement.Educational Psychology Review22, 4 (2010), 375–378. doi:10.1007/ s10648-010-9145-4

  24. [24]

    Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. 2024. A Survey on Large Language Model Based Autonomous Agents.Frontiers of Computer Science18, 6 (2024), 186345. doi:10.1007/s11704- 024-40231-1

  25. [25]

    Xiaojing Weng, Qian Xia, Meixun Gu, Kumaran Rajaram, and Thomas K. F. Chiu

  26. [26]

    doi:10.14742/ajet.9540

    Assessment and Learning Outcomes for Generative AI in Higher Education: A Scoping Review on Current Research Status and Trends.Australasian Journal of Educational Technology40, 6 (2024), 37–55. doi:10.14742/ajet.9540

  27. [27]

    Da-Wei Zhang, Melissa Boey, Yan Yu Tan, and Alexis Hoh Sheng Jia. 2024. Evaluating Large Language Models for Criterion-Based Grading from Agreement to Consistency.npj Science of Learning9 (2024), 79. doi:10.1038/s41539-024- 00291-1