Learning Critical Testing Literacy Through Puzzles: an Experience Report
Pith reviewed 2026-06-26 16:27 UTC · model grok-4.3
The pith
The full sequence of solving puzzles, debriefing, and reflecting teaches critical testing literacy rather than the puzzles alone.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Puzzles are not the intervention: the entire sequence of solving, debriefing, and reflecting is. This was observed in workshops where participants framed problems, navigated dead ends, and shifted strategies, captured under the theme of sensemaking and reflection-in-action.
What carries the argument
The sequence of solving, debriefing, and reflecting in puzzle-based learning activities for critical testing literacy.
If this is right
- Workshops need deliberate design of the full sequence including debriefing and reflection.
- Think-aloud protocols provide better data on immediate reasoning than written reflections alone.
- Students and professionals show different patterns of exploration during puzzle solving.
- The approach can be supported by custom web applications with analytics for workshop customization.
Where Pith is reading between the lines
- Extending the sequence design could improve teaching of other technical skills that require critical thinking.
- Collecting more data through the analytics app might reveal patterns in how different groups develop testing literacy.
- Visible emotions during sessions suggest that incorporating affective feedback could enhance the reflection process.
Load-bearing premise
The semi-structured observations, think-aloud protocols, and written reflections from the workshops reliably show development of critical testing literacy without bias from who chose to participate or the role of the facilitators.
What would settle it
A study that measures critical testing literacy after puzzle solving alone versus after the full sequence of solving, debriefing, and reflecting, and finds no difference between the two, would falsify the claim.
Figures
read the original abstract
In this paper, we report our experiences and takeaways from workshops using puzzles to learn CTL. Background: Software testing is important yet difficult to teach. We introduced a BoK of puzzle-based learning activities to teach CTL, based on a model of critical tester's cognition, leading to the pedagogical framework P4TEST. We conducted thirteen workshops with students, testers, teachers, and primary school pupils to assess puzzle-based teaching of critical testing literacy. Experience: Across eleven workshops, we used a semi-structured approach, varying puzzles, materials, and timing. In two additional workshops, we introduced workbooks and think-aloud sessions to gather more data on the learning experience. Observations: Participants consistently perceived themselves as experimenting while solving puzzles. Students tended to converge on solutions, while professionals continued exploring. Emotions were visible in behaviour but hard to surface through written reflection alone. Think-aloud sessions revealed immediate reasoning; written reflections elicited more meta-cognitive reflection. The theme Sensemaking / reflection-in-action captured how participants framed problems, navigated dead ends, and shifted strategies. Reflections: Puzzles are not the intervention: the entire sequence of solving, debriefing, and reflecting is. Designing that sequence more deliberately is the work ahead. We also developed an open-source web application with built-in analytics to customise workshops.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is an experience report on 13 workshops that use puzzles to teach critical testing literacy (CTL) via the P4TEST pedagogical framework derived from a model of critical tester cognition. It describes a semi-structured approach across eleven workshops plus two with workbooks and think-aloud protocols, reports observations on participant behaviors (e.g., convergence vs. exploration, sensemaking/reflection-in-action), and concludes that the full sequence of solving, debriefing, and reflecting—not puzzles alone—is the intervention; an open-source web app with analytics is also presented.
Significance. If the methodological gaps are closed, the report could usefully illustrate how structured reflection sequences support CTL development in software engineering education and provide a reusable open-source tool for practitioners. The distinction between puzzle-solving and the full pedagogical sequence is a potentially actionable insight for experiential teaching designs.
major comments (2)
- [Observations] Observations section: the themes (Sensemaking / reflection-in-action) are presented as emerging from semi-structured observations, think-aloud protocols, and written reflections, yet no description is given of the analysis process, coding scheme, inter-rater reliability, or steps taken to address observer/facilitator bias.
- [Reflections] Reflections section: the central claim that 'Puzzles are not the intervention: the entire sequence of solving, debriefing, and reflecting is' is load-bearing for the paper's takeaway, but rests on volunteer participants without reported comparison conditions, blinding, or controls that would separate sequence effects from self-selection or cuing during debriefing.
minor comments (2)
- [Experience] Experience section: the exact participant counts, demographics, and precise variations in puzzle timing/materials across the thirteen workshops are not quantified, limiting reproducibility of the reported approach.
- [Background] The abstract and text use 'CTL' and 'critical testing literacy' without an explicit early definition or reference to the underlying cognition model, which would aid readers unfamiliar with the framework.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback. We address each major comment below, indicating revisions where appropriate to strengthen the experience report while respecting its observational nature.
read point-by-point responses
-
Referee: Observations section: the themes (Sensemaking / reflection-in-action) are presented as emerging from semi-structured observations, think-aloud protocols, and written reflections, yet no description is given of the analysis process, coding scheme, inter-rater reliability, or steps taken to address observer/facilitator bias.
Authors: We agree that the analysis process requires explicit description. In the revised version, we will insert a dedicated paragraph under the Observations section explaining that themes were identified through iterative author discussion of facilitator notes, think-aloud transcripts, and written reflections; no formal coding scheme or inter-rater reliability metrics were applied because this is an experience report rather than a systematic qualitative study. We will also note the use of multiple data sources and post-session debriefs among facilitators as steps taken to reduce individual bias. revision: yes
-
Referee: Reflections section: the central claim that 'Puzzles are not the intervention: the entire sequence of solving, debriefing, and reflecting is' is load-bearing for the paper's takeaway, but rests on volunteer participants without reported comparison conditions, blinding, or controls that would separate sequence effects from self-selection or cuing during debriefing.
Authors: We accept that the claim cannot be supported by controlled evidence and will revise the Reflections section to present it explicitly as an observational takeaway drawn from consistent patterns across the 13 workshops rather than a causal finding. The revision will add a limitations paragraph acknowledging the volunteer sample, lack of comparison conditions, and potential cuing effects, while preserving the practical insight that the full sequence appeared necessary in our sessions. We will also suggest controlled follow-up studies. revision: partial
Circularity Check
No circularity: qualitative experience report with no derivations or fitted predictions
full rationale
The paper is an experience report on 13 workshops using puzzles for critical testing literacy. It presents no equations, parameters, predictions, or first-principles derivations. The central takeaway—that the full sequence of solving + debriefing + reflecting constitutes the intervention—is stated as an observation drawn from semi-structured data collection, not derived from or equivalent to any input by construction. No self-citation load-bearing steps, uniqueness theorems, or ansatzes appear. The report is self-contained against external benchmarks as direct qualitative description.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Niels Doorn, Tanja E. J. Vos, and Beatriz Marín. Towards understanding students’ sensemaking of test case design. Data & Knowledge Engineer- ing, 146:102199, 2023. ISSN 0169-023X. doi: https://doi.org/10.1016/j.datak.2023.102199. URL https://www.sciencedirect.com/science/article/pii/ S0169023X23000599
-
[2]
Niels Doorn, Tanja E. J. Vos, Beatriz Marín, and Migchiel van Diggelen. Puzzle-based learning for developing software testing skills. In Proceedings of the 2025 29th International Conference on Evaluation and Assessment in Software Engineering Companion, EASE Companion ’25, pages 202– 212, New York, NY, USA, 2025. Association for Computing Machinery. ISBN...
-
[3]
Tanja E. J. Vos, Bart Th. Knaack, Beatriz Marín, Niels Doorn, and Nikè van Vugt-Hagè. Teaching testing seriously in academia. ENASE, 2026
2026
-
[4]
Teaching puzzle-based learning: Development of basic concepts
Nickolas Falkner, Raja Sooriamurthi, and Zbigniew Michalewicz. Teaching puzzle-based learning: Development of basic concepts. Teaching Math- ematics and Computer Science , 10, 06 2012. doi: 10.5485/TMCS.2012.0303
-
[5]
The nature of insight
The nature of insight. The nature of insight. , pages xviii, 618–xviii, 618, 1995. ISSN 0-262-19345-0 (Hardcover)
1995
-
[6]
Puzzle-based testing — web appli- cation, April 2026
Niels Doorn. Puzzle-based testing — web appli- cation, April 2026. URL https://doi.org/10.5281/ zenodo.19426383
2026
-
[7]
Cyclopedia of 5000 Puzzles, Tricks, and Conundrums
Sam Loyd. Cyclopedia of 5000 Puzzles, Tricks, and Conundrums . The Lamb Publishing Com- pany, 1914. URL https://archive.org/details/ CyclopediaOfPuzzlesLoyd. Accessed: 2024-02-01
1914
-
[8]
James L. Adams. Conceptual Blockbusting: A Guide to Better Ideas . Perseus Publishing, Cambridge, MA, 4th edition, 2001. ISBN 978-0738205373
2001
-
[9]
Lateral Thinking: Creativity Step by Step
Edward de Bono. Lateral Thinking: Creativity Step by Step . Harper & Row, New York, 1970
1970
-
[10]
Leroy F. Meyers and Richard See. The census-taker problem. Mathematics Magazine , 63(2):86–88, 1990. ISSN 0025570X, 19300980. URL http://www.jstor. org/stable/2691063
arXiv 1990
-
[11]
Meyer III, Nickolas Falkner, Raja Soo- riamurthi, and Zbigniew Michalewicz
Edwin F. Meyer III, Nickolas Falkner, Raja Soo- riamurthi, and Zbigniew Michalewicz. Guide to Teaching Puzzle-based Learning. Undergraduate Topics in Computer Science. Springer London, London, 1 edition, 2014. ISBN 978-1-4471-6475-3. doi: 10.1007/978-1-4471-6476-0
-
[12]
Experiential Learning: Experience as the Source of Learning and Development
David A Kolb. Experiential Learning: Experience as the Source of Learning and Development . Prentice Hall, Englewood Cliffs, NJ, 1984
1984
-
[13]
The effect of task complexity and sequence on rule learning and problem solving
John Sweller. The effect of task complexity and sequence on rule learning and problem solving. British Journal of Psychology , 67(4):553–558, 1976. doi: 10.1111/j.2044-8295.1976.tb01546.x
-
[14]
Cohen, and Kenneth R
Nan Li, William W. Cohen, and Kenneth R. Koedinger. Problem order implications for learning transfer. In Stefano A. Cerri, William J. Clancey, Giorgos Papadourakis, and Kitty Panourgia, editors, Intelligent Tutoring Systems , pages 185– 194, Berlin, Heidelberg, 2012. Springer Berlin Heidelberg. ISBN 978-3-642-30950-2
2012
-
[15]
Niels Doorn, Bart Knaack, Tanja E. J. Vos, and Beatriz Marín. Research data: Puzzle-based learning of critical testing competences, 2026. URL https: //doi.org/10.5281/zenodo.19278865
-
[16]
Rips, and Kenneth Rasinski
Roger Tourangeau, Lance J. Rips, and Kenneth Rasinski. The Psychology of Survey Response . Cambridge University Press, Cambridge, 2000
2000
-
[17]
Krosnick and Stanley Presser
Jon A. Krosnick and Stanley Presser. Question and questionnaire design. In Peter V. Marsden and James D. Wright, editors, Handbook of Survey Re- search, pages 263–314. Emerald Group Publishing, 2 edition, 2010
2010
-
[18]
Dillman, Jolene D
Don A. Dillman, Jolene D. Smyth, and Leah Melani Christian. Internet, Phone, Mail, and Mixed-Mode Surveys: The Tailored Design Method . Wiley, Hoboken, NJ, 4 edition, 2014
2014
-
[19]
Groves, Floyd J
Robert M. Groves, Floyd J. Fowler, Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau. Survey Methodology. Wiley, Hoboken, NJ, 2 edition, 2009. Comm. Dis. Emot. Exper. Expt. KnowC. SenRe. 0 5 10 15 20 25 % of codes All puzzles combined All participants (a) Relative code distribution across all five puzzles and all par- ticipants. Sen...
2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.