Stepping OCaml
Pith reviewed 2026-05-25 14:21 UTC · model grok-4.3
The pith
A stepper for practical OCaml can display every reduction step by tracking evaluation contexts even when exceptions and printing occur.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present a stepper for a practical fragment of OCaml. Similarly to the DrRacket stepper, we keep track of evaluation contexts in order to reconstruct the whole program at each reduction step. The difference is that we support effectful constructs, such as exception handling and printing primitives, allowing the stepper to assist a wider range of users. In this paper, we describe the implementation of the stepper, share the feedback from our students, and show an attempt at assessing the educational impact of our stepper.
What carries the argument
Evaluation contexts that are tracked during reduction and used to reconstruct the complete program state at every step, extended to preserve information about exceptions and side-effecting primitives.
If this is right
- The stepper works on a practical fragment of OCaml that includes common effectful features.
- Full program states can be shown at each reduction step without losing information about pending handlers or output.
- The tool can be presented to students who are learning a real language rather than only a pedagogical subset.
- Student feedback and an initial assessment of learning impact can be collected and reported.
Where Pith is reading between the lines
- The same context-tracking technique might be reused when building steppers for other languages that mix pure reduction with effects.
- Wider availability of such steppers could change how novices first encounter control flow in languages they will actually use.
- The reported student feedback could be used to design a controlled study that measures changes in error rates or time to understand small programs.
Load-bearing premise
That maintaining evaluation contexts is enough to rebuild correct full-program states even when exceptions are raised or printing occurs.
What would settle it
Run the stepper on a program that raises an exception after a print statement and check whether the displayed state after the exception matches the actual remaining program text and environment.
Figures
read the original abstract
Steppers, which display all the reduction steps of a given program, are a novice-friendly tool for understanding program behavior. Unfortunately, steppers are not as popular as they ought to be; indeed, the tool is only available in the pedagogical languages of the DrRacket programming environment. We present a stepper for a practical fragment of OCaml. Similarly to the DrRacket stepper, we keep track of evaluation contexts in order to reconstruct the whole program at each reduction step. The difference is that we support effectful constructs, such as exception handling and printing primitives, allowing the stepper to assist a wider range of users. In this paper, we describe the implementation of the stepper, share the feedback from our students, and show an attempt at assessing the educational impact of our stepper.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents a stepper for a practical fragment of OCaml that tracks evaluation contexts to reconstruct the full program state at each reduction step. Unlike the DrRacket stepper, it supports effectful constructs including exception handling and printing primitives. The manuscript describes the implementation, reports student feedback, and includes an attempt to assess educational impact.
Significance. If the implementation correctly reconstructs program state for effects, the work would extend stepper tools beyond purely functional pedagogical languages to a practical fragment of OCaml, potentially broadening their use in education. The student feedback offers anecdotal support, but the absence of formal semantics, machine-checked proofs, or quantitative assessment data limits the demonstrated impact.
major comments (2)
- [Implementation] Implementation description (no section number given in abstract): the claim that context tracking suffices for effectful constructs (exceptions, printing) is not supported by a formal small-step semantics for the supported fragment or a machine-checked correspondence to OCaml's actual reduction behavior, leaving handler search, exception values, and side-effect reconstruction unverified.
- [Student feedback and assessment] Assessment of educational impact (no section number given): the reported student feedback and 'attempt at assessing' supply no quantitative metrics, controlled experiments, or statistical results, so the claim of educational value rests on unquantified anecdotes.
minor comments (1)
- The abstract states the approach but supplies no implementation details, correctness arguments, or quantitative results, making the central claim difficult to evaluate from the provided text.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on our manuscript describing the OCaml stepper. We respond to each major comment below, clarifying the scope of our implementation-focused contribution and the qualitative nature of the reported feedback.
read point-by-point responses
-
Referee: [Implementation] Implementation description (no section number given in abstract): the claim that context tracking suffices for effectful constructs (exceptions, printing) is not supported by a formal small-step semantics for the supported fragment or a machine-checked correspondence to OCaml's actual reduction behavior, leaving handler search, exception values, and side-effect reconstruction unverified.
Authors: We agree that the manuscript provides no formal small-step semantics or machine-checked proofs. The implementation extends context tracking (as in DrRacket) with explicit rules for effect reconstruction: exceptions are handled by searching the context for matching handlers and unwinding the stack accordingly, while printing primitives are stepped by recording output in a separate log that is replayed during reconstruction. These mechanisms were validated through extensive testing against the OCaml interpreter on programs exercising exceptions, handlers, and side effects. The paper's focus is the practical tool and its educational use rather than formal verification; adding a full semantics would be a substantial separate effort outside the current scope. revision: no
-
Referee: [Student feedback and assessment] Assessment of educational impact (no section number given): the reported student feedback and 'attempt at assessing' supply no quantitative metrics, controlled experiments, or statistical results, so the claim of educational value rests on unquantified anecdotes.
Authors: We concur that the assessment consists of qualitative student feedback and an initial attempt without quantitative metrics, controlled experiments, or statistical analysis. The reported comments illustrate how the stepper helped students visualize evaluation order and effects in OCaml, which is the extent of the evidence presented. A rigorous quantitative study lies beyond the resources available for this work and is noted as future research; the manuscript shares the tool and observed educational experiences rather than claiming statistically validated impact. revision: no
Circularity Check
No circularity: implementation description without derivational claims
full rationale
The paper describes the implementation of an OCaml stepper that tracks evaluation contexts to support effectful constructs. No equations, fitted parameters, predictions of related quantities, or load-bearing self-citations appear in the provided text. The central contribution is the existence of the described tool and student feedback, which does not reduce to any input by construction. This is a standard self-contained engineering paper; the derivation chain is absent, so no circularity patterns apply.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption OCaml reduction semantics can be reconstructed from evaluation contexts for both pure and effectful constructs
Reference graph
Works this paper leans on
-
[1]
In: European symposium on programming , Springer, pp
John Clements, Matthew Flatt & Matthias Felleisen (2001 ): Modeling an algebraic stepper . In: European symposium on programming , Springer, pp. 320–334, doi: 10.1007/3-540-45309-1_21
-
[2]
In: Proceedings of the 1990 ACM Conference on Lisp and Functional Programming , LFP ’90, pp
Olivier Danvy & Andrzej Filinski (1990): Abstracting Control. In: Proceedings of the 1990 ACM Conference on Lisp and Functional Programming , LFP ’90, pp. 151–160, doi: 10.1145/91556.91622
- [3]
-
[4]
Friedman (1986): Control operators, the SECD-machine, and the λ -calculus
Matthias Felleisen & Daniel P . Friedman (1986): Control operators, the SECD-machine, and the λ -calculus. In M. Wirsing, editor: Formal Description of Programming Concepts III , Elsevier, pp. 193–219. Available at https://cs.indiana.edu/ftp/techreports/TR197.pdf
work page 1986
-
[5]
Casey Klein, John Clements, Christos Dimoulas, Carl Eas tlund, Matthias Felleisen, Matthew Flatt, Jay A. McCarthy, Jon Rafkind, Sam Tobin-Hochstadt & Robert Bruce F indler (2012): Run Y our Research: On the Effectiveness of Lightweight Mechanization . In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , ...
-
[6]
Preston Tunnell Wilson, Kathi Fisler & Shriram Krishnam urthi (2018): Evaluating the Tracing of Recursion in the Substitution Notional Machine . In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE ’18, ACM, pp. 1023–1028, doi: 10.1145/3159450.3159479. A Appendix 2017 2018 week all step. try mod. print ref all step. tr...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.