Stepping OCaml

Kenichi Asai (Ochanomizu University); Tsukino Furukawa (Ochanomizu University); Youyou Cong (Ochanomizu University)

arxiv: 1906.11422 · v1 · pith:EAZIKBRXnew · submitted 2019-06-27 · 💻 cs.PL

Stepping OCaml

Tsukino Furukawa (Ochanomizu University) , Youyou Cong (Ochanomizu University) , Kenichi Asai (Ochanomizu University) This is my paper

Pith reviewed 2026-05-25 14:21 UTC · model grok-4.3

classification 💻 cs.PL

keywords stepperOCamlevaluation contextseffectful constructsexception handlingprogramming educationreduction stepsnovice tools

0 comments

The pith

A stepper for practical OCaml can display every reduction step by tracking evaluation contexts even when exceptions and printing occur.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a stepper for a practical fragment of OCaml that shows all reduction steps of a program. It tracks evaluation contexts to rebuild the full program state after each step, extending an earlier approach so that effectful operations such as exception handling and printing primitives are supported. The authors describe how the implementation works, report feedback collected from students who used the tool, and describe an attempt to measure its effect on learning.

Core claim

We present a stepper for a practical fragment of OCaml. Similarly to the DrRacket stepper, we keep track of evaluation contexts in order to reconstruct the whole program at each reduction step. The difference is that we support effectful constructs, such as exception handling and printing primitives, allowing the stepper to assist a wider range of users. In this paper, we describe the implementation of the stepper, share the feedback from our students, and show an attempt at assessing the educational impact of our stepper.

What carries the argument

Evaluation contexts that are tracked during reduction and used to reconstruct the complete program state at every step, extended to preserve information about exceptions and side-effecting primitives.

If this is right

The stepper works on a practical fragment of OCaml that includes common effectful features.
Full program states can be shown at each reduction step without losing information about pending handlers or output.
The tool can be presented to students who are learning a real language rather than only a pedagogical subset.
Student feedback and an initial assessment of learning impact can be collected and reported.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same context-tracking technique might be reused when building steppers for other languages that mix pure reduction with effects.
Wider availability of such steppers could change how novices first encounter control flow in languages they will actually use.
The reported student feedback could be used to design a controlled study that measures changes in error rates or time to understand small programs.

Load-bearing premise

That maintaining evaluation contexts is enough to rebuild correct full-program states even when exceptions are raised or printing occurs.

What would settle it

Run the stepper on a program that raises an exception after a print statement and check whether the displayed state after the exception matches the actual remaining program text and environment.

Figures

Figures reproduced from arXiv: 1906.11422 by Kenichi Asai (Ochanomizu University), Tsukino Furukawa (Ochanomizu University), Youyou Cong (Ochanomizu University).

**Figure 2.** Figure 2: Stepping factorial in OCaml the one provided in DrRacket, as shown in [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Syntax 2.1 Building an Interpreter In Figures 3 and 4, we define the object language as well as a big-step interpreter. The eval function evaluates a given expression following OCaml’s call-by-value, right-to-left strategy. For instance, when given an application e1 e2, it first evaluates the argument e2, then evaluates the function e1. Once the application has been turned into a redex, we perform β-reduct… view at source ↗

**Figure 4.** Figure 4: Big-step interpreter [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: Contexts and reconstruction function; first attem [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

**Figure 6.** Figure 6: Contexts and reconstruction function; final versi [PITH_FULL_IMAGE:figures/full_fig_p006_6.png] view at source ↗

**Figure 7.** Figure 7: Stepping evaluator [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: memo and main functions 2.3 The Actual Stepper In [PITH_FULL_IMAGE:figures/full_fig_p009_8.png] view at source ↗

**Figure 9.** Figure 9: Evaluating programs using the actual stepper [PITH_FULL_IMAGE:figures/full_fig_p010_9.png] view at source ↗

**Figure 10.** Figure 10: Skipping evaluation of the factorial function [PITH_FULL_IMAGE:figures/full_fig_p010_10.png] view at source ↗

**Figure 11.** Figure 11: Skipping application [PITH_FULL_IMAGE:figures/full_fig_p010_11.png] view at source ↗

**Figure 12.** Figure 12: Stepping application 3 Stepping OCaml in the Classroom Since 2016, we have been using (earlier versions of) our stepper in an introductory OCaml course called “Functional Programming”, taught by the third author at Ochanomizu University. 3.1 The OCaml Course The “Functional Programming” course teaches how to program with functions and types, covering basic topics such as recursion, datatypes, effects, and… view at source ↗

**Figure 13.** Figure 13: Frequency of standard execution (light-colored [PITH_FULL_IMAGE:figures/full_fig_p012_13.png] view at source ↗

**Figure 14.** Figure 14: Number of times the stepper was used to evaluate a p [PITH_FULL_IMAGE:figures/full_fig_p013_14.png] view at source ↗

**Figure 15.** Figure 15: Number of questions where students arrived at a co [PITH_FULL_IMAGE:figures/full_fig_p014_15.png] view at source ↗

read the original abstract

Steppers, which display all the reduction steps of a given program, are a novice-friendly tool for understanding program behavior. Unfortunately, steppers are not as popular as they ought to be; indeed, the tool is only available in the pedagogical languages of the DrRacket programming environment. We present a stepper for a practical fragment of OCaml. Similarly to the DrRacket stepper, we keep track of evaluation contexts in order to reconstruct the whole program at each reduction step. The difference is that we support effectful constructs, such as exception handling and printing primitives, allowing the stepper to assist a wider range of users. In this paper, we describe the implementation of the stepper, share the feedback from our students, and show an attempt at assessing the educational impact of our stepper.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper delivers a practical OCaml stepper that handles exceptions and printing via context tracking, but correctness rests on an unverified implementation description.

read the letter

This paper gives you a stepper for a practical fragment of OCaml that can handle effectful code like exceptions and print statements. The approach is to keep track of evaluation contexts so the tool can show the entire program state after each reduction, which is the key to making it work with effects. They build on the DrRacket stepper idea but make it apply to OCaml. The implementation description covers how they manage the contexts for these constructs, and they include feedback from students who tried it out. That feedback and the assessment attempt are useful additions for a tools paper. The implementation seems solid enough for practical use based on what they report. Extending context tracking to effects is a reasonable way to broaden the tool's applicability. Where it falls short is in verification. There is no formal small-step semantics provided for the effectful parts, and no correspondence proof or test suite results to show that the stepper accurately models OCaml's behavior for exception handling and I/O. The educational assessment is described as an attempt, so it probably does not include strong data on learning outcomes. Readers who teach functional programming with OCaml or who develop educational programming tools will get value from this. It is a concrete example of bringing step-by-step visualization to a mainstream language. The paper shows clear thinking about the problem and honest reporting of what they achieved. It deserves a serious referee in a conference or journal that covers programming tools or computer science education. I recommend sending it to peer review rather than a desk reject.

Referee Report

2 major / 1 minor

Summary. The paper presents a stepper for a practical fragment of OCaml that tracks evaluation contexts to reconstruct the full program state at each reduction step. Unlike the DrRacket stepper, it supports effectful constructs including exception handling and printing primitives. The manuscript describes the implementation, reports student feedback, and includes an attempt to assess educational impact.

Significance. If the implementation correctly reconstructs program state for effects, the work would extend stepper tools beyond purely functional pedagogical languages to a practical fragment of OCaml, potentially broadening their use in education. The student feedback offers anecdotal support, but the absence of formal semantics, machine-checked proofs, or quantitative assessment data limits the demonstrated impact.

major comments (2)

[Implementation] Implementation description (no section number given in abstract): the claim that context tracking suffices for effectful constructs (exceptions, printing) is not supported by a formal small-step semantics for the supported fragment or a machine-checked correspondence to OCaml's actual reduction behavior, leaving handler search, exception values, and side-effect reconstruction unverified.
[Student feedback and assessment] Assessment of educational impact (no section number given): the reported student feedback and 'attempt at assessing' supply no quantitative metrics, controlled experiments, or statistical results, so the claim of educational value rests on unquantified anecdotes.

minor comments (1)

The abstract states the approach but supplies no implementation details, correctness arguments, or quantitative results, making the central claim difficult to evaluate from the provided text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript describing the OCaml stepper. We respond to each major comment below, clarifying the scope of our implementation-focused contribution and the qualitative nature of the reported feedback.

read point-by-point responses

Referee: [Implementation] Implementation description (no section number given in abstract): the claim that context tracking suffices for effectful constructs (exceptions, printing) is not supported by a formal small-step semantics for the supported fragment or a machine-checked correspondence to OCaml's actual reduction behavior, leaving handler search, exception values, and side-effect reconstruction unverified.

Authors: We agree that the manuscript provides no formal small-step semantics or machine-checked proofs. The implementation extends context tracking (as in DrRacket) with explicit rules for effect reconstruction: exceptions are handled by searching the context for matching handlers and unwinding the stack accordingly, while printing primitives are stepped by recording output in a separate log that is replayed during reconstruction. These mechanisms were validated through extensive testing against the OCaml interpreter on programs exercising exceptions, handlers, and side effects. The paper's focus is the practical tool and its educational use rather than formal verification; adding a full semantics would be a substantial separate effort outside the current scope. revision: no
Referee: [Student feedback and assessment] Assessment of educational impact (no section number given): the reported student feedback and 'attempt at assessing' supply no quantitative metrics, controlled experiments, or statistical results, so the claim of educational value rests on unquantified anecdotes.

Authors: We concur that the assessment consists of qualitative student feedback and an initial attempt without quantitative metrics, controlled experiments, or statistical analysis. The reported comments illustrate how the stepper helped students visualize evaluation order and effects in OCaml, which is the extent of the evidence presented. A rigorous quantitative study lies beyond the resources available for this work and is noted as future research; the manuscript shares the tool and observed educational experiences rather than claiming statistically validated impact. revision: no

Circularity Check

0 steps flagged

No circularity: implementation description without derivational claims

full rationale

The paper describes the implementation of an OCaml stepper that tracks evaluation contexts to support effectful constructs. No equations, fitted parameters, predictions of related quantities, or load-bearing self-citations appear in the provided text. The central contribution is the existence of the described tool and student feedback, which does not reduce to any input by construction. This is a standard self-contained engineering paper; the derivation chain is absent, so no circularity patterns apply.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on the domain assumption that OCaml reduction can be faithfully captured by evaluation contexts even in the presence of effects; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption OCaml reduction semantics can be reconstructed from evaluation contexts for both pure and effectful constructs
Invoked when the stepper is said to reconstruct the whole program at each step.

pith-pipeline@v0.9.0 · 5671 in / 1194 out tokens · 33858 ms · 2026-05-25T14:21:20.561822+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

6 extracted references · 6 canonical work pages

[1]

In: European symposium on programming , Springer, pp

John Clements, Matthew Flatt & Matthias Felleisen (2001 ): Modeling an algebraic stepper . In: European symposium on programming , Springer, pp. 320–334, doi: 10.1007/3-540-45309-1_21

work page doi:10.1007/3-540-45309-1_21 2001
[2]

In: Proceedings of the 1990 ACM Conference on Lisp and Functional Programming , LFP ’90, pp

Olivier Danvy & Andrzej Filinski (1990): Abstracting Control. In: Proceedings of the 1990 ACM Conference on Lisp and Functional Programming , LFP ’90, pp. 151–160, doi: 10.1145/91556.91622

work page doi:10.1145/91556.91622 1990
[3]

MIT Press

Matthias Felleisen, Robert Bruce Findler & Matthew Flat t (2009): Semantics engineering with PLT Redex . MIT Press

work page 2009
[4]

Friedman (1986): Control operators, the SECD-machine, and the λ -calculus

Matthias Felleisen & Daniel P . Friedman (1986): Control operators, the SECD-machine, and the λ -calculus. In M. Wirsing, editor: Formal Description of Programming Concepts III , Elsevier, pp. 193–219. Available at https://cs.indiana.edu/ftp/techreports/TR197.pdf

work page 1986
[5]

McCarthy, Jon Rafkind, Sam Tobin-Hochstadt & Robert Bruce F indler (2012): Run Y our Research: On the Effectiveness of Lightweight Mechanization

Casey Klein, John Clements, Christos Dimoulas, Carl Eas tlund, Matthias Felleisen, Matthew Flatt, Jay A. McCarthy, Jon Rafkind, Sam Tobin-Hochstadt & Robert Bruce F indler (2012): Run Y our Research: On the Effectiveness of Lightweight Mechanization . In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , ...

work page doi:10.1145/2103656.2103691 2012
[6]

In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE ’18, ACM, pp

Preston Tunnell Wilson, Kathi Fisler & Shriram Krishnam urthi (2018): Evaluating the Tracing of Recursion in the Substitution Notional Machine . In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE ’18, ACM, pp. 1023–1028, doi: 10.1145/3159450.3159479. A Appendix 2017 2018 week all step. try mod. print ref all step. tr...

work page doi:10.1145/3159450.3159479 2018

[1] [1]

In: European symposium on programming , Springer, pp

John Clements, Matthew Flatt & Matthias Felleisen (2001 ): Modeling an algebraic stepper . In: European symposium on programming , Springer, pp. 320–334, doi: 10.1007/3-540-45309-1_21

work page doi:10.1007/3-540-45309-1_21 2001

[2] [2]

In: Proceedings of the 1990 ACM Conference on Lisp and Functional Programming , LFP ’90, pp

Olivier Danvy & Andrzej Filinski (1990): Abstracting Control. In: Proceedings of the 1990 ACM Conference on Lisp and Functional Programming , LFP ’90, pp. 151–160, doi: 10.1145/91556.91622

work page doi:10.1145/91556.91622 1990

[3] [3]

MIT Press

Matthias Felleisen, Robert Bruce Findler & Matthew Flat t (2009): Semantics engineering with PLT Redex . MIT Press

work page 2009

[4] [4]

Friedman (1986): Control operators, the SECD-machine, and the λ -calculus

Matthias Felleisen & Daniel P . Friedman (1986): Control operators, the SECD-machine, and the λ -calculus. In M. Wirsing, editor: Formal Description of Programming Concepts III , Elsevier, pp. 193–219. Available at https://cs.indiana.edu/ftp/techreports/TR197.pdf

work page 1986

[5] [5]

McCarthy, Jon Rafkind, Sam Tobin-Hochstadt & Robert Bruce F indler (2012): Run Y our Research: On the Effectiveness of Lightweight Mechanization

Casey Klein, John Clements, Christos Dimoulas, Carl Eas tlund, Matthias Felleisen, Matthew Flatt, Jay A. McCarthy, Jon Rafkind, Sam Tobin-Hochstadt & Robert Bruce F indler (2012): Run Y our Research: On the Effectiveness of Lightweight Mechanization . In: Proceedings of the 39th Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages , ...

work page doi:10.1145/2103656.2103691 2012

[6] [6]

In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE ’18, ACM, pp

Preston Tunnell Wilson, Kathi Fisler & Shriram Krishnam urthi (2018): Evaluating the Tracing of Recursion in the Substitution Notional Machine . In: Proceedings of the 49th ACM Technical Symposium on Computer Science Education, SIGCSE ’18, ACM, pp. 1023–1028, doi: 10.1145/3159450.3159479. A Appendix 2017 2018 week all step. try mod. print ref all step. tr...

work page doi:10.1145/3159450.3159479 2018