pith. sign in

arxiv: 1907.03356 · v1 · pith:JLDGG6X7new · submitted 2019-07-07 · 💻 cs.OS

Reproducible Execution of POSIX Programs with DiOS

Pith reviewed 2026-05-25 01:37 UTC · model grok-4.3

classification 💻 cs.OS
keywords reproducible executionPOSIX APIsmodel operating systemDiOSthread schedulingdeterminismDiVMKLEE
0
0 comments X

The pith

DiOS makes executions of POSIX programs fully reproducible, producing identical instruction traces even for multi-threaded code.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents DiOS as a lightweight model operating system that executes programs relying on POSIX APIs with guaranteed reproducibility. The same program and inputs always yield the exact same sequence of instructions, regardless of threading. DiOS is written in portable C and C++, supports multiple backends including DiVM for verification and KLEE for symbolic execution, and can also run as a user-mode kernel. Its modular design allows extending it with additional system call handlers as needed.

Core claim

DiOS achieves fully reproducible execution of POSIX programs by serving as a model operating system that controls thread scheduling and API calls to eliminate non-determinism, ensuring that repeated runs with identical inputs produce identical instruction traces.

What carries the argument

DiOS, the modular lightweight model operating system that emulates POSIX APIs in a deterministic manner.

If this is right

  • Multi-threaded programs can be executed reproducibly for verification purposes.
  • DiOS can be ported to different execution platforms like symbolic executors.
  • New POSIX APIs can be supported by adding modular components.
  • The system can function as a user-mode kernel when compiled to machine code.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such reproducibility could simplify debugging of concurrent software by allowing exact replay of executions.
  • This model OS approach might extend to other APIs beyond POSIX for deterministic testing.
  • Integration with verification tools could lead to more reliable analysis of parallel programs.

Load-bearing premise

The model operating system can emulate POSIX APIs and thread scheduling without introducing any new non-determinism.

What would settle it

Running a multi-threaded POSIX program twice under DiOS with the same inputs and observing different instruction traces.

Figures

Figures reproduced from arXiv: 1907.03356 by Jan Mr\'azek, Ji\v{r}\'i Barnat, Katar\'ina Kejstov\'a, Petr Ro\v{c}kai, Zuzana Baranov\'a.

Figure 1
Figure 1. Figure 1: Architecture of the native execution platform. Like in KLEE, the native port of DiOS does not have access to in-place resizing of memory objects, but it can be emulated slightly more efficiently using the mmap host system call. The native port, however, does not suffer from the single stack limitations that KLEE does: new stacks can be created using mmap calls and stack switching can be implemented using h… view at source ↗
Figure 2
Figure 2. Figure 2: The architecture of DiOS. 4.1 Kernel Components The decomposition of the kernel to a number of components serves multiple goals: first is resource conservation – some components have non-negligible mem￾ory overhead even when they are not actively used. This may be because they need to store auxiliary data along with each thread or process, and the under￾lying verification tool then needs to track this data… view at source ↗
Figure 3
Figure 3. Figure 3: Building verified executables with DiOS. 5 Evaluation We have tested DiOS in a number of scenarios, to ensure that it meets the goals that we describe in Section 1.2. The first goal – modularity – is hard to quantify in isolation, but it was of considerable help in adapting DiOS for different use cases. We have used DiOS with success in explicit-state model checking of parallel 9 This extraction is perform… view at source ↗
read the original abstract

In this paper, we describe DiOS, a lightweight model operating system which can be used to execute programs that make use of POSIX APIs. Such executions are fully reproducible: running the same program with the same inputs twice will result in two exactly identical instruction traces, even if the program uses threads for parallelism. DiOS is implemented almost entirely in portable C and C++: although its primary platform is DiVM, a verification-oriented virtual machine, it can be configured to also run in KLEE, a symbolic executor. Finally, it can be compiled into machine code to serve as a user-mode kernel. Additionally, DiOS is modular and extensible. Its various components can be combined to match both the capabilities of the underlying platform and to provide services required by a particular program. New components can be added to cover additional system calls or APIs. The experimental evaluation has two parts. DiOS is first evaluated as a component of a program verification platform based on DiVM. In the second part, we consider its portability and modularity by combining it with the symbolic executor KLEE.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript describes DiOS, a lightweight model operating system implemented almost entirely in portable C and C++ that enables fully reproducible execution of POSIX programs. The central claim is that running the same program with the same inputs twice produces exactly identical instruction traces, even for programs using threads for parallelism. DiOS targets DiVM as its primary platform but is configurable for KLEE and can be compiled to native machine code as a user-mode kernel. It is modular and extensible, with components combinable to match platform capabilities and program requirements. The paper reports an experimental evaluation in two parts: first as a component of a DiVM-based verification platform, and second demonstrating portability and modularity with the KLEE symbolic executor.

Significance. If the reproducibility claim holds, the work would be significant for program verification and symbolic execution communities by providing a deterministic environment for POSIX APIs and concurrent programs. The portability across DiVM, KLEE, and native compilation, along with modularity, is a practical strength that could enable broader adoption in verification tools. The two-part evaluation helps ground the claims in concrete use cases.

major comments (2)
  1. [Abstract] Abstract: the claim that executions produce 'exactly identical instruction traces, even if the program uses threads for parallelism' is the central contribution, yet the description provides no explicit accounting of how every source of non-determinism (thread scheduling decisions, signal delivery, internal scheduler state, or platform-specific timing) is eliminated from program inputs alone. This assumption is load-bearing for the reproducibility guarantee across all claimed platforms.
  2. [Experimental evaluation] Experimental evaluation (second part): the portability and modularity claims with KLEE are asserted, but without reported metrics on trace identity, comparison against baseline non-deterministic executions, or coverage of scheduling behaviors, it is not possible to assess whether the determinism holds in practice.
minor comments (2)
  1. [Abstract] The abstract mentions 'new components can be added to cover additional system calls or APIs' but does not list the POSIX APIs currently supported; a table or enumerated list would improve clarity.
  2. Notation for 'DiVM' and 'DiOS' is introduced without an initial expansion or reference to prior work on DiVM; adding this would aid readers unfamiliar with the verification VM.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the reproducibility claims and the evaluation. We address the two major comments point by point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that executions produce 'exactly identical instruction traces, even if the program uses threads for parallelism' is the central contribution, yet the description provides no explicit accounting of how every source of non-determinism (thread scheduling decisions, signal delivery, internal scheduler state, or platform-specific timing) is eliminated from program inputs alone. This assumption is load-bearing for the reproducibility guarantee across all claimed platforms.

    Authors: The abstract is a concise summary of the contribution. The manuscript provides a detailed accounting of non-determinism elimination in Sections 3 (architecture of DiOS, including the deterministic scheduler and signal handling) and 4 (reproducibility guarantees, covering thread scheduling, internal state, and platform independence). These sections explain how DiOS controls all listed sources of non-determinism through its model OS design rather than relying on program inputs alone. The abstract does not repeat this level of detail, consistent with typical abstract length constraints. revision: no

  2. Referee: [Experimental evaluation] Experimental evaluation (second part): the portability and modularity claims with KLEE are asserted, but without reported metrics on trace identity, comparison against baseline non-deterministic executions, or coverage of scheduling behaviors, it is not possible to assess whether the determinism holds in practice.

    Authors: The second part of the evaluation demonstrates portability by integrating the same DiOS implementation with KLEE, inheriting the determinism guarantees from the core design (detailed in Sections 3-4). Trace identity holds by construction across platforms because DiOS replaces non-deterministic OS services. We agree that explicit metrics for the KLEE configuration would strengthen the presentation and will add a short subsection with trace comparison results and scheduling coverage for the KLEE experiments in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: system description with no equations or load-bearing self-citations

full rationale

The paper is a system description of DiOS, claiming reproducibility by design through control of scheduling and POSIX APIs in a model OS. No equations, fitted parameters, or mathematical derivations exist. The central claim does not reduce to any self-citation chain, ansatz, or renaming of prior results; it is presented as an engineering property of the new implementation. Any self-citations to DiVM or related tools are not invoked as uniqueness theorems to force the result. The contribution is self-contained against external benchmarks of reproducibility.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

The central claim rests on the existence and correctness of the DiOS implementation itself; no free parameters, mathematical axioms, or externally invented entities are described in the abstract.

invented entities (1)
  • DiOS no independent evidence
    purpose: lightweight model operating system providing reproducible POSIX execution
    DiOS is the primary contribution introduced by the paper; no independent evidence outside the paper is mentioned.

pith-pipeline@v0.9.0 · 5745 in / 1171 out tokens · 19304 ms · 2026-05-25T01:37:39.403510+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages

  1. [1]

    Model checking of C and C++ with DIVINE 4

    Zuzana Baranov´ a, Jiˇ r´ ı Barnat, Katar´ ına Kejstov´ a, Tade´ aˇ s Kuˇ cera, Henrich Lauko, Jan Mr´ azek, Petr Roˇ ckai, and Vladim´ ırˇStill. Model checking of C and C++ with DIVINE 4. 2017

  2. [2]

    Reliable and reproducible competition results with BenchExec and witnesses report on SV-COMP 2016

    Dirk Beyer. Reliable and reproducible competition results with BenchExec and witnesses report on SV-COMP 2016. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS) , pages 887–904. Springer,

  3. [3]

    doi: 10.1007/978-3-662-49674-9 55

    ISBN 978-3-662-49673-2. doi: 10.1007/978-3-662-49674-9 55

  4. [4]

    Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems pro- grams. In Operating Systems Design and Implementation , pages 209–224. USENIX Association, 2008

  5. [5]

    Reprozip: Using provenance to support computational reproducibility

    Fernando Chirigati, Dennis Shasha, and Juliana Freire. Reprozip: Using provenance to support computational reproducibility. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance , TaPP ’13, pages 1:1–1:4, Berkeley, CA, USA, 2013. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2482949.2482951

  6. [6]

    Automatic capture and reconstruction of computational provenance

    James Frew, Dominic Metzger, and Peter Slaughter. Automatic capture and reconstruction of computational provenance. Concurr. Comput. : Pract. Exper., 20(5):485–496, April 2008. ISSN 1532-0626. doi: 10.1002/cpe.v20:5. URL http://dx.doi.org/10.1002/cpe.v20:5

  7. [7]

    Inverso, T

    O. Inverso, T. L. Nguyen, B. Fischer, S. L. Torre, and G. Parlato. Lazy-CSeq: A context-bounded model checking tool for multi-threaded C-programs. In 2015 30th IEEE/ACM International Conference on Au- tomated Software Engineering (ASE) , pages 807–812, Nov 2015. doi: 10.1109/ASE.2015.108

  8. [8]

    Scarpe: A technique and tool for selective capture and replay of program executions

    Shrinivas Joshi and Alessandro Orso. Scarpe: A technique and tool for selective capture and replay of program executions. pages 234 – 243, 11

  9. [9]

    doi: 10.1109/ICSM.2007.4362636

    ISBN 978-1-4244-1256-3. doi: 10.1109/ICSM.2007.4362636

  10. [10]

    Model checking with system call traces

    Katar´ ına Kejstov´ a. Model checking with system call traces. Master’s thesis, Masarykova univerzita, Fakulta informatiky, Brno, 2019. URL http://is. muni.cz/th/tukvk/

  11. [11]

    From model checking to runtime verification and back

    Katar´ ına Kejstov´ a, Petr Roˇ ckai, and Jiˇ r´ ı Barnat. From model checking to runtime verification and back. In Runtime Verification, volume 10548 of LNCS, pages 225–240. Springer, 2017. doi: 10.1007/978-3-319-67531-2 14

  12. [12]

    Automated testing of environment-dependent programs - a case study of modeling the file system for pex

    Soonho Kong, Nikolai Tillmann, and Jonathan de Halleux. Automated testing of environment-dependent programs - a case study of modeling the file system for pex. pages 758–762, 01 2009. doi: 10.1109/ITNG.2009.80

  13. [13]

    pytest 4.5, 2004

    Holger Krekel, Bruno Oliveira, Ronny Pfannschmidt, Floris Bruynooghe, Brianna Laugher, and Florian Bruhin. pytest 4.5, 2004. URL https:// github.com/pytest-dev/pytest

  14. [14]

    Extending DIVINE with symbolic verification using SMT

    Henrich Lauko, Vladim´ ır ˇStill, Petr Roˇ ckai, and Jiˇ r´ ı Barnat. Extending DIVINE with symbolic verification using SMT. In Dirk Beyer, Marieke Huisman, Fabrice Kordon, and Bernhard Steffen, editors, Tools and Algo- rithms for the Construction and Analysis of Systems , pages 204–208, Cham,

  15. [15]

    ISBN 978-3-030-17502-3

    Springer International Publishing. ISBN 978-3-030-17502-3. 16

  16. [16]

    Extreme programming examined

    Tim Mackinnon, Steve Freeman, and Philip Craig. Extreme programming examined. chapter Endo-testing: Unit Testing with Mock Objects, pages 287–301. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,

  17. [17]

    URL http://dl.acm.org/citation.cfm?id= 377517.377534

    ISBN 0-201-71040-4. URL http://dl.acm.org/citation.cfm?id= 377517.377534

  18. [18]

    Mostafa and X

    S. Mostafa and X. Wang. An empirical study on the usage of mocking frameworks in software testing. In 2014 14th International Conference on Quality Software, pages 127–132, Oct 2014. doi: 10.1109/QSIC.2014.19

  19. [19]

    Finding and reproducing heisenbugs in concurrent programs

    Madan Musuvathi, Shaz Qadeer, Tom Ball, Gerard Basler, Pira- manayakam Arumuga Nainar, and Iulian Neamtiu. Finding and reproducing heisenbugs in concurrent programs. In Symposium on Operating Systems Design and Implementation . USENIX, December 2008

  20. [20]

    DiVM: Model checking with LLVM and graph memory

    Petr Roˇ ckai, Vladim´ ırˇStill, Ivana ˇCern´ a, and Jiˇ r´ ı Barnat. DiVM: Model checking with LLVM and graph memory. Journal of Systems and Software , 143:1 – 13, 2018. ISSN 0164-1212. doi: https://doi.org/10.1016/j.jss.2018. 04.026

  21. [21]

    Verifying multi- threaded software with impact

    Bj¨ orn Wachter, Daniel Kroening, and Joel Ouaknine. Verifying multi- threaded software with impact. In Formal Methods in Computer-Aided Design, pages 210–217. IEEE, 10 2013. doi: 10.1109/FMCAD.2013.6679412

  22. [22]

    Inspect: A runtime model checker for multithreaded c programs

    Yu Yang, Xiaofang Chen, and Ganesh Gopalakrishnan. Inspect: A runtime model checker for multithreaded c programs. Technical report, 2008

  23. [23]

    Using off-the-shelf exception support components in C++ verification

    Vladim´ ırˇStill, Petr Roˇ ckai, and Jiˇ r´ ı Barnat. Using off-the-shelf exception support components in C++ verification. In Software Quality, Reliability and Security (QRS) , pages 54–64, 2017. 17