Reproducible Execution of POSIX Programs with DiOS
Pith reviewed 2026-05-25 01:37 UTC · model grok-4.3
The pith
DiOS makes executions of POSIX programs fully reproducible, producing identical instruction traces even for multi-threaded code.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DiOS achieves fully reproducible execution of POSIX programs by serving as a model operating system that controls thread scheduling and API calls to eliminate non-determinism, ensuring that repeated runs with identical inputs produce identical instruction traces.
What carries the argument
DiOS, the modular lightweight model operating system that emulates POSIX APIs in a deterministic manner.
If this is right
- Multi-threaded programs can be executed reproducibly for verification purposes.
- DiOS can be ported to different execution platforms like symbolic executors.
- New POSIX APIs can be supported by adding modular components.
- The system can function as a user-mode kernel when compiled to machine code.
Where Pith is reading between the lines
- Such reproducibility could simplify debugging of concurrent software by allowing exact replay of executions.
- This model OS approach might extend to other APIs beyond POSIX for deterministic testing.
- Integration with verification tools could lead to more reliable analysis of parallel programs.
Load-bearing premise
The model operating system can emulate POSIX APIs and thread scheduling without introducing any new non-determinism.
What would settle it
Running a multi-threaded POSIX program twice under DiOS with the same inputs and observing different instruction traces.
Figures
read the original abstract
In this paper, we describe DiOS, a lightweight model operating system which can be used to execute programs that make use of POSIX APIs. Such executions are fully reproducible: running the same program with the same inputs twice will result in two exactly identical instruction traces, even if the program uses threads for parallelism. DiOS is implemented almost entirely in portable C and C++: although its primary platform is DiVM, a verification-oriented virtual machine, it can be configured to also run in KLEE, a symbolic executor. Finally, it can be compiled into machine code to serve as a user-mode kernel. Additionally, DiOS is modular and extensible. Its various components can be combined to match both the capabilities of the underlying platform and to provide services required by a particular program. New components can be added to cover additional system calls or APIs. The experimental evaluation has two parts. DiOS is first evaluated as a component of a program verification platform based on DiVM. In the second part, we consider its portability and modularity by combining it with the symbolic executor KLEE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes DiOS, a lightweight model operating system implemented almost entirely in portable C and C++ that enables fully reproducible execution of POSIX programs. The central claim is that running the same program with the same inputs twice produces exactly identical instruction traces, even for programs using threads for parallelism. DiOS targets DiVM as its primary platform but is configurable for KLEE and can be compiled to native machine code as a user-mode kernel. It is modular and extensible, with components combinable to match platform capabilities and program requirements. The paper reports an experimental evaluation in two parts: first as a component of a DiVM-based verification platform, and second demonstrating portability and modularity with the KLEE symbolic executor.
Significance. If the reproducibility claim holds, the work would be significant for program verification and symbolic execution communities by providing a deterministic environment for POSIX APIs and concurrent programs. The portability across DiVM, KLEE, and native compilation, along with modularity, is a practical strength that could enable broader adoption in verification tools. The two-part evaluation helps ground the claims in concrete use cases.
major comments (2)
- [Abstract] Abstract: the claim that executions produce 'exactly identical instruction traces, even if the program uses threads for parallelism' is the central contribution, yet the description provides no explicit accounting of how every source of non-determinism (thread scheduling decisions, signal delivery, internal scheduler state, or platform-specific timing) is eliminated from program inputs alone. This assumption is load-bearing for the reproducibility guarantee across all claimed platforms.
- [Experimental evaluation] Experimental evaluation (second part): the portability and modularity claims with KLEE are asserted, but without reported metrics on trace identity, comparison against baseline non-deterministic executions, or coverage of scheduling behaviors, it is not possible to assess whether the determinism holds in practice.
minor comments (2)
- [Abstract] The abstract mentions 'new components can be added to cover additional system calls or APIs' but does not list the POSIX APIs currently supported; a table or enumerated list would improve clarity.
- Notation for 'DiVM' and 'DiOS' is introduced without an initial expansion or reference to prior work on DiVM; adding this would aid readers unfamiliar with the verification VM.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on the reproducibility claims and the evaluation. We address the two major comments point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that executions produce 'exactly identical instruction traces, even if the program uses threads for parallelism' is the central contribution, yet the description provides no explicit accounting of how every source of non-determinism (thread scheduling decisions, signal delivery, internal scheduler state, or platform-specific timing) is eliminated from program inputs alone. This assumption is load-bearing for the reproducibility guarantee across all claimed platforms.
Authors: The abstract is a concise summary of the contribution. The manuscript provides a detailed accounting of non-determinism elimination in Sections 3 (architecture of DiOS, including the deterministic scheduler and signal handling) and 4 (reproducibility guarantees, covering thread scheduling, internal state, and platform independence). These sections explain how DiOS controls all listed sources of non-determinism through its model OS design rather than relying on program inputs alone. The abstract does not repeat this level of detail, consistent with typical abstract length constraints. revision: no
-
Referee: [Experimental evaluation] Experimental evaluation (second part): the portability and modularity claims with KLEE are asserted, but without reported metrics on trace identity, comparison against baseline non-deterministic executions, or coverage of scheduling behaviors, it is not possible to assess whether the determinism holds in practice.
Authors: The second part of the evaluation demonstrates portability by integrating the same DiOS implementation with KLEE, inheriting the determinism guarantees from the core design (detailed in Sections 3-4). Trace identity holds by construction across platforms because DiOS replaces non-deterministic OS services. We agree that explicit metrics for the KLEE configuration would strengthen the presentation and will add a short subsection with trace comparison results and scheduling coverage for the KLEE experiments in the revised manuscript. revision: yes
Circularity Check
No circularity: system description with no equations or load-bearing self-citations
full rationale
The paper is a system description of DiOS, claiming reproducibility by design through control of scheduling and POSIX APIs in a model OS. No equations, fitted parameters, or mathematical derivations exist. The central claim does not reduce to any self-citation chain, ansatz, or renaming of prior results; it is presented as an engineering property of the new implementation. Any self-citations to DiVM or related tools are not invoked as uniqueness theorems to force the result. The contribution is self-contained against external benchmarks of reproducibility.
Axiom & Free-Parameter Ledger
invented entities (1)
-
DiOS
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Model checking of C and C++ with DIVINE 4
Zuzana Baranov´ a, Jiˇ r´ ı Barnat, Katar´ ına Kejstov´ a, Tade´ aˇ s Kuˇ cera, Henrich Lauko, Jan Mr´ azek, Petr Roˇ ckai, and Vladim´ ırˇStill. Model checking of C and C++ with DIVINE 4. 2017
work page 2017
-
[2]
Reliable and reproducible competition results with BenchExec and witnesses report on SV-COMP 2016
Dirk Beyer. Reliable and reproducible competition results with BenchExec and witnesses report on SV-COMP 2016. In Tools and Algorithms for the Construction and Analysis of Systems (TACAS) , pages 887–904. Springer,
work page 2016
-
[3]
doi: 10.1007/978-3-662-49674-9 55
ISBN 978-3-662-49673-2. doi: 10.1007/978-3-662-49674-9 55
-
[4]
Cristian Cadar, Daniel Dunbar, and Dawson R. Engler. KLEE: Unassisted and automatic generation of high-coverage tests for complex systems pro- grams. In Operating Systems Design and Implementation , pages 209–224. USENIX Association, 2008
work page 2008
-
[5]
Reprozip: Using provenance to support computational reproducibility
Fernando Chirigati, Dennis Shasha, and Juliana Freire. Reprozip: Using provenance to support computational reproducibility. In Proceedings of the 5th USENIX Workshop on the Theory and Practice of Provenance , TaPP ’13, pages 1:1–1:4, Berkeley, CA, USA, 2013. USENIX Association. URL http://dl.acm.org/citation.cfm?id=2482949.2482951
-
[6]
Automatic capture and reconstruction of computational provenance
James Frew, Dominic Metzger, and Peter Slaughter. Automatic capture and reconstruction of computational provenance. Concurr. Comput. : Pract. Exper., 20(5):485–496, April 2008. ISSN 1532-0626. doi: 10.1002/cpe.v20:5. URL http://dx.doi.org/10.1002/cpe.v20:5
-
[7]
O. Inverso, T. L. Nguyen, B. Fischer, S. L. Torre, and G. Parlato. Lazy-CSeq: A context-bounded model checking tool for multi-threaded C-programs. In 2015 30th IEEE/ACM International Conference on Au- tomated Software Engineering (ASE) , pages 807–812, Nov 2015. doi: 10.1109/ASE.2015.108
-
[8]
Scarpe: A technique and tool for selective capture and replay of program executions
Shrinivas Joshi and Alessandro Orso. Scarpe: A technique and tool for selective capture and replay of program executions. pages 234 – 243, 11
-
[9]
doi: 10.1109/ICSM.2007.4362636
ISBN 978-1-4244-1256-3. doi: 10.1109/ICSM.2007.4362636
-
[10]
Model checking with system call traces
Katar´ ına Kejstov´ a. Model checking with system call traces. Master’s thesis, Masarykova univerzita, Fakulta informatiky, Brno, 2019. URL http://is. muni.cz/th/tukvk/
work page 2019
-
[11]
From model checking to runtime verification and back
Katar´ ına Kejstov´ a, Petr Roˇ ckai, and Jiˇ r´ ı Barnat. From model checking to runtime verification and back. In Runtime Verification, volume 10548 of LNCS, pages 225–240. Springer, 2017. doi: 10.1007/978-3-319-67531-2 14
-
[12]
Soonho Kong, Nikolai Tillmann, and Jonathan de Halleux. Automated testing of environment-dependent programs - a case study of modeling the file system for pex. pages 758–762, 01 2009. doi: 10.1109/ITNG.2009.80
-
[13]
Holger Krekel, Bruno Oliveira, Ronny Pfannschmidt, Floris Bruynooghe, Brianna Laugher, and Florian Bruhin. pytest 4.5, 2004. URL https:// github.com/pytest-dev/pytest
work page 2004
-
[14]
Extending DIVINE with symbolic verification using SMT
Henrich Lauko, Vladim´ ır ˇStill, Petr Roˇ ckai, and Jiˇ r´ ı Barnat. Extending DIVINE with symbolic verification using SMT. In Dirk Beyer, Marieke Huisman, Fabrice Kordon, and Bernhard Steffen, editors, Tools and Algo- rithms for the Construction and Analysis of Systems , pages 204–208, Cham,
- [15]
-
[16]
Tim Mackinnon, Steve Freeman, and Philip Craig. Extreme programming examined. chapter Endo-testing: Unit Testing with Mock Objects, pages 287–301. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA,
-
[17]
URL http://dl.acm.org/citation.cfm?id= 377517.377534
ISBN 0-201-71040-4. URL http://dl.acm.org/citation.cfm?id= 377517.377534
-
[18]
S. Mostafa and X. Wang. An empirical study on the usage of mocking frameworks in software testing. In 2014 14th International Conference on Quality Software, pages 127–132, Oct 2014. doi: 10.1109/QSIC.2014.19
-
[19]
Finding and reproducing heisenbugs in concurrent programs
Madan Musuvathi, Shaz Qadeer, Tom Ball, Gerard Basler, Pira- manayakam Arumuga Nainar, and Iulian Neamtiu. Finding and reproducing heisenbugs in concurrent programs. In Symposium on Operating Systems Design and Implementation . USENIX, December 2008
work page 2008
-
[20]
DiVM: Model checking with LLVM and graph memory
Petr Roˇ ckai, Vladim´ ırˇStill, Ivana ˇCern´ a, and Jiˇ r´ ı Barnat. DiVM: Model checking with LLVM and graph memory. Journal of Systems and Software , 143:1 – 13, 2018. ISSN 0164-1212. doi: https://doi.org/10.1016/j.jss.2018. 04.026
-
[21]
Verifying multi- threaded software with impact
Bj¨ orn Wachter, Daniel Kroening, and Joel Ouaknine. Verifying multi- threaded software with impact. In Formal Methods in Computer-Aided Design, pages 210–217. IEEE, 10 2013. doi: 10.1109/FMCAD.2013.6679412
-
[22]
Inspect: A runtime model checker for multithreaded c programs
Yu Yang, Xiaofang Chen, and Ganesh Gopalakrishnan. Inspect: A runtime model checker for multithreaded c programs. Technical report, 2008
work page 2008
-
[23]
Using off-the-shelf exception support components in C++ verification
Vladim´ ırˇStill, Petr Roˇ ckai, and Jiˇ r´ ı Barnat. Using off-the-shelf exception support components in C++ verification. In Software Quality, Reliability and Security (QRS) , pages 54–64, 2017. 17
work page 2017
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.