pith. sign in

arxiv: 2604.06273 · v2 · submitted 2026-04-07 · 💻 cs.DB · cs.PL· cs.SE

CobbleDB: Modelling Levelled Storage by Composition

Pith reviewed 2026-05-10 18:52 UTC · model grok-4.3

classification 💻 cs.DB cs.PLcs.SE
keywords database storagelevelled compactionstore compositioncorrect-by-constructionequivalence proofsbacking storeRocksDB
0
0 comments X

The pith

Equivalence between store specifications allows composing performance features into a correct database backing store.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that database backing stores can be built by composing components whose behaviors were previously specified and proven equivalent. This approach ensures the resulting store is correct by construction because the composition preserves the verified properties. The authors implement this in Java to create CobbleDB, which replicates the levelled storage of RocksDB. A reader would care because it offers a way to add performance features to storage engines while avoiding the bugs that typically appear in ad-hoc coding.

Core claim

We present a composition-based approach to building correct-by-construction database backing stores. In previous work, we specified the behaviour of several store variants and proved their correctness and equivalence. Here, we derive a Java implementation: the simplicity of the specification makes manual construction straightforward. We leverage spec-guaranteed store equivalence to compose performance features, then demonstrate practical value with CobbleDB, a reimplementation of RocksDB's levelled storage.

What carries the argument

Spec-guaranteed store equivalence, used to compose performance features from prior verified variants while preserving overall correctness.

If this is right

  • Database stores can be constructed correctly by composing previously verified variants.
  • Performance features can be added modularly without re-verifying the entire system.
  • A reimplementation of RocksDB's levelled compaction becomes possible with formal guarantees.
  • Implementation effort reduces to faithful translation from the specification.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This composition method could apply to other storage architectures such as LSM trees.
  • It opens a route to libraries of reusable, verified store components for custom designs.
  • Developers could prototype and combine new performance features more rapidly than before.

Load-bearing premise

The manual Java implementation faithfully matches the specification and the prior equivalence proofs apply directly to the composed features.

What would settle it

Running CobbleDB against RocksDB on the same sequence of operations and observing different results on reads or writes, or finding a counterexample where the composed store violates one of the original specifications.

Figures

Figures reproduced from arXiv: 2604.06273 by Annette Bieniusa (RPTU), Ayush Pandey (TSP), Emilie Ma (UBC), Marc Shapiro (DELYS).

Figure 1
Figure 1. Figure 1: Basic stores class hierarchy diagram. The map store variant is designed for read-heavy workloads. It maps a key to a list of effects, with metadata to distinguish concur￾rent effects. Uncommitted writes remain transaction-private, i.e., doUpdate is a no-op; doCommit pushes all its buffered effects in bulk. Map.Lookup searches for the most recent assignment(s) up to a specified snapshot timestamp, and appli… view at source ↗
Figure 3
Figure 3. Figure 3: Latency vs. throughput for CobbleDB and basic [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Latency vs. throughput for different CobbleDB con [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

We present a composition-based approach to building correctby-construction database backing stores. In previous work, we specified the behaviour of several store variants and proved their correctness and equivalence. Here, we derive a Java implementation: the simplicity of the specification makes manual construction straightforward. We leverage spec-guaranteed store equivalence to compose performance features, then demonstrate practical value with CobbleDB, a reimplementation of RocksDB's levelled storage.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper presents a composition-based approach to building correct-by-construction database backing stores. Building on prior formal specifications and equivalence proofs for store variants, the authors manually derive a Java implementation of CobbleDB that reimplements RocksDB's levelled storage. They claim that spec simplicity makes the translation straightforward and leverage the proven equivalences to compose performance features while preserving correctness.

Significance. If the implementation faithfully realizes the specifications, the work offers a method for constructing reliable storage engines by composing proven components, which could reduce bugs in production systems like RocksDB. The practical demonstration strengthens the case for applying formal methods to database storage, though its impact depends on verification of the hand-written code.

major comments (2)
  1. [Implementation section] Implementation section (around the description of manual Java construction from the spec): The paper states that 'the simplicity of the specification makes manual construction straightforward' and that equivalences transfer, but provides no mechanical verification (e.g., refinement proof, model checking, or differential testing) that the composed levelled-storage operations in CobbleDB match the spec on key behaviors. This is load-bearing for the central claim, as any divergence would invalidate the performance-feature composition.
  2. [Composition and demonstration section] Section on leveraging spec-guaranteed equivalence for composition: The approach relies on prior separate proofs applying directly to the composed features in the Java code, but without evidence that the manual translation preserves the equivalence relations for the specific levelled-storage operations (e.g., compaction or read paths), the practical value demonstration rests on an unverified assumption.
minor comments (2)
  1. [Abstract and Introduction] The abstract and introduction could more explicitly distinguish between the prior proofs and the new implementation steps to clarify what is novel versus reused.
  2. [Figures and pseudocode] Figure captions or pseudocode (if present) for the levelled storage composition would benefit from explicit mapping to the formal spec equations to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on verification of the manual implementation. We respond to each major comment below, clarifying the paper's scope while agreeing to strengthen the presentation where appropriate.

read point-by-point responses
  1. Referee: [Implementation section] Implementation section (around the description of manual Java construction from the spec): The paper states that 'the simplicity of the specification makes manual construction straightforward' and that equivalences transfer, but provides no mechanical verification (e.g., refinement proof, model checking, or differential testing) that the composed levelled-storage operations in CobbleDB match the spec on key behaviors. This is load-bearing for the central claim, as any divergence would invalidate the performance-feature composition.

    Authors: The paper's primary contribution is composition of performance features at the specification level, where equivalences have already been formally proven. The Java implementation is presented as a direct manual derivation enabled by spec simplicity, with the implementation section describing the mapping of abstract operations to concrete Java methods. We agree that explicit mechanical verification (such as refinement proofs or differential testing) is absent and would further strengthen the demonstration. We will revise the implementation section to include additional concrete examples of operation mappings (e.g., for get/put and compaction) and a brief discussion of the manual derivation process and its limitations. revision: partial

  2. Referee: [Composition and demonstration section] Section on leveraging spec-guaranteed equivalence for composition: The approach relies on prior separate proofs applying directly to the composed features in the Java code, but without evidence that the manual translation preserves the equivalence relations for the specific levelled-storage operations (e.g., compaction or read paths), the practical value demonstration rests on an unverified assumption.

    Authors: Equivalences are established between specifications; the Java code composes the corresponding implementations following the same structure as the composed specs. For levelled-storage operations such as compaction and read paths, the code implements the behaviors defined by the composed specifications. We will revise the composition section to add explicit examples illustrating how the direct mapping from spec to code preserves the equivalence relations for these operations, making the transfer explicit rather than assumed. revision: partial

Circularity Check

0 steps flagged

No significant circularity; relies on prior separate proofs with manual implementation as explicit assumption

full rationale

The paper's derivation starts from prior specifications and equivalence proofs (cited as previous work), manually translates to Java citing spec simplicity, then composes features using those equivalences and demonstrates with CobbleDB. This chain does not reduce any claim to its inputs by construction, nor does it involve fitted parameters renamed as predictions, self-definitional loops, or a load-bearing self-citation chain that collapses without external verification. The manual translation step is presented as an assumption rather than a derived result, placing any mismatch risk under correctness rather than circularity. Prior proofs are treated as independent support.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the correctness of prior specifications and proofs of equivalence, which are treated as given inputs from previous work.

axioms (1)
  • domain assumption Prior specifications of store variants are correct and their equivalence has been proven.
    The paper explicitly builds on previous work that specified behaviours and proved correctness and equivalence.

pith-pipeline@v0.9.0 · 5372 in / 1109 out tokens · 51132 ms · 2026-05-10T18:52:18.300651+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

  1. [1]

    Tomsic, Manuel Bravo, Zhongmiao Li, Tyler Crain, Annette Bieniusa, Nuno Preguiça, and Marc Shapiro

    Deepthi Devaki Akkoorath, Alejandro Z. Tomsic, Manuel Bravo, Zhongmiao Li, Tyler Crain, Annette Bieniusa, Nuno Preguiça, and Marc Shapiro. 2016. Cure: Strong semantics meets high availability and low latency. In Int. Conf. on Distributed Comp. Sys. (ICDCS) . IEEE Comp. Society, Nara, Japan, 405–414. doi:10.1109/ICDCS.2016.98

  2. [2]

    Andrea Cerone, Giovanni Bernardi, and Alexey Gotsman. 2015. A Framework for Transactional Consistency Models with Atomic Visibility. In Int. Conf. on Concurrency Theory (CONCUR) (Leibniz Int. Proc. in Informatics (LIPIcs), Vol. 42) , Luca Aceto and David de Frutos Escrig (Eds.). Schloss Dagstuhl–Leibniz-Zentrum für Informatik, Madrid, Spain, 58–71. doi:10...

  3. [3]

    CLBG Contributors. [n. d.]. Measured: Which programming language is fastest? Benchmarks Game. https://benchmarksgame-team.pages.debian.net/ benchmarksgame/index.html Consulted 17 January 2026

  4. [4]

    AntidoteDB Contributors. 2026. https://github.com/AntidoteDB/antidote

  5. [5]

    RocksDB Contributors. 2026. Compaction. https://github.com/facebook/rocksdb/ wiki/Leveled-Compaction

  6. [6]

    RocksDB Contributors. 2026. MANIFEST. https://github.com/facebook/rocksdb/ wiki/MANIFEST

  7. [7]

    RocksDB Contributors. 2026. Write Stalls. https://github.com/facebook/rocksdb/ wiki/Write-Stalls

  8. [8]

    Brian Cooper and YCSB Contributors. 2026. brianfrankcooper/YCSB. https: //github.com/brianfrankcooper/YCSB

  9. [9]

    Shabnam Ghasemirad, Si Liu, Christoph Sprenger, Luca Multazzu, and David A. Basin. 2024. VerIso: Verifiable Isolation Guarantees for Database Transactions. Proc. VLDB Endownment 18, 5 (Oct. 2024), 1362–1375. https://www.vldb.org/ pvldb/vol18/p1362-ghasemirad.pdf

  10. [10]

    Saalik Hatia. 2023. Leveraging formal specification to implement a database back- end. Ph. D. Dissertation. Sorbonne Université. https://theses.hal.science/tel- 04291337

  11. [11]

    Haonan Lu, Shuai Mu, Siddhartha Sen, and Wyatt Lloyd. 2023. NCC: Natu- ral Concurrency Control for Strictly Serializable Datastores by Avoiding the Timestamp-Inversion Pitfall. In Symp. on Op. Sys. Design and Implementation (OSDI), Roxana Geambasu and Ed Nightingale (Eds.). Usenix, Boston, MA, USA, 305–323. https://www.usenix.org/conference/osdi23/present...

  12. [12]

    Emilie Ma, Ayush Pandey, Jaurel Fosset, and Saalik Hatia. [n. d.]. CobbleDB source code. https://gitlab.inria.fr/shapiro/CobbleDB. Commit 77ab1b9c

  13. [13]

    Gregory Malecha, Greg Morrisett, Avraham Shinnar, and Ryan Wisnesky. 2010. Toward a Verified Relational Database Management System. InSymp. on Principles of Prog. Lang. (POPL) (POPL ’10). Assoc. for Computing Machinery, Madrid, Spain, 237–248. doi:10.1145/1706299.1706329

  14. [14]

    Nisarg Patel, Siddharth Krishna, Dennis Shasha, and Thomas Wies. 2021. Veri- fying concurrent multicopy search structures. In Conf. on Object-Oriented Prog. Sys., Lang. and Applications (OOPSLA) (Proc. ACM Program. Lang., Vol. 5) . Assoc. for Computing Machinery Special Interest Group on Pg. Lang. (SIGPLAN), Assoc. for Computing Machinery, Chicago, Ill., ...

  15. [15]

    Redis. 2026. Redis. https://github.com/redis/redis

  16. [16]

    RocksDB Contributors. 2026. RocksDB: A Persistent Key-Value Store for Flash and RAM Storage. https://github.com/facebook/rocksdb

  17. [17]

    Edgard Schiebelbein, Saalik Hatia, Annette Bieniusa, Gustavo Petri, Carla Fer- reira, and Marc Shapiro. 2024. Models for Storage in Database Backends. In W. on Principles and Practice of Consistency for Distr. Data (PaPoC) , Assoc. for Computing Machinery (Ed.). Euro. Conf. on Comp. Sys. (EuroSys), Assoc. for Computing Machinery, Athens, Greece, 58–66. do...

  18. [18]

    Marc Shapiro, Nuno Preguiça, Carlos Baquero, and Marek Zawirski. 2011. Conflict-free Replicated Data Types. In Int. Symp. on Stabilization, Safety, and Security of Dist. Sys. (SSS) (Lecture Notes in Comp. Sc. (LNCS), Vol. 6976) , Xavier Défago, Franck Petit, and V. Villain (Eds.). Springer-Verlag, Grenoble, France, 386–400. doi:10.1007/978-3-642-24550-3_29