Dynamic read & write optimization with TurtleKV

Darby Huye; Johes Bater; Lan Liu; Raja R. Sambasivan; Tony Astolfi; Vidya Silai

arxiv: 2509.10714 · v3 · submitted 2025-09-12 · 💻 cs.DB

Dynamic read & write optimization with TurtleKV

Tony Astolfi , Vidya Silai , Darby Huye , Lan Liu , Raja R. Sambasivan , Johes Bater This is my paper

Pith reviewed 2026-05-18 16:53 UTC · model grok-4.3

classification 💻 cs.DB

keywords turtlekvperformancetrade-offdynamicfasterhighworkloadsdata

0 comments

The pith

TurtleKV achieves balanced high performance in dynamic key-value workloads using a new TurtleTree structure and adjustable memory knobs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to show that current key-value store designs force difficult trade-offs between read and write performance, especially when workloads change over time. By introducing the TurtleTree as a balanced on-disk structure and separate tuning knobs for read and write memory use, TurtleKV aims to deliver strong performance without frequent expensive data restructuring. This matters because many applications rely on these stores and face varying access patterns that current systems handle poorly. The prototype avoids common in-memory bottlenecks, allowing effective operation across a wide range of settings.

Core claim

The central discovery is that a read-/write-balanced on-disk structure called the TurtleTree, paired with flexible read-memory and write-memory tuning knobs, supports high performance for mixed and dynamic workloads in a key-value store. The design enables matching state-of-the-art performance on inserts while providing speedups in mixed, point-query, and scan operations compared to existing systems.

What carries the argument

The TurtleTree, a novel read-/write-balanced on-disk structure that supports the dynamic optimization through separate read-memory and write-memory tuning knobs.

Load-bearing premise

The YCSB workload mixes and tuning settings represent real dynamic scenarios without introducing unaccounted bottlenecks or correctness problems in the TurtleTree design.

What would settle it

Running the same YCSB benchmarks on different hardware or with workload mixes that stress memory differently would show if the reported speedups hold or if hidden limitations appear.

Figures

Figures reproduced from arXiv: 2509.10714 by Darby Huye, Johes Bater, Lan Liu, Raja R. Sambasivan, Tony Astolfi, Vidya Silai.

**Figure 2.** Figure 2: B𝜖 -tree → B 𝜖+ -tree structural transformation allowing all but the bottom layer of the tree to be cached in main memory so that queries can be answered with a single read. State-of-the-art approaches to reducing the write cost of updates in B-trees employ the use of a write-ahead log (WAL) to ensure durability of updates, plus a scheme for applying updates to the tree in memory. We refer to these as Log … view at source ↗

**Figure 3.** Figure 3: The TurtleTree data structure. easily cached; likewise, queries benefit from reduced tree height as a result of a higher branching factor, since the portion of a node devoted to pivots can be larger without sacrificing buffer capacity. In terms of write efficiency, being able to buffer more data per node allows updates to be moved down the tree in larger batches, allowing more work to be achieved per wri… view at source ↗

**Figure 4.** Figure 4: Details of the internal structure of TurtleKV’s [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 6.** Figure 6: The effect of checkpoint distance (𝜒) on write amplification. (Pages shown are from the example in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

**Figure 7.** Figure 7: shows TurtleKVs high-level architecture. It is similar to systems based on log-structured B-trees and on LSM-trees ( [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: YCSB Results (higher is better): TurtleKV matches or exceeds performance of other engines on small records, and is [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗

**Figure 10.** Figure 10: TurtleKV’s balanced compaction policy allows it [PITH_FULL_IMAGE:figures/full_fig_p011_10.png] view at source ↗

**Figure 9.** Figure 9: Write & Space Amplification (lower is better): [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗

**Figure 11.** Figure 11: Checkpoint distance (𝜒) tuning reduces writes and put latencies (a, e), increases upsert throughput, and increases memory usage (b, f). Adjusting 𝜒 at runtime is transparent (c, g) and cheap, costing only a few seconds to flush the current checkpoint (d, h). compaction policy from lazy at lower levels to greedy at higher ones; Wacky [23], which introduces adaptive level sizing and finergrained control ov… view at source ↗

read the original abstract

High read and write performance is important for generic key-value stores, which are foundational to modern applications and databases. Yet, achieving high performance for mixed and dynamic workloads is challenging due to fundamental trade-offs between memory use and I/O for retrieval and updates. Past work emphasizes the trade-off between read- and write-optimization as expressed through primary data structure, in combination with read-memory trade-off mechanisms like caching and filtering. This raises re-tuning costs as optimal trade-off targets change, due to restructuring of stored data. We show that write-memory trade-off mechanisms are under-developed in current designs, and propose a new approach to dynamic key-value store optimization using a novel read-/write-balanced on-disk structure, the TurtleTree, and flexible read-memory & write-memory tuning knobs. We describe how the design of TurtleKV, our prototype, avoids in-memory bottlenecks to achieve high performance across a wide range of tuning parameters. When evaluated using YCSB, TurtleKV matches state-of-the-art SplinterDB for inserts, and is 5x/12x faster than RockDB/WiredTiger. In mixed workloads, TurtleKV is 16-25% faster than SplinterDB, >4x RocksDB, and 3-6x WiredTiger. TurtleKV is 2-9x faster than the others for point-query workloads, and has the best scan performance of the write-optimized systems tested.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TurtleKV's TurtleTree plus separate read/write memory knobs improve fixed-workload numbers over SplinterDB and others, but the evaluations skip the dynamic shifts that are the main selling point.

read the letter

TurtleKV introduces a TurtleTree on-disk structure and explicit knobs for read-memory and write-memory to target under-developed write-side trade-offs in key-value stores. The idea is to cut re-tuning costs when workloads move between read-heavy and write-heavy without restructuring data each time. On YCSB the prototype matches SplinterDB on inserts and beats RocksDB and WiredTiger by large factors, with 16-25% gains in mixed workloads and strong scan and point-query results among the write-optimized systems tested. That is concrete and worth noting for anyone who maintains these stores in practice. The design also claims to sidestep in-memory bottlenecks across a wide range of knob settings, which is a useful engineering detail if it holds up in the full implementation. The main gap is that every reported run uses a fixed workload mix. The paper never shows a single experiment where the read/write ratio or access pattern changes during the run and then measures whether the knobs or the TurtleTree structure adapt without extra overhead or correctness problems. That is exactly the scenario the dynamic-optimization claim is built around, so the current numbers leave the central promise untested. Experimental controls, error bars, and how the tuning parameters were chosen are also light on detail, which makes it harder to judge how sensitive the gains are to workload specifics. This work is aimed at systems researchers and engineers who tune or extend key-value stores for mixed or shifting application loads. A reader who cares about practical KV-store internals would get value from the mechanism and the head-to-head numbers. It is solid enough on the fixed-workload side and addresses a real gap in the literature to deserve a serious referee, though the dynamic tests would need to be added or clarified before it could be considered fully convincing.

Referee Report

1 major / 2 minor

Summary. The paper introduces TurtleKV, a key-value store that uses a novel TurtleTree on-disk structure together with flexible read-memory and write-memory tuning knobs to support dynamic read/write optimization. The central claim is that this design avoids re-tuning costs and hidden bottlenecks associated with data restructuring in prior systems, delivering high performance across workloads as shown by YCSB benchmarks where TurtleKV matches SplinterDB on inserts, outperforms RocksDB and WiredTiger by large factors, and leads in mixed, point-query, and scan workloads.

Significance. If the empirical results hold under dynamic conditions, the work addresses an under-explored aspect of write-memory trade-offs in KV stores and could enable more adaptive systems for changing workloads. The reported throughput advantages constitute a concrete contribution, though the absence of tests with mid-experiment workload shifts limits verification of the dynamic-adaptation aspect.

major comments (1)

[Evaluation] Evaluation section: The YCSB results cover separate fixed workloads (insert-only, mixed, point queries, scans) but present no experiments in which read/write ratios or access patterns change during a run. This directly undercuts the central claim that the TurtleTree and memory knobs enable dynamic optimization without re-tuning costs or hidden bottlenecks, as any restructuring overhead or correctness issues would only surface under such transitions.

minor comments (2)

[Abstract] Abstract: 'RockDB' is likely a typographical error for 'RocksDB'.
[Abstract] The abstract and results summary provide limited detail on experimental controls, error bars, data-exclusion rules, or how the read/write-memory tuning parameters were selected for each workload.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and constructive feedback. We address the major comment on the evaluation below.

read point-by-point responses

Referee: The YCSB results cover separate fixed workloads (insert-only, mixed, point queries, scans) but present no experiments in which read/write ratios or access patterns change during a run. This directly undercuts the central claim that the TurtleTree and memory knobs enable dynamic optimization without re-tuning costs or hidden bottlenecks, as any restructuring overhead or correctness issues would only surface under such transitions.

Authors: We agree that experiments with mid-run workload transitions would provide additional support for the dynamic-optimization claim. The central contribution of TurtleKV is that the TurtleTree is a single balanced on-disk structure that does not require data restructuring when read/write optimization targets change; the read-memory and write-memory knobs simply adjust allocation ratios at runtime. Because no stored data is reorganized, the only cost of a tuning change is the (low) overhead of the knob adjustment itself. The YCSB results demonstrate that high performance is achievable across a wide range of fixed tuning points without hidden bottlenecks, which is the necessary precondition for low-cost dynamic adaptation. We will revise the manuscript to make this design distinction and its implications for dynamic use more explicit. If space and time permit, we are also willing to add a modest dynamic-workload experiment in the revision. revision: partial

Circularity Check

0 steps flagged

No derivation chain; empirical YCSB results are self-contained

full rationale

The paper describes a new on-disk structure (TurtleTree) and memory knobs for dynamic read/write optimization in TurtleKV, then reports direct throughput measurements from YCSB workloads (insert-only, mixed, point queries, scans). No equations, fitted parameters, or predictions appear in the abstract or described claims; performance numbers are presented as observed results rather than quantities derived from inputs by construction. Self-citations, if present, are not load-bearing for any central result. The evaluation is therefore independent of the patterns that would produce circularity scores above 2.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The design rests on the domain assumption of inherent memory-I/O trade-offs and introduces the TurtleTree as a new structure plus tunable knobs as the main engineering contributions.

free parameters (1)

read-memory and write-memory tuning knobs
Flexible parameters that control memory allocation for read and write helpers; their specific values are chosen per workload.

axioms (1)

domain assumption Fundamental trade-offs between memory use and I/O for retrieval and updates exist and must be managed in key-value store design.
Invoked in the opening paragraphs to motivate the need for a new balanced structure.

invented entities (1)

TurtleTree no independent evidence
purpose: A novel read-/write-balanced on-disk structure intended to reduce restructuring costs when workloads change.
New data structure introduced by the paper; no independent evidence outside the prototype evaluation is provided.

pith-pipeline@v0.9.0 · 5795 in / 1534 out tokens · 53364 ms · 2026-05-18T16:53:06.603299+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

TurtleTrees are a fusion of B+-trees and LSM-trees... checkpoint distance (χ) controls the portion of TurtleKV’s total memory devoted to improving write performance
IndisputableMonolith/Foundation/DimensionForcing.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We identify TurtleTrees as belonging to a previously unidentified category of data structures, which we name Bε+-trees

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages · 1 internal anchor

[1]

Alok Aggarwal and Jeffrey Vitter, S. 1988. The Input/Output Complexity of Sorting and Related Problems.Commun. ACM31, 9 (Sept. 1988), 1116–1127. https://doi.org/10.1145/48529.48535

work page doi:10.1145/48529.48535 1988
[2]

Apache. 2025. Apache Cassandra. https://cassandra.apache.org/_/index.html. Retrieved August, 2025

work page 2025
[3]

Tony Astolfi, Vidya Silai, and Bhaskar Bora. 2024. Mathworks/Llfs. MathWorks

work page 2024
[4]

Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Meth- ods: The RUM Conjecture.EDBT(2016)

work page 2016
[5]

Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E

Michael A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan. 2019. Small Refinements to the DAM Can Have Big Consequences for Data-Structure Design. InThe 31st ACM Symposium on Parallelism in Algorithms and Archit...

work page doi:10.1145/3323165.3323210 2019
[6]

Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara Mcallister, Nirjhar Mukherjee, Prashant Pandey, Donald E

Michael A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara Mcallister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan. 2021. External-Memory Dictionaries in the Affine and PDAM Models.ACM Transactions on Parallel Computing8, 3 (Sept. 2021), 1–20. https://doi.org/10.1145/3470635

work page doi:10.1145/3470635 2021
[7]

Bender, Martin Farach-Colton, Jeremy T

Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nelson. 2007. Cache-Oblivious Streaming B-trees. InProceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, San Diego California USA, 81–92. https: //doi.org/10.1145/1248377.1248393

work page doi:10.1145/1248377.1248393 2007
[8]

Michael A Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Bradley C Kuszmaul, Donald E Porter, Jun Yuan, and Yang Zhan. 2015. An Introduction to B𝜀-trees and Write-Optimization.login; magazine40, 5 (Sept. 2015)

work page 2015
[9]

Burton H. Bloom. 1970. Space/Time Trade-Offs in Hash Coding with Allowable Errors.Commun. ACM13, 7 (July 1970), 422–426. https://doi.org/10.1145/362686. 362692

work page doi:10.1145/362686 1970
[10]

Gerth Stølting Brodal and Rolf Fagerberg. 2003. Lower Bounds for External Memory Dictionaries.ACM-SIAM Symposium on Discrete Algorithms (SODA)3 (2003)

work page 2003
[11]

Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. InProceedings of the 2018 International Conference on Management of Data. ACM, Houston TX USA, 275–290. https://doi.org/10.1145/ 3183713.3196898

work page arXiv 2018
[12]

Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: An Embedded Concurrent Key-Value Store for State Management.Proceedings of the VLDB Endowment11, 12 (Aug. 2018), 1930–1933. https://doi.org/10.14778/3229863.3236227

work page doi:10.14778/3229863.3236227 2018
[13]

Subarna Chatterjee, Meena Jagadeesan, Wilson Qin, and Stratos Idreos. 2021. Cosine: A Cloud-Cost Optimized Self-Designing Key-Value Storage Engine.Pro- ceedings of the VLDB Endowment15, 1 (Sept. 2021), 112–126. https://doi.org/10. 14778/3485450.3485461

work page arXiv 2021
[14]

Douglas Comer. 1979. Ubiquitous B-Tree.Comput. Surveys11, 2 (June 1979), 121–137. https://doi.org/10.1145/356770.356776

work page doi:10.1145/356770.356776 1979
[15]

Alexander Conway, Vijay Chidambaram, Martin Farach-Colton, and Rob Johnson

work page
[16]

SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores.2020 USENIX Annual Technical Conference (USENIX ATC 20)(2020)

work page 2020
[17]

Alex Conway, Martín Farach-Colton, and Rob Johnson. 2023. SplinterDB and Maplets: Improving the Tradeoffs in Key-Value Store Compaction Pol- icy.Proceedings of the ACM on Management of Data1, 1 (May 2023), 1–27. https://doi.org/10.1145/3588726

work page doi:10.1145/3588726 2023
[18]

Alex Conway, Martin Farach-Colton, and Philip Shilane. 2018. Optimal Hashing in External Memory. https://doi.org/10.48550/arXiv.1805.09423 arXiv:1805.09423 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.09423 2018
[19]

Benchmarking cloud serving systems with ycsb,

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. InProceedings of the 1st ACM Symposium on Cloud Computing(Indianapolis, Indiana, USA) (SoCC ’10). Association for Computing Machinery, New York, NY, USA, 143–154. https://doi.org/10.1145/1807128.1807152

work page doi:10.1145/1807128.1807152 2010
[20]

Databricks. [n.d.]. Extract Transform Load (ETL). https://www.databricks.com/ discover/etl. Retrieved August, 2025

work page 2025
[21]

Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. InProceedings of the 2017 ACM International Con- ference on Management of Data. ACM, Chicago Illinois USA, 79–94. https: //doi.org/10.1145/3035918.3064054

work page doi:10.1145/3035918.3064054 2017
[22]

Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees.ACM Transactions on Database Systems43, 4 (Dec. 2018), 1–48. https://doi.org/10.1145/3276980

work page doi:10.1145/3276980 2018
[23]

Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. InProceedings of the 2018 International Conference on Management of Data. ACM, Houston TX USA, 505–520. https://doi.org/10.1145/3183713.3196927

work page doi:10.1145/3183713.3196927 2018
[24]

Niv Dayan and Stratos Idreos. 2019. The Log-Structured Merge-Bush & the Wacky Continuum. InProceedings of the 2019 International Conference on Management of Data. ACM, Amsterdam Netherlands, 449–466. https://doi.org/10.1145/3299869. 3319903

work page doi:10.1145/3299869 2019
[25]

Google. [n.d.]. LevelDB. https://github.com/google/leveldb. Retrieved August, 2025

work page 2025
[26]

Xiangpeng Hao and Badrish Chandramouli. 2024. Bf-Tree: A Modern Read- Write-Optimized Concurrent Larger-Than-Memory Range Index.Proc. VLDB Endow.17, 11 (July 2024), 3442–3455. https://doi.org/10.14778/3681954.3682012

work page doi:10.14778/3681954.3682012 2024
[27]

TGL: a general framework for temporal GNN training on billion-scale graphs

Andy Huynh, Harshal A. Chaudhari, Evimaria Terzi, and Manos Athanas- soulis. 2022. Endure: a robust tuning paradigm for LSM trees under work- load uncertainty.Proc. VLDB Endow.15, 8 (April 2022), 1605–1618. https: //doi.org/10.14778/3529337.3529345

work page doi:10.14778/3529337.3529345 2022
[28]

Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores That Know and Learn.Biennial Conference on Innovative Data Systems Research (CIDR) (2019)

work page 2019
[29]

Song Jiang, Feng Chen, and Xiaodong Zhang. 2005. CLOCK-Pro: An Effective Improvement of the CLOCK Replacement. (2005)

work page 2005
[30]

Bo-kyeong Kim and Dong-Ho Lee. 2015. LSB-Tree: A Log-Structured B-Tree Index Structure for NAND Flash SSDs.Design Automation for Embedded Systems 19, 1-2 (March 2015), 77–100. https://doi.org/10.1007/s10617-014-9139-4

work page doi:10.1007/s10617-014-9139-4 2015
[31]

Leis, Alfons Kemper, and T

V. Leis, Alfons Kemper, and T. Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, Brisbane, QLD, 38–49. https://doi.org/10. 1109/ICDE.2013.6544812

work page arXiv 2013
[32]

Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of Practical Synchronization. InProceedings of the 12th International Workshop on Data Management on New Hardware. ACM, San Francisco California, 1–8. https://doi.org/10.1145/2933349.2933352

work page doi:10.1145/2933349.2933352 2016
[33]

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. InPro- ceedings of the 27th ACM Symposium on Operating Systems Principles. ACM, Huntsville Ontario Canada, 447–461. https://doi.org/10.1145/3341301.3359628

work page doi:10.1145/3341301.3359628 2019
[34]

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2020. KVell+: Snapshot Isolation without Snapshots.14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)(2020)

work page 2020
[35]

Justin Levandoski and Sudipta Sengupta. [n.d.]. The BW-Tree: A Latch-Free B-Tree for Log-Structured Flash Storage. ([n. d.])

work page
[36]

Levandoski, David B

Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In2013 IEEE 29th International Conference on Data Engineering (ICDE). 302–313. https://doi.org/10.1109/ICDE.2013.6544834

work page doi:10.1109/icde.2013.6544834 2013
[37]

J. J. Levandoski, D. B. Lomet, and S. Sengupta. 2013. The Bw-Tree: A B-tree for New Hardware Platforms. In2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, Brisbane, QLD, 302–313. https://doi.org/10.1109/ICDE. 2013.6544834

work page doi:10.1109/icde 2013
[38]

Meta Platforms, Inc. 2022. RocksDB. https://rocksdb.org. Retrieved July, 2025

work page 2022
[39]

Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The Log-Structured Merge-Tree (LSM-tree).Acta Informatica33, 4 (June 1996), 351–

work page 1996
[40]

https://doi.org/10.1007/s002360050048

work page doi:10.1007/s002360050048
[41]

Patterson

David A. Patterson. 2004. Latency Lags Bandwith.Commun. ACM47, 10 (Oct. 2004), 71–75. https://doi.org/10.1145/1022594.1022596

work page doi:10.1145/1022594.1022596 2004
[42]

Lomet, Xin Liu, Panfeng Zhou, Yongxiang Chen, David Zhang, Jingren Zhou, and Jiesheng Wu

Rui Wang, Xinjun Yang, Feifei Li, David B. Lomet, Xin Liu, Panfeng Zhou, Yongxiang Chen, David Zhang, Jingren Zhou, and Jiesheng Wu. 2024. Bw e -Tree: An Evolution of Bw-tree on Fast Storage. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, Utrecht, Netherlands, 5266–5279. https://doi.org/10.1109/ICDE60146.2024.00396

work page doi:10.1109/icde60146.2024.00396 2024
[43]

Andersen

Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. InProceedings of the 2018 International Con- ference on Management of Data. ACM, Houston TX USA, 473–488. https: //doi.org/10.1145/3183713.3196895

work page doi:10.1145/3183713.3196895 2018
[44]

Ke Yi. 2012. Dynamic Indexability and the Optimality of B-Trees.J. ACM59, 4 (Aug. 2012), 1–19. https://doi.org/10.1145/2339123.2339129

work page doi:10.1145/2339123.2339129 2012
[45]

Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, and Tim Kraska

Geoffrey X. Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, and Tim Kraska. 2022. TreeLine: an update-in-place key-value store for modern storage.Proc. VLDB Endow.16, 1 (Sept. 2022), 99–112. https://doi.org/ 10.14778/3561261.3561270 13

work page doi:10.14778/3561261.3561270 2022

[1] [1]

Alok Aggarwal and Jeffrey Vitter, S. 1988. The Input/Output Complexity of Sorting and Related Problems.Commun. ACM31, 9 (Sept. 1988), 1116–1127. https://doi.org/10.1145/48529.48535

work page doi:10.1145/48529.48535 1988

[2] [2]

Apache. 2025. Apache Cassandra. https://cassandra.apache.org/_/index.html. Retrieved August, 2025

work page 2025

[3] [3]

Tony Astolfi, Vidya Silai, and Bhaskar Bora. 2024. Mathworks/Llfs. MathWorks

work page 2024

[4] [4]

Manos Athanassoulis, Michael S Kester, Lukas M Maas, Radu Stoica, Stratos Idreos, Anastasia Ailamaki, and Mark Callaghan. 2016. Designing Access Meth- ods: The RUM Conjecture.EDBT(2016)

work page 2016

[5] [5]

Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E

Michael A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara McAllister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan. 2019. Small Refinements to the DAM Can Have Big Consequences for Data-Structure Design. InThe 31st ACM Symposium on Parallelism in Algorithms and Archit...

work page doi:10.1145/3323165.3323210 2019

[6] [6]

Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara Mcallister, Nirjhar Mukherjee, Prashant Pandey, Donald E

Michael A. Bender, Alex Conway, Martín Farach-Colton, William Jannen, Yizheng Jiao, Rob Johnson, Eric Knorr, Sara Mcallister, Nirjhar Mukherjee, Prashant Pandey, Donald E. Porter, Jun Yuan, and Yang Zhan. 2021. External-Memory Dictionaries in the Affine and PDAM Models.ACM Transactions on Parallel Computing8, 3 (Sept. 2021), 1–20. https://doi.org/10.1145/3470635

work page doi:10.1145/3470635 2021

[7] [7]

Bender, Martin Farach-Colton, Jeremy T

Michael A. Bender, Martin Farach-Colton, Jeremy T. Fineman, Yonatan R. Fogel, Bradley C. Kuszmaul, and Jelani Nelson. 2007. Cache-Oblivious Streaming B-trees. InProceedings of the Nineteenth Annual ACM Symposium on Parallel Algorithms and Architectures. ACM, San Diego California USA, 81–92. https: //doi.org/10.1145/1248377.1248393

work page doi:10.1145/1248377.1248393 2007

[8] [8]

Michael A Bender, Martin Farach-Colton, William Jannen, Rob Johnson, Bradley C Kuszmaul, Donald E Porter, Jun Yuan, and Yang Zhan. 2015. An Introduction to B𝜀-trees and Write-Optimization.login; magazine40, 5 (Sept. 2015)

work page 2015

[9] [9]

Burton H. Bloom. 1970. Space/Time Trade-Offs in Hash Coding with Allowable Errors.Commun. ACM13, 7 (July 1970), 422–426. https://doi.org/10.1145/362686. 362692

work page doi:10.1145/362686 1970

[10] [10]

Gerth Stølting Brodal and Rolf Fagerberg. 2003. Lower Bounds for External Memory Dictionaries.ACM-SIAM Symposium on Discrete Algorithms (SODA)3 (2003)

work page 2003

[11] [11]

Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. InProceedings of the 2018 International Conference on Management of Data. ACM, Houston TX USA, 275–290. https://doi.org/10.1145/ 3183713.3196898

work page arXiv 2018

[12] [12]

Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: An Embedded Concurrent Key-Value Store for State Management.Proceedings of the VLDB Endowment11, 12 (Aug. 2018), 1930–1933. https://doi.org/10.14778/3229863.3236227

work page doi:10.14778/3229863.3236227 2018

[13] [13]

Subarna Chatterjee, Meena Jagadeesan, Wilson Qin, and Stratos Idreos. 2021. Cosine: A Cloud-Cost Optimized Self-Designing Key-Value Storage Engine.Pro- ceedings of the VLDB Endowment15, 1 (Sept. 2021), 112–126. https://doi.org/10. 14778/3485450.3485461

work page arXiv 2021

[14] [14]

Douglas Comer. 1979. Ubiquitous B-Tree.Comput. Surveys11, 2 (June 1979), 121–137. https://doi.org/10.1145/356770.356776

work page doi:10.1145/356770.356776 1979

[15] [15]

Alexander Conway, Vijay Chidambaram, Martin Farach-Colton, and Rob Johnson

work page

[16] [16]

SplinterDB: Closing the Bandwidth Gap for NVMe Key-Value Stores.2020 USENIX Annual Technical Conference (USENIX ATC 20)(2020)

work page 2020

[17] [17]

Alex Conway, Martín Farach-Colton, and Rob Johnson. 2023. SplinterDB and Maplets: Improving the Tradeoffs in Key-Value Store Compaction Pol- icy.Proceedings of the ACM on Management of Data1, 1 (May 2023), 1–27. https://doi.org/10.1145/3588726

work page doi:10.1145/3588726 2023

[18] [18]

Alex Conway, Martin Farach-Colton, and Philip Shilane. 2018. Optimal Hashing in External Memory. https://doi.org/10.48550/arXiv.1805.09423 arXiv:1805.09423 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1805.09423 2018

[19] [19]

Benchmarking cloud serving systems with ycsb,

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. InProceedings of the 1st ACM Symposium on Cloud Computing(Indianapolis, Indiana, USA) (SoCC ’10). Association for Computing Machinery, New York, NY, USA, 143–154. https://doi.org/10.1145/1807128.1807152

work page doi:10.1145/1807128.1807152 2010

[20] [20]

Databricks. [n.d.]. Extract Transform Load (ETL). https://www.databricks.com/ discover/etl. Retrieved August, 2025

work page 2025

[21] [21]

Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal Navigable Key-Value Store. InProceedings of the 2017 ACM International Con- ference on Management of Data. ACM, Chicago Illinois USA, 79–94. https: //doi.org/10.1145/3035918.3064054

work page doi:10.1145/3035918.3064054 2017

[22] [22]

Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2018. Optimal Bloom Filters and Adaptive Merging for LSM-Trees.ACM Transactions on Database Systems43, 4 (Dec. 2018), 1–48. https://doi.org/10.1145/3276980

work page doi:10.1145/3276980 2018

[23] [23]

Niv Dayan and Stratos Idreos. 2018. Dostoevsky: Better Space-Time Trade-Offs for LSM-Tree Based Key-Value Stores via Adaptive Removal of Superfluous Merging. InProceedings of the 2018 International Conference on Management of Data. ACM, Houston TX USA, 505–520. https://doi.org/10.1145/3183713.3196927

work page doi:10.1145/3183713.3196927 2018

[24] [24]

Niv Dayan and Stratos Idreos. 2019. The Log-Structured Merge-Bush & the Wacky Continuum. InProceedings of the 2019 International Conference on Management of Data. ACM, Amsterdam Netherlands, 449–466. https://doi.org/10.1145/3299869. 3319903

work page doi:10.1145/3299869 2019

[25] [25]

Google. [n.d.]. LevelDB. https://github.com/google/leveldb. Retrieved August, 2025

work page 2025

[26] [26]

Xiangpeng Hao and Badrish Chandramouli. 2024. Bf-Tree: A Modern Read- Write-Optimized Concurrent Larger-Than-Memory Range Index.Proc. VLDB Endow.17, 11 (July 2024), 3442–3455. https://doi.org/10.14778/3681954.3682012

work page doi:10.14778/3681954.3682012 2024

[27] [27]

TGL: a general framework for temporal GNN training on billion-scale graphs

Andy Huynh, Harshal A. Chaudhari, Evimaria Terzi, and Manos Athanas- soulis. 2022. Endure: a robust tuning paradigm for LSM trees under work- load uncertainty.Proc. VLDB Endow.15, 8 (April 2022), 1605–1618. https: //doi.org/10.14778/3529337.3529345

work page doi:10.14778/3529337.3529345 2022

[28] [28]

Stratos Idreos, Niv Dayan, Wilson Qin, Mali Akmanalp, Sophie Hilgard, Andrew Ross, James Lennon, Varun Jain, Harshita Gupta, David Li, and Zichen Zhu. 2019. Design Continuums and the Path Toward Self-Designing Key-Value Stores That Know and Learn.Biennial Conference on Innovative Data Systems Research (CIDR) (2019)

work page 2019

[29] [29]

Song Jiang, Feng Chen, and Xiaodong Zhang. 2005. CLOCK-Pro: An Effective Improvement of the CLOCK Replacement. (2005)

work page 2005

[30] [30]

Bo-kyeong Kim and Dong-Ho Lee. 2015. LSB-Tree: A Log-Structured B-Tree Index Structure for NAND Flash SSDs.Design Automation for Embedded Systems 19, 1-2 (March 2015), 77–100. https://doi.org/10.1007/s10617-014-9139-4

work page doi:10.1007/s10617-014-9139-4 2015

[31] [31]

Leis, Alfons Kemper, and T

V. Leis, Alfons Kemper, and T. Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, Brisbane, QLD, 38–49. https://doi.org/10. 1109/ICDE.2013.6544812

work page arXiv 2013

[32] [32]

Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of Practical Synchronization. InProceedings of the 12th International Workshop on Data Management on New Hardware. ACM, San Francisco California, 1–8. https://doi.org/10.1145/2933349.2933352

work page doi:10.1145/2933349.2933352 2016

[33] [33]

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: The Design and Implementation of a Fast Persistent Key-Value Store. InPro- ceedings of the 27th ACM Symposium on Operating Systems Principles. ACM, Huntsville Ontario Canada, 447–461. https://doi.org/10.1145/3341301.3359628

work page doi:10.1145/3341301.3359628 2019

[34] [34]

Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2020. KVell+: Snapshot Isolation without Snapshots.14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20)(2020)

work page 2020

[35] [35]

Justin Levandoski and Sudipta Sengupta. [n.d.]. The BW-Tree: A Latch-Free B-Tree for Log-Structured Flash Storage. ([n. d.])

work page

[36] [36]

Levandoski, David B

Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In2013 IEEE 29th International Conference on Data Engineering (ICDE). 302–313. https://doi.org/10.1109/ICDE.2013.6544834

work page doi:10.1109/icde.2013.6544834 2013

[37] [37]

J. J. Levandoski, D. B. Lomet, and S. Sengupta. 2013. The Bw-Tree: A B-tree for New Hardware Platforms. In2013 IEEE 29th International Conference on Data Engineering (ICDE). IEEE, Brisbane, QLD, 302–313. https://doi.org/10.1109/ICDE. 2013.6544834

work page doi:10.1109/icde 2013

[38] [38]

Meta Platforms, Inc. 2022. RocksDB. https://rocksdb.org. Retrieved July, 2025

work page 2022

[39] [39]

Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The Log-Structured Merge-Tree (LSM-tree).Acta Informatica33, 4 (June 1996), 351–

work page 1996

[40] [40]

https://doi.org/10.1007/s002360050048

work page doi:10.1007/s002360050048

[41] [41]

Patterson

David A. Patterson. 2004. Latency Lags Bandwith.Commun. ACM47, 10 (Oct. 2004), 71–75. https://doi.org/10.1145/1022594.1022596

work page doi:10.1145/1022594.1022596 2004

[42] [42]

Lomet, Xin Liu, Panfeng Zhou, Yongxiang Chen, David Zhang, Jingren Zhou, and Jiesheng Wu

Rui Wang, Xinjun Yang, Feifei Li, David B. Lomet, Xin Liu, Panfeng Zhou, Yongxiang Chen, David Zhang, Jingren Zhou, and Jiesheng Wu. 2024. Bw e -Tree: An Evolution of Bw-tree on Fast Storage. In2024 IEEE 40th International Conference on Data Engineering (ICDE). IEEE, Utrecht, Netherlands, 5266–5279. https://doi.org/10.1109/ICDE60146.2024.00396

work page doi:10.1109/icde60146.2024.00396 2024

[43] [43]

Andersen

Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. InProceedings of the 2018 International Con- ference on Management of Data. ACM, Houston TX USA, 473–488. https: //doi.org/10.1145/3183713.3196895

work page doi:10.1145/3183713.3196895 2018

[44] [44]

Ke Yi. 2012. Dynamic Indexability and the Optimality of B-Trees.J. ACM59, 4 (Aug. 2012), 1–19. https://doi.org/10.1145/2339123.2339129

work page doi:10.1145/2339123.2339129 2012

[45] [45]

Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, and Tim Kraska

Geoffrey X. Yu, Markos Markakis, Andreas Kipf, Per-Åke Larson, Umar Farooq Minhas, and Tim Kraska. 2022. TreeLine: an update-in-place key-value store for modern storage.Proc. VLDB Endow.16, 1 (Sept. 2022), 99–112. https://doi.org/ 10.14778/3561261.3561270 13

work page doi:10.14778/3561261.3561270 2022