One Ring to Shuffle Them All: Scalable Intra-Process Data Redistribution with Ring-Buffer Shuffle in Redpanda Oxla

Adam Szyma\'nski; Tyler Akidau

arxiv: 2605.29099 · v1 · pith:GLVK4KGMnew · submitted 2026-05-27 · 💻 cs.DB

One Ring to Shuffle Them All: Scalable Intra-Process Data Redistribution with Ring-Buffer Shuffle in Redpanda Oxla

Adam Szyma\'nski , Tyler Akidau This is my paper

Pith reviewed 2026-06-29 08:59 UTC · model grok-4.3

classification 💻 cs.DB

keywords ring bufferdata redistributionshufflequery engineparallelismlock-freemany-coreintra-process

0 comments

The pith

A ring-buffer shuffle design scales intra-process data redistribution to hundreds of cores with amortized O(1) synchronization per batch.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Existing intra-process shuffle methods either materialize all data with global barriers or incur per-channel synchronization that scales poorly with core count. The paper introduces ring-buffer streaming shuffle that acquires slots lock-free into fixed-size batches for amortized constant-time synchronization and constant memory overhead. This has powered production queries in Redpanda Oxla for two years. Benchmarks on up to 192-core systems demonstrate speedups of over 100% versus channels and 300% versus batching at high core counts, with workload-dependent results on chiplet hardware. If the approach holds, parallel query engines can better exploit modern server CPUs for partitioned operators like joins and aggregations.

Core claim

Ring-buffer streaming shuffle addresses scaling failures in data redistribution by using lock-free atomic slot acquisition into fixed-size batch groups, achieving amortized O(1) synchronization cost per batch and O(M) memory independent of input size, with measured gains on 72-core and 192-core systems.

What carries the argument

Ring-buffer streaming shuffle design with lock-free atomic slot acquisition into fixed-size batch groups.

If this is right

On 72-core single-socket systems, ring buffer outperforms channel streaming by up to 44% and batch partitioning by up to 79%.
At 192 cores, the advantage grows to over 100% over channel streaming and over 300% versus batch partitioning.
The design uses O(M) memory independent of input size.
It has been implemented and used in production in Redpanda's Oxla query engine for two years.
On chiplet architectures, the shared atomic counter can become a bottleneck, making channel streaming competitive.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may extend to other many-core systems where synchronization overhead limits parallel query performance.
Adaptive selection between ring-buffer and channel methods based on detected architecture could optimize across workloads.
Improvements in chip interconnects could further enhance the ring-buffer's advantages on future hardware.

Load-bearing premise

The shared atomic counter for slot acquisition maintains amortized O(1) cost without becoming a cross-die bottleneck on target architectures.

What would settle it

Performance measurements on a high-core-count non-chiplet system where the ring buffer does not show the reported speedups, or direct profiling showing the atomic counter as a dominant cost on chiplet designs.

Figures

Figures reproduced from arXiv: 2605.29099 by Adam Szyma\'nski, Tyler Akidau.

**Figure 2.** Figure 2: Channel-based streaming with 𝑀=3 producers and 𝑁=2 consumers. Each producer routes rows by ℎ and pushes to the corresponding channel. Consumers pull from their dedicated channel. Synchronization occurs on every push and pull. enabling pipelining with downstream operators. The disadvantage is synchronization cost. Each push and pull operation requires coordination—typically via futex, mutex, or atomic com… view at source ↗

**Figure 3.** Figure 3: Ring-buffer streaming. Producers acquire slots in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Ring-buffer pseudocode. Left: per-batch producer [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Throughput scaling with thread count on a 192-core [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Throughput vs. batch size at 192 cores (Graviton4) [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Ring-buffer throughput vs. batch size at full core [PITH_FULL_IMAGE:figures/full_fig_p009_7.png] view at source ↗

**Figure 8.** Figure 8: Per-query log2 (𝑡channel/𝑡ring) on Graviton4 (192 cores), sorted. Positive bars: ring-buffer faster. TPC-H bars stay within ±0.55 on the log axis (max 1.47× either way) and the bulk are positive. ClickBench is bimodal: a long tail of small ring-buffer wins and ties on the right, plus four queries on the left where the ring-buffer loses by 2–5×. GROUP BY on a moderate-width key with a counter aggregate. Whe… view at source ↗

read the original abstract

As server CPUs scale to dozens and now hundreds of cores per socket, parallel query engines must rethink how they redistribute data between threads. Partitioned operators such as hash joins and aggregations require frequent data redistribution across threads, yet existing intra-process shuffle designs fundamentally fail to scale with core count: batch partitioning avoids cross-thread synchronization in the hot path but materializes all intermediate data, introduces a global producer/consumer barrier, and requires a consumption approach with low cache locality, while channel-based streaming avoids materialization but incurs per-channel synchronization that scales poorly with core count. As core counts rise, these architectural tradeoffs increasingly prevent engines from fully utilizing modern hardware. We present a ring-buffer streaming shuffle design that addresses these shortcomings through lock-free atomic slot acquisition into fixed-size batch groups, achieving amortized O(1) synchronization cost per batch and O(M) memory independent of input size. Ring-buffer shuffle has been implemented in Redpanda's Oxla query engine for two years, where it currently powers production queries for Redpanda SQL users. We evaluate all three approaches on a 72-core NVIDIA GraceHopper, a 192-core dual-socket AWS Graviton4, and a 96-core (192-thread) AMD EPYC. On a 72-core single-socket system the ring buffer outperforms channel streaming by up to 44% and batch partitioning by up to 79%; at 192 cores the advantage over channel grows to over 100% and over 300% versus batch partitioning. Even so, on chiplet architectures with many partitioned L3 caches, the shared atomic counter becomes a cross-die bottleneck and channel-based streaming remains competitive. End-to-end Graviton4 evaluation on TPC-H (21 queries) and ClickBench (43 queries) shows the advantage is workload-shape-dependent.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Ring-buffer shuffle gives real speedups on single-socket high-core CPUs with honest notes on chiplet limits.

read the letter

The main thing to know is that this ring-buffer design for intra-process shuffle beats channel streaming and batch partitioning on single-socket high-core machines, with gains up to 44% and 79% on 72 cores, while the paper itself flags that the shared atomic counter becomes a cross-die bottleneck on chiplets and channel streaming stays competitive there.

What is new is the concrete lock-free atomic slot acquisition into fixed-size batches inside a ring buffer, which they claim delivers amortized O(1) sync per batch and O(M) memory. It has been running in production inside Redpanda Oxla for two years, which counts as actual deployment evidence rather than just a sketch. The paper compares it directly to the two standard approaches and runs the numbers on GraceHopper, Graviton4, and EPYC, plus end-to-end TPC-H and ClickBench results that show the advantage depends on workload shape.

The work is solid on the engineering side: they acknowledge the exact limitation the stress-test note raises, so that concern is already surfaced in the abstract. The production use and multi-platform benchmarks are the strongest parts.

Soft spots are mostly about missing detail. The abstract states the performance claims without benchmark methodology, error bars, or variance numbers, so the 100%+ and 300% gains at 192 cores are hard to evaluate on their own. If the full paper supplies the implementation and raw data, that gap closes; otherwise the evidence remains stated rather than fully inspectable. No load-bearing fitting or circular claims appear.

This is for database systems people who need to move data between threads on 64-plus core sockets. A practitioner scaling parallel operators would find the design and the platform-specific results useful. It deserves a serious referee because it is a deployed solution to a practical scaling limit with direct comparisons and self-reported caveats.

Referee Report

1 major / 1 minor

Summary. The paper claims that existing intra-process shuffle methods (batch partitioning and channel streaming) fail to scale with rising core counts in parallel query engines, and introduces a ring-buffer shuffle design using lock-free atomic slot acquisition into fixed-size batches. This achieves amortized O(1) synchronization cost per batch and O(M) memory independent of input size. The design has been implemented and used in production for two years in Redpanda's Oxla engine. Evaluations on 72-core GraceHopper, 192-core Graviton4, and 96-core EPYC systems report speedups of up to 44%/79% on 72 cores and >100%/>300% at 192 cores versus the baselines, with end-to-end TPC-H (21 queries) and ClickBench (43 queries) results on Graviton4 showing workload-shape dependence; the abstract notes that the shared atomic counter becomes a cross-die bottleneck on chiplet architectures.

Significance. If the central claims hold after addressing the noted limitations, this represents a practical engineering contribution to scaling intra-process data movement in many-core database systems. Strengths include the production deployment, the explicit O(1) sync and O(M) memory bounds, and the multi-architecture evaluation with real workloads. The architecture-specific caveat on chiplets is a positive sign of realism, though it tempers the universality of the scalability claims.

major comments (1)

[Abstract] Abstract: The performance claims for the 192-core Graviton4 (>100% over channel, >300% over batch) and 96-core EPYC systems rest on the amortized O(1) cost of the shared atomic counter for slot acquisition. However, the abstract states that this counter becomes a cross-die bottleneck on chiplet architectures with partitioned L3 caches (directly applicable to Graviton4 and EPYC), making channel streaming competitive. This tension is load-bearing for the scalability claims and requires explicit resolution in the evaluation, such as per-architecture breakdowns or conditions under which the O(1) bound holds.

minor comments (1)

[Abstract] Abstract: The reported speedups lack any mention of error bars, number of repetitions, or benchmark methodology details (e.g., input cardinalities, data distributions, or thread pinning), which should be provided in the evaluation section to support reproducibility and assessment of the results.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for identifying the tension between the reported speedups on 192-core Graviton4 and 96-core EPYC systems and the abstract's own caveat about the shared atomic counter becoming a cross-die bottleneck on chiplet architectures. We agree this requires explicit clarification to avoid overstatement of universality. We will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract] Abstract: The performance claims for the 192-core Graviton4 (>100% over channel, >300% over batch) and 96-core EPYC systems rest on the amortized O(1) cost of the shared atomic counter for slot acquisition. However, the abstract states that this counter becomes a cross-die bottleneck on chiplet architectures with partitioned L3 caches (directly applicable to Graviton4 and EPYC), making channel streaming competitive. This tension is load-bearing for the scalability claims and requires explicit resolution in the evaluation, such as per-architecture breakdowns or conditions under which the O(1) bound holds.

Authors: We acknowledge the valid point. The abstract already notes the chiplet caveat, but the presentation of the >100%/>300% figures for the 192-core Graviton4 (a dual-socket chiplet design) and 96-core EPYC without accompanying per-architecture qualification creates the tension identified. In revision we will (1) add a dedicated subsection in the evaluation that reports per-architecture microbenchmark results with explicit call-outs for when the ring-buffer's O(1) amortized bound holds versus when cross-die traffic makes channel streaming competitive, and (2) qualify the abstract numbers with a parenthetical reference to the conditions (single-socket vs. multi-socket chiplet) under which each speedup was measured. This directly resolves the load-bearing issue for the scalability claims. revision: yes

Circularity Check

0 steps flagged

No circularity: engineering design with external benchmarks and acknowledged limitations

full rationale

The paper presents a ring-buffer shuffle design and reports empirical speedups on specific hardware. No derivation chain, fitted parameters, predictions, or first-principles results exist that could reduce to inputs by construction. The O(1) amortized synchronization claim is an engineering assertion evaluated via benchmarks; the abstract explicitly flags its failure mode on chiplet architectures rather than smuggling it in via self-citation or definition. No self-citation is load-bearing for the central claim, and no ansatz or renaming of known results occurs. This is a standard non-circular engineering contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or background assumptions; insufficient information to populate free_parameters, axioms, or invented_entities.

pith-pipeline@v0.9.1-grok · 5874 in / 1043 out tokens · 28629 ms · 2026-06-29T08:59:54.078595+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 16 canonical work pages

[1]

Tamer Özsu

Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu. 2013. Main- Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. 12 InProceedings of the 29th IEEE International Conference on Data Engineering. 362–373. https://doi.org/10.1109/ICDE.2013.6544839

work page doi:10.1109/icde.2013.6544839 2013
[2]

Maximilian Bandle and Jana Giceva. 2021. Database Technology for the Masses: Sub-Operators as First-Class Entities.Proceedings of the VLDB Endowment14, 11 (2021), 2483–2490. https://doi.org/10.14778/3476249.3476296

work page doi:10.14778/3476249.3476296 2021
[3]

Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. InProceedings of the 2021 International Conference on Management of Data. 168–180. https: //doi.org/10.1145/3448016.3452831

work page doi:10.1145/3448016.3452831 2021
[4]

Alexander Baumstark and Constantin Pohl. 2019. Lock-free Data Structures for Data Stream Processing—A Closer Look.Datenbank-Spektrum19, 3 (2019), 209–218. https://doi.org/10.1007/s13222-019-00329-4

work page doi:10.1007/s13222-019-00329-4 2019
[5]

Alessandro Fogli, Bo Zhao, Peter Pietzuch, and Jana Giceva. 2025. ARCAS: Adaptive Runtime System for Chiplet-Aware Scheduling. (2025). https://doi. org/10.48550/arXiv.2503.11460 arXiv:2503.11460 [cs.DC]

work page doi:10.48550/arxiv.2503.11460 2025
[6]

Goetz Graefe. 1990. Encapsulation of Parallelism in the Volcano Query Processing System. InProceedings of the 1990 ACM SIGMOD International Conference on Management of Data. 102–111. https://doi.org/10.1145/93597.98720

work page doi:10.1145/93597.98720 1990
[7]

Goetz Graefe. 1994. Volcano—An Extensible and Parallel Query Evaluation System.IEEE Transactions on Knowledge and Data Engineering6, 1 (1994), 120–

1994
[8]

https://doi.org/10.1109/69.273032

work page doi:10.1109/69.273032
[9]

Laurens Kuiper, Mark Raasveldt, Hannes Mühleisen, and Peter Boncz. 2025. Saving Private Hash Join.Proceedings of the VLDB Endowment18, 10 (2025), 2748–2761. https://doi.org/10.14778/3742728.3742762

work page doi:10.14778/3742728.3742762 2025
[10]

Andrew Lamb, Yijie Shen, Daniël Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Liang-Chi Hsieh, and Chao Sun. 2024. Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine. InCompanion of the 2024 International Conference on Management of Data. https://doi.org/10.1145/3626246. 3653368

work page doi:10.1145/3626246 2024
[11]

Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel- Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many- Core Age. InProceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 743–754. https://doi.org/10.1145/2588555.2610507

work page doi:10.1145/2588555.2610507 2014
[12]

Pedro Pedreira, Orri Erling, Masha Basmanova, Kevin Wilfong, Laith Sakka, Krishna Pai, Wei He, and Biswapesh Chattopadhyay. 2022. Velox: Meta’s Unified Execution Engine.Proceedings of the VLDB Endowment15, 12 (2022), 3372–3384. https://doi.org/10.14778/3554821.3554829

work page doi:10.14778/3554821.3554829 2022
[13]

Orestis Polychroniou and Kenneth A. Ross. 2014. A Comprehensive Study of Main-Memory Partitioning and Its Application to Large-Scale Comparison- and Radix-Sort. InProceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 755–766. https://doi.org/10.1145/2588555.2610522

work page doi:10.1145/2588555.2610522 2014
[14]

Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. 2016. High-Speed Query Processing over High-Speed Networks.Proceedings of the VLDB Endowment9, 4 (2016), 228–239. https://doi.org/10.14778/2856318.2856319

work page doi:10.14778/2856318.2856319 2016
[15]

Raghav Sethi, Masha Basmanova, Andrii Rosa, et al. 2023. Presto: A Decade of SQL Analytics at Meta. InProceedings of the 2023 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/3589769

work page doi:10.1145/3589769 2023
[16]

StarRocks Contributors. 2025. StarRocks: A High-Performance Analytical Data- base. https://github.com/StarRocks/starrocks A Linux Foundation project

2025
[17]

2018.Parallel Hash

The PostgreSQL Global Development Group. 2018.Parallel Hash. https://wiki. postgresql.org/wiki/Parallel_Hash Introduced in PostgreSQL 11

2018
[18]

Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, and Andrew Stew- art. 2011. Disruptor: High Performance Alternative to Bounded Queues for Exchanging Data Between Concurrent Threads. https://lmax-exchange.github. io/disruptor/disruptor.html

2011
[19]

Daniel Xue and Ryan Marcus. 2025. Global Hash Tables Strike Back! An Analysis of Parallel GROUP BY Aggregation.arXiv preprint arXiv:2505.04153(2025). https://doi.org/10.48550/arXiv.2505.04153

work page doi:10.48550/arxiv.2505.04153 2025
[20]

Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. 2014. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores.Proceedings of the VLDB Endowment8, 3 (2014), 209–

2014
[21]

https://doi.org/10.14778/2735508.2735511 13

work page doi:10.14778/2735508.2735511

[1] [1]

Tamer Özsu

Cagri Balkesen, Jens Teubner, Gustavo Alonso, and M. Tamer Özsu. 2013. Main- Memory Hash Joins on Multi-Core CPUs: Tuning to the Underlying Hardware. 12 InProceedings of the 29th IEEE International Conference on Data Engineering. 362–373. https://doi.org/10.1109/ICDE.2013.6544839

work page doi:10.1109/icde.2013.6544839 2013

[2] [2]

Maximilian Bandle and Jana Giceva. 2021. Database Technology for the Masses: Sub-Operators as First-Class Entities.Proceedings of the VLDB Endowment14, 11 (2021), 2483–2490. https://doi.org/10.14778/3476249.3476296

work page doi:10.14778/3476249.3476296 2021

[3] [3]

Maximilian Bandle, Jana Giceva, and Thomas Neumann. 2021. To Partition, or Not to Partition, That is the Join Question in a Real System. InProceedings of the 2021 International Conference on Management of Data. 168–180. https: //doi.org/10.1145/3448016.3452831

work page doi:10.1145/3448016.3452831 2021

[4] [4]

Alexander Baumstark and Constantin Pohl. 2019. Lock-free Data Structures for Data Stream Processing—A Closer Look.Datenbank-Spektrum19, 3 (2019), 209–218. https://doi.org/10.1007/s13222-019-00329-4

work page doi:10.1007/s13222-019-00329-4 2019

[5] [5]

Alessandro Fogli, Bo Zhao, Peter Pietzuch, and Jana Giceva. 2025. ARCAS: Adaptive Runtime System for Chiplet-Aware Scheduling. (2025). https://doi. org/10.48550/arXiv.2503.11460 arXiv:2503.11460 [cs.DC]

work page doi:10.48550/arxiv.2503.11460 2025

[6] [6]

Goetz Graefe. 1990. Encapsulation of Parallelism in the Volcano Query Processing System. InProceedings of the 1990 ACM SIGMOD International Conference on Management of Data. 102–111. https://doi.org/10.1145/93597.98720

work page doi:10.1145/93597.98720 1990

[7] [7]

Goetz Graefe. 1994. Volcano—An Extensible and Parallel Query Evaluation System.IEEE Transactions on Knowledge and Data Engineering6, 1 (1994), 120–

1994

[8] [8]

https://doi.org/10.1109/69.273032

work page doi:10.1109/69.273032

[9] [9]

Laurens Kuiper, Mark Raasveldt, Hannes Mühleisen, and Peter Boncz. 2025. Saving Private Hash Join.Proceedings of the VLDB Endowment18, 10 (2025), 2748–2761. https://doi.org/10.14778/3742728.3742762

work page doi:10.14778/3742728.3742762 2025

[10] [10]

Andrew Lamb, Yijie Shen, Daniël Heres, Jayjeet Chakraborty, Mehmet Ozan Kabak, Liang-Chi Hsieh, and Chao Sun. 2024. Apache Arrow DataFusion: A Fast, Embeddable, Modular Analytic Query Engine. InCompanion of the 2024 International Conference on Management of Data. https://doi.org/10.1145/3626246. 3653368

work page doi:10.1145/3626246 2024

[11] [11]

Viktor Leis, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2014. Morsel- Driven Parallelism: A NUMA-Aware Query Evaluation Framework for the Many- Core Age. InProceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 743–754. https://doi.org/10.1145/2588555.2610507

work page doi:10.1145/2588555.2610507 2014

[12] [12]

Pedro Pedreira, Orri Erling, Masha Basmanova, Kevin Wilfong, Laith Sakka, Krishna Pai, Wei He, and Biswapesh Chattopadhyay. 2022. Velox: Meta’s Unified Execution Engine.Proceedings of the VLDB Endowment15, 12 (2022), 3372–3384. https://doi.org/10.14778/3554821.3554829

work page doi:10.14778/3554821.3554829 2022

[13] [13]

Orestis Polychroniou and Kenneth A. Ross. 2014. A Comprehensive Study of Main-Memory Partitioning and Its Application to Large-Scale Comparison- and Radix-Sort. InProceedings of the 2014 ACM SIGMOD International Conference on Management of Data. 755–766. https://doi.org/10.1145/2588555.2610522

work page doi:10.1145/2588555.2610522 2014

[14] [14]

Wolf Rödiger, Tobias Mühlbauer, Alfons Kemper, and Thomas Neumann. 2016. High-Speed Query Processing over High-Speed Networks.Proceedings of the VLDB Endowment9, 4 (2016), 228–239. https://doi.org/10.14778/2856318.2856319

work page doi:10.14778/2856318.2856319 2016

[15] [15]

Raghav Sethi, Masha Basmanova, Andrii Rosa, et al. 2023. Presto: A Decade of SQL Analytics at Meta. InProceedings of the 2023 ACM SIGMOD International Conference on Management of Data. https://doi.org/10.1145/3589769

work page doi:10.1145/3589769 2023

[16] [16]

StarRocks Contributors. 2025. StarRocks: A High-Performance Analytical Data- base. https://github.com/StarRocks/starrocks A Linux Foundation project

2025

[17] [17]

2018.Parallel Hash

The PostgreSQL Global Development Group. 2018.Parallel Hash. https://wiki. postgresql.org/wiki/Parallel_Hash Introduced in PostgreSQL 11

2018

[18] [18]

Martin Thompson, Dave Farley, Michael Barker, Patricia Gee, and Andrew Stew- art. 2011. Disruptor: High Performance Alternative to Bounded Queues for Exchanging Data Between Concurrent Threads. https://lmax-exchange.github. io/disruptor/disruptor.html

2011

[19] [19]

Daniel Xue and Ryan Marcus. 2025. Global Hash Tables Strike Back! An Analysis of Parallel GROUP BY Aggregation.arXiv preprint arXiv:2505.04153(2025). https://doi.org/10.48550/arXiv.2505.04153

work page doi:10.48550/arxiv.2505.04153 2025

[20] [20]

Xiangyao Yu, George Bezerra, Andrew Pavlo, Srinivas Devadas, and Michael Stonebraker. 2014. Staring into the Abyss: An Evaluation of Concurrency Control with One Thousand Cores.Proceedings of the VLDB Endowment8, 3 (2014), 209–

2014

[21] [21]

https://doi.org/10.14778/2735508.2735511 13

work page doi:10.14778/2735508.2735511