pith. sign in

arxiv: 2604.14530 · v1 · submitted 2026-04-16 · 💻 cs.DS · cs.DC

Fast Concurrent Primitives Despite Contention

Pith reviewed 2026-05-10 10:37 UTC · model grok-4.3

classification 💻 cs.DS cs.DC
keywords concurrent primitivescontention resolutionread/write registersCAS registersshared memorylatencystochastic scheduleradaptive adversary
0
0 comments X

The pith

Concurrent read/write and CAS registers achieve O(log P) latency with high probability under contention using constant hardware.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs read/write registers and CAS registers that maintain low latency even when many processes access shared memory simultaneously. These objects use only a constant number of underlying hardware registers yet deliver O(log P) latency with high probability. The constructions work against an adaptive adversary under a roughly synchronous scheduler where processes advance at similar rates. They compose to build other objects such as counters and fetch-and-increment primitives. A lower bound shows that logarithmic dependence on the number of processes is required for any bounded-space, bounded-latency solution.

Core claim

We give contention-resolution algorithms for several basic primitives, and analyze them under a relaxed, roughly-synchronous stochastic scheduler, where processes run at roughly the same rate up to a constant factor with high probability. Specifically, we construct read/write registers and CAS registers that have latency O(log P) w.h.p. under our scheduler model, using O(1) hardware read/write registers and, in the case of our CAS construction, one hardware CAS register. Our algorithms guarantee performance even when their operations are invoked by an adaptive adversary that is able to see the entire history of operations so far, including their timing and return values. This allows them to

What carries the argument

Contention-resolution algorithms that convert hardware primitives subject to write contention into versions with graceful contention handling, under the roughly-synchronous stochastic scheduler.

Load-bearing premise

Processes run at roughly the same rate up to a constant factor with high probability, even against an adaptive adversary.

What would settle it

An execution under the roughly-synchronous scheduler in which some operation takes more than O(log P) time with probability that does not go to zero as P grows.

Figures

Figures reproduced from arXiv: 2604.14530 by Guy E. Blelloch, Martin Farach-Colton, Michael A. Bender, Renfei Zhou, Rob Johnson, Rotem Oshman, Yang Hu.

Figure 1
Figure 1. Figure 1: An experiment to justify our model. Running times for [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pseudocode for the write operation of our read/write register, and its representation as a state machine. The comments “S”, “R” and “W” in the pseudocode indicate the beginning of states S, R, W (resp.). In state S, the process reads the register. In state R it loops and re-reads the register, until either detecting a change and returning, or transitioning to state W, where it invokes a storeRand and retur… view at source ↗
Figure 3
Figure 3. Figure 3: A busy interval [t0, t1). An arrow from R to W represents a timestep when the operation applies a load and transitions from state R to state W. Only operations that transition from R to W before time t0 can invoke store instructions inside the busy interval, and there must be at least t1 − t0 such operations. inside the busy interval, at which point it invokes a store instruction that is added to the queue… view at source ↗
read the original abstract

We study the problem of constructing concurrent objects in a setting where $P$ processes run in parallel and interact through a shared memory that is subject to write contention. Our goal is to transform hardware primitives that are subject to write contention into ones that handle contention gracefully. We give contention-resolution algorithms for several basic primitives, and analyze them under a relaxed, roughly-synchronous stochastic scheduler, where processes run at roughly the same rate up to a constant factor with high probability. Specifically, we construct read/write registers and CAS registers that have latency $O(\log P)$ w.h.p. under our scheduler model, using $O(1)$ hardware read/write registers and, in the case of our CAS construction, one hardware CAS register. Our algorithms guarantee performance even when their operations are invoked by an adaptive adversary that is able to see the entire history of operations so far, including their timing and return values. This allows them to be used as building blocks inside larger programs; using this compositionality property, we obtain several other constructions (LL/SC, fetch-and-increment, bounded max registers, and counters). To complement our constructions, we give a trade-off showing that even under a perfectly synchronous schedule and even if each process only executes one operation, any algorithm that implements any of the primitives that we consider, uses space $M$, and has latency at most $L$ with high probability must have expected latency at least $\Omega(\log_{ML} P)$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper constructs read/write registers and CAS registers achieving O(log P) latency w.h.p. under a relaxed roughly-synchronous stochastic scheduler (processes run at rates within a constant factor w.h.p., even against an adaptive adversary seeing full history), using O(1) hardware read/write registers (plus one hardware CAS for the CAS case). These are compositional, yielding LL/SC, fetch-and-increment, bounded max registers, and counters. A complementary lower bound states that any implementation of the considered primitives with space M and latency at most L w.h.p. must have expected latency Omega(log_{M L} P) even under perfectly synchronous schedules where each process executes only one operation.

Significance. If the proofs and analyses hold, the work supplies practical contention-resolution primitives with logarithmic latency and constant space in a scheduler model that is weaker than full synchrony yet stronger than pure asynchrony. The compositionality property and explicit adaptive-adversary guarantee are strengths, as is the lower-bound trade-off that rules out sub-logarithmic latency without increasing space or latency parameters. These results could inform efficient implementations of concurrent objects in shared-memory systems subject to write contention.

major comments (2)
  1. [§3] §3 (Scheduler Model): The O(log P) w.h.p. latency for the constructions is load-bearing on the assumption that the stochastic scheduler forces all processes to execute steps at rates differing by at most a constant factor w.h.p. The analysis of the O(1)-space contention-resolution schemes (e.g., exponential backoff) relies on this to bound collisions per step; if the model permitted larger rate deviations while remaining 'roughly synchronous,' the w.h.p. bound would fail and the constructions would reduce to standard asynchronous primitives with linear latency.
  2. [Lower-bound section] Lower-bound section (abstract and §6): The Omega(log_{M L} P) expected-latency lower bound is proved only for perfectly synchronous schedules with one operation per process. It therefore does not certify tightness of the O(log P) upper bound under the paper's stochastic scheduler; an extension or separate argument showing that the same lower bound holds (or nearly holds) in the relaxed model would be needed to close the gap.
minor comments (2)
  1. [Abstract] The abstract states that the constructions 'obtain several other constructions (LL/SC, fetch-and-increment, bounded max registers, and counters)' but does not list their achieved latencies; these should be stated explicitly in the introduction or a summary table.
  2. [§3] Notation for the stochastic scheduler parameters (the constant factor on rates, the high-probability bound) should be introduced once and used consistently in all theorem statements.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the detailed and constructive review of our manuscript. We appreciate the positive evaluation of the significance of our results on contention-resilient primitives. We address each major comment below, indicating where revisions will be made.

read point-by-point responses
  1. Referee: [§3] §3 (Scheduler Model): The O(log P) w.h.p. latency for the constructions is load-bearing on the assumption that the stochastic scheduler forces all processes to execute steps at rates differing by at most a constant factor w.h.p. The analysis of the O(1)-space contention-resolution schemes (e.g., exponential backoff) relies on this to bound collisions per step; if the model permitted larger rate deviations while remaining 'roughly synchronous,' the w.h.p. bound would fail and the constructions would reduce to standard asynchronous primitives with linear latency.

    Authors: We agree that the O(log P) w.h.p. latency guarantees depend critically on the roughly-synchronous rate bound in our scheduler model. Section 3 explicitly defines the model as one in which, with high probability, all processes execute steps at rates differing by at most a constant factor, even against an adaptive adversary with full history. The analyses of exponential backoff and related techniques use this property to bound the number of collisions per step and derive the logarithmic latency. In a model allowing arbitrary rate deviations, the constructions would indeed fall back to standard asynchronous behavior with linear latency. We will revise §3 to add an explicit discussion of this dependency, including a contrast with the fully asynchronous case, to make the scope of the guarantees clearer. revision: yes

  2. Referee: [Lower-bound section] Lower-bound section (abstract and §6): The Omega(log_{M L} P) expected-latency lower bound is proved only for perfectly synchronous schedules with one operation per process. It therefore does not certify tightness of the O(log P) upper bound under the paper's stochastic scheduler; an extension or separate argument showing that the same lower bound holds (or nearly holds) in the relaxed model would be needed to close the gap.

    Authors: The lower bound is proved under perfect synchrony (with one operation per process) because this yields the strongest possible impossibility statement: sub-logarithmic latency is ruled out even when the scheduler is most favorable to the algorithm. Synchronous executions occur with positive probability under our stochastic scheduler, so any algorithm claiming good performance in the stochastic model must also succeed on these synchronous schedules; thus the lower bound applies. Nevertheless, we acknowledge that a direct lower-bound argument specialized to the stochastic scheduler would more tightly match the upper-bound setting. We will add a clarifying paragraph in §6 explaining this relationship and noting that a full extension to the stochastic model is left for future work, as it would require additional technical machinery. revision: partial

Circularity Check

0 steps flagged

No circularity: constructions and analysis are self-contained under stated scheduler model

full rationale

The paper defines a relaxed roughly-synchronous stochastic scheduler as an explicit modeling assumption, then proves that its O(1)-space contention-resolution algorithms achieve O(log P) latency w.h.p. under that assumption (including against adaptive adversaries). The lower-bound trade-off is proved separately under perfectly synchronous schedules and does not feed back into the upper-bound constructions. No equations reduce the claimed latency to fitted parameters, self-referential definitions, or load-bearing self-citations; the scheduler model is an input, not derived from the algorithms. The derivation chain is therefore independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on a specific stochastic scheduler model and an adaptive adversary that observes history; these are domain assumptions rather than derived quantities.

axioms (2)
  • domain assumption Processes run at roughly the same rate up to a constant factor with high probability under a relaxed roughly-synchronous stochastic scheduler.
    Invoked to obtain the O(log P) latency bound w.h.p. for the register and CAS constructions.
  • domain assumption The adversary is adaptive and can see the entire history of operations including timing and return values.
    Used to guarantee that the algorithms remain correct and fast even when composed inside larger programs.

pith-pipeline@v0.9.0 · 5581 in / 1358 out tokens · 29265 ms · 2026-05-10T10:37:23.874558+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    Max registers, counters, and monotone circuits

    James Aspnes, Hagit Attiya, and Keren Censor. Max registers, counters, and monotone circuits. In Proc. 28th ACM Symposium on Principles of Distributed Computing (PODC) , pages 36--45, 2009

  2. [2]

    Time-adaptive algorithms for synchronization

    Rajeev Alur, Hagit Attiya, and Gadi Taubenfeld. Time-adaptive algorithms for synchronization. In Proc. 26th ACM Symposium on Theory of Computing (STOC) , pages 800--809, 1994

  3. [3]

    Acar, Naama Ben-David , and Mike Rainey

    Umut A. Acar, Naama Ben-David , and Mike Rainey. Contention in structured concurrency: Provably efficient dynamic non-zero indicators for nested parallelism. ACM SIGPLAN Notices , 52(8):75--88, October 2017

  4. [4]

    Lower bounds for restricted-use objects

    James Aspnes, Keren Censor-Hillel , Hagit Attiya, and Danny Hendler. Lower bounds for restricted-use objects. SIAM Journal on Computing , 45(3):734--762, 2016

  5. [5]

    Are lock-free concurrent algorithms practically wait-free? Journal of the ACM , 63(4), September 2016

    Dan Alistarh, Keren Censor-Hillel , and Nir Shavit. Are lock-free concurrent algorithms practically wait-free? Journal of the ACM , 63(4), September 2016

  6. [6]

    Bounds on the time to reach agreement in the presence of timing uncertainty

    Hagit Attiya, Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Bounds on the time to reach agreement in the presence of timing uncertainty. Journal of the ACM , 41(1):122--152, January 1994

  7. [7]

    Anderson and Yong-Jik Kim

    James H. Anderson and Yong-Jik Kim. An improved lower bound for the time complexity of mutual exclusion. In Proc. 20th ACM Symposium on Principles of Distributed Computing (PODC) , pages 90--99, 2001

  8. [8]

    Attiya and M

    H. Attiya and M. Mavronicolas. Efficiency of semisynchronous versus asynchronous networks. Mathematical Systems Theory , 27(6):547--571, November 1994

  9. [9]

    Asbell and Eric Ruppert

    Shalom M. Asbell and Eric Ruppert. A wait-free deque with polylogarithmic step complexity. In Proc. 27th International Conference on Principles of Distributed Systems (OPODIS) , pages 17:1--17:22, 2023

  10. [10]

    Analyzing the performance of lock-free data structures: A conflict-based model

    Aras Atalar, Paul Renaud-Goud , and Philippas Tsigas. Analyzing the performance of lock-free data structures: A conflict-based model. In Proc. 29th International Symposium on Distributed Computing (DISC) , pages 341--355, 2015

  11. [11]

    How lock-free data structures perform in dynamic environments: Models and analyses

    Aras Atalar, Paul Renaud-Goud , and Philippas Tsigas. How lock-free data structures perform in dynamic environments: Models and analyses. In Proc. International Conference on Principles of Distributed Systems (OPODIS) , 2016

  12. [12]

    Lock-free algorithms under stochastic schedulers

    Dan Alistarh, Thomas Sauerwald, and Milan Vojnovi\' c . Lock-free algorithms under stochastic schedulers. In Proc. ACM Symposium on Principles of Distributed Computing (PODC) , pages 251--260, 2015

  13. [13]

    Blelloch

    Naama Ben-David and Guy E. Blelloch. Analyzing contention and backoff in asynchronous shared memory. In Proc. ACM Symposium on Principles of Distributed Computing (PODC) , pages 53--62, 2017

  14. [14]

    Bender, Mart \'i n Farach-Colton , Simai He, Bradley C

    Michael A. Bender, Mart \'i n Farach-Colton , Simai He, Bradley C. Kuszmaul, and Charles E. Leiserson. Adversarial contention resolution for simple channels. In Proc. 17th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) , pages 325--332, 2005

  15. [15]

    On the time complexity of broadcast in radio networks: an exponential gap between determinism and randomization

    Reuven Bar-Yehuda , Oded Goldreich, and Alon Itai. On the time complexity of broadcast in radio networks: an exponential gap between determinism and randomization. In Proc. 6th ACM Symposium on Principles of Distributed Computing (PODC) , pages 98--108, 1987

  16. [16]

    Blelloch, Phillip B

    Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Marco Zagha. Accounting for memory bank contention and delay in high-bandwidth multiprocessors. IEEE Transactions on Parallel and Distributed Systems , 8(9):943--958, 1997

  17. [17]

    Contention in counting networks

    Costas Busch, Nikos Hardavellas, and Marios Mavronicolas. Contention in counting networks. In Proc. 13th ACM Symposium on Principles of Distributed Computing (PODC) , page 404, 1994

  18. [18]

    An efficient counting network

    Costas Busch and Marios Mavronicolas. An efficient counting network. Theoretical Computer Science , 411(34):3001--3030, 2010

  19. [19]

    Blelloch and Yuanhao Wei

    Guy E. Blelloch and Yuanhao Wei. LL/SC and atomic copy: Constant time, space efficient implementations using only pointer-width CAS . In Proc. 34th International Symposium on Distributed Computing (DISC) , pages 5:1--5:17, 2020

  20. [20]

    The communication requirements of mutual exclusion

    Robert Cypher. The communication requirements of mutual exclusion. In Proc. 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA) , pages 147--156, 1995

  21. [21]

    Contention in shared memory algorithms

    Cynthia Dwork, Maurice Herlihy, and Orli Waarts. Contention in shared memory algorithms. Journal of the ACM , 44(6):779--805, November 1997

  22. [22]

    Consensus in the presence of partial synchrony

    Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM , 35(2):288--323, April 1988

  23. [23]

    On the inherent sequentiality of concurrent objects

    Faith Ellen, Danny Hendler, and Nir Shavit. On the inherent sequentiality of concurrent objects. SIAM Journal on Computing , 41(3):519--536, 2012

  24. [24]

    Efficient fetch-and-increment

    Faith Ellen, Vijaya Ramachandran, and Philipp Woelfel. Efficient fetch-and-increment. In Proc. 26th International Symposium on Distributed Computing (DISC) , pages 16--30, 2012

  25. [25]

    An optimal implementation of fetch-and-increment

    Faith Ellen and Philipp Woelfel. An optimal implementation of fetch-and-increment. In Proc. 27th International Symposium on Distributed Computing (DISC) , pages 284--298, 2013

  26. [26]

    Obstruction-free algorithms can be practically wait-free

    Faith Ellen Fich, Victor Luchangco, Mark Moir, and Nir Shavit. Obstruction-free algorithms can be practically wait-free. In Proc. 19th International Symposium on Distributed Computing (DISC) , pages 78--92, 2005

  27. [27]

    Analysis of timing-based mutual exclusion with random times

    Eli Gafni and Michael Mitzenmacher. Analysis of timing-based mutual exclusion with random times. SIAM Journal on Computing , 31(3):816--837, 2001

  28. [28]

    Gibbons, Yossi Matias, and Vijaya Ramachandran

    Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. Efficient low-contention parallel algorithms. Journal of Computer and System Sciences , 53(3):417--442, 1996

  29. [29]

    Gibbons, Yossi Matias, and Vijaya Ramachandran

    Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. The queue-read queue-write asynchronous PRAM model. Theoretical Computer Science , 196(1-2):3--29, 1998

  30. [30]

    Gibbons, Yossi Matias, and Vijaya Ramachandran

    Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. The queue-read queue-write PRAM model: Accounting for contention in parallel algorithms. SIAM Journal on Computing , 28(2):733--769, 1998

  31. [31]

    A tight RMR lower bound for randomized mutual exclusion

    George Giakkoupis and Philipp Woelfel. A tight RMR lower bound for randomized mutual exclusion. In Proc. 44th ACM Symposium on Theory of Computing (STOC) , pages 983--1002, 2012

  32. [32]

    Randomized mutual exclusion with constant amortized RMR complexity on the DSM

    George Giakkoupis and Philipp Woelfel. Randomized mutual exclusion with constant amortized RMR complexity on the DSM . In Proc. 55th IEEE Symposium on Foundations of Computer Science (FOCS) , pages 504--513, 2014

  33. [33]

    Constructing shared objects that are both robust and high-throughput

    Danny Hendler and Shay Kutten. Constructing shared objects that are both robust and high-throughput. In Proc. 20th International Symposium on Distributed Computing (DISC) , pages 428--442, 2006

  34. [34]

    Operation-valency and the cost of coordination

    Danny Hendler and Nir Shavit. Operation-valency and the cost of coordination. In Proc. 22nd ACM Symposium on Principles of Distributed Computing (PODC) , pages 84--91, 2003

  35. [35]

    The Art of Multiprocessor Programming, Revised Reprint

    Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming, Revised Reprint . Morgan Kaufmann Publishers Inc., 1st edition, 2012

  36. [36]

    Herlihy and Jeannette M

    Maurice P. Herlihy and Jeannette M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems , 12(3):463--492, July 1990

  37. [37]

    Randomized mutual exclusion with sub-logarithmic RMR -complexity

    Danny Hendler and Philipp Woelfel. Randomized mutual exclusion with sub-logarithmic RMR -complexity. Distributed Computing , 24(1):3--19, 2011

  38. [38]

    Jayanti and Robert E

    Siddhartha V. Jayanti and Robert E. Tarjan. Concurrent disjoint set union. Distributed Computing , 34(6):413--436, December 2021

  39. [39]

    Kuszmaul and Q

    William Kuszmaul and Qi Qi. The multiplicative version of Azuma's inequality, with an application to contention analysis. Preprint arXiv:2102.05077, January 2025

  40. [40]

    Blelloch, Jeremy T

    Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. Reducing contention through priority updates. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , pages 299--300, 2013

  41. [41]

    Blelloch, Jeremy T

    Julian Shun, Yan Gu, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. Sequential random permutation, list contraction and tree contraction are highly parallel. In Proc. 26th ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 431--448, 2015

  42. [42]

    Efficient transformations of obstruction-free algorithms into non-blocking algorithms

    Gadi Taubenfeld. Efficient transformations of obstruction-free algorithms into non-blocking algorithms. In Proc. 21st International Symposium on Distributed Computing (DISC) , pages 450--464, 2007

  43. [43]

    Anderson

    Jae-Heon Yang and Jams H. Anderson. A fast, scalable mutual exclusion algorithm. Distributed Computing , 9(1):51--60, Mar 1995