Fast Concurrent Primitives Despite Contention
Pith reviewed 2026-05-10 10:37 UTC · model grok-4.3
The pith
Concurrent read/write and CAS registers achieve O(log P) latency with high probability under contention using constant hardware.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We give contention-resolution algorithms for several basic primitives, and analyze them under a relaxed, roughly-synchronous stochastic scheduler, where processes run at roughly the same rate up to a constant factor with high probability. Specifically, we construct read/write registers and CAS registers that have latency O(log P) w.h.p. under our scheduler model, using O(1) hardware read/write registers and, in the case of our CAS construction, one hardware CAS register. Our algorithms guarantee performance even when their operations are invoked by an adaptive adversary that is able to see the entire history of operations so far, including their timing and return values. This allows them to
What carries the argument
Contention-resolution algorithms that convert hardware primitives subject to write contention into versions with graceful contention handling, under the roughly-synchronous stochastic scheduler.
Load-bearing premise
Processes run at roughly the same rate up to a constant factor with high probability, even against an adaptive adversary.
What would settle it
An execution under the roughly-synchronous scheduler in which some operation takes more than O(log P) time with probability that does not go to zero as P grows.
Figures
read the original abstract
We study the problem of constructing concurrent objects in a setting where $P$ processes run in parallel and interact through a shared memory that is subject to write contention. Our goal is to transform hardware primitives that are subject to write contention into ones that handle contention gracefully. We give contention-resolution algorithms for several basic primitives, and analyze them under a relaxed, roughly-synchronous stochastic scheduler, where processes run at roughly the same rate up to a constant factor with high probability. Specifically, we construct read/write registers and CAS registers that have latency $O(\log P)$ w.h.p. under our scheduler model, using $O(1)$ hardware read/write registers and, in the case of our CAS construction, one hardware CAS register. Our algorithms guarantee performance even when their operations are invoked by an adaptive adversary that is able to see the entire history of operations so far, including their timing and return values. This allows them to be used as building blocks inside larger programs; using this compositionality property, we obtain several other constructions (LL/SC, fetch-and-increment, bounded max registers, and counters). To complement our constructions, we give a trade-off showing that even under a perfectly synchronous schedule and even if each process only executes one operation, any algorithm that implements any of the primitives that we consider, uses space $M$, and has latency at most $L$ with high probability must have expected latency at least $\Omega(\log_{ML} P)$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper constructs read/write registers and CAS registers achieving O(log P) latency w.h.p. under a relaxed roughly-synchronous stochastic scheduler (processes run at rates within a constant factor w.h.p., even against an adaptive adversary seeing full history), using O(1) hardware read/write registers (plus one hardware CAS for the CAS case). These are compositional, yielding LL/SC, fetch-and-increment, bounded max registers, and counters. A complementary lower bound states that any implementation of the considered primitives with space M and latency at most L w.h.p. must have expected latency Omega(log_{M L} P) even under perfectly synchronous schedules where each process executes only one operation.
Significance. If the proofs and analyses hold, the work supplies practical contention-resolution primitives with logarithmic latency and constant space in a scheduler model that is weaker than full synchrony yet stronger than pure asynchrony. The compositionality property and explicit adaptive-adversary guarantee are strengths, as is the lower-bound trade-off that rules out sub-logarithmic latency without increasing space or latency parameters. These results could inform efficient implementations of concurrent objects in shared-memory systems subject to write contention.
major comments (2)
- [§3] §3 (Scheduler Model): The O(log P) w.h.p. latency for the constructions is load-bearing on the assumption that the stochastic scheduler forces all processes to execute steps at rates differing by at most a constant factor w.h.p. The analysis of the O(1)-space contention-resolution schemes (e.g., exponential backoff) relies on this to bound collisions per step; if the model permitted larger rate deviations while remaining 'roughly synchronous,' the w.h.p. bound would fail and the constructions would reduce to standard asynchronous primitives with linear latency.
- [Lower-bound section] Lower-bound section (abstract and §6): The Omega(log_{M L} P) expected-latency lower bound is proved only for perfectly synchronous schedules with one operation per process. It therefore does not certify tightness of the O(log P) upper bound under the paper's stochastic scheduler; an extension or separate argument showing that the same lower bound holds (or nearly holds) in the relaxed model would be needed to close the gap.
minor comments (2)
- [Abstract] The abstract states that the constructions 'obtain several other constructions (LL/SC, fetch-and-increment, bounded max registers, and counters)' but does not list their achieved latencies; these should be stated explicitly in the introduction or a summary table.
- [§3] Notation for the stochastic scheduler parameters (the constant factor on rates, the high-probability bound) should be introduced once and used consistently in all theorem statements.
Simulated Author's Rebuttal
Thank you for the detailed and constructive review of our manuscript. We appreciate the positive evaluation of the significance of our results on contention-resilient primitives. We address each major comment below, indicating where revisions will be made.
read point-by-point responses
-
Referee: [§3] §3 (Scheduler Model): The O(log P) w.h.p. latency for the constructions is load-bearing on the assumption that the stochastic scheduler forces all processes to execute steps at rates differing by at most a constant factor w.h.p. The analysis of the O(1)-space contention-resolution schemes (e.g., exponential backoff) relies on this to bound collisions per step; if the model permitted larger rate deviations while remaining 'roughly synchronous,' the w.h.p. bound would fail and the constructions would reduce to standard asynchronous primitives with linear latency.
Authors: We agree that the O(log P) w.h.p. latency guarantees depend critically on the roughly-synchronous rate bound in our scheduler model. Section 3 explicitly defines the model as one in which, with high probability, all processes execute steps at rates differing by at most a constant factor, even against an adaptive adversary with full history. The analyses of exponential backoff and related techniques use this property to bound the number of collisions per step and derive the logarithmic latency. In a model allowing arbitrary rate deviations, the constructions would indeed fall back to standard asynchronous behavior with linear latency. We will revise §3 to add an explicit discussion of this dependency, including a contrast with the fully asynchronous case, to make the scope of the guarantees clearer. revision: yes
-
Referee: [Lower-bound section] Lower-bound section (abstract and §6): The Omega(log_{M L} P) expected-latency lower bound is proved only for perfectly synchronous schedules with one operation per process. It therefore does not certify tightness of the O(log P) upper bound under the paper's stochastic scheduler; an extension or separate argument showing that the same lower bound holds (or nearly holds) in the relaxed model would be needed to close the gap.
Authors: The lower bound is proved under perfect synchrony (with one operation per process) because this yields the strongest possible impossibility statement: sub-logarithmic latency is ruled out even when the scheduler is most favorable to the algorithm. Synchronous executions occur with positive probability under our stochastic scheduler, so any algorithm claiming good performance in the stochastic model must also succeed on these synchronous schedules; thus the lower bound applies. Nevertheless, we acknowledge that a direct lower-bound argument specialized to the stochastic scheduler would more tightly match the upper-bound setting. We will add a clarifying paragraph in §6 explaining this relationship and noting that a full extension to the stochastic model is left for future work, as it would require additional technical machinery. revision: partial
Circularity Check
No circularity: constructions and analysis are self-contained under stated scheduler model
full rationale
The paper defines a relaxed roughly-synchronous stochastic scheduler as an explicit modeling assumption, then proves that its O(1)-space contention-resolution algorithms achieve O(log P) latency w.h.p. under that assumption (including against adaptive adversaries). The lower-bound trade-off is proved separately under perfectly synchronous schedules and does not feed back into the upper-bound constructions. No equations reduce the claimed latency to fitted parameters, self-referential definitions, or load-bearing self-citations; the scheduler model is an input, not derived from the algorithms. The derivation chain is therefore independent of its own outputs.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Processes run at roughly the same rate up to a constant factor with high probability under a relaxed roughly-synchronous stochastic scheduler.
- domain assumption The adversary is adaptive and can see the entire history of operations including timing and return values.
Reference graph
Works this paper leans on
-
[1]
Max registers, counters, and monotone circuits
James Aspnes, Hagit Attiya, and Keren Censor. Max registers, counters, and monotone circuits. In Proc. 28th ACM Symposium on Principles of Distributed Computing (PODC) , pages 36--45, 2009
work page 2009
-
[2]
Time-adaptive algorithms for synchronization
Rajeev Alur, Hagit Attiya, and Gadi Taubenfeld. Time-adaptive algorithms for synchronization. In Proc. 26th ACM Symposium on Theory of Computing (STOC) , pages 800--809, 1994
work page 1994
-
[3]
Acar, Naama Ben-David , and Mike Rainey
Umut A. Acar, Naama Ben-David , and Mike Rainey. Contention in structured concurrency: Provably efficient dynamic non-zero indicators for nested parallelism. ACM SIGPLAN Notices , 52(8):75--88, October 2017
work page 2017
-
[4]
Lower bounds for restricted-use objects
James Aspnes, Keren Censor-Hillel , Hagit Attiya, and Danny Hendler. Lower bounds for restricted-use objects. SIAM Journal on Computing , 45(3):734--762, 2016
work page 2016
-
[5]
Dan Alistarh, Keren Censor-Hillel , and Nir Shavit. Are lock-free concurrent algorithms practically wait-free? Journal of the ACM , 63(4), September 2016
work page 2016
-
[6]
Bounds on the time to reach agreement in the presence of timing uncertainty
Hagit Attiya, Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Bounds on the time to reach agreement in the presence of timing uncertainty. Journal of the ACM , 41(1):122--152, January 1994
work page 1994
-
[7]
James H. Anderson and Yong-Jik Kim. An improved lower bound for the time complexity of mutual exclusion. In Proc. 20th ACM Symposium on Principles of Distributed Computing (PODC) , pages 90--99, 2001
work page 2001
-
[8]
H. Attiya and M. Mavronicolas. Efficiency of semisynchronous versus asynchronous networks. Mathematical Systems Theory , 27(6):547--571, November 1994
work page 1994
-
[9]
Shalom M. Asbell and Eric Ruppert. A wait-free deque with polylogarithmic step complexity. In Proc. 27th International Conference on Principles of Distributed Systems (OPODIS) , pages 17:1--17:22, 2023
work page 2023
-
[10]
Analyzing the performance of lock-free data structures: A conflict-based model
Aras Atalar, Paul Renaud-Goud , and Philippas Tsigas. Analyzing the performance of lock-free data structures: A conflict-based model. In Proc. 29th International Symposium on Distributed Computing (DISC) , pages 341--355, 2015
work page 2015
-
[11]
How lock-free data structures perform in dynamic environments: Models and analyses
Aras Atalar, Paul Renaud-Goud , and Philippas Tsigas. How lock-free data structures perform in dynamic environments: Models and analyses. In Proc. International Conference on Principles of Distributed Systems (OPODIS) , 2016
work page 2016
-
[12]
Lock-free algorithms under stochastic schedulers
Dan Alistarh, Thomas Sauerwald, and Milan Vojnovi\' c . Lock-free algorithms under stochastic schedulers. In Proc. ACM Symposium on Principles of Distributed Computing (PODC) , pages 251--260, 2015
work page 2015
- [13]
-
[14]
Bender, Mart \'i n Farach-Colton , Simai He, Bradley C
Michael A. Bender, Mart \'i n Farach-Colton , Simai He, Bradley C. Kuszmaul, and Charles E. Leiserson. Adversarial contention resolution for simple channels. In Proc. 17th ACM Symposium on Parallelism in Algorithms and Architectures (SPAA) , pages 325--332, 2005
work page 2005
-
[15]
Reuven Bar-Yehuda , Oded Goldreich, and Alon Itai. On the time complexity of broadcast in radio networks: an exponential gap between determinism and randomization. In Proc. 6th ACM Symposium on Principles of Distributed Computing (PODC) , pages 98--108, 1987
work page 1987
-
[16]
Guy E. Blelloch, Phillip B. Gibbons, Yossi Matias, and Marco Zagha. Accounting for memory bank contention and delay in high-bandwidth multiprocessors. IEEE Transactions on Parallel and Distributed Systems , 8(9):943--958, 1997
work page 1997
-
[17]
Contention in counting networks
Costas Busch, Nikos Hardavellas, and Marios Mavronicolas. Contention in counting networks. In Proc. 13th ACM Symposium on Principles of Distributed Computing (PODC) , page 404, 1994
work page 1994
-
[18]
Costas Busch and Marios Mavronicolas. An efficient counting network. Theoretical Computer Science , 411(34):3001--3030, 2010
work page 2010
-
[19]
Guy E. Blelloch and Yuanhao Wei. LL/SC and atomic copy: Constant time, space efficient implementations using only pointer-width CAS . In Proc. 34th International Symposium on Distributed Computing (DISC) , pages 5:1--5:17, 2020
work page 2020
-
[20]
The communication requirements of mutual exclusion
Robert Cypher. The communication requirements of mutual exclusion. In Proc. 7th ACM Symposium on Parallel Algorithms and Architectures (SPAA) , pages 147--156, 1995
work page 1995
-
[21]
Contention in shared memory algorithms
Cynthia Dwork, Maurice Herlihy, and Orli Waarts. Contention in shared memory algorithms. Journal of the ACM , 44(6):779--805, November 1997
work page 1997
-
[22]
Consensus in the presence of partial synchrony
Cynthia Dwork, Nancy Lynch, and Larry Stockmeyer. Consensus in the presence of partial synchrony. Journal of the ACM , 35(2):288--323, April 1988
work page 1988
-
[23]
On the inherent sequentiality of concurrent objects
Faith Ellen, Danny Hendler, and Nir Shavit. On the inherent sequentiality of concurrent objects. SIAM Journal on Computing , 41(3):519--536, 2012
work page 2012
-
[24]
Faith Ellen, Vijaya Ramachandran, and Philipp Woelfel. Efficient fetch-and-increment. In Proc. 26th International Symposium on Distributed Computing (DISC) , pages 16--30, 2012
work page 2012
-
[25]
An optimal implementation of fetch-and-increment
Faith Ellen and Philipp Woelfel. An optimal implementation of fetch-and-increment. In Proc. 27th International Symposium on Distributed Computing (DISC) , pages 284--298, 2013
work page 2013
-
[26]
Obstruction-free algorithms can be practically wait-free
Faith Ellen Fich, Victor Luchangco, Mark Moir, and Nir Shavit. Obstruction-free algorithms can be practically wait-free. In Proc. 19th International Symposium on Distributed Computing (DISC) , pages 78--92, 2005
work page 2005
-
[27]
Analysis of timing-based mutual exclusion with random times
Eli Gafni and Michael Mitzenmacher. Analysis of timing-based mutual exclusion with random times. SIAM Journal on Computing , 31(3):816--837, 2001
work page 2001
-
[28]
Gibbons, Yossi Matias, and Vijaya Ramachandran
Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. Efficient low-contention parallel algorithms. Journal of Computer and System Sciences , 53(3):417--442, 1996
work page 1996
-
[29]
Gibbons, Yossi Matias, and Vijaya Ramachandran
Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. The queue-read queue-write asynchronous PRAM model. Theoretical Computer Science , 196(1-2):3--29, 1998
work page 1998
-
[30]
Gibbons, Yossi Matias, and Vijaya Ramachandran
Phillip B. Gibbons, Yossi Matias, and Vijaya Ramachandran. The queue-read queue-write PRAM model: Accounting for contention in parallel algorithms. SIAM Journal on Computing , 28(2):733--769, 1998
work page 1998
-
[31]
A tight RMR lower bound for randomized mutual exclusion
George Giakkoupis and Philipp Woelfel. A tight RMR lower bound for randomized mutual exclusion. In Proc. 44th ACM Symposium on Theory of Computing (STOC) , pages 983--1002, 2012
work page 2012
-
[32]
Randomized mutual exclusion with constant amortized RMR complexity on the DSM
George Giakkoupis and Philipp Woelfel. Randomized mutual exclusion with constant amortized RMR complexity on the DSM . In Proc. 55th IEEE Symposium on Foundations of Computer Science (FOCS) , pages 504--513, 2014
work page 2014
-
[33]
Constructing shared objects that are both robust and high-throughput
Danny Hendler and Shay Kutten. Constructing shared objects that are both robust and high-throughput. In Proc. 20th International Symposium on Distributed Computing (DISC) , pages 428--442, 2006
work page 2006
-
[34]
Operation-valency and the cost of coordination
Danny Hendler and Nir Shavit. Operation-valency and the cost of coordination. In Proc. 22nd ACM Symposium on Principles of Distributed Computing (PODC) , pages 84--91, 2003
work page 2003
-
[35]
The Art of Multiprocessor Programming, Revised Reprint
Maurice Herlihy and Nir Shavit. The Art of Multiprocessor Programming, Revised Reprint . Morgan Kaufmann Publishers Inc., 1st edition, 2012
work page 2012
-
[36]
Maurice P. Herlihy and Jeannette M. Wing. Linearizability: a correctness condition for concurrent objects. ACM Transactions on Programming Languages and Systems , 12(3):463--492, July 1990
work page 1990
-
[37]
Randomized mutual exclusion with sub-logarithmic RMR -complexity
Danny Hendler and Philipp Woelfel. Randomized mutual exclusion with sub-logarithmic RMR -complexity. Distributed Computing , 24(1):3--19, 2011
work page 2011
-
[38]
Siddhartha V. Jayanti and Robert E. Tarjan. Concurrent disjoint set union. Distributed Computing , 34(6):413--436, December 2021
work page 2021
-
[39]
William Kuszmaul and Qi Qi. The multiplicative version of Azuma's inequality, with an application to contention analysis. Preprint arXiv:2102.05077, January 2025
-
[40]
Julian Shun, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. Reducing contention through priority updates. In Proc. ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP) , pages 299--300, 2013
work page 2013
-
[41]
Julian Shun, Yan Gu, Guy E. Blelloch, Jeremy T. Fineman, and Phillip B. Gibbons. Sequential random permutation, list contraction and tree contraction are highly parallel. In Proc. 26th ACM-SIAM Symposium on Discrete Algorithms (SODA) , pages 431--448, 2015
work page 2015
-
[42]
Efficient transformations of obstruction-free algorithms into non-blocking algorithms
Gadi Taubenfeld. Efficient transformations of obstruction-free algorithms into non-blocking algorithms. In Proc. 21st International Symposium on Distributed Computing (DISC) , pages 450--464, 2007
work page 2007
- [43]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.