pith. sign in

arxiv: 1907.02669 · v1 · pith:OGD3KVOInew · submitted 2019-07-05 · 💻 cs.DC

On the Cost of Concurrency in Hybrid Transactional Memory

Pith reviewed 2026-05-25 02:26 UTC · model grok-4.3

classification 💻 cs.DC
keywords hybrid transactional memoryincremental validationprogressivenesssoftware transactional memoryhardware transactional memoryinstrumentationconcurrency
0
0 comments X

The pith

Software transactions in progressive hybrid transactional memories must perform incremental validation, unlike in pure software versions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that mixing hardware and software transactions under progressiveness creates a constraint absent from pure software transactional memory. Software transactions cannot skip re-reading previously accessed data items to ensure consistency. This requirement remains even when hardware transactions read metadata without speculation. The authors then give opaque algorithms that achieve progressiveness for a subset of transactions while minimizing hardware instrumentation overhead.

Core claim

Unlike progressive STMs, progressive HyTMs require software transactions to perform incremental validation. This result holds even if hardware transactions can read metadata non-speculatively. The paper presents opaque HyTM algorithms providing progressiveness for a subset of transactions that are optimal in terms of hardware instrumentation and explores the resulting concurrency, instrumentation, and validation trade-offs.

What carries the argument

Instrumentation of hardware transactions to detect contention with software transactions, under progressiveness defined as allowing aborts only due to data conflicts.

Load-bearing premise

Hardware transactions must be instrumented to detect contention with software transactions and progressiveness is defined strictly as allowing aborts only due to data conflicts.

What would settle it

A progressive HyTM construction in which software transactions avoid incremental validation while hardware transactions remain instrumented to detect contention.

Figures

Figures reproduced from arXiv: 1907.02669 by Srivatsan Ravi, Trevor Brown.

Figure 1
Figure 1. Figure 1: Proof steps for Theorem 3 progressiveness: a transaction may abort only if there is a prefix in which it conflicts with another transaction and both are t-incomplete. TL2 on the other hand allows a transaction to abort due to a concurrent conflicting transaction. Implications for disjoint-access parallelism in HyTM. The property of disjoint-access parallelism (DAP), in its weakest form, ensures that two tr… view at source ↗
Figure 2
Figure 2. Figure 2: Table summarizing complexities of HyTM implementations [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Results for a BST microbenchmark. The x-axis represents the number of concurrent threads. The y-axis represents operations per microsecond. with high probability. Thus, in this type of workload, almost all transactions succeed in hardware, and the slow-path is almost never used. To study performance when transactions regularly run on slow-path, we introduced an operation called a RangeIncrement that often … view at source ↗
read the original abstract

State-of-the-art \emph{software transactional memory (STM)} implementations achieve good performance by carefully avoiding the overhead of \emph{incremental validation} (i.e., re-reading previously read data items to avoid inconsistency) while still providing \emph{progressiveness} (allowing transactional aborts only due to \emph{data conflicts}). Hardware transactional memory (HTM) implementations promise even better performance, but offer no progress guarantees. Thus, they must be combined with STMs, leading to \emph{hybrid} TMs (HyTMs) in which hardware transactions must be \emph{instrumented} (i.e., access metadata) to detect contention with software transactions. We show that, unlike in progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation. In fact, this result holds even if hardware transactions can \emph{read} metadata \emph{non-speculatively}. We then present \emph{opaque} HyTM algorithms providing \emph{progressiveness for a subset of transactions} that are optimal in terms of hardware instrumentation. We explore the concurrency vs. hardware instrumentation vs. software validation trade-offs for these algorithms. Our experiments with Intel and IBM POWER8 HTMs seem to suggest that (i) the \emph{cost of concurrency} also exists in practice, (ii) it is important to implement HyTMs that provide progressiveness for a maximal set of transactions without incurring high hardware instrumentation overhead or using global contending bottlenecks and (iii) there is no easy way to derive more efficient HyTMs by taking advantage of non-speculative accesses within hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper proves that, unlike progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation even when hardware transactions are permitted non-speculative metadata reads. It presents opaque HyTM algorithms that achieve progressiveness for a maximal subset of transactions while minimizing hardware instrumentation, analyzes the resulting concurrency-instrumentation-validation trade-offs, and reports experiments on Intel and IBM POWER8 HTMs suggesting the theoretical cost appears in practice.

Significance. If the lower bound holds under the stated model, it identifies a fundamental cost of combining HTM and STM for progressiveness that pure STMs avoid, directly informing HyTM design. The algorithms are credited for being optimal in instrumentation for the progressiveness subset they support, and the real-hardware experiments provide concrete evidence on the practical relevance of the concurrency cost.

major comments (3)
  1. [§2–3] §2–3 (model): The impossibility result is derived under the axiom that every hardware transaction must be explicitly instrumented to detect contention with software transactions and that progressiveness permits aborts only on data conflicts. The manuscript should explicitly state whether these are derived from weaker primitives or taken as axioms; if the latter, a brief argument is needed showing that relaxing either (e.g., an HTM that can observe software metadata without explicit instrumentation while preserving progress) would not falsify the necessity of incremental validation.
  2. [Theorem 1] Theorem 1 (impossibility): The proof sketch in the abstract claims the result survives non-speculative metadata reads by HTM, but the full argument must be checked for whether it still forces incremental validation when hardware transactions can read metadata without instrumentation while still guaranteeing the progressiveness definition; a concrete counter-example construction under a relaxed model would strengthen the claim.
  3. [§5] §5 (algorithms): The optimality claim for hardware instrumentation in the presented opaque HyTM algorithms is load-bearing for the practical contribution; the manuscript should include a matching lower bound showing that any algorithm providing the same progressiveness subset must incur at least as much instrumentation on the hardware path.
minor comments (2)
  1. [Experiments] The experimental section would benefit from explicit reporting of the number of runs, variance, and the precise instrumentation points measured on each platform (Intel vs. POWER8) to allow reproduction of the claimed practical cost of concurrency.
  2. [§2] Notation for the progressiveness subset (e.g., which transactions receive the guarantee) should be introduced once in §2 and used consistently in the algorithm descriptions and theorems.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our manuscript. We address each major comment below and indicate where revisions will be made to clarify the model, strengthen the proof presentation, and support the optimality claims.

read point-by-point responses
  1. Referee: [§2–3] §2–3 (model): The impossibility result is derived under the axiom that every hardware transaction must be explicitly instrumented to detect contention with software transactions and that progressiveness permits aborts only on data conflicts. The manuscript should explicitly state whether these are derived from weaker primitives or taken as axioms; if the latter, a brief argument is needed showing that relaxing either (e.g., an HTM that can observe software metadata without explicit instrumentation while preserving progress) would not falsify the necessity of incremental validation.

    Authors: These assumptions are axioms of the standard HyTM model used throughout the literature: HTM transactions have no automatic visibility into STM metadata and must explicitly access it to detect contention, while progressiveness is defined to allow aborts solely on data conflicts. We will revise §2 to state this explicitly and add a short paragraph arguing that relaxing the instrumentation axiom would require an HTM with built-in STM awareness, which is outside the scope of current hardware and would constitute a different (non-hybrid) model. The progressiveness definition is standard and relaxing it would weaken the result rather than invalidate it. revision: yes

  2. Referee: [Theorem 1] Theorem 1 (impossibility): The proof sketch in the abstract claims the result survives non-speculative metadata reads by HTM, but the full argument must be checked for whether it still forces incremental validation when hardware transactions can read metadata without instrumentation while still guaranteeing the progressiveness definition; a concrete counter-example construction under a relaxed model would strengthen the claim.

    Authors: The full proof of Theorem 1 (detailed in the appendix) already accounts for non-speculative metadata reads by hardware transactions; the impossibility of avoiding incremental validation on the software path holds because even non-speculative reads cannot eliminate the need for software transactions to re-validate to preserve progressiveness under the stated concurrency model. We will expand the main-text proof sketch to make this case explicit. A separate counter-example under a further-relaxed model is unnecessary, as the current model already incorporates the non-speculative capability the referee describes. revision: partial

  3. Referee: [§5] §5 (algorithms): The optimality claim for hardware instrumentation in the presented opaque HyTM algorithms is load-bearing for the practical contribution; the manuscript should include a matching lower bound showing that any algorithm providing the same progressiveness subset must incur at least as much instrumentation on the hardware path.

    Authors: The algorithms are shown to be optimal by construction: they achieve the maximal progressiveness subset while using the minimal number of metadata accesses on every hardware transaction path, as any fewer accesses would allow a data conflict to go undetected and violate progressiveness. We agree that an explicit matching lower bound would make this clearer and will add a short lower-bound argument in §5 of the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity; impossibility result and algorithms are self-contained derivations from explicit model axioms.

full rationale

The paper derives a lower bound (software transactions in progressive HyTMs require incremental validation) from the stated model definitions of instrumentation, progressiveness (aborts only on data conflicts), and HTM behavior. This is contrasted explicitly with STMs and holds under the additional condition of non-speculative metadata reads. No equations or claims reduce a 'prediction' to a fitted input, no self-citation chain bears the central premise, and no ansatz or uniqueness theorem is imported from prior author work. Algorithm constructions are presented as separate contributions exploring trade-offs. The derivation is therefore independent of its inputs and self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard definitions of progressiveness and the requirement that hardware transactions be instrumented; no free parameters or invented entities are introduced in the abstract.

axioms (2)
  • domain assumption Progressiveness allows transactional aborts only due to data conflicts.
    Invoked in the statement of the main impossibility result.
  • domain assumption Hardware transactions must be instrumented to detect contention with software transactions.
    Stated as the reason hybrid TMs require instrumentation.

pith-pipeline@v0.9.0 · 5822 in / 1251 out tokens · 22639 ms · 2026-05-25T02:26:16.719577+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 16 canonical work pages

  1. [1]

    http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf

    Advanced Synchronization Facility Proposed Architectural Specification, March 2009. http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf

  2. [2]

    Moll, and Nir Shavit

    Yehuda Afek, Alexander Matveev, Oscar R. Moll, and Nir Shavit. Amalgamated lock-elision. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15 , pages 309–324, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_21 , doi:10.1007/978-3-662-48653-5_21

  3. [3]

    Inherent limitations of hybrid transactional memory

    Dan Alistarh, Justin Kopinsky, Petr Kuznetsov, Srivatsan Ravi, and Nir Shavit. Inherent limitations of hybrid transactional memory. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15, pages 185–199, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_13 , doi: 10.1007/978-3-662-48653-5_13

  4. [4]

    Safety of deferred update in transactional memory

    Hagit Attiya, Sandeep Hans, Petr Kuznetsov, and Srivatsan Ravi. Safety of deferred update in transactional memory. In 2013 IEEE 33rd Int. Conf. on Distributed Computing Systems (ICDCS), pages 601–610, Los Alamitos, CA, USA, 2013. IEEE Computer Society. doi:http: //doi.ieeecomputersociety.org/10.1109/ICDCS.2013.57

  5. [5]

    Inherent limitations on disjoint-access parallel implementations of transactional memory

    Hagit Attiya, Eshcar Hillel, and Alessia Milani. Inherent limitations on disjoint-access parallel implementations of transactional memory. Theory of Computing Systems , 49(4):698–719, 2011. URL: http://dx.doi.org/10.1007/s00224-010-9304-5 , doi:10.1007/s00224-010-9304-5

  6. [6]

    Investigating the performance of hardware transactions on a multi-socket machine

    Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In Proceedings of 28th ACM Sym. on Parallelism in Algorithms and Architectures, SPAA ’16 , pages 121–132, 2016. URL: http://doi.acm.org/10. 1145/2935764.2935796, doi:10.1145/2935764.2935796

  7. [7]

    Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory

    Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, and Maurice Herlihy. Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory. In Int. Conf. on Par. Arch. and Compilation, PACT ’14 , pages 187–200, 2014. URL: http://doi.acm.org/10.1145/ 2628071.2628086, doi:10.1145/2628071.2628086

  8. [8]

    Scott, and Michael F

    Luke Dalessandro, Francois Carouge, Sean White, Yossi Lev, Mark Moir, Michael L. Scott, and Michael F. Spear. Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional 11 memory. In ASPLOS ’11, pages 39–52. ACM, 2011. URL: http://doi.acm.org/10.1145/1950365. 1950373

  9. [9]

    Spear, and Michael L

    Luke Dalessandro, Michael F. Spear, and Michael L. Scott. Norec: Streamlining stm by abolishing ownership records. SIGPLAN Not. , 45(5):67–78, January 2010. URL: http://doi.acm.org/10. 1145/1837853.1693464, doi:10.1145/1837853.1693464

  10. [10]

    Hybrid transactional memory

    Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. Hybrid transactional memory. SIGPLAN Not., 41(11):336–346, October 2006. URL: http://doi. acm.org/10.1145/1168918.1168900, doi:10.1145/1168918.1168900

  11. [11]

    Transactional locking ii

    Dave Dice, Ori Shalev, and Nir Shavit. Transactional locking ii. In Proceedings of the 20th International Conference on Distributed Computing, DISC’06, pages 194–208, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11864219_14, doi:10.1007/11864219_14

  12. [12]

    K. Fraser. Practical lock-freedom. Technical report, Cambridge University Computer Laboratory, 2003

  13. [13]

    Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory

    Rachid Guerraoui and Michal Kapalka. Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory. Morgan and Claypool, 2010

  14. [14]

    Scherer, III

    Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer, III. Software transactional memory for dynamic-sized data structures. In Proc. of 22nd Int. Sym. on Principles of Distr. Comp. , PODC ’03, pages 92–101, New York, NY, USA, 2003. ACM. URL: http://doi.acm.org/10.1145/ 872035.872048, doi:10.1145/872035.872048

  15. [15]

    Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, pages 289–300, 1993

  16. [16]

    Hughes, Partha Kundu, and Anthony Nguyen

    Sanjeev Kumar, Michael Chu, Christopher J. Hughes, Partha Kundu, and Anthony Nguyen. Hybrid transactional memory. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , PPoPP ’06, pages 209–220, New York, NY, USA, 2006. ACM. URL: http://doi.acm.org/10.1145/1122971.1123003

  17. [17]

    Progressive transactional memory in time and space

    Petr Kuznetsov and Srivatsan Ravi. Progressive transactional memory in time and space. In Proceed- ings of 13th Int. Conf. on Parallel Computing Technologies, PaCT ’15 , pages 410–425, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21909-7_40 , doi:10.1007/978-3-319-21909-7_40

  18. [18]

    Hung Q. Le, G. L. Guthrie, Derek Williams, Maged M. Michael, Brad Frey, William J. Starke, Cathy May, Rei Odaira, and Takuya Nakaike. Transactional memory support in the IBM POWER8 processor. IBM Journal of Research and Development , 59(1), 2015. URL: http://dx.doi.org/10. 1147/JRD.2014.2380199

  19. [19]

    Phtm: Phased transactional memory

    Yossi Lev, Mark Moir, and Dan Nussbaum. Phtm: Phased transactional memory. In In Workshop on Transactional Computing (Transact), 2007

  20. [20]

    Reduced hardware transactions: a new approach to hybrid transactional memory

    Alexander Matveev and Nir Shavit. Reduced hardware transactions: a new approach to hybrid transactional memory. In Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures, pages 11–22. ACM, 2013

  21. [21]

    Michael, and Hisanobu Tomari

    Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, and Hisanobu Tomari. Quantita- tive comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In Proc. of 42nd Int. Sym. on Comp. Arch. , ISCA ’15, pages 144–157, NY, USA,

  22. [22]

    URL: http://doi.acm.org/10.1145/2749469.2750403

  23. [23]

    Andrew T. Nguyen. Investigation of hardware transactional memory. 2015. http://groups.csail.mit.edu/mag/Andrew-Nguyen-Thesis.pdf

  24. [24]

    Ravi Rajwar and James R. Goodman. Speculative lock elision: Enabling highly concurrent mul- tithreaded execution. In Proc. of 34th ACM/IEEE Int. Sym. on Microarchitecture , MICRO ’01, pages 294–305, Washington, DC, USA, 2001. URL: http://dl.acm.org/citation.cfm?id=563998. 564036. 12

  25. [25]

    Transactional Synchronization in Haswell, 2012

    James Reinders. Transactional Synchronization in Haswell, 2012. http://software.intel.com/ en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/

  26. [26]

    Optimizing hybrid transactional memory: The importance of nonspeculative operations

    Torvald Riegel, Patrick Marlier, Martin Nowack, Pascal Felber, and Christof Fetzer. Optimizing hybrid transactional memory: The importance of nonspeculative operations. In Proc. of 23rd ACM Sym. on Parallelism in Algs. and Arch. , pages 53–64. ACM, 2011

  27. [27]

    Software transactional memory

    Nir Shavit and Dan Touitou. Software transactional memory. In Principles of Distributed Computing (PODC), pages 204–213, 1995

  28. [28]

    Yoo, Christopher J

    Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. Performance evaluation of intel® transactional synchronization extensions for high-performance computing. In Proceedings of Int. Conf. on High Perf. Computing, Networking, Storage and Analysis , SC ’13, pages 19:1– 19:11, New York, NY, USA, 2013. URL: http://doi.acm.org/10.1145/250321...

  29. [29]

    E2) that does not include e1 (and resp

    be the longest prefix of E1 (and resp. E2) that does not include e1 (and resp. e2). Since before accessing b, the execution is step contention-free for T1, E·E′ 1·E′ 2 is an execution ofM. By assumption of lemma, T1 and T2 do not conflict in E′ 1·E′

  30. [30]

    By construction, E1·E′ 2 is indistinguishable to T2 from E′ 1·E′

  31. [31]

    We now consider two cases:

    Hence, T1 and T2 are poised to apply contending events e1 and e2 on b in the execution ˜E =E′ 1·E′ 2. We now consider two cases:

  32. [32]

    Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive

    (e1 is a nontrivial event) After ˜E·e1, b is contained in the tracking set of process p1 in exclusive mode and in the extension ˜E·e1·e2, we have thatτ1 is invalidated. Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive

  33. [33]

    Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode

    (e1 is a trivial event) Recall that e1 may be potentially an event involving a direct access. Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode. Clearly, we have an extension ˜E·e2·e1 in which τ2 is invalidated. Thus transaction T2 must return A2 in any extension of E·e2·e1—a contradiction to t...

  34. [34]

    readφ(Xi)→Aφ)

    is the complete step contention-free execution fragment of readφ(Xi)→v (and resp. readφ(Xi)→Aφ). Note that in the execution so defined above, we assume that each fast-path transactionsTi andTℓ;ℓ≤ (i−1) are executed by distinct processes. Claim 9. For all i∈{ 2,...,m } and ℓ≤ (i− 1),M has an execution of the form Ei 1ℓ or Ei 2ℓ. Proof. Note that by our assu...