On the Cost of Concurrency in Hybrid Transactional Memory
Pith reviewed 2026-05-25 02:26 UTC · model grok-4.3
The pith
Software transactions in progressive hybrid transactional memories must perform incremental validation, unlike in pure software versions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Unlike progressive STMs, progressive HyTMs require software transactions to perform incremental validation. This result holds even if hardware transactions can read metadata non-speculatively. The paper presents opaque HyTM algorithms providing progressiveness for a subset of transactions that are optimal in terms of hardware instrumentation and explores the resulting concurrency, instrumentation, and validation trade-offs.
What carries the argument
Instrumentation of hardware transactions to detect contention with software transactions, under progressiveness defined as allowing aborts only due to data conflicts.
Load-bearing premise
Hardware transactions must be instrumented to detect contention with software transactions and progressiveness is defined strictly as allowing aborts only due to data conflicts.
What would settle it
A progressive HyTM construction in which software transactions avoid incremental validation while hardware transactions remain instrumented to detect contention.
Figures
read the original abstract
State-of-the-art \emph{software transactional memory (STM)} implementations achieve good performance by carefully avoiding the overhead of \emph{incremental validation} (i.e., re-reading previously read data items to avoid inconsistency) while still providing \emph{progressiveness} (allowing transactional aborts only due to \emph{data conflicts}). Hardware transactional memory (HTM) implementations promise even better performance, but offer no progress guarantees. Thus, they must be combined with STMs, leading to \emph{hybrid} TMs (HyTMs) in which hardware transactions must be \emph{instrumented} (i.e., access metadata) to detect contention with software transactions. We show that, unlike in progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation. In fact, this result holds even if hardware transactions can \emph{read} metadata \emph{non-speculatively}. We then present \emph{opaque} HyTM algorithms providing \emph{progressiveness for a subset of transactions} that are optimal in terms of hardware instrumentation. We explore the concurrency vs. hardware instrumentation vs. software validation trade-offs for these algorithms. Our experiments with Intel and IBM POWER8 HTMs seem to suggest that (i) the \emph{cost of concurrency} also exists in practice, (ii) it is important to implement HyTMs that provide progressiveness for a maximal set of transactions without incurring high hardware instrumentation overhead or using global contending bottlenecks and (iii) there is no easy way to derive more efficient HyTMs by taking advantage of non-speculative accesses within hardware.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proves that, unlike progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation even when hardware transactions are permitted non-speculative metadata reads. It presents opaque HyTM algorithms that achieve progressiveness for a maximal subset of transactions while minimizing hardware instrumentation, analyzes the resulting concurrency-instrumentation-validation trade-offs, and reports experiments on Intel and IBM POWER8 HTMs suggesting the theoretical cost appears in practice.
Significance. If the lower bound holds under the stated model, it identifies a fundamental cost of combining HTM and STM for progressiveness that pure STMs avoid, directly informing HyTM design. The algorithms are credited for being optimal in instrumentation for the progressiveness subset they support, and the real-hardware experiments provide concrete evidence on the practical relevance of the concurrency cost.
major comments (3)
- [§2–3] §2–3 (model): The impossibility result is derived under the axiom that every hardware transaction must be explicitly instrumented to detect contention with software transactions and that progressiveness permits aborts only on data conflicts. The manuscript should explicitly state whether these are derived from weaker primitives or taken as axioms; if the latter, a brief argument is needed showing that relaxing either (e.g., an HTM that can observe software metadata without explicit instrumentation while preserving progress) would not falsify the necessity of incremental validation.
- [Theorem 1] Theorem 1 (impossibility): The proof sketch in the abstract claims the result survives non-speculative metadata reads by HTM, but the full argument must be checked for whether it still forces incremental validation when hardware transactions can read metadata without instrumentation while still guaranteeing the progressiveness definition; a concrete counter-example construction under a relaxed model would strengthen the claim.
- [§5] §5 (algorithms): The optimality claim for hardware instrumentation in the presented opaque HyTM algorithms is load-bearing for the practical contribution; the manuscript should include a matching lower bound showing that any algorithm providing the same progressiveness subset must incur at least as much instrumentation on the hardware path.
minor comments (2)
- [Experiments] The experimental section would benefit from explicit reporting of the number of runs, variance, and the precise instrumentation points measured on each platform (Intel vs. POWER8) to allow reproduction of the claimed practical cost of concurrency.
- [§2] Notation for the progressiveness subset (e.g., which transactions receive the guarantee) should be introduced once in §2 and used consistently in the algorithm descriptions and theorems.
Simulated Author's Rebuttal
We thank the referee for the thorough review and insightful comments on our manuscript. We address each major comment below and indicate where revisions will be made to clarify the model, strengthen the proof presentation, and support the optimality claims.
read point-by-point responses
-
Referee: [§2–3] §2–3 (model): The impossibility result is derived under the axiom that every hardware transaction must be explicitly instrumented to detect contention with software transactions and that progressiveness permits aborts only on data conflicts. The manuscript should explicitly state whether these are derived from weaker primitives or taken as axioms; if the latter, a brief argument is needed showing that relaxing either (e.g., an HTM that can observe software metadata without explicit instrumentation while preserving progress) would not falsify the necessity of incremental validation.
Authors: These assumptions are axioms of the standard HyTM model used throughout the literature: HTM transactions have no automatic visibility into STM metadata and must explicitly access it to detect contention, while progressiveness is defined to allow aborts solely on data conflicts. We will revise §2 to state this explicitly and add a short paragraph arguing that relaxing the instrumentation axiom would require an HTM with built-in STM awareness, which is outside the scope of current hardware and would constitute a different (non-hybrid) model. The progressiveness definition is standard and relaxing it would weaken the result rather than invalidate it. revision: yes
-
Referee: [Theorem 1] Theorem 1 (impossibility): The proof sketch in the abstract claims the result survives non-speculative metadata reads by HTM, but the full argument must be checked for whether it still forces incremental validation when hardware transactions can read metadata without instrumentation while still guaranteeing the progressiveness definition; a concrete counter-example construction under a relaxed model would strengthen the claim.
Authors: The full proof of Theorem 1 (detailed in the appendix) already accounts for non-speculative metadata reads by hardware transactions; the impossibility of avoiding incremental validation on the software path holds because even non-speculative reads cannot eliminate the need for software transactions to re-validate to preserve progressiveness under the stated concurrency model. We will expand the main-text proof sketch to make this case explicit. A separate counter-example under a further-relaxed model is unnecessary, as the current model already incorporates the non-speculative capability the referee describes. revision: partial
-
Referee: [§5] §5 (algorithms): The optimality claim for hardware instrumentation in the presented opaque HyTM algorithms is load-bearing for the practical contribution; the manuscript should include a matching lower bound showing that any algorithm providing the same progressiveness subset must incur at least as much instrumentation on the hardware path.
Authors: The algorithms are shown to be optimal by construction: they achieve the maximal progressiveness subset while using the minimal number of metadata accesses on every hardware transaction path, as any fewer accesses would allow a data conflict to go undetected and violate progressiveness. We agree that an explicit matching lower bound would make this clearer and will add a short lower-bound argument in §5 of the revised version. revision: yes
Circularity Check
No circularity; impossibility result and algorithms are self-contained derivations from explicit model axioms.
full rationale
The paper derives a lower bound (software transactions in progressive HyTMs require incremental validation) from the stated model definitions of instrumentation, progressiveness (aborts only on data conflicts), and HTM behavior. This is contrasted explicitly with STMs and holds under the additional condition of non-speculative metadata reads. No equations or claims reduce a 'prediction' to a fitted input, no self-citation chain bears the central premise, and no ansatz or uniqueness theorem is imported from prior author work. Algorithm constructions are presented as separate contributions exploring trade-offs. The derivation is therefore independent of its inputs and self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Progressiveness allows transactional aborts only due to data conflicts.
- domain assumption Hardware transactions must be instrumented to detect contention with software transactions.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We show that, unlike in progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation... even if hardware transactions can read metadata non-speculatively.
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Theorem 3. ... slow-path read-only transaction Tk ... incurs a time complexity of Ω(m²); m = |Rset(Tk)|.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf
Advanced Synchronization Facility Proposed Architectural Specification, March 2009. http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf
2009
-
[2]
Yehuda Afek, Alexander Matveev, Oscar R. Moll, and Nir Shavit. Amalgamated lock-elision. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15 , pages 309–324, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_21 , doi:10.1007/978-3-662-48653-5_21
-
[3]
Inherent limitations of hybrid transactional memory
Dan Alistarh, Justin Kopinsky, Petr Kuznetsov, Srivatsan Ravi, and Nir Shavit. Inherent limitations of hybrid transactional memory. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15, pages 185–199, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_13 , doi: 10.1007/978-3-662-48653-5_13
-
[4]
Safety of deferred update in transactional memory
Hagit Attiya, Sandeep Hans, Petr Kuznetsov, and Srivatsan Ravi. Safety of deferred update in transactional memory. In 2013 IEEE 33rd Int. Conf. on Distributed Computing Systems (ICDCS), pages 601–610, Los Alamitos, CA, USA, 2013. IEEE Computer Society. doi:http: //doi.ieeecomputersociety.org/10.1109/ICDCS.2013.57
-
[5]
Inherent limitations on disjoint-access parallel implementations of transactional memory
Hagit Attiya, Eshcar Hillel, and Alessia Milani. Inherent limitations on disjoint-access parallel implementations of transactional memory. Theory of Computing Systems , 49(4):698–719, 2011. URL: http://dx.doi.org/10.1007/s00224-010-9304-5 , doi:10.1007/s00224-010-9304-5
-
[6]
Investigating the performance of hardware transactions on a multi-socket machine
Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In Proceedings of 28th ACM Sym. on Parallelism in Algorithms and Architectures, SPAA ’16 , pages 121–132, 2016. URL: http://doi.acm.org/10. 1145/2935764.2935796, doi:10.1145/2935764.2935796
-
[7]
Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory
Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, and Maurice Herlihy. Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory. In Int. Conf. on Par. Arch. and Compilation, PACT ’14 , pages 187–200, 2014. URL: http://doi.acm.org/10.1145/ 2628071.2628086, doi:10.1145/2628071.2628086
-
[8]
Luke Dalessandro, Francois Carouge, Sean White, Yossi Lev, Mark Moir, Michael L. Scott, and Michael F. Spear. Hybrid NOrec: a case study in the effectiveness of best effort hardware transactional 11 memory. In ASPLOS ’11, pages 39–52. ACM, 2011. URL: http://doi.acm.org/10.1145/1950365. 1950373
-
[9]
Luke Dalessandro, Michael F. Spear, and Michael L. Scott. Norec: Streamlining stm by abolishing ownership records. SIGPLAN Not. , 45(5):67–78, January 2010. URL: http://doi.acm.org/10. 1145/1837853.1693464, doi:10.1145/1837853.1693464
-
[10]
Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. Hybrid transactional memory. SIGPLAN Not., 41(11):336–346, October 2006. URL: http://doi. acm.org/10.1145/1168918.1168900, doi:10.1145/1168918.1168900
-
[11]
Dave Dice, Ori Shalev, and Nir Shavit. Transactional locking ii. In Proceedings of the 20th International Conference on Distributed Computing, DISC’06, pages 194–208, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11864219_14, doi:10.1007/11864219_14
-
[12]
K. Fraser. Practical lock-freedom. Technical report, Cambridge University Computer Laboratory, 2003
2003
-
[13]
Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory
Rachid Guerraoui and Michal Kapalka. Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory. Morgan and Claypool, 2010
2010
-
[14]
Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer, III. Software transactional memory for dynamic-sized data structures. In Proc. of 22nd Int. Sym. on Principles of Distr. Comp. , PODC ’03, pages 92–101, New York, NY, USA, 2003. ACM. URL: http://doi.acm.org/10.1145/ 872035.872048, doi:10.1145/872035.872048
-
[15]
Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, pages 289–300, 1993
1993
-
[16]
Hughes, Partha Kundu, and Anthony Nguyen
Sanjeev Kumar, Michael Chu, Christopher J. Hughes, Partha Kundu, and Anthony Nguyen. Hybrid transactional memory. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , PPoPP ’06, pages 209–220, New York, NY, USA, 2006. ACM. URL: http://doi.acm.org/10.1145/1122971.1123003
-
[17]
Progressive transactional memory in time and space
Petr Kuznetsov and Srivatsan Ravi. Progressive transactional memory in time and space. In Proceed- ings of 13th Int. Conf. on Parallel Computing Technologies, PaCT ’15 , pages 410–425, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21909-7_40 , doi:10.1007/978-3-319-21909-7_40
-
[18]
Hung Q. Le, G. L. Guthrie, Derek Williams, Maged M. Michael, Brad Frey, William J. Starke, Cathy May, Rei Odaira, and Takuya Nakaike. Transactional memory support in the IBM POWER8 processor. IBM Journal of Research and Development , 59(1), 2015. URL: http://dx.doi.org/10. 1147/JRD.2014.2380199
-
[19]
Phtm: Phased transactional memory
Yossi Lev, Mark Moir, and Dan Nussbaum. Phtm: Phased transactional memory. In In Workshop on Transactional Computing (Transact), 2007
2007
-
[20]
Reduced hardware transactions: a new approach to hybrid transactional memory
Alexander Matveev and Nir Shavit. Reduced hardware transactions: a new approach to hybrid transactional memory. In Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures, pages 11–22. ACM, 2013
2013
-
[21]
Michael, and Hisanobu Tomari
Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, and Hisanobu Tomari. Quantita- tive comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In Proc. of 42nd Int. Sym. on Comp. Arch. , ISCA ’15, pages 144–157, NY, USA,
-
[22]
URL: http://doi.acm.org/10.1145/2749469.2750403
-
[23]
Andrew T. Nguyen. Investigation of hardware transactional memory. 2015. http://groups.csail.mit.edu/mag/Andrew-Nguyen-Thesis.pdf
2015
-
[24]
Ravi Rajwar and James R. Goodman. Speculative lock elision: Enabling highly concurrent mul- tithreaded execution. In Proc. of 34th ACM/IEEE Int. Sym. on Microarchitecture , MICRO ’01, pages 294–305, Washington, DC, USA, 2001. URL: http://dl.acm.org/citation.cfm?id=563998. 564036. 12
2001
-
[25]
Transactional Synchronization in Haswell, 2012
James Reinders. Transactional Synchronization in Haswell, 2012. http://software.intel.com/ en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/
2012
-
[26]
Optimizing hybrid transactional memory: The importance of nonspeculative operations
Torvald Riegel, Patrick Marlier, Martin Nowack, Pascal Felber, and Christof Fetzer. Optimizing hybrid transactional memory: The importance of nonspeculative operations. In Proc. of 23rd ACM Sym. on Parallelism in Algs. and Arch. , pages 53–64. ACM, 2011
2011
-
[27]
Software transactional memory
Nir Shavit and Dan Touitou. Software transactional memory. In Principles of Distributed Computing (PODC), pages 204–213, 1995
1995
-
[28]
Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. Performance evaluation of intel® transactional synchronization extensions for high-performance computing. In Proceedings of Int. Conf. on High Perf. Computing, Networking, Storage and Analysis , SC ’13, pages 19:1– 19:11, New York, NY, USA, 2013. URL: http://doi.acm.org/10.1145/250321...
-
[29]
E2) that does not include e1 (and resp
be the longest prefix of E1 (and resp. E2) that does not include e1 (and resp. e2). Since before accessing b, the execution is step contention-free for T1, E·E′ 1·E′ 2 is an execution ofM. By assumption of lemma, T1 and T2 do not conflict in E′ 1·E′
-
[30]
By construction, E1·E′ 2 is indistinguishable to T2 from E′ 1·E′
-
[31]
We now consider two cases:
Hence, T1 and T2 are poised to apply contending events e1 and e2 on b in the execution ˜E =E′ 1·E′ 2. We now consider two cases:
-
[32]
Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive
(e1 is a nontrivial event) After ˜E·e1, b is contained in the tracking set of process p1 in exclusive mode and in the extension ˜E·e1·e2, we have thatτ1 is invalidated. Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive
-
[33]
Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode
(e1 is a trivial event) Recall that e1 may be potentially an event involving a direct access. Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode. Clearly, we have an extension ˜E·e2·e1 in which τ2 is invalidated. Thus transaction T2 must return A2 in any extension of E·e2·e1—a contradiction to t...
-
[34]
readφ(Xi)→Aφ)
is the complete step contention-free execution fragment of readφ(Xi)→v (and resp. readφ(Xi)→Aφ). Note that in the execution so defined above, we assume that each fast-path transactionsTi andTℓ;ℓ≤ (i−1) are executed by distinct processes. Claim 9. For all i∈{ 2,...,m } and ℓ≤ (i− 1),M has an execution of the form Ei 1ℓ or Ei 2ℓ. Proof. Note that by our assu...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.