On the Cost of Concurrency in Hybrid Transactional Memory

Srivatsan Ravi; Trevor Brown

arxiv: 1907.02669 · v1 · pith:OGD3KVOInew · submitted 2019-07-05 · 💻 cs.DC

On the Cost of Concurrency in Hybrid Transactional Memory

Trevor Brown , Srivatsan Ravi This is my paper

Pith reviewed 2026-05-25 02:26 UTC · model grok-4.3

classification 💻 cs.DC

keywords hybrid transactional memoryincremental validationprogressivenesssoftware transactional memoryhardware transactional memoryinstrumentationconcurrency

0 comments

The pith

Software transactions in progressive hybrid transactional memories must perform incremental validation, unlike in pure software versions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that mixing hardware and software transactions under progressiveness creates a constraint absent from pure software transactional memory. Software transactions cannot skip re-reading previously accessed data items to ensure consistency. This requirement remains even when hardware transactions read metadata without speculation. The authors then give opaque algorithms that achieve progressiveness for a subset of transactions while minimizing hardware instrumentation overhead.

Core claim

Unlike progressive STMs, progressive HyTMs require software transactions to perform incremental validation. This result holds even if hardware transactions can read metadata non-speculatively. The paper presents opaque HyTM algorithms providing progressiveness for a subset of transactions that are optimal in terms of hardware instrumentation and explores the resulting concurrency, instrumentation, and validation trade-offs.

What carries the argument

Instrumentation of hardware transactions to detect contention with software transactions, under progressiveness defined as allowing aborts only due to data conflicts.

Load-bearing premise

Hardware transactions must be instrumented to detect contention with software transactions and progressiveness is defined strictly as allowing aborts only due to data conflicts.

What would settle it

A progressive HyTM construction in which software transactions avoid incremental validation while hardware transactions remain instrumented to detect contention.

Figures

Figures reproduced from arXiv: 1907.02669 by Srivatsan Ravi, Trevor Brown.

**Figure 1.** Figure 1: Proof steps for Theorem 3 progressiveness: a transaction may abort only if there is a prefix in which it conflicts with another transaction and both are t-incomplete. TL2 on the other hand allows a transaction to abort due to a concurrent conflicting transaction. Implications for disjoint-access parallelism in HyTM. The property of disjoint-access parallelism (DAP), in its weakest form, ensures that two tr… view at source ↗

**Figure 2.** Figure 2: Table summarizing complexities of HyTM implementations [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Results for a BST microbenchmark. The x-axis represents the number of concurrent threads. The y-axis represents operations per microsecond. with high probability. Thus, in this type of workload, almost all transactions succeed in hardware, and the slow-path is almost never used. To study performance when transactions regularly run on slow-path, we introduced an operation called a RangeIncrement that often … view at source ↗

read the original abstract

State-of-the-art \emph{software transactional memory (STM)} implementations achieve good performance by carefully avoiding the overhead of \emph{incremental validation} (i.e., re-reading previously read data items to avoid inconsistency) while still providing \emph{progressiveness} (allowing transactional aborts only due to \emph{data conflicts}). Hardware transactional memory (HTM) implementations promise even better performance, but offer no progress guarantees. Thus, they must be combined with STMs, leading to \emph{hybrid} TMs (HyTMs) in which hardware transactions must be \emph{instrumented} (i.e., access metadata) to detect contention with software transactions. We show that, unlike in progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation. In fact, this result holds even if hardware transactions can \emph{read} metadata \emph{non-speculatively}. We then present \emph{opaque} HyTM algorithms providing \emph{progressiveness for a subset of transactions} that are optimal in terms of hardware instrumentation. We explore the concurrency vs. hardware instrumentation vs. software validation trade-offs for these algorithms. Our experiments with Intel and IBM POWER8 HTMs seem to suggest that (i) the \emph{cost of concurrency} also exists in practice, (ii) it is important to implement HyTMs that provide progressiveness for a maximal set of transactions without incurring high hardware instrumentation overhead or using global contending bottlenecks and (iii) there is no easy way to derive more efficient HyTMs by taking advantage of non-speculative accesses within hardware.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Progressive HyTMs require incremental validation for software transactions even with non-speculative hardware reads, with lower bounds, subset-optimal algorithms, and real hardware experiments.

read the letter

The central result is that software transactions in progressive HyTMs must perform incremental validation, unlike in pure STMs, and this holds even when hardware transactions can read metadata non-speculatively. The paper proves the lower bound and then constructs opaque algorithms that give progressiveness to maximal subsets of transactions while minimizing hardware instrumentation overhead. They also map out the resulting trade-offs among concurrency, instrumentation, and validation cost. The experiments on Intel and POWER8 hardware indicate these costs appear in practice and that non-speculative accesses do not yield easy improvements. The work is careful about avoiding global bottlenecks in the designs. The model requires explicit instrumentation on hardware transactions to detect software contention and defines progressiveness strictly via data conflicts. The impossibility is stated to survive non-speculative reads, so the proof apparently incorporates that relaxation. If hardware could observe contention without instrumentation while preserving progress, the necessity claim would not transfer, but the paper positions the result inside the stated model. This is useful for people building or analyzing hybrid transactional systems who need concrete guidance on where validation overhead is unavoidable. The combination of a lower bound, explicit algorithms, and hardware measurements is enough to warrant peer review so that the proofs and experimental details can be checked.

Referee Report

3 major / 2 minor

Summary. The paper proves that, unlike progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation even when hardware transactions are permitted non-speculative metadata reads. It presents opaque HyTM algorithms that achieve progressiveness for a maximal subset of transactions while minimizing hardware instrumentation, analyzes the resulting concurrency-instrumentation-validation trade-offs, and reports experiments on Intel and IBM POWER8 HTMs suggesting the theoretical cost appears in practice.

Significance. If the lower bound holds under the stated model, it identifies a fundamental cost of combining HTM and STM for progressiveness that pure STMs avoid, directly informing HyTM design. The algorithms are credited for being optimal in instrumentation for the progressiveness subset they support, and the real-hardware experiments provide concrete evidence on the practical relevance of the concurrency cost.

major comments (3)

[§2–3] §2–3 (model): The impossibility result is derived under the axiom that every hardware transaction must be explicitly instrumented to detect contention with software transactions and that progressiveness permits aborts only on data conflicts. The manuscript should explicitly state whether these are derived from weaker primitives or taken as axioms; if the latter, a brief argument is needed showing that relaxing either (e.g., an HTM that can observe software metadata without explicit instrumentation while preserving progress) would not falsify the necessity of incremental validation.
[Theorem 1] Theorem 1 (impossibility): The proof sketch in the abstract claims the result survives non-speculative metadata reads by HTM, but the full argument must be checked for whether it still forces incremental validation when hardware transactions can read metadata without instrumentation while still guaranteeing the progressiveness definition; a concrete counter-example construction under a relaxed model would strengthen the claim.
[§5] §5 (algorithms): The optimality claim for hardware instrumentation in the presented opaque HyTM algorithms is load-bearing for the practical contribution; the manuscript should include a matching lower bound showing that any algorithm providing the same progressiveness subset must incur at least as much instrumentation on the hardware path.

minor comments (2)

[Experiments] The experimental section would benefit from explicit reporting of the number of runs, variance, and the precise instrumentation points measured on each platform (Intel vs. POWER8) to allow reproduction of the claimed practical cost of concurrency.
[§2] Notation for the progressiveness subset (e.g., which transactions receive the guarantee) should be introduced once in §2 and used consistently in the algorithm descriptions and theorems.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the thorough review and insightful comments on our manuscript. We address each major comment below and indicate where revisions will be made to clarify the model, strengthen the proof presentation, and support the optimality claims.

read point-by-point responses

Referee: [§2–3] §2–3 (model): The impossibility result is derived under the axiom that every hardware transaction must be explicitly instrumented to detect contention with software transactions and that progressiveness permits aborts only on data conflicts. The manuscript should explicitly state whether these are derived from weaker primitives or taken as axioms; if the latter, a brief argument is needed showing that relaxing either (e.g., an HTM that can observe software metadata without explicit instrumentation while preserving progress) would not falsify the necessity of incremental validation.

Authors: These assumptions are axioms of the standard HyTM model used throughout the literature: HTM transactions have no automatic visibility into STM metadata and must explicitly access it to detect contention, while progressiveness is defined to allow aborts solely on data conflicts. We will revise §2 to state this explicitly and add a short paragraph arguing that relaxing the instrumentation axiom would require an HTM with built-in STM awareness, which is outside the scope of current hardware and would constitute a different (non-hybrid) model. The progressiveness definition is standard and relaxing it would weaken the result rather than invalidate it. revision: yes
Referee: [Theorem 1] Theorem 1 (impossibility): The proof sketch in the abstract claims the result survives non-speculative metadata reads by HTM, but the full argument must be checked for whether it still forces incremental validation when hardware transactions can read metadata without instrumentation while still guaranteeing the progressiveness definition; a concrete counter-example construction under a relaxed model would strengthen the claim.

Authors: The full proof of Theorem 1 (detailed in the appendix) already accounts for non-speculative metadata reads by hardware transactions; the impossibility of avoiding incremental validation on the software path holds because even non-speculative reads cannot eliminate the need for software transactions to re-validate to preserve progressiveness under the stated concurrency model. We will expand the main-text proof sketch to make this case explicit. A separate counter-example under a further-relaxed model is unnecessary, as the current model already incorporates the non-speculative capability the referee describes. revision: partial
Referee: [§5] §5 (algorithms): The optimality claim for hardware instrumentation in the presented opaque HyTM algorithms is load-bearing for the practical contribution; the manuscript should include a matching lower bound showing that any algorithm providing the same progressiveness subset must incur at least as much instrumentation on the hardware path.

Authors: The algorithms are shown to be optimal by construction: they achieve the maximal progressiveness subset while using the minimal number of metadata accesses on every hardware transaction path, as any fewer accesses would allow a data conflict to go undetected and violate progressiveness. We agree that an explicit matching lower bound would make this clearer and will add a short lower-bound argument in §5 of the revised version. revision: yes

Circularity Check

0 steps flagged

No circularity; impossibility result and algorithms are self-contained derivations from explicit model axioms.

full rationale

The paper derives a lower bound (software transactions in progressive HyTMs require incremental validation) from the stated model definitions of instrumentation, progressiveness (aborts only on data conflicts), and HTM behavior. This is contrasted explicitly with STMs and holds under the additional condition of non-speculative metadata reads. No equations or claims reduce a 'prediction' to a fitted input, no self-citation chain bears the central premise, and no ansatz or uniqueness theorem is imported from prior author work. Algorithm constructions are presented as separate contributions exploring trade-offs. The derivation is therefore independent of its inputs and self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The paper rests on standard definitions of progressiveness and the requirement that hardware transactions be instrumented; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Progressiveness allows transactional aborts only due to data conflicts.
Invoked in the statement of the main impossibility result.
domain assumption Hardware transactions must be instrumented to detect contention with software transactions.
Stated as the reason hybrid TMs require instrumentation.

pith-pipeline@v0.9.0 · 5822 in / 1251 out tokens · 22639 ms · 2026-05-25T02:26:16.719577+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We show that, unlike in progressive STMs, software transactions in progressive HyTMs cannot avoid incremental validation... even if hardware transactions can read metadata non-speculatively.
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Theorem 3. ... slow-path read-only transaction Tk ... incurs a time complexity of Ω(m²); m = |Rset(Tk)|.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

34 extracted references · 16 canonical work pages

[1]

http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf

Advanced Synchronization Facility Proposed Architectural Speciﬁcation, March 2009. http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf

2009
[2]

Moll, and Nir Shavit

Yehuda Afek, Alexander Matveev, Oscar R. Moll, and Nir Shavit. Amalgamated lock-elision. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15 , pages 309–324, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_21 , doi:10.1007/978-3-662-48653-5_21

work page doi:10.1007/978-3-662-48653-5_21 2015
[3]

Inherent limitations of hybrid transactional memory

Dan Alistarh, Justin Kopinsky, Petr Kuznetsov, Srivatsan Ravi, and Nir Shavit. Inherent limitations of hybrid transactional memory. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15, pages 185–199, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_13 , doi: 10.1007/978-3-662-48653-5_13

work page doi:10.1007/978-3-662-48653-5_13 2015
[4]

Safety of deferred update in transactional memory

Hagit Attiya, Sandeep Hans, Petr Kuznetsov, and Srivatsan Ravi. Safety of deferred update in transactional memory. In 2013 IEEE 33rd Int. Conf. on Distributed Computing Systems (ICDCS), pages 601–610, Los Alamitos, CA, USA, 2013. IEEE Computer Society. doi:http: //doi.ieeecomputersociety.org/10.1109/ICDCS.2013.57

work page doi:10.1109/icdcs.2013.57 2013
[5]

Inherent limitations on disjoint-access parallel implementations of transactional memory

Hagit Attiya, Eshcar Hillel, and Alessia Milani. Inherent limitations on disjoint-access parallel implementations of transactional memory. Theory of Computing Systems , 49(4):698–719, 2011. URL: http://dx.doi.org/10.1007/s00224-010-9304-5 , doi:10.1007/s00224-010-9304-5

work page doi:10.1007/s00224-010-9304-5 2011
[6]

Investigating the performance of hardware transactions on a multi-socket machine

Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In Proceedings of 28th ACM Sym. on Parallelism in Algorithms and Architectures, SPAA ’16 , pages 121–132, 2016. URL: http://doi.acm.org/10. 1145/2935764.2935796, doi:10.1145/2935764.2935796

work page doi:10.1145/2935764.2935796 2016
[7]

Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory

Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, and Maurice Herlihy. Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory. In Int. Conf. on Par. Arch. and Compilation, PACT ’14 , pages 187–200, 2014. URL: http://doi.acm.org/10.1145/ 2628071.2628086, doi:10.1145/2628071.2628086

work page doi:10.1145/2628071.2628086 2014
[8]

Scott, and Michael F

Luke Dalessandro, Francois Carouge, Sean White, Yossi Lev, Mark Moir, Michael L. Scott, and Michael F. Spear. Hybrid NOrec: a case study in the eﬀectiveness of best eﬀort hardware transactional 11 memory. In ASPLOS ’11, pages 39–52. ACM, 2011. URL: http://doi.acm.org/10.1145/1950365. 1950373

work page doi:10.1145/1950365 2011
[9]

Spear, and Michael L

Luke Dalessandro, Michael F. Spear, and Michael L. Scott. Norec: Streamlining stm by abolishing ownership records. SIGPLAN Not. , 45(5):67–78, January 2010. URL: http://doi.acm.org/10. 1145/1837853.1693464, doi:10.1145/1837853.1693464

work page doi:10.1145/1837853.1693464 2010
[10]

Hybrid transactional memory

Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. Hybrid transactional memory. SIGPLAN Not., 41(11):336–346, October 2006. URL: http://doi. acm.org/10.1145/1168918.1168900, doi:10.1145/1168918.1168900

work page doi:10.1145/1168918.1168900 2006
[11]

Transactional locking ii

Dave Dice, Ori Shalev, and Nir Shavit. Transactional locking ii. In Proceedings of the 20th International Conference on Distributed Computing, DISC’06, pages 194–208, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11864219_14, doi:10.1007/11864219_14

work page doi:10.1007/11864219_14 2006
[12]

K. Fraser. Practical lock-freedom. Technical report, Cambridge University Computer Laboratory, 2003

2003
[13]

Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory

Rachid Guerraoui and Michal Kapalka. Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory. Morgan and Claypool, 2010

2010
[14]

Scherer, III

Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer, III. Software transactional memory for dynamic-sized data structures. In Proc. of 22nd Int. Sym. on Principles of Distr. Comp. , PODC ’03, pages 92–101, New York, NY, USA, 2003. ACM. URL: http://doi.acm.org/10.1145/ 872035.872048, doi:10.1145/872035.872048

work page doi:10.1145/872035.872048 2003
[15]

Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, pages 289–300, 1993

1993
[16]

Hughes, Partha Kundu, and Anthony Nguyen

Sanjeev Kumar, Michael Chu, Christopher J. Hughes, Partha Kundu, and Anthony Nguyen. Hybrid transactional memory. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , PPoPP ’06, pages 209–220, New York, NY, USA, 2006. ACM. URL: http://doi.acm.org/10.1145/1122971.1123003

work page doi:10.1145/1122971.1123003 2006
[17]

Progressive transactional memory in time and space

Petr Kuznetsov and Srivatsan Ravi. Progressive transactional memory in time and space. In Proceed- ings of 13th Int. Conf. on Parallel Computing Technologies, PaCT ’15 , pages 410–425, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21909-7_40 , doi:10.1007/978-3-319-21909-7_40

work page doi:10.1007/978-3-319-21909-7_40 2015
[18]

Hung Q. Le, G. L. Guthrie, Derek Williams, Maged M. Michael, Brad Frey, William J. Starke, Cathy May, Rei Odaira, and Takuya Nakaike. Transactional memory support in the IBM POWER8 processor. IBM Journal of Research and Development , 59(1), 2015. URL: http://dx.doi.org/10. 1147/JRD.2014.2380199

work page arXiv 2015
[19]

Phtm: Phased transactional memory

Yossi Lev, Mark Moir, and Dan Nussbaum. Phtm: Phased transactional memory. In In Workshop on Transactional Computing (Transact), 2007

2007
[20]

Reduced hardware transactions: a new approach to hybrid transactional memory

Alexander Matveev and Nir Shavit. Reduced hardware transactions: a new approach to hybrid transactional memory. In Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures, pages 11–22. ACM, 2013

2013
[21]

Michael, and Hisanobu Tomari

Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, and Hisanobu Tomari. Quantita- tive comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In Proc. of 42nd Int. Sym. on Comp. Arch. , ISCA ’15, pages 144–157, NY, USA,
[22]

URL: http://doi.acm.org/10.1145/2749469.2750403

work page doi:10.1145/2749469.2750403
[23]

Andrew T. Nguyen. Investigation of hardware transactional memory. 2015. http://groups.csail.mit.edu/mag/Andrew-Nguyen-Thesis.pdf

2015
[24]

Ravi Rajwar and James R. Goodman. Speculative lock elision: Enabling highly concurrent mul- tithreaded execution. In Proc. of 34th ACM/IEEE Int. Sym. on Microarchitecture , MICRO ’01, pages 294–305, Washington, DC, USA, 2001. URL: http://dl.acm.org/citation.cfm?id=563998. 564036. 12

2001
[25]

Transactional Synchronization in Haswell, 2012

James Reinders. Transactional Synchronization in Haswell, 2012. http://software.intel.com/ en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/

2012
[26]

Optimizing hybrid transactional memory: The importance of nonspeculative operations

Torvald Riegel, Patrick Marlier, Martin Nowack, Pascal Felber, and Christof Fetzer. Optimizing hybrid transactional memory: The importance of nonspeculative operations. In Proc. of 23rd ACM Sym. on Parallelism in Algs. and Arch. , pages 53–64. ACM, 2011

2011
[27]

Software transactional memory

Nir Shavit and Dan Touitou. Software transactional memory. In Principles of Distributed Computing (PODC), pages 204–213, 1995

1995
[28]

Yoo, Christopher J

Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. Performance evaluation of intel® transactional synchronization extensions for high-performance computing. In Proceedings of Int. Conf. on High Perf. Computing, Networking, Storage and Analysis , SC ’13, pages 19:1– 19:11, New York, NY, USA, 2013. URL: http://doi.acm.org/10.1145/250321...

work page doi:10.1145/2503210.2503232 2013
[29]

E2) that does not include e1 (and resp

be the longest preﬁx of E1 (and resp. E2) that does not include e1 (and resp. e2). Since before accessing b, the execution is step contention-free for T1, E·E′ 1·E′ 2 is an execution ofM. By assumption of lemma, T1 and T2 do not conﬂict in E′ 1·E′
[30]

By construction, E1·E′ 2 is indistinguishable to T2 from E′ 1·E′
[31]

We now consider two cases:

Hence, T1 and T2 are poised to apply contending events e1 and e2 on b in the execution ˜E =E′ 1·E′ 2. We now consider two cases:
[32]

Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive

(e1 is a nontrivial event) After ˜E·e1, b is contained in the tracking set of process p1 in exclusive mode and in the extension ˜E·e1·e2, we have thatτ1 is invalidated. Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive
[33]

Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode

(e1 is a trivial event) Recall that e1 may be potentially an event involving a direct access. Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode. Clearly, we have an extension ˜E·e2·e1 in which τ2 is invalidated. Thus transaction T2 must return A2 in any extension of E·e2·e1—a contradiction to t...
[34]

readφ(Xi)→Aφ)

is the complete step contention-free execution fragment of readφ(Xi)→v (and resp. readφ(Xi)→Aφ). Note that in the execution so deﬁned above, we assume that each fast-path transactionsTi andTℓ;ℓ≤ (i−1) are executed by distinct processes. Claim 9. For all i∈{ 2,...,m } and ℓ≤ (i− 1),M has an execution of the form Ei 1ℓ or Ei 2ℓ. Proof. Note that by our assu...

[1] [1]

http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf

Advanced Synchronization Facility Proposed Architectural Speciﬁcation, March 2009. http:// developer.amd.com/wordpress/media/2013/09/45432-ASF_Spec_2.1.pdf

2009

[2] [2]

Moll, and Nir Shavit

Yehuda Afek, Alexander Matveev, Oscar R. Moll, and Nir Shavit. Amalgamated lock-elision. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15 , pages 309–324, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_21 , doi:10.1007/978-3-662-48653-5_21

work page doi:10.1007/978-3-662-48653-5_21 2015

[3] [3]

Inherent limitations of hybrid transactional memory

Dan Alistarh, Justin Kopinsky, Petr Kuznetsov, Srivatsan Ravi, and Nir Shavit. Inherent limitations of hybrid transactional memory. In Proceedings of 29th Int. Sym. on Distributed Computing, DISC ’15, pages 185–199, 2015. URL: http://dx.doi.org/10.1007/978-3-662-48653-5_13 , doi: 10.1007/978-3-662-48653-5_13

work page doi:10.1007/978-3-662-48653-5_13 2015

[4] [4]

Safety of deferred update in transactional memory

Hagit Attiya, Sandeep Hans, Petr Kuznetsov, and Srivatsan Ravi. Safety of deferred update in transactional memory. In 2013 IEEE 33rd Int. Conf. on Distributed Computing Systems (ICDCS), pages 601–610, Los Alamitos, CA, USA, 2013. IEEE Computer Society. doi:http: //doi.ieeecomputersociety.org/10.1109/ICDCS.2013.57

work page doi:10.1109/icdcs.2013.57 2013

[5] [5]

Inherent limitations on disjoint-access parallel implementations of transactional memory

Hagit Attiya, Eshcar Hillel, and Alessia Milani. Inherent limitations on disjoint-access parallel implementations of transactional memory. Theory of Computing Systems , 49(4):698–719, 2011. URL: http://dx.doi.org/10.1007/s00224-010-9304-5 , doi:10.1007/s00224-010-9304-5

work page doi:10.1007/s00224-010-9304-5 2011

[6] [6]

Investigating the performance of hardware transactions on a multi-socket machine

Trevor Brown, Alex Kogan, Yossi Lev, and Victor Luchangco. Investigating the performance of hardware transactions on a multi-socket machine. In Proceedings of 28th ACM Sym. on Parallelism in Algorithms and Architectures, SPAA ’16 , pages 121–132, 2016. URL: http://doi.acm.org/10. 1145/2935764.2935796, doi:10.1145/2935764.2935796

work page doi:10.1145/2935764.2935796 2016

[7] [7]

Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory

Irina Calciu, Justin Gottschlich, Tatiana Shpeisman, Gilles Pokam, and Maurice Herlihy. Invyswell: a hybrid transactional memory for haswell’s restricted transactional memory. In Int. Conf. on Par. Arch. and Compilation, PACT ’14 , pages 187–200, 2014. URL: http://doi.acm.org/10.1145/ 2628071.2628086, doi:10.1145/2628071.2628086

work page doi:10.1145/2628071.2628086 2014

[8] [8]

Scott, and Michael F

Luke Dalessandro, Francois Carouge, Sean White, Yossi Lev, Mark Moir, Michael L. Scott, and Michael F. Spear. Hybrid NOrec: a case study in the eﬀectiveness of best eﬀort hardware transactional 11 memory. In ASPLOS ’11, pages 39–52. ACM, 2011. URL: http://doi.acm.org/10.1145/1950365. 1950373

work page doi:10.1145/1950365 2011

[9] [9]

Spear, and Michael L

Luke Dalessandro, Michael F. Spear, and Michael L. Scott. Norec: Streamlining stm by abolishing ownership records. SIGPLAN Not. , 45(5):67–78, January 2010. URL: http://doi.acm.org/10. 1145/1837853.1693464, doi:10.1145/1837853.1693464

work page doi:10.1145/1837853.1693464 2010

[10] [10]

Hybrid transactional memory

Peter Damron, Alexandra Fedorova, Yossi Lev, Victor Luchangco, Mark Moir, and Daniel Nussbaum. Hybrid transactional memory. SIGPLAN Not., 41(11):336–346, October 2006. URL: http://doi. acm.org/10.1145/1168918.1168900, doi:10.1145/1168918.1168900

work page doi:10.1145/1168918.1168900 2006

[11] [11]

Transactional locking ii

Dave Dice, Ori Shalev, and Nir Shavit. Transactional locking ii. In Proceedings of the 20th International Conference on Distributed Computing, DISC’06, pages 194–208, Berlin, Heidelberg, 2006. Springer-Verlag. URL: http://dx.doi.org/10.1007/11864219_14, doi:10.1007/11864219_14

work page doi:10.1007/11864219_14 2006

[12] [12]

K. Fraser. Practical lock-freedom. Technical report, Cambridge University Computer Laboratory, 2003

2003

[13] [13]

Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory

Rachid Guerraoui and Michal Kapalka. Principles of Transactional Memory, Synthesis Lectures on Distributed Computing Theory. Morgan and Claypool, 2010

2010

[14] [14]

Scherer, III

Maurice Herlihy, Victor Luchangco, Mark Moir, and William N. Scherer, III. Software transactional memory for dynamic-sized data structures. In Proc. of 22nd Int. Sym. on Principles of Distr. Comp. , PODC ’03, pages 92–101, New York, NY, USA, 2003. ACM. URL: http://doi.acm.org/10.1145/ 872035.872048, doi:10.1145/872035.872048

work page doi:10.1145/872035.872048 2003

[15] [15]

Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural support for lock-free data structures. In ISCA, pages 289–300, 1993

1993

[16] [16]

Hughes, Partha Kundu, and Anthony Nguyen

Sanjeev Kumar, Michael Chu, Christopher J. Hughes, Partha Kundu, and Anthony Nguyen. Hybrid transactional memory. In Proceedings of the Eleventh ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming , PPoPP ’06, pages 209–220, New York, NY, USA, 2006. ACM. URL: http://doi.acm.org/10.1145/1122971.1123003

work page doi:10.1145/1122971.1123003 2006

[17] [17]

Progressive transactional memory in time and space

Petr Kuznetsov and Srivatsan Ravi. Progressive transactional memory in time and space. In Proceed- ings of 13th Int. Conf. on Parallel Computing Technologies, PaCT ’15 , pages 410–425, 2015. URL: http://dx.doi.org/10.1007/978-3-319-21909-7_40 , doi:10.1007/978-3-319-21909-7_40

work page doi:10.1007/978-3-319-21909-7_40 2015

[18] [18]

Hung Q. Le, G. L. Guthrie, Derek Williams, Maged M. Michael, Brad Frey, William J. Starke, Cathy May, Rei Odaira, and Takuya Nakaike. Transactional memory support in the IBM POWER8 processor. IBM Journal of Research and Development , 59(1), 2015. URL: http://dx.doi.org/10. 1147/JRD.2014.2380199

work page arXiv 2015

[19] [19]

Phtm: Phased transactional memory

Yossi Lev, Mark Moir, and Dan Nussbaum. Phtm: Phased transactional memory. In In Workshop on Transactional Computing (Transact), 2007

2007

[20] [20]

Reduced hardware transactions: a new approach to hybrid transactional memory

Alexander Matveev and Nir Shavit. Reduced hardware transactions: a new approach to hybrid transactional memory. In Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures, pages 11–22. ACM, 2013

2013

[21] [21]

Michael, and Hisanobu Tomari

Takuya Nakaike, Rei Odaira, Matthew Gaudet, Maged M. Michael, and Hisanobu Tomari. Quantita- tive comparison of hardware transactional memory for Blue Gene/Q, zEnterprise EC12, Intel Core, and POWER8. In Proc. of 42nd Int. Sym. on Comp. Arch. , ISCA ’15, pages 144–157, NY, USA,

[22] [22]

URL: http://doi.acm.org/10.1145/2749469.2750403

work page doi:10.1145/2749469.2750403

[23] [23]

Andrew T. Nguyen. Investigation of hardware transactional memory. 2015. http://groups.csail.mit.edu/mag/Andrew-Nguyen-Thesis.pdf

2015

[24] [24]

Ravi Rajwar and James R. Goodman. Speculative lock elision: Enabling highly concurrent mul- tithreaded execution. In Proc. of 34th ACM/IEEE Int. Sym. on Microarchitecture , MICRO ’01, pages 294–305, Washington, DC, USA, 2001. URL: http://dl.acm.org/citation.cfm?id=563998. 564036. 12

2001

[25] [25]

Transactional Synchronization in Haswell, 2012

James Reinders. Transactional Synchronization in Haswell, 2012. http://software.intel.com/ en-us/blogs/2012/02/07/transactional-synchronization-in-haswell/

2012

[26] [26]

Optimizing hybrid transactional memory: The importance of nonspeculative operations

Torvald Riegel, Patrick Marlier, Martin Nowack, Pascal Felber, and Christof Fetzer. Optimizing hybrid transactional memory: The importance of nonspeculative operations. In Proc. of 23rd ACM Sym. on Parallelism in Algs. and Arch. , pages 53–64. ACM, 2011

2011

[27] [27]

Software transactional memory

Nir Shavit and Dan Touitou. Software transactional memory. In Principles of Distributed Computing (PODC), pages 204–213, 1995

1995

[28] [28]

Yoo, Christopher J

Richard M. Yoo, Christopher J. Hughes, Konrad Lai, and Ravi Rajwar. Performance evaluation of intel® transactional synchronization extensions for high-performance computing. In Proceedings of Int. Conf. on High Perf. Computing, Networking, Storage and Analysis , SC ’13, pages 19:1– 19:11, New York, NY, USA, 2013. URL: http://doi.acm.org/10.1145/250321...

work page doi:10.1145/2503210.2503232 2013

[29] [29]

E2) that does not include e1 (and resp

be the longest preﬁx of E1 (and resp. E2) that does not include e1 (and resp. e2). Since before accessing b, the execution is step contention-free for T1, E·E′ 1·E′ 2 is an execution ofM. By assumption of lemma, T1 and T2 do not conﬂict in E′ 1·E′

[30] [30]

By construction, E1·E′ 2 is indistinguishable to T2 from E′ 1·E′

[31] [31]

We now consider two cases:

Hence, T1 and T2 are poised to apply contending events e1 and e2 on b in the execution ˜E =E′ 1·E′ 2. We now consider two cases:

[32] [32]

Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive

(e1 is a nontrivial event) After ˜E·e1, b is contained in the tracking set of process p1 in exclusive mode and in the extension ˜E·e1·e2, we have thatτ1 is invalidated. Thus, by Remark 1, transaction T1 must return A1 in any extension of E·e1·e2—a contradiction to the assumption that M is progressive

[33] [33]

Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode

(e1 is a trivial event) Recall that e1 may be potentially an event involving a direct access. Consider the execution ˜E·e2 following which b is contained in the tracking set of process p2 in exclusive mode. Clearly, we have an extension ˜E·e2·e1 in which τ2 is invalidated. Thus transaction T2 must return A2 in any extension of E·e2·e1—a contradiction to t...

[34] [34]

readφ(Xi)→Aφ)

is the complete step contention-free execution fragment of readφ(Xi)→v (and resp. readφ(Xi)→Aφ). Note that in the execution so deﬁned above, we assume that each fast-path transactionsTi andTℓ;ℓ≤ (i−1) are executed by distinct processes. Claim 9. For all i∈{ 2,...,m } and ℓ≤ (i− 1),M has an execution of the form Ei 1ℓ or Ei 2ℓ. Proof. Note that by our assu...