pith. machine review for the scientific record. sign in

arxiv: 2605.04172 · v1 · submitted 2026-05-05 · 💻 cs.AR · cs.LO

Recognition: 2 theorem links

· Lean Theorem

t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:25 UTC · model grok-4.3

classification 💻 cs.AR cs.LO
keywords memory consistency modelprogrammable memory hierarchyformal verificationISA-level semanticsaccelerator programmingmicroarchitectural modelinglitmus teststākō accelerator
0
0 comments X

The pith

An ISA-level memory consistency model for tākō defines the allowed behaviors of programs that use user-defined callbacks on cache events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a high-level memory consistency model at the ISA level for the tākō programmable memory hierarchy. This model specifies the semantics of operations triggered by cache misses, evictions, and writebacks so that programmers can check their code formally. The authors prove the model is sound by creating a detailed implementation model of the hardware and confirming that every execution allowed by the implementation stays within the high-level rules. If correct, the model gives programmers a reliable way to reason about counterintuitive outcomes from custom data-movement logic. The extended version adds litmus tests that further map out safe and unsafe program behaviors.

Core claim

We develop an ISA-level memory consistency model (MCM) for tākō that captures the semantics of its operation, and we show how it enables programmers to formally reason about their tākō programs. We also prove the soundness of this ISA-level MCM by constructing a detailed tākō implementation model and verifying that all executions of the implementation model are allowed by our ISA-level MCM. Along the way, we discover useful insights about microarchitectural modeling and verification that are applicable to hardware in general.

What carries the argument

The ISA-level memory consistency model for tākō, which defines the ordering and visibility rules for memory operations in the presence of user-defined callback functions triggered by cache events.

If this is right

  • Programmers can use the model and associated litmus tests to check whether their tākō programs produce only allowed behaviors.
  • Any implementation that follows the detailed model is guaranteed to respect the high-level consistency rules.
  • The same modeling approach yields reusable insights for verifying other accelerators with complex data-movement logic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of the model could let compiler writers or runtime systems automatically insert safe callback logic.
  • The verification method may scale to other programmable hardware features that alter the hardware-software interface.
  • Further litmus tests beyond those in the extended version could expose additional corner cases in real silicon.

Load-bearing premise

The constructed implementation model accurately represents the real tākō hardware behavior, and the verification process covers all relevant executions and litmus tests without gaps.

What would settle it

A single execution trace on actual tākō hardware that produces a memory ordering or visibility outcome forbidden by the ISA-level MCM.

Figures

Figures reproduced from arXiv: 2605.04172 by Manos Kapritsos, Pranav Srinivasan, Yatin A. Manerkar.

Figure 1
Figure 1. Figure 1: Image and caption from [55] showing the organization view at source ↗
Figure 2
Figure 2. Figure 2: (a) A sample tak¨ o program. (b) A possible execution ¯ of said program. The evictions and misses from the cache, which were previously hidden hardware details, now impact the outcome of the program. and non-volatile memory transactions. Thus, insights gained from modeling and verifying tak¨ o are likely to be more broadly ¯ applicable than those gained from modeling a more specialized accelerator. Finally… view at source ↗
Figure 3
Figure 3. Figure 3: Image from [55] showing the order of events in t view at source ↗
Figure 5
Figure 5. Figure 5: (a) The mp (message passing) litmus test. All addresses are assumed to be 0 initially. (b) An execution graph of mp that is outlawed under sequential consistency (SC). Figure 4b explains this reasoning. If r1=2, the value for (i2) must have been generated as the result of an OnMiss A instead of by (i1). Thus, an interspersed OnMiss must have run between (i1) and (i2). However, since (i1) would have brought… view at source ↗
Figure 6
Figure 6. Figure 6: All axioms for our tak¨ o MCM. Executions must satisfy all axioms to be allowed. ¯ [A] is the elements of type A in relation form [59]. A × B are pairs of an element of type A and an element of type B. A \ B is the elements of A that are not in B. Semicolons (;) denote relational composition, e.g., e1; e2 is two relations e1 and e2 where the destination node of e1 is the source node of e2. R−1 is the inver… view at source ↗
Figure 7
Figure 7. Figure 7: (a-e) Faulty candidate execution graphs for Figure 4a’s t view at source ↗
Figure 8
Figure 8. Figure 8: A pattern forbidden by traditional happens-before ( view at source ↗
Figure 9
Figure 9. Figure 9: (a) The mprmw (no callback) and mpcb (OnMiss for [b]) litmus tests. (b) A forbidden execution of mprmw where the RMWs augment the hb relation to outlaw the reading of 0 for [a]. (c) An analogous forbidden execution of mpcb showing the same pattern when [b] is a phantom address. needing to understand the intricacies of a tak¨ o implementation. ¯ §V shows how to use our MCM to analyze tak¨ o programs. ¯ V. P… view at source ↗
Figure 10
Figure 10. Figure 10: (a) The icbsb litmus test. (b) An execution of icbsb that demonstrates the intra-callback instruction re￾ordering that is allowed by our MCM. designing the callback engine. In particular, tak¨ o designs that ¯ buffer and reorder memory operations in the engine can still use our MCM. D. FlushRange as a Synchronization Primitive While the use of an OnMiss-generated value (Me) by a Rcb or Wcb induces an hb e… view at source ↗
Figure 12
Figure 12. Figure 12: (a) The phiR (with OnWB (1)) and phiNR (with OnWB (2)) litmus tests. (b) Execution snippet showing phiR race due to OnWB from core 0’s write occurring after FlushRange on core 1. (c) Execution snippet showing how phiNR avoids the race by branching on evicted value in the OnWB. of a program (named phiR and phiNR) in which multiple cores concurrently write updates to a phantom address [x] and the OnWB of [x… view at source ↗
Figure 13
Figure 13. Figure 13: (a) hatsR and hatsNR litmus tests. (b) Execution snippet showing hatsR race due to OnEvict writing to the log post-traversal. (c) Snippet showing how hatsNR elimi￾nates the race by only logging valid edges in the OnEvict. graph traversal, which is more difficult to write correctly because it has writes in OnEvict callbacks that can run an arbitrary number of times. F. A tak¨ o Application Case Study: HATS… view at source ↗
Figure 14
Figure 14. Figure 14: Our hierarchical transition system instantiated with view at source ↗
Figure 16
Figure 16. Figure 16: A Venn diagram motivating inductive invariants. view at source ↗
Figure 17
Figure 17. Figure 17: Two formulations of an axiom restricting the number view at source ↗
Figure 18
Figure 18. Figure 18: The two components of our soundness proof. view at source ↗
read the original abstract

Accelerators provide large performance and energy-efficiency benefits, but can significantly change the hardware-software interface. The t\"{a}k\={o} programmable memory hierarchy accelerates data movement by enabling programmers to run user-defined callback functions triggered by cache misses, evictions, and writebacks. However, it also leads to drastically increased complexity and counterintuitive outcomes. In response, we develop an ISA-level memory consistency model (MCM) for t\"{a}k\={o} that captures the semantics of its operation, and we show how it enables programmers to formally reason about their t\"{a}k\={o} programs. We also prove the soundness of this ISA-level MCM by constructing a detailed t\"{a}k\={o} implementation model and verifying that all executions of the implementation model are allowed by our ISA-level MCM. Along the way, we discover useful insights about microarchitectural modeling and verification that are applicable to hardware in general. This is the extended version of the ISCA 2026 paper "t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies". This version adds material on additional litmus tests to Section V to further explore the programmability of t\"{a}k\={o} using our ISA-level MCM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper develops an ISA-level memory consistency model (MCM) for the tākō programmable memory hierarchy, which accelerates data movement via user-defined callbacks on cache misses, evictions, and writebacks. It demonstrates how the MCM enables formal reasoning about tākō programs and proves soundness by constructing a detailed implementation model and verifying that all executions of this model are permitted by the MCM. The extended version adds litmus tests in Section V to further explore programmability.

Significance. If the implementation model faithfully represents hardware, the work supplies a formal foundation for safe programming of complex programmable memory hierarchies, addressing counterintuitive behaviors that arise from custom callbacks and coherence. The explicit construction and verification of the implementation model against the MCM, plus the additional litmus tests, constitute a concrete contribution to microarchitectural modeling; the reported insights on verification techniques are broadly applicable beyond tākō.

major comments (2)
  1. [Abstract] Abstract and soundness argument: the claim that the ISA-level MCM 'captures the semantics of its operation' rests on consistency with the authors' constructed implementation model; no independent validation (comparison to RTL, cycle-accurate simulator, or silicon measurements of callback triggering, eviction timing, or coherence interactions) is supplied. This is load-bearing for the utility claim that programmers can rely on the MCM for real tākō programs.
  2. [Section V] Section V (additional litmus tests): the new tests examine MCM expressiveness and programmability but do not test alignment between the implementation model and observable hardware behavior. Because the central soundness result is internal to the two artifacts the authors define, this omission leaves open whether the MCM is too strong or too weak for actual executions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for acknowledging the significance of providing a formal foundation for programmable memory hierarchies. We address each major comment below, clarifying the intended scope of our contributions while offering targeted revisions to improve clarity.

read point-by-point responses
  1. Referee: [Abstract] Abstract and soundness argument: the claim that the ISA-level MCM 'captures the semantics of its operation' rests on consistency with the authors' constructed implementation model; no independent validation (comparison to RTL, cycle-accurate simulator, or silicon measurements of callback triggering, eviction timing, or coherence interactions) is supplied. This is load-bearing for the utility claim that programmers can rely on the MCM for real tākō programs.

    Authors: We thank the referee for this observation. The soundness proof establishes that every execution of the detailed implementation model is permitted by the ISA-level MCM, providing a rigorous guarantee that the MCM soundly abstracts the operational semantics encoded in the implementation model (including callback triggers, evictions, writebacks, and coherence). This enables programmers to reason about tākō programs at the ISA level without directly modeling microarchitectural details. While we agree that direct comparison to RTL, cycle-accurate simulation, or silicon measurements would provide additional empirical support for the implementation model's fidelity to hardware, such validation lies outside the scope of this work. Our focus is the development of the formal models, the verification methodology, and the resulting insights into microarchitectural modeling that apply more broadly. We will add a dedicated paragraph in the introduction (and reference it in the abstract) explicitly discussing the assumptions underlying the implementation model and noting the value of future hardware validation. revision: partial

  2. Referee: [Section V] Section V (additional litmus tests): the new tests examine MCM expressiveness and programmability but do not test alignment between the implementation model and observable hardware behavior. Because the central soundness result is internal to the two artifacts the authors define, this omission leaves open whether the MCM is too strong or too weak for actual executions.

    Authors: The litmus tests added to Section V of the extended version are intended to demonstrate the MCM's expressiveness and to show how it supports formal reasoning about tākō programmability, including counterintuitive outcomes from custom callbacks. They are not designed to validate alignment with hardware observations, as that would require access to concrete executions from RTL or silicon, which is not part of this paper's contribution. The soundness result is deliberately internal to the two models we define, as this provides a machine-checked guarantee that the MCM is neither too strong nor too weak relative to the implementation model. If the implementation model faithfully captures tākō hardware (as constructed from its specification), the MCM inherits that fidelity. We will revise Section V to more explicitly connect the litmus tests to the soundness theorem and to restate the internal nature of the proof. revision: partial

Circularity Check

0 steps flagged

No circularity: soundness is a standard refinement proof between separately defined models

full rationale

The paper defines an ISA-level MCM capturing tākō semantics, constructs a separate detailed implementation model, and proves that every execution of the implementation model is permitted by the MCM. This is a conventional refinement argument with no self-definitional loop, no fitted parameter renamed as prediction, and no load-bearing self-citation that reduces the central claim to its own inputs. The derivation remains self-contained once the two models are accepted as given; external fidelity of the implementation model to silicon is a separate validation question, not a circularity in the proof chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

With only the abstract available, no specific free parameters, axioms, or invented entities can be identified from the text; the work appears to rely on standard formal modeling techniques for memory consistency.

pith-pipeline@v0.9.0 · 5559 in / 1026 out tokens · 30852 ms · 2026-05-08T18:25:39.509197+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

74 extracted references · 53 canonical work pages · 1 internal anchor

  1. [1]

    Shared memory consistency models: A tutorial,

    S. Adve and K. Gharachorloo, “Shared memory consistency models: A tutorial,”IEEE Computer, vol. 29, no. 12, pp. 66–76, 1996

  2. [2]

    Weak ordering—a new definition,

    S. V . Adve and M. D. Hill, “Weak ordering—a new definition,” inProceedings of the 17th Annual International Symposium on Computer Architecture, ser. ISCA ’90. New York, NY , USA: Association for Computing Machinery, 1990, p. 2–14. [Online]. Available: https://doi.org/10.1145/325164.325100

  3. [3]

    GPU Concurrency: Weak Behaviours and Programming Assumptions,

    J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson, “GPU Concurrency: Weak Behaviours and Programming Assumptions,” inProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. Istanbul Turkey: ACM, Mar. 2015, pp. 577–591. [Onlin...

  4. [4]

    In: Touili, T., Cook, B., Jackson, P.B

    J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, “Fences in Weak Memory Models,” inComputer Aided Verification, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y . Vardi, G. Weikum, T. Touili, B. Cook, and P. Jackson, Eds. Berlin...

  5. [5]

    Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory,

    J. Alglave, L. Maranget, and M. Tautschnig, “Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory,”ACM Transactions on Programming Languages and Systems, vol. 36, no. 2, pp. 7:1–7:74, Jul. 2014. [Online]. Available: https://doi.org/10.1145/ 2627752

  6. [6]

    Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO Architectures,

    G. Ambal, B. Dongol, H. Eran, V . Klimis, O. Lahav, and A. Raad, “Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO Architectures,”Proc. ACM Program. Lang., vol. 8, no. OOPSLA2, pp. 341:1982–341:2009, Oct

  7. [7]

    Available: https://dl.acm.org/doi/10.1145/3689781

    [Online]. Available: https://dl.acm.org/doi/10.1145/3689781

  8. [8]

    Cortex-A9 MPCore Programmer Advice Notice Read-after- Read Hazards,

    ARM, “Cortex-A9 MPCore Programmer Advice Notice Read-after- Read Hazards,” ARM, Arm Reference 761319, 2011. [Online]. Available: https://developer.arm.com/documentation/uan0004/a/

  9. [9]

    Overhauling SC Atomics in C11 and OpenCL,

    M. Batty, A. F. Donaldson, and J. Wickerson, “Overhauling SC Atomics in C11 and OpenCL,” inProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Jan. 2016, pp. 634–648, arXiv:1503.07073 [cs]. [Online]. Available: http://arxiv.org/abs/1503.07073

  10. [10]

    Mathematizing C++ concurrency,

    M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber, “Mathematizing C++ concurrency,” inProceedings of the 38th annual ACM SIGPLAN- SIGACT symposium on Principles of programming languages, ser. POPL ’11. New York, NY , USA: Association for Computing Machinery, Jan. 2011, pp. 55–66. [Online]. Available: https://doi.org/ 10.1145/1926385.1926394

  11. [11]

    Automatic verification of Pipelined Micro- processor Control,

    J. R. Burch and D. L. Dill, “Automatic verification of Pipelined Micro- processor Control,” inProceedings of the 6th International Conference on Computer Aided Verification, ser. CA V ’94. Berlin, Heidelberg: Springer-Verlag, Jun. 1994, pp. 68–80

  12. [12]

    Kami: a platform for high-level parametric hardware specification and its modular verification,

    J. Choi, M. Vijayaraghavan, B. Sherman, A. Chlipala, and Arvind, “Kami: a platform for high-level parametric hardware specification and its modular verification,”Proceedings of the ACM on Programming Languages, vol. 1, no. ICFP, pp. 24:1–24:30, Aug. 2017. [Online]. Available: https://doi.org/10.1145/3110268

  13. [13]

    Z3: An Efficient SMT Solver,

    L. de Moura and N. Bjørner, “Z3: An Efficient SMT Solver,” inTools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and J. Rehof, Eds. Berlin, Heidelberg: Springer, 2008, pp. 337–340

  14. [14]

    An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors ,

    M. R. Fadiheh, A. Wezel, J. Muller, J. Bormann, S. Ray, J. M. Fung, S. Mitra, D. Stoffel, and W. Kunz, “ An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors ,”IEEE Transactions on Computers, vol. 72, no. 01, pp. 222–235, Jan. 2023. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/TC.2022.3152666

  15. [15]

    Memory consistency and event ordering in scalable shared-memory multiprocessors,

    K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” inProceedings of the 17th Annual International Symposium on Computer Architecture, ser. ISCA ’90. New York, NY , USA: Association for Computing Machinery, 1990, p. 15–26. [Online]. Available: htt...

  16. [16]

    CertiKOS: an extensible architecture for building certified concurrent OS kernels,

    R. Gu, Z. Shao, H. Chen, X. Wu, J. Kim, V . Sj ¨oberg, and D. Costanzo, “CertiKOS: an extensible architecture for building certified concurrent OS kernels,” inProceedings of the 12th USENIX conference on Operat- ing Systems Design and Implementation, ser. OSDI’16. USA: USENIX Association, Nov. 2016, pp. 653–669

  17. [17]

    Program synthesis,

    S. Gulwani, O. Polozov, and R. Singh, “Program synthesis,”Foundations and Trends® in Programming Languages, vol. 4, no. 1-2, pp. 1–119,

  18. [18]

    2017 , volume =

    [Online]. Available: http://dx.doi.org/10.1561/2500000010

  19. [19]

    IronFleet: proving practical distributed systems correct,

    C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill, “IronFleet: proving practical distributed systems correct,” inProceedings of the 25th Symposium on Operating Systems Principles. Monterey California: ACM, Oct. 2015, pp. 1–17. [Online]. Available: https://dl.acm.org/doi/10.1145/2815400.2815428

  20. [20]

    Ironclad apps: end-to-end security via automated full- system verification,

    C. Hawblitzel, J. Howell, J. R. Lorch, A. Narayan, B. Parno, D. Zhang, and B. Zill, “Ironclad apps: end-to-end security via automated full- system verification,” inProceedings of the 11th USENIX conference on Operating Systems Design and Implementation, ser. OSDI’14. USA: USENIX Association, Oct. 2014, pp. 165–181

  21. [21]

    Accelerator-level parallelism,

    M. D. Hill and V . J. Reddi, “Accelerator-level parallelism,” Communications of the ACM, vol. 64, no. 12, pp. 36–38, Dec. 2021. [Online]. Available: https://dl.acm.org/doi/10.1145/3460970

  22. [22]

    Informing memory operations: providing memory performance feedback in modern processors,

    M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith, “Informing memory operations: providing memory performance feedback in modern processors,” inProceedings of the 23rd annual international symposium on Computer architecture, ser. ISCA ’96. New York, NY , USA: Association for Computing Machinery, May 1996, pp. 260–270. [Online]. Available: https://dl...

  23. [23]

    Alloy: a lightweight object modelling notation,

    D. Jackson, “Alloy: a lightweight object modelling notation,”ACM Trans. Softw. Eng. Methodol., vol. 11, no. 2, pp. 256–290, Apr. 2002. [Online]. Available: https://dl.acm.org/doi/10.1145/505145.505149

  24. [24]

    Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,

    N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, C. Young, X. Zhou, Z. Zhou, and D. A. Patterson, “Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA...

  25. [25]

    Refinement in the Formal Verification of the seL4 Microkernel,

    G. Klein, T. Sewell, and S. Winwood, “Refinement in the Formal Verification of the seL4 Microkernel,” inDesign and Verification of Microprocessor Systems for High-Assurance Applications, D. S. Hardin, Ed. Boston, MA: Springer US, 2010, pp. 323–339. [Online]. Available: https://doi.org/10.1007/978-1-4419-1539-9 11

  26. [26]

    Spectre Attacks: Exploiting Speculative Execution,

    P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y . Yarom, “Spectre Attacks: Exploiting Speculative Execution,” in2019 IEEE Symposium on Security and Privacy (SP), May 2019, pp. 1–19, iSSN: 2375-1207. [Online]. Available: https: //ieeexplore.ieee.org/document/8835233

  27. [27]

    Effective stateless model checking for C/C++ concurrency,

    M. Kokologiannakis, O. Lahav, K. Sagonas, and V . Vafeiadis, “Effective stateless model checking for C/C++ concurrency,”Proceedings of the ACM on Programming Languages, vol. 2, no. POPL, pp. 1–32, Jan

  28. [28]

    Available: https://dl.acm.org/doi/10.1145/3158105

    [Online]. Available: https://dl.acm.org/doi/10.1145/3158105

  29. [29]

    The civl verifier,

    B. Kragl and S. Qadeer, “The civl verifier,” in2021 Formal Methods in Computer Aided Design (FMCAD), 2021, pp. 143–152

  30. [30]

    The Stanford FLASH multiprocessor,

    J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, “The Stanford FLASH multiprocessor,” inProceedings of 21 International Symposium on Computer Architecture, Apr. 1994, pp. 302–313. [Online]. Available: https://ieeexplore.ieee.org/abstract/docu...

  31. [31]

    Repairing sequential consistency in C/C++11,

    O. Lahav, V . Vafeiadis, J. Kang, C.-K. Hur, and D. Dreyer, “Repairing sequential consistency in C/C++11,” inProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2017. New York, NY , USA: Association for Computing Machinery, Jun. 2017, pp. 618–632. [Online]. Available: https://doi.org/10.1145/3062341.3062352

  32. [32]

    How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,

    Lamport, “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,”IEEE Transactions on Computers, vol. C-28, no. 9, pp. 690–691, Sep. 1979. [Online]. Available: https://ieeexplore.ieee.org/document/1675439

  33. [33]

    Specification and verification of strong timing isolation of hardware enclaves,

    S. Lau, T. Bourgeat, C. Pit-Claudel, and A. Chlipala, “Specification and verification of strong timing isolation of hardware enclaves,” in Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1121–1135. [Online]. Available: https://doi.org/...

  34. [34]

    doi: 10.1007/978-3-642-17511-4\ _20

    K. R. M. Leino, “Dafny: An Automatic Program Verifier for Functional Correctness,” inLogic for Programming, Artificial Intelligence, and Reasoning, E. M. Clarke and A. V oronkov, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, vol. 6355, pp. 348–370, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/...

  35. [35]

    Armada: low-effort verification of high-performance concurrent programs,

    J. R. Lorch, Y . Chen, M. Kapritsos, B. Parno, S. Qadeer, U. Sharma, J. R. Wilcox, and X. Zhao, “Armada: low-effort verification of high-performance concurrent programs,” inProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2020. New York, NY , USA: Association for Computing Machinery, Jun. 2020, pp...

  36. [36]

    PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models,

    D. Lustig, M. Pellauer, and M. Martonosi, “PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models,” inProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-47. USA: IEEE Computer Society, Dec. 2014, pp. 635–646. [Online]. Available: https://dl.acm.org/doi/10.1109/MICRO.2014.38

  37. [37]

    A Formal Analysis of the NVIDIA PTX Memory Consistency Model,

    D. Lustig, S. Sahasrabuddhe, and O. Giroux, “A Formal Analysis of the NVIDIA PTX Memory Consistency Model,” inProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’19. New York, NY , USA: Association for Computing Machinery, Apr. 2019, pp. 257–270. [Online]. Availab...

  38. [38]

    COATCheck: Verifying Memory Ordering at the Hardware-OS Interface,

    D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, “COATCheck: Verifying Memory Ordering at the Hardware-OS Interface,” in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. Atlanta Georgia USA: ACM, Mar. 2016, pp. 233–247. [Online]. Available: https://dl.acm.org/doi/10....

  39. [39]

    I4: incremental inference of inductive invariants for verification of distributed protocols,

    H. Ma, A. Goel, J.-B. Jeannin, M. Kapritsos, B. Kasikci, and K. A. Sakallah, “I4: incremental inference of inductive invariants for verification of distributed protocols,” inProceedings of the 27th ACM Symposium on Operating Systems Principles, ser. SOSP ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 370–384. [Online]. Available: h...

  40. [40]

    An Axiomatic Memory Model for POWER Multiprocessors,

    S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams, “An Axiomatic Memory Model for POWER Multiprocessors,” in Computer Aided Verification, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Suda...

  41. [41]

    PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifi- cations,

    Y . A. Manerkar, D. Lustig, M. Martonosi, and A. Gupta, “PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifi- cations,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Fukuoka: IEEE, Oct. 2018, pp. 788–801. [Online]. Available: https://ieeexplore.ieee.org/document/8574586/

  42. [42]

    RTLcheck: verifying the memory consistency of RTL designs,

    Y . A. Manerkar, D. Lustig, M. Martonosi, and M. Pellauer, “RTLcheck: verifying the memory consistency of RTL designs,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-50 ’17. New York, NY , USA: Association for Computing Machinery, Oct. 2017, pp. 463–476. [Online]. Available: https://dl.acm.org/doi/10.1...

  43. [43]

    CCICheck: using µhb graphs to verify the coherence-consistency interface,

    Y . A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “CCICheck: using µhb graphs to verify the coherence-consistency interface,” in Proceedings of the 48th International Symposium on Microarchitecture. Waikiki Hawaii: ACM, Dec. 2015, pp. 26–37. [Online]. Available: https://dl.acm.org/doi/10.1145/2830772.2830782

  44. [44]

    Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings,

    Y . A. Manerkar, C. Trippel, D. Lustig, M. Pellauer, and M. Martonosi, “Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings,” Nov. 2016, arXiv:1611.01507 [cs]. [Online]. Available: http://arxiv.org/abs/1611.01507

  45. [45]

    The Java memory model,

    J. Manson, W. Pugh, and S. V . Adve, “The Java memory model,” inProceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’05. New York, NY , USA: Association for Computing Machinery, 2005, p. 378–391. [Online]. Available: https://doi.org/10.1145/1040305.1040336

  46. [46]

    Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling,

    A. Mukkara, N. Beckmann, M. Abeydeera, X. Ma, and D. Sanchez, “Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2018, pp. 1–

  47. [47]

    Available: https://ieeexplore.ieee.org/abstract/document/ 8574527

    [Online]. Available: https://ieeexplore.ieee.org/abstract/document/ 8574527

  48. [48]

    PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates,

    A. Mukkara, N. Beckmann, and D. Sanchez, “PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’52. New York, NY , USA: Association for Computing Machinery, Oct. 2019, pp. 1009–1022. [Online]. Available: https...

  49. [49]

    ECMon: exposing cache events for monitoring,

    V . Nagarajan and R. Gupta, “ECMon: exposing cache events for monitoring,”SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 349–360, Jun. 2009. [Online]. Available: https://dl.acm.org/doi/10.1145/ 1555815.1555798

  50. [50]

    Nagarajan, D

    V . Nagarajan, D. J. Sorin, M. D. Hill, and D. A. Wood,A Primer on Memory Consistency and Cache Coherence, ser. Synthesis Lectures on Computer Architecture. Cham: Springer International Publishing, 2020. [Online]. Available: https://link.springer.com/10.1007/ 978-3-031-01764-3

  51. [51]

    An operational semantics for C/C++11 concurrency,

    K. Nienhuis, K. Memarian, and P. Sewell, “An operational semantics for C/C++11 concurrency,” inProceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. Amsterdam Netherlands: ACM, Oct. 2016, pp. 111–128. [Online]. Available: https://dl.acm.org/doi/10.1145/ 2983990.2983997

  52. [52]

    HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols,

    N. Oswald, V . Nagarajan, and D. J. Sorin, “HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). Valencia, Spain: IEEE, May 2020, pp. 888–899. [Online]. Available: https://ieeexplore.ieee.org/document/9138912/

  53. [53]

    A Better x86 Memory Model: x86-TSO,

    S. Owens, S. Sarkar, and P. Sewell, “A Better x86 Memory Model: x86-TSO,” inTheorem Proving in Higher Order Logics, S. Berghofer, T. Nipkow, C. Urban, and M. Wenzel, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, vol. 5674, pp. 391–407, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/10.1007/978-3...

  54. [54]

    Ivy: safety verification by interactive generalization,

    O. Padon, K. L. McMillan, A. Panda, M. Sagiv, and S. Shoham, “Ivy: safety verification by interactive generalization,” inProceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 614–630. [Online]. Available: https://doi.org/10.1145/290...

  55. [55]

    Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?

    R. Peled, D. Kroening, M. Tautschnig, and Y . Vizel, “Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?” Nov. 2025, arXiv:2511.02521 [cs]. [Online]. Available: http://arxiv.org/abs/2511. 02521

  56. [56]

    ACM Program

    C. Pulte, S. Flur, W. Deacon, J. French, S. Sarkar, and P. Sewell, “Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8,”Proceedings of the ACM on Programming Languages, vol. 2, no. POPL, pp. 1–29, Jan. 2018. [Online]. Available: https://dl.acm.org/doi/10.1145/3158107

  57. [57]

    Understanding POWER multiprocessors,

    S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams, “Understanding POWER multiprocessors,” inProceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’11. New York, NY , USA: Association for Computing Machinery, Jun. 2011, pp. 175–186. [Online]. Available: https://doi.org/10.1145/1993498.1993520

  58. [58]

    Leviathan: A Unified System for General-Purpose Near-Data Computing,

    B. C. Schwedock and N. Beckmann, “Leviathan: A Unified System for General-Purpose Near-Data Computing,” in2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), Nov. 2024, pp. 1278–1294, iSSN: 2379-3155. [Online]. Available: https://ieeexplore.ieee.org/document/10764520

  59. [59]

    t ¨ak¯o: a polymorphic cache hierarchy for general-purpose optimization of data movement,

    B. C. Schwedock, P. Yoovidhya, J. Seibert, and N. Beckmann, “t ¨ak¯o: a polymorphic cache hierarchy for general-purpose optimization of data movement,” inProceedings of the 49th Annual International Symposium on Computer Architecture, ser. ISCA ’22. New York, NY , USA: Association for Computing Machinery, Jun. 2022, pp. 42–58. [Online]. Available: https:/...

  60. [60]

    Checking Safety Properties Using Induction and a SAT-Solver,

    M. Sheeran, S. Singh, and G. St ˚almarck, “Checking Safety Properties Using Induction and a SAT-Solver,” inProceedings of the Third In- ternational Conference on Formal Methods in Computer-Aided Design, ser. FMCAD ’00. Berlin, Heidelberg: Springer-Verlag, Nov. 2000, pp. 108–125

  61. [61]

    Template-based synthe- sis of instruction-level abstractions for SoC verification,

    P. Subramanyan, Y . Vizel, S. Ray, and S. Malik, “Template-based synthe- sis of instruction-level abstractions for SoC verification,” inProceedings of the 15th Conference on Formal Methods in Computer-Aided Design, ser. FMCAD ’15. Austin, Texas: FMCAD Inc, Sep. 2015, pp. 160–167

  62. [62]

    Formalising CXL Cache Coherence,

    C. Tan, A. F. Donaldson, and J. Wickerson, “Formalising CXL Cache Coherence,” inProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York, NY , USA: Association for Computing Machinery, Mar. 2025, pp. 437–450. [Online]. Available: https://doi.org/10.1145/3676641.3715999

  63. [63]

    On the calculus of relations,

    A. Tarski, “On the calculus of relations,”Journal of Symbolic Logic, vol. 6, no. 3, pp. 73–89, Sep. 1941. [Online]. Available: https://www.cambridge.org/core/product/identifier/ S0022481200106528/type/journal article

  64. [64]

    TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA,

    C. Trippel, Y . A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA,” inProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. Xi’an China: ACM, Apr. 2017, pp. 119–133. [Online]. Available: h...

  65. [65]

    Client-optimized algorithms and acceleration for encrypted compute offloading,

    M. van der Hagen and B. Lucia, “Client-optimized algorithms and acceleration for encrypted compute offloading,” inProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’22. New York, NY , USA: Association for Computing Machinery, Feb. 2022, pp. 683–696. [Online]. Availabl...

  66. [66]

    Modular deduc- tive verification of multiprocessor hardware designs,

    M. Vijayaraghavan, A. Chlipala, Arvind, and N. Dave, “Modular deduc- tive verification of multiprocessor hardware designs,” inComputer Aided Verification, D. Kroening and C. S. P ˘as˘areanu, Eds. Cham: Springer International Publishing, 2015, pp. 109–127

  67. [67]

    Specification and Verification of Side-channel Security for Open- source Processors via Leakage Contracts,

    Z. Wang, G. Mohr, K. von Gleissenthall, J. Reineke, and M. Guarnieri, “Specification and Verification of Side-channel Security for Open- source Processors via Leakage Contracts,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. New York, NY , USA: Association for Computing Machinery, Nov. 2023, pp. 212...

  68. [68]

    Remote-scope promotion: clarified, rectified, and verified,

    J. Wickerson, M. Batty, B. M. Beckmann, and A. F. Donaldson, “Remote-scope promotion: clarified, rectified, and verified,”SIGPLAN Not., vol. 50, no. 10, pp. 731–747, Oct. 2015. [Online]. Available: https://doi.org/10.1145/2858965.2814283

  69. [69]

    Automatically comparing memory consistency models,

    J. Wickerson, M. Batty, T. Sorensen, and G. A. Constantinides, “Automatically comparing memory consistency models,” inProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, ser. POPL ’17. New York, NY , USA: Association for Computing Machinery, Jan. 2017, pp. 190–204. [Online]. Available: https://doi.org/10.1145/3009837.3009838

  70. [70]

    Verdi: a framework for implementing and formally verifying distributed systems,

    J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. Anderson, “Verdi: a framework for implementing and formally verifying distributed systems,” inProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’15. New York, NY , USA: Association for Computing Machinery, Jun. 2015, pp. ...

  71. [71]

    Pensieve: Microarchitectural Modeling for Security Evaluation,

    Y . Yang, T. Bourgeat, S. Lau, and M. Yan, “Pensieve: Microarchitectural Modeling for Security Evaluation,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA ’23. New York, NY , USA: Association for Computing Machinery, Jun. 2023, pp. 1–15. [Online]. Available: https://dl.acm.org/doi/10.1145/ 3579371.3589094

  72. [72]

    ILA-MCM: Integrating Memory Consistency Models with Instruction-Level Abstractions for Heterogeneous System-on-Chip Verification,

    H. Zhang, C. Trippel, Y . A. Manerkar, A. Gupta, M. Martonosi, and S. Malik, “ILA-MCM: Integrating Memory Consistency Models with Instruction-Level Abstractions for Heterogeneous System-on-Chip Verification,” in2018 Formal Methods in Computer Aided Design (FMCAD). Austin, TX, USA: IEEE, Oct. 2018, pp. 1–10. [Online]. Available: https://ieeexplore.ieee.org...

  73. [73]

    End-to-end automated exploit generation for validating the security of processor designs,

    R. Zhang, C. Deutschbein, P. Huang, and C. Sturton, “End-to-end automated exploit generation for validating the security of processor designs,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018, pp. 815–827

  74. [74]

    Basilisk: using provenance invariants to automate proofs of undecidable proto- cols,

    T. N. Zhang, K. Singh, T. Chajed, M. Kapritsos, and B. Parno, “Basilisk: using provenance invariants to automate proofs of undecidable proto- cols,” inProceedings of the 19th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI ’25. USA: USENIX Association, 2025