arxiv: 2605.04172 · v1 · submitted 2026-05-05 · 💻 cs.AR · cs.LO

Recognition: 2 theorem links

· Lean Theorem

t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies (Extended Version)

Pranav Srinivasan , Manos Kapritsos , Yatin A. Manerkar

Authors on Pith no claims yet

Pith reviewed 2026-05-08 18:25 UTC · model grok-4.3

classification 💻 cs.AR cs.LO

keywords memory consistency modelprogrammable memory hierarchyformal verificationISA-level semanticsaccelerator programmingmicroarchitectural modelinglitmus teststākō accelerator

0 comments

The pith

An ISA-level memory consistency model for tākō defines the allowed behaviors of programs that use user-defined callbacks on cache events.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a high-level memory consistency model at the ISA level for the tākō programmable memory hierarchy. This model specifies the semantics of operations triggered by cache misses, evictions, and writebacks so that programmers can check their code formally. The authors prove the model is sound by creating a detailed implementation model of the hardware and confirming that every execution allowed by the implementation stays within the high-level rules. If correct, the model gives programmers a reliable way to reason about counterintuitive outcomes from custom data-movement logic. The extended version adds litmus tests that further map out safe and unsafe program behaviors.

Core claim

We develop an ISA-level memory consistency model (MCM) for tākō that captures the semantics of its operation, and we show how it enables programmers to formally reason about their tākō programs. We also prove the soundness of this ISA-level MCM by constructing a detailed tākō implementation model and verifying that all executions of the implementation model are allowed by our ISA-level MCM. Along the way, we discover useful insights about microarchitectural modeling and verification that are applicable to hardware in general.

What carries the argument

The ISA-level memory consistency model for tākō, which defines the ordering and visibility rules for memory operations in the presence of user-defined callback functions triggered by cache events.

If this is right

Programmers can use the model and associated litmus tests to check whether their tākō programs produce only allowed behaviors.
Any implementation that follows the detailed model is guaranteed to respect the high-level consistency rules.
The same modeling approach yields reusable insights for verifying other accelerators with complex data-movement logic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption of the model could let compiler writers or runtime systems automatically insert safe callback logic.
The verification method may scale to other programmable hardware features that alter the hardware-software interface.
Further litmus tests beyond those in the extended version could expose additional corner cases in real silicon.

Load-bearing premise

The constructed implementation model accurately represents the real tākō hardware behavior, and the verification process covers all relevant executions and litmus tests without gaps.

What would settle it

A single execution trace on actual tākō hardware that produces a memory ordering or visibility outcome forbidden by the ISA-level MCM.

Figures

Figures reproduced from arXiv: 2605.04172 by Manos Kapritsos, Pranav Srinivasan, Yatin A. Manerkar.

**Figure 1.** Figure 1: Image and caption from [55] showing the organization view at source ↗

**Figure 2.** Figure 2: (a) A sample tak¨ o program. (b) A possible execution ¯ of said program. The evictions and misses from the cache, which were previously hidden hardware details, now impact the outcome of the program. and non-volatile memory transactions. Thus, insights gained from modeling and verifying tak¨ o are likely to be more broadly ¯ applicable than those gained from modeling a more specialized accelerator. Finally… view at source ↗

**Figure 3.** Figure 3: Image from [55] showing the order of events in t view at source ↗

**Figure 5.** Figure 5: (a) The mp (message passing) litmus test. All addresses are assumed to be 0 initially. (b) An execution graph of mp that is outlawed under sequential consistency (SC). Figure 4b explains this reasoning. If r1=2, the value for (i2) must have been generated as the result of an OnMiss A instead of by (i1). Thus, an interspersed OnMiss must have run between (i1) and (i2). However, since (i1) would have brought… view at source ↗

**Figure 6.** Figure 6: All axioms for our tak¨ o MCM. Executions must satisfy all axioms to be allowed. ¯ [A] is the elements of type A in relation form [59]. A × B are pairs of an element of type A and an element of type B. A \ B is the elements of A that are not in B. Semicolons (;) denote relational composition, e.g., e1; e2 is two relations e1 and e2 where the destination node of e1 is the source node of e2. R−1 is the inver… view at source ↗

**Figure 7.** Figure 7: (a-e) Faulty candidate execution graphs for Figure 4a’s t view at source ↗

**Figure 8.** Figure 8: A pattern forbidden by traditional happens-before ( view at source ↗

**Figure 9.** Figure 9: (a) The mprmw (no callback) and mpcb (OnMiss for [b]) litmus tests. (b) A forbidden execution of mprmw where the RMWs augment the hb relation to outlaw the reading of 0 for [a]. (c) An analogous forbidden execution of mpcb showing the same pattern when [b] is a phantom address. needing to understand the intricacies of a tak¨ o implementation. ¯ §V shows how to use our MCM to analyze tak¨ o programs. ¯ V. P… view at source ↗

**Figure 10.** Figure 10: (a) The icbsb litmus test. (b) An execution of icbsb that demonstrates the intra-callback instruction reordering that is allowed by our MCM. designing the callback engine. In particular, tak¨ o designs that ¯ buffer and reorder memory operations in the engine can still use our MCM. D. FlushRange as a Synchronization Primitive While the use of an OnMiss-generated value (Me) by a Rcb or Wcb induces an hb e… view at source ↗

**Figure 12.** Figure 12: (a) The phiR (with OnWB (1)) and phiNR (with OnWB (2)) litmus tests. (b) Execution snippet showing phiR race due to OnWB from core 0’s write occurring after FlushRange on core 1. (c) Execution snippet showing how phiNR avoids the race by branching on evicted value in the OnWB. of a program (named phiR and phiNR) in which multiple cores concurrently write updates to a phantom address [x] and the OnWB of [x… view at source ↗

**Figure 13.** Figure 13: (a) hatsR and hatsNR litmus tests. (b) Execution snippet showing hatsR race due to OnEvict writing to the log post-traversal. (c) Snippet showing how hatsNR eliminates the race by only logging valid edges in the OnEvict. graph traversal, which is more difficult to write correctly because it has writes in OnEvict callbacks that can run an arbitrary number of times. F. A tak¨ o Application Case Study: HATS… view at source ↗

**Figure 14.** Figure 14: Our hierarchical transition system instantiated with view at source ↗

**Figure 16.** Figure 16: A Venn diagram motivating inductive invariants. view at source ↗

**Figure 17.** Figure 17: Two formulations of an axiom restricting the number view at source ↗

**Figure 18.** Figure 18: The two components of our soundness proof. view at source ↗

read the original abstract

Accelerators provide large performance and energy-efficiency benefits, but can significantly change the hardware-software interface. The t\"{a}k\={o} programmable memory hierarchy accelerates data movement by enabling programmers to run user-defined callback functions triggered by cache misses, evictions, and writebacks. However, it also leads to drastically increased complexity and counterintuitive outcomes. In response, we develop an ISA-level memory consistency model (MCM) for t\"{a}k\={o} that captures the semantics of its operation, and we show how it enables programmers to formally reason about their t\"{a}k\={o} programs. We also prove the soundness of this ISA-level MCM by constructing a detailed t\"{a}k\={o} implementation model and verifying that all executions of the implementation model are allowed by our ISA-level MCM. Along the way, we discover useful insights about microarchitectural modeling and verification that are applicable to hardware in general. This is the extended version of the ISCA 2026 paper "t\"{a}k\={o}Formal: Enabling Robust Software for Programmable Memory Hierarchies". This version adds material on additional litmus tests to Section V to further explore the programmability of t\"{a}k\={o} using our ISA-level MCM.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a new ISA-level MCM for tākō and proves soundness against the authors' own implementation model, but that model has no independent check against real hardware.

read the letter

The main point is that the authors built an ISA-level memory consistency model for the tākō programmable memory hierarchy and showed that every execution from their detailed implementation model is allowed by the MCM. The extended version adds litmus tests to explore what programmers can express with it. This is new work because existing MCMs do not cover user-defined callbacks on misses, evictions, and writebacks in this way, and the soundness argument is tied to a concrete model rather than just informal description. They do a reasonable job of turning a messy hardware feature into something a programmer can reason about formally, and the side notes on microarchitectural modeling are practical enough to be useful beyond this specific case. The soft spot is exactly what the stress-test note flags: the soundness result is internal to the two artifacts the authors defined. There is no comparison to RTL, cycle-accurate simulation, or silicon traces that would show the implementation model actually reproduces tākō hardware behavior. If the model misses some coherence timing or callback trigger detail, the MCM could be either too strong or too weak for real programs. The extra litmus tests only test the MCM's expressiveness, not its alignment with hardware. This paper is for people who work at the intersection of computer architecture and formal methods, especially those building or using programmable accelerators. A reader already familiar with MCMs or hardware modeling will get the most out of the definitions and proof structure. It deserves a serious referee because the formal contribution is grounded and the problem is relevant to reliable accelerator software. I would recommend sending it to peer review, with reviewers asked to focus on whether the implementation model needs external validation.

Referee Report

2 major / 0 minor

Summary. The paper develops an ISA-level memory consistency model (MCM) for the tākō programmable memory hierarchy, which accelerates data movement via user-defined callbacks on cache misses, evictions, and writebacks. It demonstrates how the MCM enables formal reasoning about tākō programs and proves soundness by constructing a detailed implementation model and verifying that all executions of this model are permitted by the MCM. The extended version adds litmus tests in Section V to further explore programmability.

Significance. If the implementation model faithfully represents hardware, the work supplies a formal foundation for safe programming of complex programmable memory hierarchies, addressing counterintuitive behaviors that arise from custom callbacks and coherence. The explicit construction and verification of the implementation model against the MCM, plus the additional litmus tests, constitute a concrete contribution to microarchitectural modeling; the reported insights on verification techniques are broadly applicable beyond tākō.

major comments (2)

[Abstract] Abstract and soundness argument: the claim that the ISA-level MCM 'captures the semantics of its operation' rests on consistency with the authors' constructed implementation model; no independent validation (comparison to RTL, cycle-accurate simulator, or silicon measurements of callback triggering, eviction timing, or coherence interactions) is supplied. This is load-bearing for the utility claim that programmers can rely on the MCM for real tākō programs.
[Section V] Section V (additional litmus tests): the new tests examine MCM expressiveness and programmability but do not test alignment between the implementation model and observable hardware behavior. Because the central soundness result is internal to the two artifacts the authors define, this omission leaves open whether the MCM is too strong or too weak for actual executions.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and for acknowledging the significance of providing a formal foundation for programmable memory hierarchies. We address each major comment below, clarifying the intended scope of our contributions while offering targeted revisions to improve clarity.

read point-by-point responses

Referee: [Abstract] Abstract and soundness argument: the claim that the ISA-level MCM 'captures the semantics of its operation' rests on consistency with the authors' constructed implementation model; no independent validation (comparison to RTL, cycle-accurate simulator, or silicon measurements of callback triggering, eviction timing, or coherence interactions) is supplied. This is load-bearing for the utility claim that programmers can rely on the MCM for real tākō programs.

Authors: We thank the referee for this observation. The soundness proof establishes that every execution of the detailed implementation model is permitted by the ISA-level MCM, providing a rigorous guarantee that the MCM soundly abstracts the operational semantics encoded in the implementation model (including callback triggers, evictions, writebacks, and coherence). This enables programmers to reason about tākō programs at the ISA level without directly modeling microarchitectural details. While we agree that direct comparison to RTL, cycle-accurate simulation, or silicon measurements would provide additional empirical support for the implementation model's fidelity to hardware, such validation lies outside the scope of this work. Our focus is the development of the formal models, the verification methodology, and the resulting insights into microarchitectural modeling that apply more broadly. We will add a dedicated paragraph in the introduction (and reference it in the abstract) explicitly discussing the assumptions underlying the implementation model and noting the value of future hardware validation. revision: partial
Referee: [Section V] Section V (additional litmus tests): the new tests examine MCM expressiveness and programmability but do not test alignment between the implementation model and observable hardware behavior. Because the central soundness result is internal to the two artifacts the authors define, this omission leaves open whether the MCM is too strong or too weak for actual executions.

Authors: The litmus tests added to Section V of the extended version are intended to demonstrate the MCM's expressiveness and to show how it supports formal reasoning about tākō programmability, including counterintuitive outcomes from custom callbacks. They are not designed to validate alignment with hardware observations, as that would require access to concrete executions from RTL or silicon, which is not part of this paper's contribution. The soundness result is deliberately internal to the two models we define, as this provides a machine-checked guarantee that the MCM is neither too strong nor too weak relative to the implementation model. If the implementation model faithfully captures tākō hardware (as constructed from its specification), the MCM inherits that fidelity. We will revise Section V to more explicitly connect the litmus tests to the soundness theorem and to restate the internal nature of the proof. revision: partial

Circularity Check

0 steps flagged

No circularity: soundness is a standard refinement proof between separately defined models

full rationale

The paper defines an ISA-level MCM capturing tākō semantics, constructs a separate detailed implementation model, and proves that every execution of the implementation model is permitted by the MCM. This is a conventional refinement argument with no self-definitional loop, no fitted parameter renamed as prediction, and no load-bearing self-citation that reduces the central claim to its own inputs. The derivation remains self-contained once the two models are accepted as given; external fidelity of the implementation model to silicon is a separate validation question, not a circularity in the proof chain.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

With only the abstract available, no specific free parameters, axioms, or invented entities can be identified from the text; the work appears to rely on standard formal modeling techniques for memory consistency.

pith-pipeline@v0.9.0 · 5559 in / 1026 out tokens · 30852 ms · 2026-05-08T18:25:39.509197+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

74 extracted references · 53 canonical work pages · 1 internal anchor

[1]

Shared memory consistency models: A tutorial,

S. Adve and K. Gharachorloo, “Shared memory consistency models: A tutorial,”IEEE Computer, vol. 29, no. 12, pp. 66–76, 1996

1996
[2]

Weak ordering—a new definition,

S. V . Adve and M. D. Hill, “Weak ordering—a new definition,” inProceedings of the 17th Annual International Symposium on Computer Architecture, ser. ISCA ’90. New York, NY , USA: Association for Computing Machinery, 1990, p. 2–14. [Online]. Available: https://doi.org/10.1145/325164.325100

work page doi:10.1145/325164.325100 1990
[3]

GPU Concurrency: Weak Behaviours and Programming Assumptions,

J. Alglave, M. Batty, A. F. Donaldson, G. Gopalakrishnan, J. Ketema, D. Poetzl, T. Sorensen, and J. Wickerson, “GPU Concurrency: Weak Behaviours and Programming Assumptions,” inProceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems. Istanbul Turkey: ACM, Mar. 2015, pp. 577–591. [Onlin...

work page doi:10.1145/2694344.2694391 2015
[4]

In: Touili, T., Cook, B., Jackson, P.B

J. Alglave, L. Maranget, S. Sarkar, and P. Sewell, “Fences in Weak Memory Models,” inComputer Aided Verification, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Sudan, D. Terzopoulos, D. Tygar, M. Y . Vardi, G. Weikum, T. Touili, B. Cook, and P. Jackson, Eds. Berlin...

work page doi:10.1007/978-3-642-14295-6 2010
[5]

Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory,

J. Alglave, L. Maranget, and M. Tautschnig, “Herding Cats: Modelling, Simulation, Testing, and Data Mining for Weak Memory,”ACM Transactions on Programming Languages and Systems, vol. 36, no. 2, pp. 7:1–7:74, Jul. 2014. [Online]. Available: https://doi.org/10.1145/ 2627752

2014
[6]

Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO Architectures,

G. Ambal, B. Dongol, H. Eran, V . Klimis, O. Lahav, and A. Raad, “Semantics of Remote Direct Memory Access: Operational and Declarative Models of RDMA on TSO Architectures,”Proc. ACM Program. Lang., vol. 8, no. OOPSLA2, pp. 341:1982–341:2009, Oct

1982
[7]

Available: https://dl.acm.org/doi/10.1145/3689781

[Online]. Available: https://dl.acm.org/doi/10.1145/3689781

work page doi:10.1145/3689781
[8]

Cortex-A9 MPCore Programmer Advice Notice Read-after- Read Hazards,

ARM, “Cortex-A9 MPCore Programmer Advice Notice Read-after- Read Hazards,” ARM, Arm Reference 761319, 2011. [Online]. Available: https://developer.arm.com/documentation/uan0004/a/

2011
[9]

Overhauling SC Atomics in C11 and OpenCL,

M. Batty, A. F. Donaldson, and J. Wickerson, “Overhauling SC Atomics in C11 and OpenCL,” inProceedings of the 43rd Annual ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, Jan. 2016, pp. 634–648, arXiv:1503.07073 [cs]. [Online]. Available: http://arxiv.org/abs/1503.07073

work page arXiv 2016
[10]

Mathematizing C++ concurrency,

M. Batty, S. Owens, S. Sarkar, P. Sewell, and T. Weber, “Mathematizing C++ concurrency,” inProceedings of the 38th annual ACM SIGPLAN- SIGACT symposium on Principles of programming languages, ser. POPL ’11. New York, NY , USA: Association for Computing Machinery, Jan. 2011, pp. 55–66. [Online]. Available: https://doi.org/ 10.1145/1926385.1926394

work page doi:10.1145/1926385.1926394 2011
[11]

Automatic verification of Pipelined Micro- processor Control,

J. R. Burch and D. L. Dill, “Automatic verification of Pipelined Micro- processor Control,” inProceedings of the 6th International Conference on Computer Aided Verification, ser. CA V ’94. Berlin, Heidelberg: Springer-Verlag, Jun. 1994, pp. 68–80

1994
[12]

Kami: a platform for high-level parametric hardware specification and its modular verification,

J. Choi, M. Vijayaraghavan, B. Sherman, A. Chlipala, and Arvind, “Kami: a platform for high-level parametric hardware specification and its modular verification,”Proceedings of the ACM on Programming Languages, vol. 1, no. ICFP, pp. 24:1–24:30, Aug. 2017. [Online]. Available: https://doi.org/10.1145/3110268

work page doi:10.1145/3110268 2017
[13]

Z3: An Efficient SMT Solver,

L. de Moura and N. Bjørner, “Z3: An Efficient SMT Solver,” inTools and Algorithms for the Construction and Analysis of Systems, C. R. Ramakrishnan and J. Rehof, Eds. Berlin, Heidelberg: Springer, 2008, pp. 337–340

2008
[14]

An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors ,

M. R. Fadiheh, A. Wezel, J. Muller, J. Bormann, S. Ray, J. M. Fung, S. Mitra, D. Stoffel, and W. Kunz, “ An Exhaustive Approach to Detecting Transient Execution Side Channels in RTL Designs of Processors ,”IEEE Transactions on Computers, vol. 72, no. 01, pp. 222–235, Jan. 2023. [Online]. Available: https://doi.ieeecomputersociety.org/10.1109/TC.2022.3152666

work page doi:10.1109/tc.2022.3152666 2023
[15]

Memory consistency and event ordering in scalable shared-memory multiprocessors,

K. Gharachorloo, D. Lenoski, J. Laudon, P. Gibbons, A. Gupta, and J. Hennessy, “Memory consistency and event ordering in scalable shared-memory multiprocessors,” inProceedings of the 17th Annual International Symposium on Computer Architecture, ser. ISCA ’90. New York, NY , USA: Association for Computing Machinery, 1990, p. 15–26. [Online]. Available: htt...

work page doi:10.1145/325164.325102 1990
[16]

CertiKOS: an extensible architecture for building certified concurrent OS kernels,

R. Gu, Z. Shao, H. Chen, X. Wu, J. Kim, V . Sj ¨oberg, and D. Costanzo, “CertiKOS: an extensible architecture for building certified concurrent OS kernels,” inProceedings of the 12th USENIX conference on Operat- ing Systems Design and Implementation, ser. OSDI’16. USA: USENIX Association, Nov. 2016, pp. 653–669

2016
[17]

Program synthesis,

S. Gulwani, O. Polozov, and R. Singh, “Program synthesis,”Foundations and Trends® in Programming Languages, vol. 4, no. 1-2, pp. 1–119,
[18]

2017 , volume =

[Online]. Available: http://dx.doi.org/10.1561/2500000010

work page doi:10.1561/2500000010
[19]

IronFleet: proving practical distributed systems correct,

C. Hawblitzel, J. Howell, M. Kapritsos, J. R. Lorch, B. Parno, M. L. Roberts, S. Setty, and B. Zill, “IronFleet: proving practical distributed systems correct,” inProceedings of the 25th Symposium on Operating Systems Principles. Monterey California: ACM, Oct. 2015, pp. 1–17. [Online]. Available: https://dl.acm.org/doi/10.1145/2815400.2815428

work page doi:10.1145/2815400.2815428 2015
[20]

Ironclad apps: end-to-end security via automated full- system verification,

C. Hawblitzel, J. Howell, J. R. Lorch, A. Narayan, B. Parno, D. Zhang, and B. Zill, “Ironclad apps: end-to-end security via automated full- system verification,” inProceedings of the 11th USENIX conference on Operating Systems Design and Implementation, ser. OSDI’14. USA: USENIX Association, Oct. 2014, pp. 165–181

2014
[21]

Accelerator-level parallelism,

M. D. Hill and V . J. Reddi, “Accelerator-level parallelism,” Communications of the ACM, vol. 64, no. 12, pp. 36–38, Dec. 2021. [Online]. Available: https://dl.acm.org/doi/10.1145/3460970

work page doi:10.1145/3460970 2021
[22]

Informing memory operations: providing memory performance feedback in modern processors,

M. Horowitz, M. Martonosi, T. C. Mowry, and M. D. Smith, “Informing memory operations: providing memory performance feedback in modern processors,” inProceedings of the 23rd annual international symposium on Computer architecture, ser. ISCA ’96. New York, NY , USA: Association for Computing Machinery, May 1996, pp. 260–270. [Online]. Available: https://dl...

work page doi:10.1145/232973.233000 1996
[23]

Alloy: a lightweight object modelling notation,

D. Jackson, “Alloy: a lightweight object modelling notation,”ACM Trans. Softw. Eng. Methodol., vol. 11, no. 2, pp. 256–290, Apr. 2002. [Online]. Available: https://dl.acm.org/doi/10.1145/505145.505149

work page doi:10.1145/505145.505149 2002
[24]

Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,

N. Jouppi, G. Kurian, S. Li, P. Ma, R. Nagarajan, L. Nai, N. Patil, S. Subramanian, A. Swing, B. Towles, C. Young, X. Zhou, Z. Zhou, and D. A. Patterson, “Tpu v4: An optically reconfigurable supercomputer for machine learning with hardware support for embeddings,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA...

work page doi:10.1145/3579371.3589350 2023
[25]

Refinement in the Formal Verification of the seL4 Microkernel,

G. Klein, T. Sewell, and S. Winwood, “Refinement in the Formal Verification of the seL4 Microkernel,” inDesign and Verification of Microprocessor Systems for High-Assurance Applications, D. S. Hardin, Ed. Boston, MA: Springer US, 2010, pp. 323–339. [Online]. Available: https://doi.org/10.1007/978-1-4419-1539-9 11

work page doi:10.1007/978-1-4419-1539-9 2010
[26]

Spectre Attacks: Exploiting Speculative Execution,

P. Kocher, J. Horn, A. Fogh, D. Genkin, D. Gruss, W. Haas, M. Hamburg, M. Lipp, S. Mangard, T. Prescher, M. Schwarz, and Y . Yarom, “Spectre Attacks: Exploiting Speculative Execution,” in2019 IEEE Symposium on Security and Privacy (SP), May 2019, pp. 1–19, iSSN: 2375-1207. [Online]. Available: https: //ieeexplore.ieee.org/document/8835233

work page arXiv 2019
[27]

Effective stateless model checking for C/C++ concurrency,

M. Kokologiannakis, O. Lahav, K. Sagonas, and V . Vafeiadis, “Effective stateless model checking for C/C++ concurrency,”Proceedings of the ACM on Programming Languages, vol. 2, no. POPL, pp. 1–32, Jan
[28]

Available: https://dl.acm.org/doi/10.1145/3158105

[Online]. Available: https://dl.acm.org/doi/10.1145/3158105

work page doi:10.1145/3158105
[29]

The civl verifier,

B. Kragl and S. Qadeer, “The civl verifier,” in2021 Formal Methods in Computer Aided Design (FMCAD), 2021, pp. 143–152

2021
[30]

The Stanford FLASH multiprocessor,

J. Kuskin, D. Ofelt, M. Heinrich, J. Heinlein, R. Simoni, K. Gharachorloo, J. Chapin, D. Nakahira, J. Baxter, M. Horowitz, A. Gupta, M. Rosenblum, and J. Hennessy, “The Stanford FLASH multiprocessor,” inProceedings of 21 International Symposium on Computer Architecture, Apr. 1994, pp. 302–313. [Online]. Available: https://ieeexplore.ieee.org/abstract/docu...

1994
[31]

Repairing sequential consistency in C/C++11,

O. Lahav, V . Vafeiadis, J. Kang, C.-K. Hur, and D. Dreyer, “Repairing sequential consistency in C/C++11,” inProceedings of the 38th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2017. New York, NY , USA: Association for Computing Machinery, Jun. 2017, pp. 618–632. [Online]. Available: https://doi.org/10.1145/3062341.3062352

work page doi:10.1145/3062341.3062352 2017
[32]

How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,

Lamport, “How to Make a Multiprocessor Computer That Correctly Executes Multiprocess Programs,”IEEE Transactions on Computers, vol. C-28, no. 9, pp. 690–691, Sep. 1979. [Online]. Available: https://ieeexplore.ieee.org/document/1675439

work page arXiv 1979
[33]

Specification and verification of strong timing isolation of hardware enclaves,

S. Lau, T. Bourgeat, C. Pit-Claudel, and A. Chlipala, “Specification and verification of strong timing isolation of hardware enclaves,” in Proceedings of the 2024 on ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’24. New York, NY , USA: Association for Computing Machinery, 2024, p. 1121–1135. [Online]. Available: https://doi.org/...

work page doi:10.1145/3658644.3690203 2024
[34]

doi: 10.1007/978-3-642-17511-4\ _20

K. R. M. Leino, “Dafny: An Automatic Program Verifier for Functional Correctness,” inLogic for Programming, Artificial Intelligence, and Reasoning, E. M. Clarke and A. V oronkov, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2010, vol. 6355, pp. 348–370, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/...

work page doi:10.1007/978-3-642-17511-4 2010
[35]

Armada: low-effort verification of high-performance concurrent programs,

J. R. Lorch, Y . Chen, M. Kapritsos, B. Parno, S. Qadeer, U. Sharma, J. R. Wilcox, and X. Zhao, “Armada: low-effort verification of high-performance concurrent programs,” inProceedings of the 41st ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI 2020. New York, NY , USA: Association for Computing Machinery, Jun. 2020, pp...

work page doi:10.1145/3385412.3385971 2020
[36]

PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models,

D. Lustig, M. Pellauer, and M. Martonosi, “PipeCheck: Specifying and Verifying Microarchitectural Enforcement of Memory Consistency Models,” inProceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-47. USA: IEEE Computer Society, Dec. 2014, pp. 635–646. [Online]. Available: https://dl.acm.org/doi/10.1109/MICRO.2014.38

work page doi:10.1109/micro.2014.38 2014
[37]

A Formal Analysis of the NVIDIA PTX Memory Consistency Model,

D. Lustig, S. Sahasrabuddhe, and O. Giroux, “A Formal Analysis of the NVIDIA PTX Memory Consistency Model,” inProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’19. New York, NY , USA: Association for Computing Machinery, Apr. 2019, pp. 257–270. [Online]. Availab...

work page doi:10.1145/3297858.3304043 2019
[38]

COATCheck: Verifying Memory Ordering at the Hardware-OS Interface,

D. Lustig, G. Sethi, M. Martonosi, and A. Bhattacharjee, “COATCheck: Verifying Memory Ordering at the Hardware-OS Interface,” in Proceedings of the Twenty-First International Conference on Architectural Support for Programming Languages and Operating Systems. Atlanta Georgia USA: ACM, Mar. 2016, pp. 233–247. [Online]. Available: https://dl.acm.org/doi/10....

work page doi:10.1145/2872362.2872399 2016
[39]

I4: incremental inference of inductive invariants for verification of distributed protocols,

H. Ma, A. Goel, J.-B. Jeannin, M. Kapritsos, B. Kasikci, and K. A. Sakallah, “I4: incremental inference of inductive invariants for verification of distributed protocols,” inProceedings of the 27th ACM Symposium on Operating Systems Principles, ser. SOSP ’19. New York, NY , USA: Association for Computing Machinery, 2019, p. 370–384. [Online]. Available: h...

work page doi:10.1145/3341301.3359651 2019
[40]

An Axiomatic Memory Model for POWER Multiprocessors,

S. Mador-Haim, L. Maranget, S. Sarkar, K. Memarian, J. Alglave, S. Owens, R. Alur, M. M. K. Martin, P. Sewell, and D. Williams, “An Axiomatic Memory Model for POWER Multiprocessors,” in Computer Aided Verification, D. Hutchison, T. Kanade, J. Kittler, J. M. Kleinberg, F. Mattern, J. C. Mitchell, M. Naor, O. Nierstrasz, C. Pandu Rangan, B. Steffen, M. Suda...

work page doi:10.1007/978-3-642-31424-7 2012
[41]

PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifi- cations,

Y . A. Manerkar, D. Lustig, M. Martonosi, and A. Gupta, “PipeProof: Automated Memory Consistency Proofs for Microarchitectural Specifi- cations,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO). Fukuoka: IEEE, Oct. 2018, pp. 788–801. [Online]. Available: https://ieeexplore.ieee.org/document/8574586/

work page arXiv 2018
[42]

RTLcheck: verifying the memory consistency of RTL designs,

Y . A. Manerkar, D. Lustig, M. Martonosi, and M. Pellauer, “RTLcheck: verifying the memory consistency of RTL designs,” in Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO-50 ’17. New York, NY , USA: Association for Computing Machinery, Oct. 2017, pp. 463–476. [Online]. Available: https://dl.acm.org/doi/10.1...

work page doi:10.1145/3123939.3124536 2017
[43]

CCICheck: using µhb graphs to verify the coherence-consistency interface,

Y . A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “CCICheck: using µhb graphs to verify the coherence-consistency interface,” in Proceedings of the 48th International Symposium on Microarchitecture. Waikiki Hawaii: ACM, Dec. 2015, pp. 26–37. [Online]. Available: https://dl.acm.org/doi/10.1145/2830772.2830782

work page doi:10.1145/2830772.2830782 2015
[44]

Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings,

Y . A. Manerkar, C. Trippel, D. Lustig, M. Pellauer, and M. Martonosi, “Counterexamples and Proof Loophole for the C/C++ to POWER and ARMv7 Trailing-Sync Compiler Mappings,” Nov. 2016, arXiv:1611.01507 [cs]. [Online]. Available: http://arxiv.org/abs/1611.01507

work page arXiv 2016
[45]

The Java memory model,

J. Manson, W. Pugh, and S. V . Adve, “The Java memory model,” inProceedings of the 32nd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, ser. POPL ’05. New York, NY , USA: Association for Computing Machinery, 2005, p. 378–391. [Online]. Available: https://doi.org/10.1145/1040305.1040336

work page doi:10.1145/1040305.1040336 2005
[46]

Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling,

A. Mukkara, N. Beckmann, M. Abeydeera, X. Ma, and D. Sanchez, “Exploiting Locality in Graph Analytics through Hardware-Accelerated Traversal Scheduling,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Oct. 2018, pp. 1–

2018
[47]

Available: https://ieeexplore.ieee.org/abstract/document/ 8574527

[Online]. Available: https://ieeexplore.ieee.org/abstract/document/ 8574527
[48]

PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates,

A. Mukkara, N. Beckmann, and D. Sanchez, “PHI: Architectural Support for Synchronization- and Bandwidth-Efficient Commutative Scatter Updates,” inProceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture, ser. MICRO ’52. New York, NY , USA: Association for Computing Machinery, Oct. 2019, pp. 1009–1022. [Online]. Available: https...

work page doi:10.1145/3352460.3358254 2019
[49]

ECMon: exposing cache events for monitoring,

V . Nagarajan and R. Gupta, “ECMon: exposing cache events for monitoring,”SIGARCH Comput. Archit. News, vol. 37, no. 3, pp. 349–360, Jun. 2009. [Online]. Available: https://dl.acm.org/doi/10.1145/ 1555815.1555798

work page arXiv 2009
[50]

Nagarajan, D

V . Nagarajan, D. J. Sorin, M. D. Hill, and D. A. Wood,A Primer on Memory Consistency and Cache Coherence, ser. Synthesis Lectures on Computer Architecture. Cham: Springer International Publishing, 2020. [Online]. Available: https://link.springer.com/10.1007/ 978-3-031-01764-3

2020
[51]

An operational semantics for C/C++11 concurrency,

K. Nienhuis, K. Memarian, and P. Sewell, “An operational semantics for C/C++11 concurrency,” inProceedings of the 2016 ACM SIGPLAN International Conference on Object-Oriented Programming, Systems, Languages, and Applications. Amsterdam Netherlands: ACM, Oct. 2016, pp. 111–128. [Online]. Available: https://dl.acm.org/doi/10.1145/ 2983990.2983997

work page arXiv 2016
[52]

HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols,

N. Oswald, V . Nagarajan, and D. J. Sorin, “HieraGen: Automated Generation of Concurrent, Hierarchical Cache Coherence Protocols,” in 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA). Valencia, Spain: IEEE, May 2020, pp. 888–899. [Online]. Available: https://ieeexplore.ieee.org/document/9138912/

work page arXiv 2020
[53]

A Better x86 Memory Model: x86-TSO,

S. Owens, S. Sarkar, and P. Sewell, “A Better x86 Memory Model: x86-TSO,” inTheorem Proving in Higher Order Logics, S. Berghofer, T. Nipkow, C. Urban, and M. Wenzel, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2009, vol. 5674, pp. 391–407, series Title: Lecture Notes in Computer Science. [Online]. Available: http://link.springer.com/10.1007/978-3...

work page doi:10.1007/978-3-642-03359-9 2009
[54]

Ivy: safety verification by interactive generalization,

O. Padon, K. L. McMillan, A. Panda, M. Sagiv, and S. Shoham, “Ivy: safety verification by interactive generalization,” inProceedings of the 37th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’16. New York, NY , USA: Association for Computing Machinery, 2016, p. 614–630. [Online]. Available: https://doi.org/10.1145/290...

work page doi:10.1145/2908080.2908118 2016
[55]

Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?

R. Peled, D. Kroening, M. Tautschnig, and Y . Vizel, “Large Lemma Miners: Can LLMs do Induction Proofs for Hardware?” Nov. 2025, arXiv:2511.02521 [cs]. [Online]. Available: http://arxiv.org/abs/2511. 02521

work page internal anchor Pith review Pith/arXiv arXiv 2025
[56]

ACM Program

C. Pulte, S. Flur, W. Deacon, J. French, S. Sarkar, and P. Sewell, “Simplifying ARM concurrency: multicopy-atomic axiomatic and operational models for ARMv8,”Proceedings of the ACM on Programming Languages, vol. 2, no. POPL, pp. 1–29, Jan. 2018. [Online]. Available: https://dl.acm.org/doi/10.1145/3158107

work page doi:10.1145/3158107 2018
[57]

Understanding POWER multiprocessors,

S. Sarkar, P. Sewell, J. Alglave, L. Maranget, and D. Williams, “Understanding POWER multiprocessors,” inProceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’11. New York, NY , USA: Association for Computing Machinery, Jun. 2011, pp. 175–186. [Online]. Available: https://doi.org/10.1145/1993498.1993520

work page doi:10.1145/1993498.1993520 2011
[58]

Leviathan: A Unified System for General-Purpose Near-Data Computing,

B. C. Schwedock and N. Beckmann, “Leviathan: A Unified System for General-Purpose Near-Data Computing,” in2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO), Nov. 2024, pp. 1278–1294, iSSN: 2379-3155. [Online]. Available: https://ieeexplore.ieee.org/document/10764520

work page arXiv 2024
[59]

t ¨ak¯o: a polymorphic cache hierarchy for general-purpose optimization of data movement,

B. C. Schwedock, P. Yoovidhya, J. Seibert, and N. Beckmann, “t ¨ak¯o: a polymorphic cache hierarchy for general-purpose optimization of data movement,” inProceedings of the 49th Annual International Symposium on Computer Architecture, ser. ISCA ’22. New York, NY , USA: Association for Computing Machinery, Jun. 2022, pp. 42–58. [Online]. Available: https:/...

work page doi:10.1145/3470496.3527379 2022
[60]

Checking Safety Properties Using Induction and a SAT-Solver,

M. Sheeran, S. Singh, and G. St ˚almarck, “Checking Safety Properties Using Induction and a SAT-Solver,” inProceedings of the Third In- ternational Conference on Formal Methods in Computer-Aided Design, ser. FMCAD ’00. Berlin, Heidelberg: Springer-Verlag, Nov. 2000, pp. 108–125

2000
[61]

Template-based synthe- sis of instruction-level abstractions for SoC verification,

P. Subramanyan, Y . Vizel, S. Ray, and S. Malik, “Template-based synthe- sis of instruction-level abstractions for SoC verification,” inProceedings of the 15th Conference on Formal Methods in Computer-Aided Design, ser. FMCAD ’15. Austin, Texas: FMCAD Inc, Sep. 2015, pp. 160–167

2015
[62]

Formalising CXL Cache Coherence,

C. Tan, A. F. Donaldson, and J. Wickerson, “Formalising CXL Cache Coherence,” inProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. New York, NY , USA: Association for Computing Machinery, Mar. 2025, pp. 437–450. [Online]. Available: https://doi.org/10.1145/3676641.3715999

work page doi:10.1145/3676641.3715999 2025
[63]

On the calculus of relations,

A. Tarski, “On the calculus of relations,”Journal of Symbolic Logic, vol. 6, no. 3, pp. 73–89, Sep. 1941. [Online]. Available: https://www.cambridge.org/core/product/identifier/ S0022481200106528/type/journal article

1941
[64]

TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA,

C. Trippel, Y . A. Manerkar, D. Lustig, M. Pellauer, and M. Martonosi, “TriCheck: Memory Model Verification at the Trisection of Software, Hardware, and ISA,” inProceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems. Xi’an China: ACM, Apr. 2017, pp. 119–133. [Online]. Available: h...

work page doi:10.1145/3037697.3037719 2017
[65]

Client-optimized algorithms and acceleration for encrypted compute offloading,

M. van der Hagen and B. Lucia, “Client-optimized algorithms and acceleration for encrypted compute offloading,” inProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, ser. ASPLOS ’22. New York, NY , USA: Association for Computing Machinery, Feb. 2022, pp. 683–696. [Online]. Availabl...

work page arXiv 2022
[66]

Modular deduc- tive verification of multiprocessor hardware designs,

M. Vijayaraghavan, A. Chlipala, Arvind, and N. Dave, “Modular deduc- tive verification of multiprocessor hardware designs,” inComputer Aided Verification, D. Kroening and C. S. P ˘as˘areanu, Eds. Cham: Springer International Publishing, 2015, pp. 109–127

2015
[67]

Specification and Verification of Side-channel Security for Open- source Processors via Leakage Contracts,

Z. Wang, G. Mohr, K. von Gleissenthall, J. Reineke, and M. Guarnieri, “Specification and Verification of Side-channel Security for Open- source Processors via Leakage Contracts,” inProceedings of the 2023 ACM SIGSAC Conference on Computer and Communications Security, ser. CCS ’23. New York, NY , USA: Association for Computing Machinery, Nov. 2023, pp. 212...

work page doi:10.1145/3576915.3623192 2023
[68]

Remote-scope promotion: clarified, rectified, and verified,

J. Wickerson, M. Batty, B. M. Beckmann, and A. F. Donaldson, “Remote-scope promotion: clarified, rectified, and verified,”SIGPLAN Not., vol. 50, no. 10, pp. 731–747, Oct. 2015. [Online]. Available: https://doi.org/10.1145/2858965.2814283

work page doi:10.1145/2858965.2814283 2015
[69]

Automatically comparing memory consistency models,

J. Wickerson, M. Batty, T. Sorensen, and G. A. Constantinides, “Automatically comparing memory consistency models,” inProceedings of the 44th ACM SIGPLAN Symposium on Principles of Programming Languages, ser. POPL ’17. New York, NY , USA: Association for Computing Machinery, Jan. 2017, pp. 190–204. [Online]. Available: https://doi.org/10.1145/3009837.3009838

work page doi:10.1145/3009837.3009838 2017
[70]

Verdi: a framework for implementing and formally verifying distributed systems,

J. R. Wilcox, D. Woos, P. Panchekha, Z. Tatlock, X. Wang, M. D. Ernst, and T. Anderson, “Verdi: a framework for implementing and formally verifying distributed systems,” inProceedings of the 36th ACM SIGPLAN Conference on Programming Language Design and Implementation, ser. PLDI ’15. New York, NY , USA: Association for Computing Machinery, Jun. 2015, pp. ...

work page doi:10.1145/2737924.2737958 2015
[71]

Pensieve: Microarchitectural Modeling for Security Evaluation,

Y . Yang, T. Bourgeat, S. Lau, and M. Yan, “Pensieve: Microarchitectural Modeling for Security Evaluation,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA ’23. New York, NY , USA: Association for Computing Machinery, Jun. 2023, pp. 1–15. [Online]. Available: https://dl.acm.org/doi/10.1145/ 3579371.3589094

work page arXiv 2023
[72]

ILA-MCM: Integrating Memory Consistency Models with Instruction-Level Abstractions for Heterogeneous System-on-Chip Verification,

H. Zhang, C. Trippel, Y . A. Manerkar, A. Gupta, M. Martonosi, and S. Malik, “ILA-MCM: Integrating Memory Consistency Models with Instruction-Level Abstractions for Heterogeneous System-on-Chip Verification,” in2018 Formal Methods in Computer Aided Design (FMCAD). Austin, TX, USA: IEEE, Oct. 2018, pp. 1–10. [Online]. Available: https://ieeexplore.ieee.org...

work page arXiv 2018
[73]

End-to-end automated exploit generation for validating the security of processor designs,

R. Zhang, C. Deutschbein, P. Huang, and C. Sturton, “End-to-end automated exploit generation for validating the security of processor designs,” in2018 51st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2018, pp. 815–827

2018
[74]

Basilisk: using provenance invariants to automate proofs of undecidable proto- cols,

T. N. Zhang, K. Singh, T. Chajed, M. Kapritsos, and B. Parno, “Basilisk: using provenance invariants to automate proofs of undecidable proto- cols,” inProceedings of the 19th USENIX Conference on Operating Systems Design and Implementation, ser. OSDI ’25. USA: USENIX Association, 2025

2025