Optimized Memory Tagging on AmpereOne Processors
Pith reviewed 2026-05-17 19:48 UTC · model grok-4.3
The pith
AmpereOne implements memory tagging with no capacity overhead and single-digit performance cost.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The AmpereOne processor is the first datacenter processor to support MTE. Its optimized implementation uniquely incurs no memory capacity overhead for tag storage and provides synchronous tag-checking with single-digit performance impact across a broad range of datacenter class workloads. The paper analyzes the complete hardware-software stack and identifies application memory management as the primary remaining source of overhead, highlighting opportunities for software optimization.
What carries the argument
The optimized MTE hardware in AmpereOne that integrates tag storage without extra memory capacity and supports efficient synchronous checking.
Load-bearing premise
The selected workloads and measurement methods represent typical production datacenter usage and the optimizations introduce no new compatibility or security issues.
What would settle it
A production datacenter workload that shows measurable memory capacity overhead for tags or performance impact beyond single digits would falsify the central performance and capacity claims.
Figures
read the original abstract
Memory-safety escapes continue to form the launching pad for a wide range of security attacks, especially for the substantial base of deployed software that is coded in pointer-based languages such as C/C++. Although compiler and Instruction Set Architecture (ISA) extensions have been introduced to address elements of this issue, the overhead and/or comprehensive applicability have limited broad production deployment. The Memory Tagging Extension (MTE) to the ARM AArch64 Instruction Set Architecture is a valuable tool to address memory-safety escapes; when used in synchronous tag-checking mode, MTE provides deterministic detection and prevention of sequential buffer overflow attacks, and probabilistic detection and prevention of exploits resulting from temporal use-after-free pointer programming bugs. The AmpereOne processor, launched in 2024, is the first datacenter processor to support MTE. Its optimized MTE implementation uniquely incurs no memory capacity overhead for tag storage and provides synchronous tag-checking with single-digit performance impact across a broad range of datacenter class workloads. Furthermore, this paper analyzes the complete hardware-software stack, identifying application memory management as the primary remaining source of overhead and highlighting clear opportunities for software optimization. The combination of an efficient hardware foundation and a clear path for software improvement makes the MTE implementation of the AmpereOne processor highly attractive for deployment in production cloud environments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript describes the AmpereOne processor's optimized implementation of the ARM Memory Tagging Extension (MTE). It claims that this design incurs no memory capacity overhead for tag storage and delivers synchronous tag-checking with single-digit performance impact across a broad range of datacenter-class workloads. The paper analyzes the full hardware-software stack, identifies application memory management as the dominant remaining overhead source, and outlines opportunities for software-level optimizations.
Significance. If the central claims hold, the work is significant for computer architecture and systems security: it provides the first detailed evaluation of production-grade MTE on a datacenter processor, showing that hardware memory tagging can be deployed with negligible capacity and performance cost. The hardware-software co-analysis supplies concrete guidance for reducing overhead further, which could accelerate adoption of deterministic memory-safety mechanisms in cloud environments.
major comments (2)
- [§5] §5 (Performance Evaluation) and associated tables: the single-digit overhead claim is presented without error bars, number of measurement repetitions, or statistical details on variability. Because this result is load-bearing for the abstract's performance conclusion, the absence of these elements prevents assessment of whether the reported impact is robust or sensitive to workload selection.
- [§4] §4 (Workload Selection and Methodology): the paper does not demonstrate that the chosen benchmarks include high-allocation-rate, pointer-intensive, or long-running server applications whose memory-management patterns match production datacenter usage. If allocation churn or temporal-safety stress is under-sampled, the measured overhead may underestimate costs once application memory managers are updated to use MTE.
minor comments (2)
- [Abstract] Abstract: the phrase 'broad range of datacenter class workloads' is used without even a high-level enumeration or forward reference to the specific benchmarks; adding this would improve immediate clarity.
- [§6] §6 (Software Optimization Opportunities): several directions are listed but lack even brief pseudocode or quantitative estimates of expected gains, making the section less actionable.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below and have revised the manuscript to incorporate additional statistical details and methodological clarifications where feasible.
read point-by-point responses
-
Referee: [§5] §5 (Performance Evaluation) and associated tables: the single-digit overhead claim is presented without error bars, number of measurement repetitions, or statistical details on variability. Because this result is load-bearing for the abstract's performance conclusion, the absence of these elements prevents assessment of whether the reported impact is robust or sensitive to workload selection.
Authors: We agree that the absence of error bars and repetition counts limits the ability to evaluate robustness. Each benchmark was run multiple times under controlled conditions to confirm result stability, but these details were omitted from the original submission. In the revised version we will add error bars (standard deviation across runs) to all relevant tables and figures in §5, explicitly state the number of repetitions performed for each workload, and include a brief discussion of observed variability. This directly strengthens the single-digit overhead claim without altering the reported results. revision: yes
-
Referee: [§4] §4 (Workload Selection and Methodology): the paper does not demonstrate that the chosen benchmarks include high-allocation-rate, pointer-intensive, or long-running server applications whose memory-management patterns match production datacenter usage. If allocation churn or temporal-safety stress is under-sampled, the measured overhead may underestimate costs once application memory managers are updated to use MTE.
Authors: The selected workloads were chosen to span a range of datacenter-relevant behaviors, including server-style applications with non-trivial allocation activity, and our analysis already identifies application memory management as the dominant remaining overhead source. However, we acknowledge that an explicit characterization of allocation rates and pointer intensity relative to production traces would improve transparency. In the revision we will expand §4 with a new table summarizing allocation frequency, heap churn, and pointer density for each benchmark, together with a justification of their representativeness. We do not believe the current measurements systematically underestimate future costs, because the paper already highlights memory-management patterns as the primary bottleneck and the hardware overhead remains low even under the tested conditions. revision: partial
Circularity Check
No significant circularity in derivation chain
full rationale
The paper is a hardware evaluation describing AmpereOne MTE design choices and reporting direct benchmark measurements. No equations, fitted parameters, self-referential definitions, or load-bearing self-citations appear in the provided abstract or described content. Central claims rest on empirical results and hardware specifics rather than reducing to inputs by construction. This matches the expected pattern for a non-circular hardware paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption MTE in synchronous mode provides deterministic detection of sequential buffer overflows and probabilistic detection of temporal use-after-free bugs.
Forward citations
Cited by 1 Pith paper
-
SPEC CPU: The Next Generation
SPEC CPU 2026 presents a new benchmark suite using open-source apps, expanded multithreading, and Rolling-Round-Robin Rate to address gaps in evaluating heterogeneous multiprogrammed CPU performance.
Reference graph
Works this paper leans on
-
[1]
L. Szekeres, M. Payer, T. Wei, and D. Song, “SoK: Eternal War in Memory,” in2013 IEEE Symposium on Security and Privacy, 2013, pp. 48–62.DOI: 10.1109/SP.2013.13
-
[2]
The Chromium Projects: Chromium Security: Memory Safety
Google. “The Chromium Projects: Chromium Security: Memory Safety. ”[Online]. Available: https : / / www. chromium . org / Home / chromium-security/memory-safety/
-
[3]
Trends, challenges, and strategic shifts in the software vulnerability mitigation landscape
M. Miller. “Trends, challenges, and strategic shifts in the software vulnerability mitigation landscape. ”[Online]. Available: https : / / github . com / Microsoft / MSRC - Security - Research / blob / master / presentations/2019_02_BlueHatIL/2019_02%20- %20BlueHatIL% 20 - %20Trends % 2C % 20challenge % 2C % 20and % 20shifts % 20in % 20software%20vulnerab...
-
[4]
Memory Safety: Memory Unsafety
Google. “Memory Safety: Memory Unsafety. ”[Online]. Available: https://source.android.com/docs/security/test/memory-safety
-
[5]
S. Klabnik, C. Nichols, and C. Krycho. “The Rust Programming Language. ”[Online]. Available: https : / / doc . rust - lang . org / stable / book/index.html
-
[6]
“The Rust Language Reference. ”[Online]. Available: https://github. com/rust-lang/reference
-
[7]
In Rust We Trust: Microsoft Azure CTO shuns C and C++
T. Claburn. “In Rust We Trust: Microsoft Azure CTO shuns C and C++. ”[Online]. Available: https://www.theregister.com/2022/09/20/ rust_microsoft_c/
work page 2022
-
[8]
Linus Torvalds talks AI, Rust adoption, and why the Linux kernel is the only thing that matters
S. Vaughan-Nichols. “Linus Torvalds talks AI, Rust adoption, and why the Linux kernel is the only thing that matters. ”[Online]. Available: https://www.zdnet.com/article/linus-torvalds-talks-ai-rust- adoption-and-why-the-linux-kernel-is-the-only-thing-that-matters
-
[9]
Safer with Google: Advancing Memory Safety
R. Al, C. Carruth, J. Engel, and A. Qin. “Safer with Google: Advancing Memory Safety. ”[Online]. Available: https : / / security. googleblog.com/2024/10/safer-with-google-advancing-memory.html
work page 2024
-
[10]
Address Space Layout Randomization
“Address Space Layout Randomization. ”[Online]. Available: https: //pax.grsecurity.net/docs/aslr.txt
-
[11]
Stackguard: Automatic adaptive detection and prevention of buffer-overflow attacks,
C. Cowan et al., “Stackguard: Automatic adaptive detection and prevention of buffer-overflow attacks,” inProceedings of the 7th Con- ference on USENIX Security Symposium - Volume 7, ser. SSYM’98, USENIX Association, 1998. [Online]. Available: https : / / api . semanticscholar.org/CorpusID:2358856
work page 1998
-
[12]
Security Analysis of Processor Instruction Set Architecture for Enforcing Control-Flow In- tegrity,
V . Shanbhogue, D. Gupta, and R. Sahita, “Security Analysis of Processor Instruction Set Architecture for Enforcing Control-Flow In- tegrity,” inHASP’19: Proceedings of the 8th International Workshop on Hardware and Architectural Support for Security and Privacy, 2019.DOI: 10.1145/3337167.3337175
-
[13]
ARM. “Better Security at the Flick of a (Compiler) Switch: Enabling Pointer Authentication and Branch Target Identification. ”[Online]. Available: https://newsroom.arm.com/blog/pac-bti
-
[14]
Clang 22.0.0.git documentation: Address Sanitizer
“Clang 22.0.0.git documentation: Address Sanitizer. ”[Online]. Avail- able: https://clang.llvm.org/docs/AddressSanitizer.html
-
[15]
Program Instrumentation Options
“Program Instrumentation Options. ”[Online]. Available: https://gcc. gnu.org/onlinedocs/gcc/Instrumentation-Options.html
-
[16]
Kernel Address Sanitizer (KASAN)
“Kernel Address Sanitizer (KASAN). ”[Online]. Available: https:// www.kernel.org/doc/html/latest/dev-tools/kasan.html
-
[17]
Support for Intel® Memory Protection Extensions (Intel® MPX) Technology
Intel. “Support for Intel® Memory Protection Extensions (Intel® MPX) Technology. ”[Online]. Available: https : / / www. intel . com / content/www/us/en/support/articles/000059823/processors.html
-
[18]
SPARC Architecture: Application Data Integrity
“SPARC Architecture: Application Data Integrity. ”[Online]. Avail- able: https://www.kernel.org/doc/html/v6.13-rc7/arch/sparc/adi.html
-
[19]
Memory Tagging and how it improves C/C++ memory safety
K. Serebryany, E. Stepanov, A. Shylapnikov, V . Tsyrklevich, and D. Vyukov. “Memory Tagging and How it Improves C/C++ Memory Safety.” arXiv: 1802.09517
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
A. Evangelista. “ChkTag: x86 Memory Safety. ”[Online]. Available: https://community.intel.com/t5/Blogs/Tech- Innovation/open- intel/ ChkTag-x86-Memory-Safety/post/1721490 11
-
[21]
Arm A-Profile Architecture Developments 2018: Armv8.5-A
M. Gretto-Dann. “Arm A-Profile Architecture Developments 2018: Armv8.5-A. ”[Online]. Available: https : / / developer . arm . com / community/arm- community- blogs/b/architectures- and- processors- blog/posts/arm-a-profile-architecture-2018-developments-armv85a
work page 2018
-
[22]
Armv8.5-A Memory Tagging Extension
ARM. “Armv8.5-A Memory Tagging Extension. ”[Online]. Available: https://developer.arm.com/documentation/102925/0100
-
[23]
H. Liljestrand, C. Chinea, R. Denis-Courmont, J.-E. Ekberg, and N. Asokan,Color My World: Deterministic Tagging for Memory Safety,
- [24]
-
[25]
Ampere. “AmpereOne-M® Product Brief. ”[Online]. Available: https: //amperecomputing.com/briefs/ampereone-m-product-brief
-
[26]
ARM Architecture Manual for A-profile Architecture, Section D8.8
ARM. “ARM Architecture Manual for A-profile Architecture, Section D8.8 "Address Tagging". ”[Online]. Available: https://developer.arm. com/documentation/ddi0487/latest
-
[27]
Security Analysis of Memory Tagging
J. Bialek, K. Johnson, M. Miller, and T. Chen. “Security Analysis of Memory Tagging. ”[Online]. Available: https://github.com/microsoft/ MSRC - Security - Research / blob / master / papers / 2020 / Security % 20analysis%20of%20memory%20tagging.pdf
work page 2020
-
[28]
Meet Pixel 8 and Pixel 8 Pro, our newest phones
B. Rakowski. “Meet Pixel 8 and Pixel 8 Pro, our newest phones. ”[Online]. Available: https://blog.google/products/pixel/google-pixel- 8-pro/
-
[29]
First handset with MTE on the market
M. Brand. “First handset with MTE on the market. ”[Online]. Avail- able: https://googleprojectzero.blogspot.com/2023/11/first-handset- with-mte-on-market.html
work page 2023
-
[30]
Memory Tagging Extension on Google Pixel 8
R. Lopez Mendez. “Memory Tagging Extension on Google Pixel 8. ”[Online]. Available: https://learn.arm.com/learning- paths/mobile- graphics-and-gaming/mte_on_pixel8/
-
[31]
Memory Integrity Enforcement: A complete vision for memory safety in Apple devices
Apple Security Engineering and Architecture. “Memory Integrity Enforcement: A complete vision for memory safety in Apple devices. ”[Online]. Available: https : / / security . apple . com / blog / memory - integrity-enforcement/
-
[32]
A tree clock data structure for causal orderings in concurrent ex- ecutions
J. Weiner et al., “TMO: Transparent Memory Offloading in Data- centers,” inASPLOS’22: Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, 2022.DOI: 10.1145/3503222.3507731
-
[33]
Sizes for virtual machines in Azure
Microsoft. “Sizes for virtual machines in Azure. ”[Online]. Available: https : / / learn . microsoft . com / en - us / azure / virtual - machines / sizes / overview
-
[34]
Google Cloud: Machine families resource and comparison guide
Google. “Google Cloud: Machine families resource and comparison guide. ”[Online]. Available: https://cloud.google.com/compute/docs/ machine-resource
-
[35]
Oracle Cloud Infrastructure Documentation: Compute Shapes
Oracle. “Oracle Cloud Infrastructure Documentation: Compute Shapes. ”[Online]. Available: https://docs.oracle.com/en- us/iaas/ Content/Compute/References/computeshapes.htm
-
[36]
Review of Memory RAS for Data Centers,
J. Lee, M. J. Kim, W.-S. Kim, and Y . S. Kim, “Review of Memory RAS for Data Centers,”IEEE Access, vol. 11, pp. 124 782–124 796, 2023.DOI: 10.1109/ACCESS.2023.3329984
-
[37]
E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura, and R. Bianchini, “Resource Central: Understanding and Predicting Workloads for Improved Resource Management in Large Cloud Platforms,” inSOSP’17: Proceedings of the 26th Symposium on Operating Systems Principles, 2017.DOI: 10.1145/3132747.3132772
- [38]
-
[39]
Chadwick, Na- talia Nottingham, Tanay Roy, Ziqian Li, David Schuster, Frederic T
M. B. Sullivan, M. T. I. Ziad, A. Jaleel, and S. W. Keckler, “Implicit Memory Tagging: No-Overhead Memory Safety Using Alias-Free Tagged ECC,” inProceedings of the 50th Annual International Sym- posium on Computer Architecture, ser. ISCA ’23, Orlando, FL, USA: Association for Computing Machinery, 2023,ISBN: 9798400700958. DOI: 10.1145/3579371.3589102
-
[40]
V oodoo: Memory Tagging, Authenticated Encryption, and Error Correction through MAGIC,
L. Lamster, M. Unterguggenberger, D. Schrammel, and S. Mangard, “V oodoo: Memory Tagging, Authenticated Encryption, and Error Correction through MAGIC,” in33rd USENIX Security Symposium (USENIX Security 24), Philadelphia, PA: USENIX Association, Aug. 2024,ISBN: 978-1-939133-44-1. [Online]. Available: https://www. usenix.org/conference/usenixsecurity24/pre...
work page 2024
-
[41]
M. Sutera, N. Aboulenein, and S. Brahmadathan, “Integrated error correction code (ECC) and parity protection in memory control circuits for increased memory utilization,” pat. U.S. Patent No. 12,204,410, 2022. [Online]. Available: https://patents.google.com/ patent/US12204410B2/en
work page 2022
-
[42]
AmpereOne® Family 2U Mt. Mitchell Refer- ence Platform Brief
Ampere Computing. “AmpereOne® Family 2U Mt. Mitchell Refer- ence Platform Brief. ”[Online]. Available: https://amperecomputing. com/customer-connect/products/mt-mitchell
-
[43]
Memory Tagging Extension (MTE) in AArch64 Linux
C. M. Vincenzo Frascino. “Memory Tagging Extension (MTE) in AArch64 Linux. ”[Online]. Available: https://docs.kernel.org/arch/ arm64/memory-tagging-extension.html
-
[44]
Memory Tagging Extension (MTE) in AArch64 Linux
“Memory Tagging Extension (MTE) in AArch64 Linux. ”[Online]. Available: https://gcc.gnu.org/wiki/MTE
-
[45]
“Memory Related Tunables. ”[Online]. Available: https://sourceware. org/glibc/manual/2.33/html_node/Memory-Related-Tunables.html
-
[46]
ARM MTE Performance in Practice,
T. Noh et al., “ARM MTE Performance in Practice,” in35th USENIX Security Symposium (USENIX Security 26), USENIX Association, 2026
work page 2026
-
[47]
memcached: distributed memory object caching system
“memcached: distributed memory object caching system. ”[Online]. Available: https://memcached.org
-
[48]
NoSQL Redis and Memcache traffic generation and benchmarking tool
“NoSQL Redis and Memcache traffic generation and benchmarking tool. ”[Online]. Available: https://github.com/RedisLabs/memtier_ benchmark
-
[49]
Redis: The Real Time Data Platform
“Redis: The Real Time Data Platform. ”[Online]. Available: https: //redis.io
-
[50]
A Scalable Concurrent malloc(3) Implementation for FreeBSD,
J. Evans, “A Scalable Concurrent malloc(3) Implementation for FreeBSD,” inProceedings of the BSDCan Conference, Ottawa, Canada, Jan. 2006. [Online]. Available: https://papers.freebsd.org/ 2006/bsdcan/evans-jemalloc.files/evans-jemalloc-paper.pdf
work page 2006
-
[51]
vbench: Benchmarking Video Transcoding in the Cloud,
A. Lottarini et al., “vbench: Benchmarking Video Transcoding in the Cloud,” ser. ASPLOS ’18, Williamsburg, V A, USA: Association for Computing Machinery, 2018, 797–809,ISBN: 9781450349116.DOI: 10.1145/3173162.3173207
- [52]
-
[53]
wrk - a HTTP benchmarking tool
W. Glozer. “wrk - a HTTP benchmarking tool. ”[Online]. Available: https://github.com/wg/wrk
-
[54]
ARM Takes Wing: Qualcomm vs. Intel CPU compar- ison
V . Krasnov. “ARM Takes Wing: Qualcomm vs. Intel CPU compar- ison. ”[Online]. Available: https://blog.cloudflare.com/arm- takes- wing/#nginx
- [55]
-
[56]
sysbench: Scriptable database and system performance benchmark
A. Kopytov. “sysbench: Scriptable database and system performance benchmark. ”[Online]. Available: https : / / github . com / akopytov / sysbench
-
[57]
PostgreSQL: The World’s Most Advanced Open Source Relational Database
“PostgreSQL: The World’s Most Advanced Open Source Relational Database. ”[Online]. Available: https://www.postgresql.org
-
[58]
HammerDB: The industry standard open-source database bench- mark
“HammerDB: The industry standard open-source database bench- mark. ”[Online]. Available: https://www.hammerdb.com
-
[59]
SPEC CPU2017: Next-Generation Compute Benchmark,
J. Bucek, K.-D. Lange, and J. v. Kistowski, “SPEC CPU2017: Next-Generation Compute Benchmark,” inCompanion of the 2018 ACM/SPEC International Conference on Performance Engineering, ser. ICPE ’18, Berlin, Germany: Association for Computing Machin- ery, 2018, 41–42,ISBN: 9781450356299.DOI: 10 . 1145 / 3185768 . 3185771
work page 2018
-
[60]
P.-H. Kamp. “ministat – statistics utility. ”[Online]. Available: https: //man.freebsd.org/cgi/man.cgi?query=ministat
-
[61]
W. S. Gosset. “Student’s T-Test. ”[Online]. Available: https : / / en . wikipedia.org/wiki/Student%27s_t-test
-
[62]
Hardware-based Always-On Heap Memory Safety,
Y . Kim, J. Lee, and H. Kim, “Hardware-based Always-On Heap Memory Safety,” in2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020, pp. 1153–1166. DOI: 10.1109/MICRO50266.2020.00095
-
[63]
Strengthening the Shield: MTE in Heap Allo- cators
DARKNA VY .org. “Strengthening the Shield: MTE in Heap Allo- cators. ”[Online]. Available: https : / / www . darknavy . org / blog / strengthening_the_shield_mte_in_memory_allocators/
-
[64]
org / docs / ScudoHardenedAllocator.html
llvm.org,Scudo hardened allocator, https : / / llvm . org / docs / ScudoHardenedAllocator.html
-
[65]
AMBA CHI Architecture Specification
ARM. “AMBA CHI Architecture Specification. ”[Online]. Available: https://developer.arm.com/documentation/ihi0050/h
-
[66]
Compute Express Link: About CXL
“Compute Express Link: About CXL. ”[Online]. Available: https : //computeexpresslink.org/about-cxl/
-
[67]
CXL Consortium Announces Compute Express Link 3.2 Specifica- tion Release
“CXL Consortium Announces Compute Express Link 3.2 Specifica- tion Release. ”[Online]. Available: https://computeexpresslink.org/ wp - content / uploads / 2024 / 12 / CXL _ 3 . 2 - Spec - Announcement _ FINAL-1.pdf 12
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.