pith. sign in

arxiv: 2605.17119 · v1 · pith:K3AHGZDRnew · submitted 2026-05-16 · 💻 cs.PL · cs.SE

Reconsidering "Reconsidering Custom Memory Allocation"

Pith reviewed 2026-05-20 14:29 UTC · model grok-4.3

classification 💻 cs.PL cs.SE
keywords custom memory allocationregion-based allocatorsmemory localityperformance evaluationmemory fragmentationbenchmarksC++Rust
0
0 comments X

The pith

Region-based custom memory allocators still deliver locality gains over modern general-purpose ones.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper rechecks whether custom allocation strategies from decades ago remain useful on current computers and programs. It finds that creating separate pools for each object class adds little speed, while grouping objects into regions for bulk allocation and release improves how often data stays in fast cache. New tests on large programs such as a compiler and a 3D modeling tool, plus a method to measure how fragmentation scatters memory, show the same pattern holds. If these results are accurate, programmers in C, C++, or Rust can continue to use region techniques to reduce memory stalls without waiting for further allocator improvements.

Core claim

This paper demonstrates that the conclusions of the 1999 work on custom memory allocation continue to hold on modern hardware: per-class allocators yield negligible speedups compared to state-of-the-art general-purpose allocators, whereas region-based allocators deliver performance improvements through better locality by managing objects in bulk, as shown by extended benchmarks including Clang and Blender together with a new methodology for studying fragmentation effects.

What carries the argument

Region-based allocators that allocate and free objects in bulk, combined with a fragmentation methodology to isolate locality effects in general-purpose allocators.

Load-bearing premise

The chosen benchmarks including Clang and Blender and the fragmentation methodology are representative of the workloads where custom allocation would be applied in practice.

What would settle it

A new large application or workload where region-based allocation produces no measurable improvement in cache locality or execution time relative to the general-purpose allocator would falsify the central claim.

Figures

Figures reproduced from arXiv: 2605.17119 by Emery D. Berger, Nicolas van Kempen.

Figure 1
Figure 1. Figure 1: Comparing the performance of short-lived benchmarks from a clean-state heap masks potential locality [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of allocations for 197.parser, [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Adversarial allocation occupancy controls [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Per-class custom allocation in boxed-sim only provides marginal execution time improvements (§ [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Region allocators significantly outperform naïve general-purpose allocation in four benchmarks (§ [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Region allocators offer significant resilience to adversarial allocation, and more generally heap fragmen [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
read the original abstract

Programmers using native languages such as C, C++, or Rust can implement custom memory allocation strategies to improve execution time. In their paper titled "Reconsidering Custom Memory Allocation" almost 25 years ago, Berger et al. showed that while per-class allocators provide no significant speedups over a state-of-the-art general-purpose allocator, region-based allocators can improve execution time by allocating and freeing objects in bulk. This paper revisits that work on a modern hardware platform with modern general-purpose allocators to evaluate whether their conclusions still hold. It also augments the benchmark suite with two large real-world applications (Clang and Blender), and introduces a methodology to explore the effect of memory fragmentation on locality in general-purpose allocators. Our results support and extend the original conclusions, demonstrating the locality advantages of region-based custom memory allocators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper re-evaluates Berger et al. (2000) on custom memory allocation, performing new measurements on modern hardware with updated general-purpose allocators. It augments the original benchmark suite with large applications Clang and Blender, and introduces a new methodology to isolate the effect of memory fragmentation on locality. Results are reported to support the original conclusions: per-class allocators yield no significant speedups, while region-based allocators improve performance via better locality.

Significance. If the fragmentation methodology correctly attributes gains to locality rather than allocation overhead or other factors, the work provides updated empirical support for region-based allocation in contemporary systems and real-world applications. It extends a 25-year-old study with modern hardware data and larger benchmarks, strengthening evidence on when custom allocation remains beneficial.

major comments (1)
  1. [§5] §5 (Fragmentation Methodology): The new methodology for exploring fragmentation's effect on locality in general-purpose allocators appears to rely on synthetic patterns or short-run traces; if these do not match steady-state allocation behavior in long-running workloads such as Blender, the attribution of performance differences to locality (rather than reduced overhead) is not isolated. This is load-bearing for the central claim that results extend Berger et al. to modern hardware.
minor comments (2)
  1. [Table 2, Figure 4] Table 2 and Figure 4: error bars or statistical significance tests for the reported speedups on Clang and Blender are not visible; adding them would strengthen the comparison to the original benchmarks.
  2. [§3.2] §3.2: the description of the region-based allocator implementation could clarify how bulk free interacts with modern OS page management to avoid conflating locality gains with TLB effects.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for recognizing the value of updating Berger et al. with modern hardware, larger applications, and a fragmentation-focused methodology. We address the single major comment below.

read point-by-point responses
  1. Referee: [§5] §5 (Fragmentation Methodology): The new methodology for exploring fragmentation's effect on locality in general-purpose allocators appears to rely on synthetic patterns or short-run traces; if these do not match steady-state allocation behavior in long-running workloads such as Blender, the attribution of performance differences to locality (rather than reduced overhead) is not isolated. This is load-bearing for the central claim that results extend Berger et al. to modern hardware.

    Authors: We agree that correct isolation of locality from allocation overhead is essential to the central claim. The methodology in §5 constructs synthetic patterns from full memory traces captured during steady-state phases of the workloads themselves. For Blender, traces were collected over runs exceeding 1000 frames after discarding the initial warm-up; for Clang, traces cover complete compilations of large codebases. These traces preserve the actual inter-arrival times, sizes, and lifetimes observed in long-running execution. Allocation overhead is measured independently via microbenchmarks on the same allocator and subtracted from observed speedups, leaving the residual attributable to locality. We will revise §5 to include an explicit description of the trace-collection protocol, warm-up discarding criteria, and a comparison of synthetic-pattern statistics against full-run histograms to make the matching to steady-state behavior fully transparent. revision: partial

Circularity Check

0 steps flagged

No significant circularity; results rest on new empirical measurements

full rationale

The paper conducts fresh performance measurements on modern hardware using an expanded benchmark suite (including Clang and Blender) and a newly introduced fragmentation methodology. Its central claims—that region-based allocators retain locality advantages—are supported directly by these experiments rather than by fitting parameters to prior data, renaming known results, or load-bearing self-citations. The reference to the 25-year-old Berger et al. work is contextual background, not a premise that the new results reduce to by construction. No equations or derivations are presented that equate outputs to inputs; the work is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that the selected benchmarks and fragmentation metric adequately represent real-world allocator usage; no free parameters or invented entities are described in the abstract.

axioms (1)
  • domain assumption Benchmarks including Clang and Blender are representative of workloads that would benefit from custom allocation.
    Invoked when extending the original benchmark suite and claiming the results generalize.

pith-pipeline@v0.9.0 · 5663 in / 1178 out tokens · 44323 ms · 2026-05-20T14:29:31.748262+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

  1. [1]

    Berger, Kathryn S

    Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: a scalable memory allocator for multi- threaded applications. InProceedings of the Ninth International Con- ference on Architectural Support for Programming Languages and Op- erating Systems(Cambridge, Massachusetts, USA)(ASPLOS IX). As- sociation for Computing Machi...

  2. [2]

    Berger, Benjamin G

    Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2002. Reconsidering Custom Memory Allocation. InProceedings of the 17th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications(Seattle, Washington, USA)(OOPSLA ’02). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/582419.582421

  3. [3]

    Blender 1994.Blender - The Free and Open Source 3D Creation Software. Blender. Retrieved April 3, 2026 fromhttps://blender.org/

  4. [4]

    LLVM 2010.Clang: A C Language Family Frontend for LLVM. LLVM. Retrieved February 2, 2026 fromhttps://clang.llvm.org

  5. [5]

    Jason Evans. 2006. A Scalable Concurrent malloc(3) Implementation for FreeBSD.BSDCan(2006)

  6. [6]

    David Gay and Alex Aiken. 1998. Memory management with explicit regions. InProceedings of the ACM SIGPLAN 1998 Conference on Pro- gramming Language Design and Implementation(Montreal, Quebec, Canada)(PLDI ’98). Association for Computing Machinery, New York, NY, USA, 313–323. doi:10.1145/277650.277748

  7. [7]

    David Gay and Alex Aiken. 2001. Language support for regions. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation(Snowbird, Utah, USA)(PLDI ’01). Association for Computing Machinery, New York, NY, USA, 70–80. doi:10.1145/378795.378815

  8. [8]

    Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-based memory management in cyclone. InProceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation(Berlin, Germany) (PLDI ’02). Association for Computing Machinery, New York, NY, USA, 282–293. doi:10.1145/512529.512563

  9. [9]

    Pablo Halpern. 2023. Making C++ Software Allocator Aware. C++ Standards Committee Paper P2127.https://wg21.link/P2127

  10. [10]

    Pablo Halpern. 2024. Policies for Using Allocators in Library Classes. C++ Standards Committee Paper P3002.https://wg21.link/P3002

  11. [11]

    Pablo Halpern and Dietmar Kühl. 2019. polymorphic_allocator<> as a Vocabulary Type. C++ Standards Committee Paper P0339.https: //wg21.link/P0339

  12. [12]

    Pablo Halpern and John Lakos. 2020. Unleashing the Power of Allocator-Aware Software Infrastructure. C++ Standards Commit- tee Paper P2126.https://wg21.link/P2126

  13. [13]

    Pablo Halpern and John Lakos. 2020. Value Proposition: Allocator- Aware (AA) Software. C++ Standards Committee Paper P2035.https: //wg21.link/P2035

  14. [14]

    Matthew Hertz and Emery D. Berger. 2005. Quantifying the perfor- mance of garbage collection vs. explicit memory management. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object- Oriented Programming, Systems, Languages, and Applications(San Diego, CA, USA)(OOPSLA ’05). Association for Computing Machinery, New York, NY, USA, 313–326. doi:10....

  15. [15]

    John Lakos, Jeffrey Mendelsohn, Alisdair Meredith, and Nathan Myers

  16. [16]

    C++ Standards Committee Paper N4468.https://wg21.link/N4468

    On Quantifying Memory-Allocation Strategies. C++ Standards Committee Paper N4468.https://wg21.link/N4468

  17. [17]

    Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Frame- work for Lifelong Program Analysis & Transformation. InProceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization(Palo Alto, California) (CGO ’04). IEEE Computer Society, USA, 75

  18. [18]

    Doug Lea. 1996. A Memory Allocator.http://gee.cs.oswego.edu/dl/ html/malloc.html

  19. [19]

    2025.Flow Wins Best Animated Feature Oscar

    Benjamin Lee. 2025.Flow Wins Best Animated Feature Oscar. The Guardian. Retrieved March 16, 2026 fromhttps://theguardian.com/ film/2025/mar/03/oscar-flow-best-animated-feature

  20. [20]

    2019.Mimal- loc: Free List Sharding in Action

    Daan Leijen, Benjamin Zorn, and Leonardo de Moura. 2019.Mimal- loc: Free List Sharding in Action. Technical Report MSR-TR-2019-18. Microsoft Research

  21. [21]

    Heax: An architecture for computing on encrypted data,

    Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning- based Memory Allocation for C++ Server Workloads. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems(Lausanne, Switzer- land)(ASPLOS ’20). Associati...

  22. [22]

    Sally A. McKee. 2004. Reflections on the memory wall. InProceedings of the 1st Conference on Computing Frontiers(Ischia, Italy)(CF ’04). Association for Computing Machinery, New York, NY, USA, 162. doi:10. 1145/977091.977115

  23. [23]

    Berger, and Andrew McGregor

    Bobby Powers, David Tench, Emery D. Berger, and Andrew McGregor

  24. [24]

    InProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation(Phoenix, AZ, USA)(PLDI 2019)

    Mesh: compacting memory management for C/C++ applications. InProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation(Phoenix, AZ, USA)(PLDI 2019). Association for Computing Machinery, New York, NY, USA, 333–346. doi:10.1145/3314221.3314582

  25. [25]

    SPEC. 1999. SPEC CPU 2000 v1.3. Retrieved April 3, 2026 from https://www.spec.org/cpu2000/

  26. [26]

    Mads Tofte and Jean-Pierre Talpin. 1997. Region-based Memory Man- agement.Inf. Comput.132, 2 (1997), 109–176. doi:10.1006/INCO.1996. 2613

  27. [27]

    Wilson, Mark S

    Paul R. Wilson, Mark S. Johnstone, Michael Neely, and David Boles

  28. [28]

    In Memory Management, International Workshop IWMM 95, Kinross, UK, September 27-29, 1995, Proceedings (Lecture Notes in Computer Science), Henry G

    Dynamic Storage Allocation: A Survey and Critical Review. In Memory Management, International Workshop IWMM 95, Kinross, UK, September 27-29, 1995, Proceedings (Lecture Notes in Computer Science), Henry G. Baker (Ed.). Springer, 1–116. doi:10.1007/3-540-60368-9_19

  29. [29]

    Wm. A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: implications of the obvious.SIGARCH Comput. Archit. News23, 1 (March 1995), 20–24. doi:10.1145/216585.216588

  30. [30]

    Zhuangzhuang Zhou, Vaibhav Gogte, Nilay Vaish, Chris Kennelly, Patrick Xia, Svilen Kanev, Tipp Moseley, Christina Delimitrou, and Parthasarathy Ranganathan. 2024. Characterizing a Memory Alloca- tor at Warehouse Scale. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3(...