Reconsidering "Reconsidering Custom Memory Allocation"
Pith reviewed 2026-05-20 14:29 UTC · model grok-4.3
The pith
Region-based custom memory allocators still deliver locality gains over modern general-purpose ones.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
This paper demonstrates that the conclusions of the 1999 work on custom memory allocation continue to hold on modern hardware: per-class allocators yield negligible speedups compared to state-of-the-art general-purpose allocators, whereas region-based allocators deliver performance improvements through better locality by managing objects in bulk, as shown by extended benchmarks including Clang and Blender together with a new methodology for studying fragmentation effects.
What carries the argument
Region-based allocators that allocate and free objects in bulk, combined with a fragmentation methodology to isolate locality effects in general-purpose allocators.
Load-bearing premise
The chosen benchmarks including Clang and Blender and the fragmentation methodology are representative of the workloads where custom allocation would be applied in practice.
What would settle it
A new large application or workload where region-based allocation produces no measurable improvement in cache locality or execution time relative to the general-purpose allocator would falsify the central claim.
Figures
read the original abstract
Programmers using native languages such as C, C++, or Rust can implement custom memory allocation strategies to improve execution time. In their paper titled "Reconsidering Custom Memory Allocation" almost 25 years ago, Berger et al. showed that while per-class allocators provide no significant speedups over a state-of-the-art general-purpose allocator, region-based allocators can improve execution time by allocating and freeing objects in bulk. This paper revisits that work on a modern hardware platform with modern general-purpose allocators to evaluate whether their conclusions still hold. It also augments the benchmark suite with two large real-world applications (Clang and Blender), and introduces a methodology to explore the effect of memory fragmentation on locality in general-purpose allocators. Our results support and extend the original conclusions, demonstrating the locality advantages of region-based custom memory allocators.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper re-evaluates Berger et al. (2000) on custom memory allocation, performing new measurements on modern hardware with updated general-purpose allocators. It augments the original benchmark suite with large applications Clang and Blender, and introduces a new methodology to isolate the effect of memory fragmentation on locality. Results are reported to support the original conclusions: per-class allocators yield no significant speedups, while region-based allocators improve performance via better locality.
Significance. If the fragmentation methodology correctly attributes gains to locality rather than allocation overhead or other factors, the work provides updated empirical support for region-based allocation in contemporary systems and real-world applications. It extends a 25-year-old study with modern hardware data and larger benchmarks, strengthening evidence on when custom allocation remains beneficial.
major comments (1)
- [§5] §5 (Fragmentation Methodology): The new methodology for exploring fragmentation's effect on locality in general-purpose allocators appears to rely on synthetic patterns or short-run traces; if these do not match steady-state allocation behavior in long-running workloads such as Blender, the attribution of performance differences to locality (rather than reduced overhead) is not isolated. This is load-bearing for the central claim that results extend Berger et al. to modern hardware.
minor comments (2)
- [Table 2, Figure 4] Table 2 and Figure 4: error bars or statistical significance tests for the reported speedups on Clang and Blender are not visible; adding them would strengthen the comparison to the original benchmarks.
- [§3.2] §3.2: the description of the region-based allocator implementation could clarify how bulk free interacts with modern OS page management to avoid conflating locality gains with TLB effects.
Simulated Author's Rebuttal
We thank the referee for the constructive review and for recognizing the value of updating Berger et al. with modern hardware, larger applications, and a fragmentation-focused methodology. We address the single major comment below.
read point-by-point responses
-
Referee: [§5] §5 (Fragmentation Methodology): The new methodology for exploring fragmentation's effect on locality in general-purpose allocators appears to rely on synthetic patterns or short-run traces; if these do not match steady-state allocation behavior in long-running workloads such as Blender, the attribution of performance differences to locality (rather than reduced overhead) is not isolated. This is load-bearing for the central claim that results extend Berger et al. to modern hardware.
Authors: We agree that correct isolation of locality from allocation overhead is essential to the central claim. The methodology in §5 constructs synthetic patterns from full memory traces captured during steady-state phases of the workloads themselves. For Blender, traces were collected over runs exceeding 1000 frames after discarding the initial warm-up; for Clang, traces cover complete compilations of large codebases. These traces preserve the actual inter-arrival times, sizes, and lifetimes observed in long-running execution. Allocation overhead is measured independently via microbenchmarks on the same allocator and subtracted from observed speedups, leaving the residual attributable to locality. We will revise §5 to include an explicit description of the trace-collection protocol, warm-up discarding criteria, and a comparison of synthetic-pattern statistics against full-run histograms to make the matching to steady-state behavior fully transparent. revision: partial
Circularity Check
No significant circularity; results rest on new empirical measurements
full rationale
The paper conducts fresh performance measurements on modern hardware using an expanded benchmark suite (including Clang and Blender) and a newly introduced fragmentation methodology. Its central claims—that region-based allocators retain locality advantages—are supported directly by these experiments rather than by fitting parameters to prior data, renaming known results, or load-bearing self-citations. The reference to the 25-year-old Berger et al. work is contextual background, not a premise that the new results reduce to by construction. No equations or derivations are presented that equate outputs to inputs; the work is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Benchmarks including Clang and Blender are representative of workloads that would benefit from custom allocation.
Reference graph
Works this paper leans on
-
[1]
Emery D. Berger, Kathryn S. McKinley, Robert D. Blumofe, and Paul R. Wilson. 2000. Hoard: a scalable memory allocator for multi- threaded applications. InProceedings of the Ninth International Con- ference on Architectural Support for Programming Languages and Op- erating Systems(Cambridge, Massachusetts, USA)(ASPLOS IX). As- sociation for Computing Machi...
-
[2]
Emery D. Berger, Benjamin G. Zorn, and Kathryn S. McKinley. 2002. Reconsidering Custom Memory Allocation. InProceedings of the 17th ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications(Seattle, Washington, USA)(OOPSLA ’02). Association for Computing Machinery, New York, NY, USA, 1–12. doi:10.1145/582419.582421
-
[3]
Blender 1994.Blender - The Free and Open Source 3D Creation Software. Blender. Retrieved April 3, 2026 fromhttps://blender.org/
work page 1994
-
[4]
LLVM 2010.Clang: A C Language Family Frontend for LLVM. LLVM. Retrieved February 2, 2026 fromhttps://clang.llvm.org
work page 2010
-
[5]
Jason Evans. 2006. A Scalable Concurrent malloc(3) Implementation for FreeBSD.BSDCan(2006)
work page 2006
-
[6]
David Gay and Alex Aiken. 1998. Memory management with explicit regions. InProceedings of the ACM SIGPLAN 1998 Conference on Pro- gramming Language Design and Implementation(Montreal, Quebec, Canada)(PLDI ’98). Association for Computing Machinery, New York, NY, USA, 313–323. doi:10.1145/277650.277748
-
[7]
David Gay and Alex Aiken. 2001. Language support for regions. In Proceedings of the ACM SIGPLAN 2001 Conference on Programming Language Design and Implementation(Snowbird, Utah, USA)(PLDI ’01). Association for Computing Machinery, New York, NY, USA, 70–80. doi:10.1145/378795.378815
-
[8]
Dan Grossman, Greg Morrisett, Trevor Jim, Michael Hicks, Yanling Wang, and James Cheney. 2002. Region-based memory management in cyclone. InProceedings of the ACM SIGPLAN 2002 Conference on Programming Language Design and Implementation(Berlin, Germany) (PLDI ’02). Association for Computing Machinery, New York, NY, USA, 282–293. doi:10.1145/512529.512563
-
[9]
Pablo Halpern. 2023. Making C++ Software Allocator Aware. C++ Standards Committee Paper P2127.https://wg21.link/P2127
work page 2023
-
[10]
Pablo Halpern. 2024. Policies for Using Allocators in Library Classes. C++ Standards Committee Paper P3002.https://wg21.link/P3002
work page 2024
-
[11]
Pablo Halpern and Dietmar Kühl. 2019. polymorphic_allocator<> as a Vocabulary Type. C++ Standards Committee Paper P0339.https: //wg21.link/P0339
work page 2019
-
[12]
Pablo Halpern and John Lakos. 2020. Unleashing the Power of Allocator-Aware Software Infrastructure. C++ Standards Commit- tee Paper P2126.https://wg21.link/P2126
work page 2020
-
[13]
Pablo Halpern and John Lakos. 2020. Value Proposition: Allocator- Aware (AA) Software. C++ Standards Committee Paper P2035.https: //wg21.link/P2035
work page 2020
-
[14]
Matthew Hertz and Emery D. Berger. 2005. Quantifying the perfor- mance of garbage collection vs. explicit memory management. In Proceedings of the 20th Annual ACM SIGPLAN Conference on Object- Oriented Programming, Systems, Languages, and Applications(San Diego, CA, USA)(OOPSLA ’05). Association for Computing Machinery, New York, NY, USA, 313–326. doi:10....
-
[15]
John Lakos, Jeffrey Mendelsohn, Alisdair Meredith, and Nathan Myers
-
[16]
C++ Standards Committee Paper N4468.https://wg21.link/N4468
On Quantifying Memory-Allocation Strategies. C++ Standards Committee Paper N4468.https://wg21.link/N4468
-
[17]
Chris Lattner and Vikram Adve. 2004. LLVM: A Compilation Frame- work for Lifelong Program Analysis & Transformation. InProceedings of the International Symposium on Code Generation and Optimization: Feedback-Directed and Runtime Optimization(Palo Alto, California) (CGO ’04). IEEE Computer Society, USA, 75
work page 2004
-
[18]
Doug Lea. 1996. A Memory Allocator.http://gee.cs.oswego.edu/dl/ html/malloc.html
work page 1996
-
[19]
2025.Flow Wins Best Animated Feature Oscar
Benjamin Lee. 2025.Flow Wins Best Animated Feature Oscar. The Guardian. Retrieved March 16, 2026 fromhttps://theguardian.com/ film/2025/mar/03/oscar-flow-best-animated-feature
work page 2025
-
[20]
2019.Mimal- loc: Free List Sharding in Action
Daan Leijen, Benjamin Zorn, and Leonardo de Moura. 2019.Mimal- loc: Free List Sharding in Action. Technical Report MSR-TR-2019-18. Microsoft Research
work page 2019
-
[21]
Heax: An architecture for computing on encrypted data,
Martin Maas, David G. Andersen, Michael Isard, Mohammad Mahdi Javanmard, Kathryn S. McKinley, and Colin Raffel. 2020. Learning- based Memory Allocation for C++ Server Workloads. InProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems(Lausanne, Switzer- land)(ASPLOS ’20). Associati...
- [22]
-
[23]
Bobby Powers, David Tench, Emery D. Berger, and Andrew McGregor
-
[24]
Mesh: compacting memory management for C/C++ applications. InProceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation(Phoenix, AZ, USA)(PLDI 2019). Association for Computing Machinery, New York, NY, USA, 333–346. doi:10.1145/3314221.3314582
-
[25]
SPEC. 1999. SPEC CPU 2000 v1.3. Retrieved April 3, 2026 from https://www.spec.org/cpu2000/
work page 1999
-
[26]
Mads Tofte and Jean-Pierre Talpin. 1997. Region-based Memory Man- agement.Inf. Comput.132, 2 (1997), 109–176. doi:10.1006/INCO.1996. 2613
- [27]
-
[28]
Dynamic Storage Allocation: A Survey and Critical Review. In Memory Management, International Workshop IWMM 95, Kinross, UK, September 27-29, 1995, Proceedings (Lecture Notes in Computer Science), Henry G. Baker (Ed.). Springer, 1–116. doi:10.1007/3-540-60368-9_19
-
[29]
Wm. A. Wulf and Sally A. McKee. 1995. Hitting the memory wall: implications of the obvious.SIGARCH Comput. Archit. News23, 1 (March 1995), 20–24. doi:10.1145/216585.216588
-
[30]
Zhuangzhuang Zhou, Vaibhav Gogte, Nilay Vaish, Chris Kennelly, Patrick Xia, Svilen Kanev, Tipp Moseley, Christina Delimitrou, and Parthasarathy Ranganathan. 2024. Characterizing a Memory Alloca- tor at Warehouse Scale. InProceedings of the 29th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 3(...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.