Cleaning up the Mess: Re-Evaluating the Real-System Modeling Accuracy of Ramulator 2.0
Pith reviewed 2026-05-18 05:57 UTC · model grok-4.3
The pith
Ramulator 2.0 matches real memory system performance once configuration errors are fixed.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By correcting multiple configuration errors in the Ramulator 2.0 setup used by the Mess paper, the simulated memory performance becomes similar to real-system characteristics. The authors demonstrate that the previously reported discrepancies were not due to inherent simulation inaccuracy but to misconfigurations, and they identify wrong simulation statistics used for DAMOV results as well.
What carries the argument
Reconfiguration of Ramulator 2.0 parameters to match the intended real-system comparison, along with proper interpretation of simulation statistics.
If this is right
- Ramulator 2.0 can accurately model real memory system performance with correct configurations.
- The key claim of the Mess paper regarding simulator inaccuracy is incorrect.
- DAMOV simulations in the Mess paper relied on unrelated statistics rather than DRAM performance metrics.
- The Mess paper's artifact is incomplete and cannot fully reproduce its results.
- Community should adopt the corrected results to avoid propagating inaccurate findings.
Where Pith is reading between the lines
- Careful verification of simulator configurations is necessary to draw reliable conclusions about their accuracy.
- This work highlights the importance of complete and documented artifacts for reproducibility in computer architecture research.
- Future studies comparing simulators to real systems should provide exact configuration details to prevent similar discrepancies.
Load-bearing premise
The specific configuration errors identified account for the discrepancies, and the corrected setups match what the Mess paper intended to compare against real hardware.
What would settle it
Executing Ramulator 2.0 using the corrected configuration files on the benchmarks from the Mess paper and directly comparing the resulting performance numbers to the real hardware data.
Figures
read the original abstract
A MICRO 2024 best paper runner-up publication (the Mess paper) with all three artifact badges awarded (including ``Reproducible'') proposes a new benchmark to evaluate real and simulated memory system performance. The publication contends that Ramulator 2.0 and DAMOV (ZSim+Ramulator) (along with other existing memory system simulators) ``poorly resemble the actual system performance'' and asserts that their simulator is better. In this paper, we show that the Mess paper has 1) demonstrable technical misconfigurations, 2) methodological errors in interpreting simulation statistics, and 3) an incomplete artifact that makes its key results irreproducible. We demonstrate that the Ramulator 2.0 simulation results reported in the Mess paper are incorrect due to multiple configuration errors instead of inherent simulation inaccuracy claimed by the Mess paper. We show that by correctly configuring Ramulator 2.0, Ramulator 2.0's simulated memory system performance actually resembles real system characteristics well, and thus a key claimed contribution of the Mess paper is factually incorrect. We also identify that the DAMOV simulation results in the Mess paper use wrong simulation statistics that are unrelated to the simulated DRAM performance. We show that DAMOV's simulated DRAM latency is not constant, in contrast to the Mess paper's claim. Moreover, the Mess paper's artifact repository lacks the necessary sources to fully reproduce all the Mess paper's results. We find that the experiment scripts use simulator executables and other resources that are neither described in the Mess paper nor found in the artifact repository. We strongly encourage the computer architecture community to consider our corrections to the Ramulator 2.0 and DAMOV results of the Mess paper to prevent the propagation of inaccurate and misleading results and to maintain the reliability of the scientific record.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript re-evaluates claims from the Mess paper (MICRO 2024 best-paper runner-up with all artifact badges) that Ramulator 2.0 and DAMOV poorly resemble real-system memory performance. It identifies technical misconfigurations, errors in interpreting simulation statistics, and an incomplete artifact that renders key results irreproducible. The authors show that correctly configured Ramulator 2.0 produces performance characteristics that resemble real hardware well, and that DAMOV DRAM latency is not constant as claimed; they conclude that the Mess paper's inaccuracy assertions are factually incorrect.
Significance. If the identified configuration errors are the primary source of the reported discrepancies and the corrected runs faithfully match the Mess paper's intended real-system comparison setup (including hardware models, workloads, and measurement points), the work is significant: it corrects the record on an award-winning paper with reproducibility badges and underscores the need for precise simulator configuration and complete artifacts in computer-architecture research.
major comments (2)
- [Abstract] Abstract and § on Ramulator 2.0 corrections: the central claim that 'multiple configuration errors' (rather than inherent simulator inaccuracy) explain the Mess paper discrepancies requires an explicit side-by-side parameter table listing the Mess paper's DRAM timing, memory-controller, and statistic choices versus the corrected values used here, to confirm the fixes restore the exact experimental conditions rather than substitute different modeling decisions.
- [DAMOV results] Section on DAMOV results: the assertion that 'wrong simulation statistics that are unrelated to the simulated DRAM performance' were used needs to name the specific incorrect statistics (e.g., which latency or bandwidth counters) and demonstrate quantitatively how they differ from the correct DRAM latency measurements that the authors obtain.
minor comments (2)
- Add a dedicated reproducibility subsection that enumerates every missing source or executable referenced in the Mess paper's experiment scripts but absent from the artifact repository.
- Clarify the exact workload binaries and measurement points employed in the corrected Ramulator 2.0 runs so readers can verify equivalence to the Mess paper's real-system baseline.
Simulated Author's Rebuttal
Thank you for your constructive review and recommendation for major revision. We value the opportunity to strengthen the clarity of our corrections to the Mess paper's results on Ramulator 2.0 and DAMOV. We address each major comment below and will incorporate the requested details in the revised manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and § on Ramulator 2.0 corrections: the central claim that 'multiple configuration errors' (rather than inherent simulator inaccuracy) explain the Mess paper discrepancies requires an explicit side-by-side parameter table listing the Mess paper's DRAM timing, memory-controller, and statistic choices versus the corrected values used here, to confirm the fixes restore the exact experimental conditions rather than substitute different modeling decisions.
Authors: We agree that an explicit side-by-side parameter table will improve transparency. In the revised manuscript, we will add such a table in the Ramulator 2.0 corrections section. It will list the Mess paper's DRAM timing parameters (e.g., tCL, tRCD, tRP), memory controller configurations, and statistics collected, directly compared to the corrected values we employed. This will confirm that our adjustments address the identified misconfigurations while aligning with the original experimental conditions and workloads. revision: yes
-
Referee: [DAMOV results] Section on DAMOV results: the assertion that 'wrong simulation statistics that are unrelated to the simulated DRAM performance' were used needs to name the specific incorrect statistics (e.g., which latency or bandwidth counters) and demonstrate quantitatively how they differ from the correct DRAM latency measurements that the authors obtain.
Authors: We thank the referee for highlighting the need for specificity. The Mess paper's DAMOV results relied on an aggregate 'memory latency' counter that includes non-DRAM components such as CPU pipeline effects and LLC interactions, rather than the simulator's direct DRAM latency counters (e.g., 'DRAM read latency' or row-buffer hit/miss latency). Quantitatively, the incorrect statistic produces a near-constant latency of ~110 ns across workloads, whereas the correct DRAM-specific measurements vary from ~55 ns (row-buffer hits) to over 180 ns (misses), better matching real hardware. We will revise the DAMOV section to name these counters explicitly and include a quantitative comparison table or figure. revision: yes
Circularity Check
No significant circularity; empirical re-evaluation of simulator configurations against real hardware
full rationale
The paper identifies configuration errors in the prior Mess paper and demonstrates improved fidelity by re-running Ramulator 2.0 with corrected settings. This rests on direct comparison of simulation outputs to real-system measurements and artifact inspection, not on any self-referential equations, fitted parameters renamed as predictions, or load-bearing self-citations that reduce the central claim to its own inputs. No derivation chain exists that collapses by construction; the argument is falsifiable via independent reproduction of the corrected runs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We identify two fundamental configuration errors in the Mess paper’s Ramulator 2.0 simulations... single-channel DDR5... unrealistically low latency configurations of the SimpleO3 frontend
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the Mess paper uses different Mess benchmark methodologies for real systems and simulators without disclosing this difference
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison
SPEC CPU2026 increases instruction volume and memory footprint while shifting pressure to instruction-cache bottlenecks; 4-5 workload subsets per group preserve 96.4-99.9% of full-suite behavior and show complementary...
-
SPEC CPU2026: Characterization, Representativeness, and Cross-Suite Comparison
SPEC CPU2026 raises instruction volume and memory demands while shifting pressure to instruction caches; 4-5 workload subsets per group preserve 96.4-99.9% of full-suite microarchitectural behavior and better approxim...
Reference graph
Works this paper leans on
-
[1]
A Mess of Memory System Benchmarking, Simulation and Application Profiling
Pouya Esmaili-Dokht, Francesco Sgherzi, Valéria Soldera Girelli, Isaac Boixaderas, Mariana Carmin, Alireza Monemi, Adrià Armejach, Estanislao Mercadal, Germán Llort, Petar Radojković, Miquel Moreto, Judit Giménez, Xavier Martorell, Eduard Ayguadé, Jesus Labarta, Emanuele Confalonieri, Rishabh Dubey, and Jason Adlard. A Mess of Memory System Benchmarking, ...
work page 2024
-
[2]
Nisa Bostanci, Ataberk Olgun, A
Haocong Luo, Yahya Can Tugrul, F. Nisa Bostanci, Ataberk Olgun, A. Giray Yaglikci, and Onur Mutlu. Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simula- tor.IEEE CAL, 2024
work page 2024
-
[3]
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
SAFARI Research Group. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks. https://github.com/CMU-SAFARI/ DAMOV
-
[4]
Esmaili-Dokht, Pouya. A Mess of Memory System Benchmarking, Simulation and Application Profiling.https://zenodo.org/records/13748674, 2024
-
[5]
Memory systems for HPC and AI @BSC. Mess benchmark. https://github.com/ bsc-mem/Mess-benchmark, 2024
work page 2024
-
[6]
SAFARI Research Group. Ramulator 2.0. https://github.com/CMU-SAFARI/ ramulator2
-
[7]
Ramulator: A Fast and Extensible DRAM Simulator.IEEE CAL, 2015
Yoongu Kim, Weikun Yang, and Onur Mutlu. Ramulator: A Fast and Extensible DRAM Simulator.IEEE CAL, 2015
work page 2015
- [8]
-
[9]
DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks
Geraldo F Oliveira, Juan Gómez-Luna, Lois Orosa, Saugata Ghose, Nandita Vijayku- mar, Ivan Fernandez, Mohammad Sadrosadati, and Onur Mutlu. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks. IEEE Access, 2021
work page 2021
-
[10]
Microbenchmarks for Detailed Validation and Tuning of Hardware Simulators
Rommel Sánchez Verdejo and Petar Radojkovic. Microbenchmarks for Detailed Validation and Tuning of Hardware Simulators. InHPCS, 2017
work page 2017
-
[11]
Main Memory Latency Simulation: The Missing Link
Rommel Sánchez Verdejo, Kazi Asifuzzaman, Milan Radulovic, Petar Radojković, Eduard Ayguadé, and Bruce Jacob. Main Memory Latency Simulation: The Missing Link. InMEMSYS, 2018
work page 2018
-
[12]
JEDEC.JESD79-5C: DDR5 SDRAM Standard, 2024
work page 2024
-
[13]
Ramulator 2.0 – Mess benchmark
SAFARI Research Group. Ramulator 2.0 – Mess benchmark. https://github.com/ CMU-SAFARI/ramulator2/tree/mess
-
[14]
Scott Rixner, William J. Dally, Ujval J. Kapasi, Peter Mattson, and John D. Owens. Memory access scheduling. InISCA, 2000
work page 2000
-
[15]
William K Zuravleff and Timothy Robinson. Controller for a synchronous DRAM that maximizes throughput by allowing memory requests and commands to be issued out of order. US Patent 5,630,096, 1997
work page 1997
-
[16]
DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems
Chang Joo Lee, Veynu Narasiman, Eiman Ebrahimi, Onur Mutlu, and Yale N Patt. DRAM-Aware Last-Level Cache Writeback: Reducing Write-Caused Interference in Memory Systems. Technical Report HPS-2010-002, 2010
work page 2010
-
[17]
ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems
Daniel Sanchez and Christos Kozyrakis. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. InISCA, 2013
work page 2013
-
[18]
Memory systems for HPC and AI @BSC. Mess simulator. https://github.com/ bsc-mem/Mess-simulator, 2025. 6
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.