Different Perspectives of Memory System Simulation

Adrian Cristal; Arash Yadegari; Eduard Ayguade; Julian Pavon; Petar Radojkovic; Pouya Esmaili-Dokht; Victor Xirau

arxiv: 2604.16965 · v1 · submitted 2026-04-18 · 💻 cs.AR

Different Perspectives of Memory System Simulation

Pouya Esmaili-Dokht , Arash Yadegari , Victor Xirau , Julian Pavon , Adrian Cristal , Eduard Ayguade , Petar Radojkovic This is my paper

Pith reviewed 2026-05-10 06:54 UTC · model grok-4.3

classification 💻 cs.AR

keywords memory simulationCPU-memory interfacesimulator accuracyRamulatorDRAMsim3performance validationmemory systemssimulation discrepancies

0 comments

The pith

Memory simulator inaccuracies arise mainly from the CPU-memory interface, not the core simulator logic.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Memory simulators frequently produce performance numbers that diverge from real hardware runs. The paper checks this mismatch by measuring memory behavior from three angles at once: what the simulator itself reports internally, how the CPU talks to the memory controller, and what the application actually experiences. These three views often disagree sharply, and the CPU-memory interface turns out to be the biggest source of error. The authors apply targeted fixes to this interface inside Ramulator, Ramulator 2, and DRAMsim3 running under ZSim, and the revised simulators track hardware measurements much more closely. Reliable simulation matters because it lets architects test new memory designs without first building costly prototypes.

Core claim

Evaluating memory performance through the combined lenses of the memory simulator, the CPU-memory interface, and the application shows that these perspectives can diverge substantially, with application-level results often decoupled from internal simulator statistics. The CPU-memory interface is the dominant source of the observed inaccuracies. Implementing a set of corrections and enhancements at this interface in integrated simulators improves fidelity, producing outcomes that more closely match actual system performance across the tested tools and workloads.

What carries the argument

Three-perspective evaluation methodology that cross-checks simulator statistics against CPU-memory interface events and application-level performance metrics to isolate discrepancy sources.

If this is right

Simulators must model CPU-memory interface timing and queuing accurately before their internal DRAM statistics can be trusted for performance prediction.
Application-level speedups reported by simulators will align better with hardware once interface mismatches are removed.
Validation of future memory simulators should routinely include side-by-side comparison of all three perspectives rather than relying on any single metric.
Architectural studies that used uncorrected versions of Ramulator or DRAMsim3 may have drawn incorrect conclusions about memory-bound workload scaling.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same multi-perspective check could be adapted to validate cache or interconnect simulators where similar hidden interface mismatches may exist.
Memory technology papers that rely on simulation should now include explicit interface-fidelity measurements before claiming performance gains.
Past published speedups for new DRAM organizations may need re-evaluation if the original studies used simulators with uncorrected CPU-memory interfaces.

Load-bearing premise

That the three selected perspectives are enough to find every major cause of inaccuracy and that interface problems are the dominant, fixable driver in the simulators and workloads examined.

What would settle it

Run a workload that previously showed large simulator-to-hardware gaps, apply only the interface corrections, and measure whether the performance delta to real hardware shrinks to near zero while internal simulator statistics remain largely unchanged.

Figures

Figures reproduced from arXiv: 2604.16965 by Adrian Cristal, Arash Yadegari, Eduard Ayguade, Julian Pavon, Petar Radojkovic, Pouya Esmaili-Dokht, Victor Xirau.

**Figure 6.** Figure 6: Close-to-hardware accuracy of memory simulation also requires correct address mappings, detailed network-on-chip models and data prefetchers. the memory performance and reach the saturation point sooner. The higher the percentage of writes, the higher the performance impact, and the actual system follows a clear gradient from the lightest (100%-read) to darker memory curves, as seen in Fig. 2a. This trend … view at source ↗

**Figure 8.** Figure 8: Top-level structure of the artifact repository. [PITH_FULL_IMAGE:figures/full_fig_p006_8.png] view at source ↗

read the original abstract

Memory simulators are used to estimate application performance on advanced memory systems, yet they may exhibit significant discrepancies compared to real hardware. This paper investigates two key questions: (1) what causes these inaccuracies, and (2) how can simulators be properly validated to ensure reliable performance predictions. We propose a methodology that evaluates memory performance from three complementary perspectives: the memory simulator, the CPU-memory interface, and the application. Our analysis reveals that these perspectives can diverge substantially, with application-level performance often decoupled from internal simulator statistics. We identify the CPU-memory interface as the primary source of these inaccuracies. To address these problems, we implement a set of corrections and enhancements that improve the fidelity of integrated simulators. We evaluate these changes across multiple widely used simulators, including Ramulator, Ramulator 2, and DRAMsim3 integrated with ZSim. The results show that correcting interface-related issues is essential to achieve simulation outcomes that closely resemble actual system performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper investigates discrepancies between memory simulators (Ramulator, Ramulator 2, DRAMsim3+ZSim) and real hardware. It proposes evaluating memory performance from three perspectives—memory simulator internals, CPU-memory interface, and application-level behavior—to diagnose causes. The central claim is that the CPU-memory interface is the primary inaccuracy source; the authors implement interface corrections and report that these changes produce simulation results closer to hardware.

Significance. If the empirical findings hold with proper controls and quantitative attribution, the work would be useful for the computer-architecture simulation community by offering a diagnostic framework and concrete fixes for a common validation problem. The three-perspective lens is a constructive contribution even if not exhaustive.

major comments (3)

[Abstract and §3] Abstract and §3 (Methodology): the claim that the CPU-memory interface is the dominant source of discrepancy is load-bearing yet unsupported by any quantitative breakdown (e.g., fraction of total error attributable to interface queuing/handshakes versus DRAM timing models or CPU artifacts). Without such attribution or ablation across the tested simulators and workloads, the identification of the interface as “primary” cannot be evaluated.
[§4] §4 (Evaluation): the manuscript states that corrections improve fidelity but supplies no before/after error metrics, workload list, hardware platform details, or statistical tests. This absence prevents assessment of whether the reported improvements are consistent or generalizable.
[§3] §3: the assumption that the three chosen perspectives suffice to isolate interface effects from simulator model errors or measurement variance is not justified by controls or sensitivity analysis; the skeptic concern that other factors could dominate therefore remains unaddressed.

minor comments (1)

[Abstract] Abstract: the phrase “a set of corrections and enhancements” is vague; listing the specific interface changes (e.g., request queuing, timing handshake adjustments) would improve clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where additional evidence and detail will strengthen the manuscript. We address each major comment below and will incorporate revisions to provide the requested quantitative support, evaluation details, and methodological justification.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Methodology): the claim that the CPU-memory interface is the dominant source of discrepancy is load-bearing yet unsupported by any quantitative breakdown (e.g., fraction of total error attributable to interface queuing/handshakes versus DRAM timing models or CPU artifacts). Without such attribution or ablation across the tested simulators and workloads, the identification of the interface as “primary” cannot be evaluated.

Authors: We agree that an explicit quantitative attribution strengthens the central claim. Our analysis demonstrates divergence between perspectives and shows that interface corrections reduce discrepancies with hardware, but we did not present a formal breakdown or ablation isolating interface contributions from DRAM timing or CPU effects. In the revision we will add an error attribution analysis and ablation study across the evaluated simulators and workloads, using the collected data to quantify the relative impact of interface mismatches. revision: yes
Referee: [§4] §4 (Evaluation): the manuscript states that corrections improve fidelity but supplies no before/after error metrics, workload list, hardware platform details, or statistical tests. This absence prevents assessment of whether the reported improvements are consistent or generalizable.

Authors: The referee correctly notes the lack of detailed metrics. The original evaluation emphasized the overall methodology and high-level outcomes rather than exhaustive numerical results. We will revise §4 to include before-and-after error metrics (e.g., relative differences in latency and throughput), the complete workload list, hardware platform specifications used for validation, and statistical tests to evaluate consistency and generalizability of the fidelity gains. revision: yes
Referee: [§3] §3: the assumption that the three chosen perspectives suffice to isolate interface effects from simulator model errors or measurement variance is not justified by controls or sensitivity analysis; the skeptic concern that other factors could dominate therefore remains unaddressed.

Authors: The three perspectives were chosen to provide complementary diagnostic views that together highlight interface issues. We acknowledge that explicit controls and sensitivity analysis are needed to justify their sufficiency. The revised §3 will expand the methodology discussion with sensitivity analysis on key parameters, controls for measurement variance, and an explicit treatment of limitations to address potential confounding factors. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison of simulators to hardware

full rationale

The paper conducts an empirical investigation by measuring discrepancies between memory simulators (Ramulator, Ramulator 2, DRAMsim3+ZSim) and real hardware across three perspectives. No equations, fitted parameters, derivations, or predictions that reduce to inputs by construction appear in the abstract or described methodology. Claims rest on direct experimental observations and proposed interface corrections validated externally against hardware, not on self-definitions, self-citations as load-bearing premises, or renamed known results. The analysis is therefore self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The paper is an empirical validation study; the abstract mentions no mathematical derivations, fitted constants, or newly postulated entities.

pith-pipeline@v0.9.0 · 5480 in / 992 out tokens · 36458 ms · 2026-05-10T06:54:42.907063+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

DRAMSys4.0: An open-source simulation framework for in-depth DRAM Analyses.International Journal of Parallel Programming, 2022

Lukas Steiner et al. DRAMSys4.0: An open-source simulation framework for in-depth DRAM Analyses.International Journal of Parallel Programming, 2022. 0 20 40 60 80 100 120 Used Memory bandwidth [GB/s] 0 100 200 300 400 500Memory access latency [ns] Max. theoretical BW = 128 GB/s Copy Scale Add Triad Rd:Wr 50:50 Rd:Wr 100:0 0 20 40 60 80 100 120 Used Memory...

work page 2022
[2]

DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator.IEEE CAL, 2020

Shang Li et al. DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator.IEEE CAL, 2020

work page 2020
[3]

Ramulator: A Fast and Extensible DRAM Simulator

Yoongu Kim et al. Ramulator: A Fast and Extensible DRAM Simulator. InIEEE CAL, 2016

work page 2016
[4]

Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator.IEEE CAL, 2023

Haocong Luo et al. Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator.IEEE CAL, 2023

work page 2023
[5]

A Mess of Memory System Benchmarking, Simulation and Application Profiling

Pouya Esmaili-Dokht et al. A Mess of Memory System Benchmarking, Simulation and Application Profiling. InMICRO, 2024

work page 2024
[6]

https://github.com/bsc-mem/ZSim-mem-Interface, 2026

work page 2026
[7]

ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Daniel Sanchez and Christos Kozyrakis. ZSim: fast and accurate microarchitectural simulation of thousand-core systems. InISCA, 2013

work page 2013
[8]

O(n) Key–value Sort with Active Compute Memory.IEEE Transactions on Computers, 2024

Pouya Esmaili-Dokht et al. O(n) Key–value Sort with Active Compute Memory.IEEE Transactions on Computers, 2024

work page 2024
[9]

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.IEEE Access, 2021

Geraldo F Oliveira et al. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.IEEE Access, 2021

work page 2021
[10]

https://www.bsc.es/supportkc/ docs/MareNostrum4/overview/, 2017

MareNostrum 4 System Overview. https://www.bsc.es/supportkc/ docs/MareNostrum4/overview/, 2017

work page 2017
[11]

https://github.com/CMU-SAFARI/ DAMOV/tree/main/simulator/templates, 2021

DAMOV Simulator Templates. https://github.com/CMU-SAFARI/ DAMOV/tree/main/simulator/templates, 2021. Accessed: 2026-03-31

work page 2021
[12]

Rethinking Cycle Accurate DRAM Simulation

Shang Li et al. Rethinking Cycle Accurate DRAM Simulation. In MEMSYS, 2019

work page 2019
[13]

Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model.IEEE CAL, 2021

Stijn Eyerman et al. Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model.IEEE CAL, 2021

work page 2021
[14]

McCalpin

John D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. https://www.cs.virginia.edu/stream/., 2022

work page 2022
[15]

G Franklin et al.Feedback Control Of Dynamic Systems. 1994

work page 1994
[16]

DRAMDig: a knowledge-assisted tool to uncover DRAM address mapping

Minghua Wang et al. DRAMDig: a knowledge-assisted tool to uncover DRAM address mapping. InDAC, 2020

work page 2020
[17]

Reverse Engineering the Intel Cascade Lake Mesh Interconnect

Miles Dai. Reverse Engineering the Intel Cascade Lake Mesh Interconnect. Master of engineering in electrical engineering and computer science, Massachusetts Institute of Technology, 2021

work page 2021
[18]

Knights landing: Second-generation Intel Xeon Phi product.IEEE MICRO, 2016

Avinash Sodani et al. Knights landing: Second-generation Intel Xeon Phi product.IEEE MICRO, 2016

work page 2016
[19]

McCalpin

John D. McCalpin. Mapping Core and L3 Slice Numbering to Die Location in Intel Xeon Scalable Processors. Technical report, 2021

work page 2021
[20]

Simulating DRAM controllers for future system architecture exploration

Andreas Hansson et al. Simulating DRAM controllers for future system architecture exploration. InISPASS, 2014. 5 Artifact Appendix

work page 2014
[21]

We also provide the 00-damov-native experiment to demonstrate that the inaccuracies identified in this paper also exist in the original DAMOV platform

Abstract This artifact includes the source code and data required to replicate all experiments conducted in our study. We also provide the 00-damov-native experiment to demonstrate that the inaccuracies identified in this paper also exist in the original DAMOV platform. This artifact enables readers to understand how the results were obtained, reproduce t...

work page
[22]

• Compilation:GCC 11 or later (C++20 required by Ramulator2),scons,make, and Python 3

Artifact check-list (meta-information) • Program:ZSim-based CPU–memory simulation platform with Ramulator, Ramulator2, and DRAMsim3 backends; pointer-chasing and traffic-generation benchmarks. • Compilation:GCC 11 or later (C++20 required by Ramulator2),scons,make, and Python 3. • Data set:Committed processed CSV and PDF outputs for all figure-producing s...

work page doi:10.5281/zenodo.19629351
[23]

All experiment stages share the same simulator sources and benchmarks; stages differ only in their sb.cfg configuration and a small number of stage-specific overrides

Description The artifact is organized around the refinement sequence presented in the paper. All experiment stages share the same simulator sources and benchmarks; stages differ only in their sb.cfg configuration and a small number of stage-specific overrides. This design makes it possible to compare intermediate states directly without duplicating the co...

work page
[24]

11 12./scripts/build−benchmarks.sh This preparation step is sufficient to inspect committed results and to regenerate figures from an available raw bw-lattree

Installation After cloning the artifact repository, create a local .zsim-env file with the required dependency paths, source it, build ZSim, and build the benchmarks: 1git clone https://github.com/bsc−mem/ZSim−mem−Interface.git 2cd Zsim−mem−Interface 3 4# edit .zsim−env to define PINPATH, HDF5_HOME, 5# DRAMSIM3PATH, RAMULATORPATH, and RAMULATOR2PATH 6sour...

work page
[25]

Experiment workflow The default artifact workflow is stage-based. Reviewers can first inspect the committed outputs under each stage’s processed/ and figures/ directories, then regenerate the same outputs from a raw bw-lat tree, and finally compare stages. For example, the baseline stage corresponding to Figure 2 can be exercised as follows: 1# s u m m a ...

work page
[26]

sh 01−b a s e l i n e 3 4# r e g e n e r a t e p r o c e s s e d o u t p u t s and p l o t s

/ s c r i p t s / reproduce−paper−r e s u l t s . sh 01−b a s e l i n e 3 4# r e g e n e r a t e p r o c e s s e d o u t p u t s and p l o t s

work page
[27]

/ e x p e r i m e n t s / p l o t . py . / raw−r e s u l t s /01−b a s e l i n e / bw−l a t \ 6−−c o n f i g−d i r . / e x p e r i m e n t s /01−b a s e l i n e 7 8# c o m p a r e t h e b a s e l i n e a g a i n s t t h e model−c o r r e c t s t a g e

work page
[28]

sh 01−b a s e l i n e 04−model−c o r r e c t The plotter writes regenerated files under test-output/ by default, so it does not overwrite committed outputs

/ s c r i p t s / compare−r e s u l t s . sh 01−b a s e l i n e 04−model−c o r r e c t The plotter writes regenerated files under test-output/ by default, so it does not overwrite committed outputs. A full stage rerun is started with runner.sh inside experiments/; it expands the parameter sweep and invokes run-one.shfor each point

work page
[29]

Evaluation and expected results The main validation criterion is that regenerated outputs match the committed ones. For the baseline example, the CSV and plots under test-output/01-baseline/ should match those under experiments/01-baseline/, and the figures should reproduce the three views from Figure 2. Stages can also be compared directly. compare-resul...

work page
[30]

This makes the address-mapping stage fully reproducible without code modifications

Notes All interface refinements are controlled through ZSim configuration files, except the Skylake-specific address mapping used in Figure 6a, which is selected through a dedicated Ramulator configuration file rather than through manual source editing. This makes the address-mapping stage fully reproducible without code modifications. 7

work page

[1] [1]

DRAMSys4.0: An open-source simulation framework for in-depth DRAM Analyses.International Journal of Parallel Programming, 2022

Lukas Steiner et al. DRAMSys4.0: An open-source simulation framework for in-depth DRAM Analyses.International Journal of Parallel Programming, 2022. 0 20 40 60 80 100 120 Used Memory bandwidth [GB/s] 0 100 200 300 400 500Memory access latency [ns] Max. theoretical BW = 128 GB/s Copy Scale Add Triad Rd:Wr 50:50 Rd:Wr 100:0 0 20 40 60 80 100 120 Used Memory...

work page 2022

[2] [2]

DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator.IEEE CAL, 2020

Shang Li et al. DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator.IEEE CAL, 2020

work page 2020

[3] [3]

Ramulator: A Fast and Extensible DRAM Simulator

Yoongu Kim et al. Ramulator: A Fast and Extensible DRAM Simulator. InIEEE CAL, 2016

work page 2016

[4] [4]

Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator.IEEE CAL, 2023

Haocong Luo et al. Ramulator 2.0: A Modern, Modular, and Extensible DRAM Simulator.IEEE CAL, 2023

work page 2023

[5] [5]

A Mess of Memory System Benchmarking, Simulation and Application Profiling

Pouya Esmaili-Dokht et al. A Mess of Memory System Benchmarking, Simulation and Application Profiling. InMICRO, 2024

work page 2024

[6] [6]

https://github.com/bsc-mem/ZSim-mem-Interface, 2026

work page 2026

[7] [7]

ZSim: fast and accurate microarchitectural simulation of thousand-core systems

Daniel Sanchez and Christos Kozyrakis. ZSim: fast and accurate microarchitectural simulation of thousand-core systems. InISCA, 2013

work page 2013

[8] [8]

O(n) Key–value Sort with Active Compute Memory.IEEE Transactions on Computers, 2024

Pouya Esmaili-Dokht et al. O(n) Key–value Sort with Active Compute Memory.IEEE Transactions on Computers, 2024

work page 2024

[9] [9]

DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.IEEE Access, 2021

Geraldo F Oliveira et al. DAMOV: A New Methodology and Benchmark Suite for Evaluating Data Movement Bottlenecks.IEEE Access, 2021

work page 2021

[10] [10]

https://www.bsc.es/supportkc/ docs/MareNostrum4/overview/, 2017

MareNostrum 4 System Overview. https://www.bsc.es/supportkc/ docs/MareNostrum4/overview/, 2017

work page 2017

[11] [11]

https://github.com/CMU-SAFARI/ DAMOV/tree/main/simulator/templates, 2021

DAMOV Simulator Templates. https://github.com/CMU-SAFARI/ DAMOV/tree/main/simulator/templates, 2021. Accessed: 2026-03-31

work page 2021

[12] [12]

Rethinking Cycle Accurate DRAM Simulation

Shang Li et al. Rethinking Cycle Accurate DRAM Simulation. In MEMSYS, 2019

work page 2019

[13] [13]

Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model.IEEE CAL, 2021

Stijn Eyerman et al. Modeling DRAM Timing in Parallel Simulators With Immediate-Response Memory Model.IEEE CAL, 2021

work page 2021

[14] [14]

McCalpin

John D. McCalpin. STREAM: Sustainable Memory Bandwidth in High Performance Computers. https://www.cs.virginia.edu/stream/., 2022

work page 2022

[15] [15]

G Franklin et al.Feedback Control Of Dynamic Systems. 1994

work page 1994

[16] [16]

DRAMDig: a knowledge-assisted tool to uncover DRAM address mapping

Minghua Wang et al. DRAMDig: a knowledge-assisted tool to uncover DRAM address mapping. InDAC, 2020

work page 2020

[17] [17]

Reverse Engineering the Intel Cascade Lake Mesh Interconnect

Miles Dai. Reverse Engineering the Intel Cascade Lake Mesh Interconnect. Master of engineering in electrical engineering and computer science, Massachusetts Institute of Technology, 2021

work page 2021

[18] [18]

Knights landing: Second-generation Intel Xeon Phi product.IEEE MICRO, 2016

Avinash Sodani et al. Knights landing: Second-generation Intel Xeon Phi product.IEEE MICRO, 2016

work page 2016

[19] [19]

McCalpin

John D. McCalpin. Mapping Core and L3 Slice Numbering to Die Location in Intel Xeon Scalable Processors. Technical report, 2021

work page 2021

[20] [20]

Simulating DRAM controllers for future system architecture exploration

Andreas Hansson et al. Simulating DRAM controllers for future system architecture exploration. InISPASS, 2014. 5 Artifact Appendix

work page 2014

[21] [21]

We also provide the 00-damov-native experiment to demonstrate that the inaccuracies identified in this paper also exist in the original DAMOV platform

Abstract This artifact includes the source code and data required to replicate all experiments conducted in our study. We also provide the 00-damov-native experiment to demonstrate that the inaccuracies identified in this paper also exist in the original DAMOV platform. This artifact enables readers to understand how the results were obtained, reproduce t...

work page

[22] [22]

• Compilation:GCC 11 or later (C++20 required by Ramulator2),scons,make, and Python 3

Artifact check-list (meta-information) • Program:ZSim-based CPU–memory simulation platform with Ramulator, Ramulator2, and DRAMsim3 backends; pointer-chasing and traffic-generation benchmarks. • Compilation:GCC 11 or later (C++20 required by Ramulator2),scons,make, and Python 3. • Data set:Committed processed CSV and PDF outputs for all figure-producing s...

work page doi:10.5281/zenodo.19629351

[23] [23]

All experiment stages share the same simulator sources and benchmarks; stages differ only in their sb.cfg configuration and a small number of stage-specific overrides

Description The artifact is organized around the refinement sequence presented in the paper. All experiment stages share the same simulator sources and benchmarks; stages differ only in their sb.cfg configuration and a small number of stage-specific overrides. This design makes it possible to compare intermediate states directly without duplicating the co...

work page

[24] [24]

11 12./scripts/build−benchmarks.sh This preparation step is sufficient to inspect committed results and to regenerate figures from an available raw bw-lattree

Installation After cloning the artifact repository, create a local .zsim-env file with the required dependency paths, source it, build ZSim, and build the benchmarks: 1git clone https://github.com/bsc−mem/ZSim−mem−Interface.git 2cd Zsim−mem−Interface 3 4# edit .zsim−env to define PINPATH, HDF5_HOME, 5# DRAMSIM3PATH, RAMULATORPATH, and RAMULATOR2PATH 6sour...

work page

[25] [25]

Experiment workflow The default artifact workflow is stage-based. Reviewers can first inspect the committed outputs under each stage’s processed/ and figures/ directories, then regenerate the same outputs from a raw bw-lat tree, and finally compare stages. For example, the baseline stage corresponding to Figure 2 can be exercised as follows: 1# s u m m a ...

work page

[26] [26]

sh 01−b a s e l i n e 3 4# r e g e n e r a t e p r o c e s s e d o u t p u t s and p l o t s

/ s c r i p t s / reproduce−paper−r e s u l t s . sh 01−b a s e l i n e 3 4# r e g e n e r a t e p r o c e s s e d o u t p u t s and p l o t s

work page

[27] [27]

/ e x p e r i m e n t s / p l o t . py . / raw−r e s u l t s /01−b a s e l i n e / bw−l a t \ 6−−c o n f i g−d i r . / e x p e r i m e n t s /01−b a s e l i n e 7 8# c o m p a r e t h e b a s e l i n e a g a i n s t t h e model−c o r r e c t s t a g e

work page

[28] [28]

sh 01−b a s e l i n e 04−model−c o r r e c t The plotter writes regenerated files under test-output/ by default, so it does not overwrite committed outputs

/ s c r i p t s / compare−r e s u l t s . sh 01−b a s e l i n e 04−model−c o r r e c t The plotter writes regenerated files under test-output/ by default, so it does not overwrite committed outputs. A full stage rerun is started with runner.sh inside experiments/; it expands the parameter sweep and invokes run-one.shfor each point

work page

[29] [29]

Evaluation and expected results The main validation criterion is that regenerated outputs match the committed ones. For the baseline example, the CSV and plots under test-output/01-baseline/ should match those under experiments/01-baseline/, and the figures should reproduce the three views from Figure 2. Stages can also be compared directly. compare-resul...

work page

[30] [30]

This makes the address-mapping stage fully reproducible without code modifications

Notes All interface refinements are controlled through ZSim configuration files, except the Skylake-specific address mapping used in Figure 6a, which is selected through a dedicated Ramulator configuration file rather than through manual source editing. This makes the address-mapping stage fully reproducible without code modifications. 7

work page