The gem5 Simulator: Version 20.0+
read the original abstract
The open-source and community-supported gem5 simulator is one of the most popular tools for computer architecture research. This simulation infrastructure allows researchers to model modern computer hardware at the cycle level, and it has enough fidelity to boot unmodified Linux-based operating systems and run full applications for multiple architectures including x86, Arm, and RISC-V. The gem5 simulator has been under active development over the last nine years since the original gem5 release. In this time, there have been over 7500 commits to the codebase from over 250 unique contributors which have improved the simulator by adding new features, fixing bugs, and increasing the code quality. In this paper, we give and overview of gem5's usage and features, describe the current state of the gem5 simulator, and enumerate the major changes since the initial release of gem5. We also discuss how the gem5 simulator has transitioned to a formal governance model to enable continued improvement and community support for the next 20 years of computer architecture research.
This paper has not been read by Pith yet.
Forward citations
Cited by 17 Pith papers
-
Speed Kills: Exploring Confused Deputy Attacks Through Edge AI Accelerators
An empirical security study shows confused deputy attacks are practical on most edge AI accelerators via a new LLM-assisted analysis framework, with vendor-confirmed impact on over 100 million devices.
-
CHIA: An open-source framework for principled, agentic AI-driven hardware/software co-design research
CHIA introduces a framework for building and deploying agentic AI co-design flows as CHIA loops with tool nodes, reliability mechanisms, and five case-study demonstrations.
-
CHIA: An open-source framework for principled, agentic AI-driven hardware/software co-design research
CHIA is an open-source framework for agentic AI-driven hardware/software co-design using CHIA loops as directed cyclic graphs, a tool library, and features for reliable experimentation, shown via five case studies.
-
OpenURMA: A Clean-Room Open Implementation of the Unified Bus Protocol
OpenURMA is the first clean-room open implementation of the Unified Bus transport and transaction layers, showing ~500 ns end-to-end latency for 64-byte remote loads versus 2186 ns for RoCEv2 RC.
-
Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation
Presents scalable packed layouts and extensions to tiling/fusion/vectorization in MLIR/IREE for VLA ML code generation on Arm SVE, achieving up to 1.45x speedup over NEON and outperforming PyTorch frameworks.
-
SPEC CPU: The Next Generation
SPEC CPU 2026 presents a new benchmark suite using open-source apps, expanded multithreading, and Rolling-Round-Robin Rate to address gaps in evaluating heterogeneous multiprogrammed CPU performance.
-
InjectV: Modeling Fault Injection Attacks in RISC-V Simulation Environment
InjectV is a gem5-based framework for precise fault injection in RISC-V that identifies attack points on FISSC security benchmarks with a claimed 95.8% time saving versus traditional methods.
-
Distributed Persistence Domain for Persistent Memory Pooling
Proposes Distributed Persistence Domain and Persistent CXL Switch to enable low-latency persistence operations at CXL switch level while maintaining crash consistency in disaggregated memory.
-
Throughput-Optimized Networks at Scale
TONS uses linear optimization and heuristics to synthesize deadlock-free network topologies and routing for datacenter AI training, reporting 2.1x and 1.6x geometric mean speedups over best TPU torus variants for unif...
-
HammerSim: A System-Level Tool to Model RowHammer
HammerSim is a gem5-based full-system framework for modeling RowHammer with probability-driven bitflip simulation, validated against real DDR4 DIMMs via JS divergence.
-
Scalable Packed Layouts for Vector-Length-Agnostic ML Code Generation
Packed layouts and extensions to tiling/fusion/vectorization in MLIR/IREE enable VLA ML code generation for SVE, achieving up to 1.45x speedup over NEON and outperforming PyTorch frameworks while scaling with vector length.
-
Understanding Simulated Architecture via gem5 Call-Stack Profiling
A specialized profiling tool using Linux perf_event samples gem5 call-stacks to expose simulated architecture behaviors such as TimingSimpleCPU inefficiencies and cache coherence deadlocks not visible in conventional stats.
-
PG-MDP: Profile-Guided Memory Dependence Prediction for Area-Constrained Cores
Profile-guided opcode labeling removes consistently independent loads from the MDP working set, cutting queries 79%, false dependencies 77%, and raising small-core IPC 1.47% on SPEC2017 intspeed.
-
DARTH-PUM: A Hybrid Processing-Using-Memory Architecture
DARTH-PUM integrates analog and Boolean PUM with optimized peripherals, coordination hardware, and a programming interface to run kernels like AES, CNNs, and LLMs fully in memory, achieving speedups of 59.4x, 14.8x, a...
-
ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling
ASTRA-sim 3.0 introduces cache-line load-store simulation, a detailed GPU execution model, and InfraGraph to support high-fidelity distributed machine learning infrastructure simulations.
-
Akita: A High Usability Simulation Framework for Computer Architecture
Akita is a decoupled simulation engine that lets developers write simple single-threaded cycle-based code while automatically delivering event-driven performance, transparent parallel execution, and built-in tracing f...
-
Ramulator 2.1: A Composable Memory System Simulator for Modern DRAM Systems
Ramulator 2.1 is an updated open-source DRAM simulator adding support for recent memory standards, a Python modeling interface, and enhanced validation workflows.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.