MGSim + MGMark: A Framework for Multi-GPU System Research

Ajay Joshi; David Kaeli; John Kim; Jos\'e L. Abell\'an; Rafael Ubal; Saiful A. Mojumder; Shane Treadway; Shi Dong; Trinayan Baruah; Vincent Zhao

arxiv: 1811.02884 · v3 · pith:PTMONQYUnew · submitted 2018-10-15 · 💻 cs.DC · cs.AR

MGSim + MGMark: A Framework for Multi-GPU System Research

Yifan Sun , Trinayan Baruah , Saiful A. Mojumder , Shi Dong , Rafael Ubal , Xiang Gong , Shane Treadway , Yuhui Bao

show 5 more authors

Vincent Zhao Jos\'e L. Abell\'an John Kim Ajay Joshi David Kaeli

This is my paper

classification 💻 cs.DC cs.AR

keywords multi-gpumgsimsimulationsystemsystemsdesignprogrammingsimulator

0 comments

read the original abstract

The rapidly growing popularity and scale of data-parallel workloads demand a corresponding increase in raw computational power of GPUs (Graphics Processing Units). As single-GPU systems struggle to satisfy the performance demands, multi-GPU systems have begun to dominate the high-performance computing world. The advent of such systems raises a number of design challenges, including the GPU microarchitecture, multi-GPU interconnect fabrics, runtime libraries and associated programming models. The research community currently lacks a publically available and comprehensive multi-GPU simulation framework and benchmark suite to evaluate multi-GPU system design solutions. In this work, we present MGSim, a cycle-accurate, extensively validated, multi-GPU simulator, based on AMD's Graphics Core Next 3 (GCN3) instruction set architecture. We complement MGSim with MGMark, a suite of multi-GPU workloads that explores multi-GPU collaborative execution patterns. Our simulator is scalable and comes with in-built support for multi-threaded execution to enable fast and efficient simulations. In terms of performance accuracy, MGSim differs $5.5\%$ on average when compared against actual GPU hardware. We also achieve a $3.5\times$ and a $2.5\times$ average speedup in function emulation and architectural simulation with 4 CPU cores, while delivering the same accuracy as the serial simulation. We illustrate the novel simulation capabilities provided by our simulator through a case study exploring programming models based on a unified multi-GPU system (U-MGPU) and a discrete multi-GPU system (D-MGPU) that both utilize unified memory space and cross-GPU memory access. We evaluate the design implications from our case study, suggesting that D-MGPU is an attractive programming model for future multi-GPU systems.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Analyzing Reverse Address Translation Overheads in Multi-GPU Scale-Up Pods
cs.DC 2026-04 unverdicted novelty 7.0

Simulation study shows cold TLB misses in reverse address translation dominate latency for small collectives in multi-GPU pods, causing up to 1.4x degradation, while larger ones see diminishing returns.