pith. machine review for the scientific record. sign in

arxiv: 1811.02883 · v2 · submitted 2018-10-16 · 💻 cs.DC · cs.AR

Recognition: unknown

SCALE-Sim: Systolic CNN Accelerator Simulator

Authors on Pith no claims yet
classification 💻 cs.DC cs.AR
keywords scale-simsimulatorsystolicacceleratoracceleratorsdeepdesigninsights
0
0 comments X
read the original abstract

Systolic Arrays are one of the most popular compute substrates within Deep Learning accelerators today, as they provide extremely high efficiency for running dense matrix multiplications. However, the research community lacks tools to insights on both the design trade-offs and efficient mapping strategies for systolic-array based accelerators. We introduce Systolic CNN Accelerator Simulator (SCALE-Sim), which is a configurable systolic array based cycle accurate DNN accelerator simulator. SCALE-Sim exposes various micro-architectural features as well as system integration parameters to the designer to enable comprehensive design space exploration. This is the first systolic-array simulator tuned for running DNNs to the best of our knowledge. Using SCALE-Sim, we conduct a suite of case studies and demonstrate the effect of bandwidth, data flow and aspect ratio on the overall runtime and energy of Deep Learning kernels across vision, speech, text, and games. We believe that these insights will be highly beneficial to architects and ML practitioners.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DRIFT: Harnessing Inherent Fault Tolerance for Efficient and Reliable Diffusion Model Inference

    cs.AR 2026-04 unverdicted novelty 7.0

    DRIFT uses resilience analysis, targeted DVFS, and adaptive rollback ABFT to deliver 36% average energy savings or 1.7x speedup in diffusion model inference while preserving generation quality.

  2. AMMA: A Multi-Chiplet Memory-Centric Architecture for Low-Latency 1M Context Attention Serving

    cs.AR 2026-04 unverdicted novelty 6.0

    AMMA is a memory-centric multi-chiplet architecture using HBM-PNM cubes, custom logic dies, hybrid parallelism, and reordered collectives that delivers 15.5X lower attention latency and 6.9X lower energy than NVIDIA H...

  3. FireBridge: Cycle-Accurate Hardware + Firmware Co-Verification for Modern Accelerators

    cs.AR 2026-03 conditional novelty 6.0

    FireBridge enables cycle-accurate hardware-firmware co-verification in standard simulators using randomized memory bridges, delivering up to 50x faster debug iterations than FPGA-based flows for accelerators such as s...

  4. CHICO-Agent: An LLM Agent for the Cross-layer Optimization of 2.5D and 3D Chiplet-based Systems

    cs.AR 2026-04 unverdicted novelty 5.0

    CHICO-Agent uses LLM agents with a knowledge base to find lower-cost configurations for 2.5D/3D chiplet systems than simulated annealing while providing an interpretable audit trail.