A comprehensive study on ILP acceleration accounting for sparsity, area, energy, data movement using near-memory architecture
pith:R5NAPKH2 Add to your LaTeX paper
What is a Pith Number?\usepackage{pith}
\pithnumber{R5NAPKH2}
Prints a linked pith:R5NAPKH2 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more
read the original abstract
Integer Linear Programming (ILP) is widely used for solving real-world optimization problems, including network routing, map routing, and traffic scheduling. However, ILP algorithms are sparse and branch-intensive, making them inefficient on conventional CPUs and GPUs. Prior work has shown that large-scale ILP problems can require tens of hours of execution time even on massively parallel systems, limiting their applicability to time-sensitive decision-making workloads. Existing ILP solvers such as Gurobi employ software-level optimizations to handle sparsity on CPUs, but still face throughput limitations. GPU-based ILP solvers are also constrained because GPUs are not well suited for sparse and branch-heavy workloads, leading to thread divergence, under-utilization of streaming multiprocessors, and frequent host-device interactions. This paper presents SPARK, a sparsity-aware, reuse-aware, energy-efficient, reconfigurable near-cache ILP accelerator. SPARK repurposes the existing L1 cache in CPUs to provide near-cache acceleration with minimal hardware overhead of approximately 1.4\% of the CPU area. The architecture performs near-cache sparsity detection and sparsity-aware computation to reduce insignificant computations and data movement energy. SPARK also exploits computational reuse patterns in ILP algorithms to improve parallelism and efficiency. The proposed design supports both sparse and dense ILPs as well as Linear Programs (LPs). Evaluations on real-world workloads from MIPLIB 2017 show that SPARK achieves up to 15x and 20x performance improvement, and up to 152x and 740x energy reduction compared to AMD Zen3 CPUs and NVIDIA Tesla V100 GPUs, respectively, for sparse ILPs. For sparse LPs, SPARK achieves 7-17x performance improvement and 103-250x energy reduction over CPU and GPU baselines, demonstrating the broad applicability of the proposed architecture.
This paper has not been read by Pith yet.
Forward citations
Cited by 1 Pith paper
-
A complete discussion on fully reconfigurable, digital, scalable, graph and sparsity-aware near-memory accelerator for graph neural networks
NEM-GNN is a scalable DAC/ADC-less processing-in-memory architecture for GNNs that uses early compute termination, reconfigurable SoC pre-computation, and compute-as-soon-as-ready broadcast execution to deliver large ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.