PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices,

· 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM

cs.AR · 2026-04-06 · conditional · novelty 6.0

LOCALUT delivers 1.82x geometric mean speedup for quantized DNN inference on real UPMEM DRAM-PIM devices by using operation-packed LUTs with canonicalization, reordering, and slice streaming.

DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures

cs.AR · 2025-11-19 · unverdicted · novelty 6.0

DCC is a data-centric compiler that co-optimizes data partitioning strategies with compute loop partitioning for ML kernels on multiple PIM architectures, reporting up to 13.17x speedup on AttAcc PIM and 4.52x average for LLM inference over GPU.

citing papers explorer

Showing 2 of 2 citing papers.

LOCALUT: Harnessing Capacity-Computation Tradeoffs for LUT-Based Inference in DRAM-PIM cs.AR · 2026-04-06 · conditional · none · ref 67
LOCALUT delivers 1.82x geometric mean speedup for quantized DNN inference on real UPMEM DRAM-PIM devices by using operation-packed LUTs with canonicalization, reordering, and slice streaming.
DCC: Data-Centric Compilation of Machine Learning Kernels for Processing-In-Memory Architectures cs.AR · 2025-11-19 · unverdicted · none · ref 93
DCC is a data-centric compiler that co-optimizes data partitioning strategies with compute loop partitioning for ML kernels on multiple PIM architectures, reporting up to 13.17x speedup on AttAcc PIM and 4.52x average for LLM inference over GPU.

PID-Comm: A Fast and Flexible Collective Communication Framework for Commodity Processing-in-DIMM Devices,

fields

years

verdicts

representative citing papers

citing papers explorer