Abel, Xu Guo, Jianbing Dong, Ji Shi, and Kunlun Li

Zehuan Wang, Yingcan Wei, Minseok Lee, Matthias Langer, Fan Yu, Jie Liu, Shijie Liu, Daniel G · 2022 · arXiv 3227.354740

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining

cs.DC · 2026-04-08 · unverdicted · novelty 7.0

NestPipe achieves up to 3.06x speedup and 94.07% scaling efficiency on 1,536 workers via dual-buffer inter-batch and frozen-window intra-batch pipelining that overlaps communication with computation.

Toward Robust and Efficient ML-Based GPU Caching for Modern Inference

cs.LG · 2025-09-25 · unverdicted · novelty 5.0

Learning-augmented LRU achieves 1-consistency and O(k)-robustness for GPU caching with low overhead, implemented in LCR to cut P99 TTFT by up to 28.3% on LLM workloads and raise throughput by up to 24.2% on DLRM workloads.

citing papers explorer

Showing 2 of 2 citing papers.

NestPipe: Large-Scale Recommendation Training on 1,500+ Accelerators via Nested Pipelining cs.DC · 2026-04-08 · unverdicted · none · ref 36
NestPipe achieves up to 3.06x speedup and 94.07% scaling efficiency on 1,536 workers via dual-buffer inter-batch and frozen-window intra-batch pipelining that overlaps communication with computation.
Toward Robust and Efficient ML-Based GPU Caching for Modern Inference cs.LG · 2025-09-25 · unverdicted · none · ref 41
Learning-augmented LRU achieves 1-consistency and O(k)-robustness for GPU caching with low overhead, implemented in LCR to cut P99 TTFT by up to 28.3% on LLM workloads and raise throughput by up to 24.2% on DLRM workloads.

Abel, Xu Guo, Jianbing Dong, Ji Shi, and Kunlun Li

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer