pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.DC 1

years

2025 1

verdicts

CONDITIONAL 1

representative citing papers

Understanding and Improving Communication Performance in Multi-node LLM Inference

cs.DC · 2025-11-12 · conditional · novelty 5.0

Performance analysis of multi-node LLM inference identifies all-reduce bottlenecks and introduces NVRAR hierarchical all-reduce achieving 1.9-3.6x lower latency than NCCL and up to 1.72x end-to-end batch latency reduction for Llama 3.1 405B in decode-heavy tensor-parallel workloads.

citing papers explorer

Showing 1 of 1 citing paper.

  • Understanding and Improving Communication Performance in Multi-node LLM Inference cs.DC · 2025-11-12 · conditional · none · ref 18

    Performance analysis of multi-node LLM inference identifies all-reduce bottlenecks and introduces NVRAR hierarchical all-reduce achieving 1.9-3.6x lower latency than NCCL and up to 1.72x end-to-end batch latency reduction for Llama 3.1 405B in decode-heavy tensor-parallel workloads.