T., Hausd¨orfer, O., and Verma, A

Hansen-Palmus, J · 2024 · arXiv 2411.09510

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network

cs.AR · 2026-03-30 · unverdicted · novelty 7.0

SCIN uses an in-switch accelerator for direct memory access and 8-bit in-network quantization during All-Reduce, delivering up to 8.7x faster small-message reduction and 1.74x TTFT speedup on LLaMA-2 models.

Understanding and Improving Communication Performance in Multi-node LLM Inference

cs.DC · 2025-11-12 · conditional · novelty 5.0

Performance analysis of multi-node LLM inference identifies all-reduce bottlenecks and introduces NVRAR hierarchical all-reduce achieving 1.9-3.6x lower latency than NCCL and up to 1.72x end-to-end batch latency reduction for Llama 3.1 405B in decode-heavy tensor-parallel workloads.

citing papers explorer

Showing 2 of 2 citing papers.

A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network cs.AR · 2026-03-30 · unverdicted · none · ref 26
SCIN uses an in-switch accelerator for direct memory access and 8-bit in-network quantization during All-Reduce, delivering up to 8.7x faster small-message reduction and 1.74x TTFT speedup on LLaMA-2 models.
Understanding and Improving Communication Performance in Multi-node LLM Inference cs.DC · 2025-11-12 · conditional · none · ref 6
Performance analysis of multi-node LLM inference identifies all-reduce bottlenecks and introduces NVRAR hierarchical all-reduce achieving 1.9-3.6x lower latency than NCCL and up to 1.72x end-to-end batch latency reduction for Llama 3.1 405B in decode-heavy tensor-parallel workloads.

T., Hausd¨orfer, O., and Verma, A

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer