SCIN uses an in-switch accelerator for direct memory access and 8-bit in-network quantization during All-Reduce, delivering up to 8.7x faster small-message reduction and 1.74x TTFT speedup on LLaMA-2 models.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Introduces Switching Efficiency (η) decomposed into data, routing efficiency, and port utilization factors to analyze and improve communication bottlenecks in AI data center networks for LLM training.
citing papers explorer
-
A Switch-Centric In-Network Architecture for Accelerating LLM Inference in Shared-Memory Network
SCIN uses an in-switch accelerator for direct memory access and 8-bit in-network quantization during All-Reduce, delivering up to 8.7x faster small-message reduction and 1.74x TTFT speedup on LLaMA-2 models.
-
Switching Efficiency: A Novel Framework for Dissecting AI Data Center Network Efficiency
Introduces Switching Efficiency (η) decomposed into data, routing efficiency, and port utilization factors to analyze and improve communication bottlenecks in AI data center networks for LLM training.