pith. sign in

ByteScale: Communication-Efficient Scaling of LLM Training with a 2048K Context Length on 16384 GPUs , url=

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 3

verdicts

UNVERDICTED 3

roles

background 1

polarities

background 1

representative citing papers

RNG: Flat Datacenter Networks at Scale

cs.NI · 2026-04-16 · unverdicted · novelty 7.0 · 3 refs

RNG deploys the first production flat datacenter network using quasi-random graphs, a new distributed routing protocol, and a passive optical cabling shuffle device, achieving fat-tree performance at substantially lower cost.

SCENIC: Stream Computation-Enhanced SmartNIC

cs.AR · 2026-04-16 · unverdicted · novelty 7.0

SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU data partitioning.

citing papers explorer

Showing 3 of 3 citing papers.

  • RNG: Flat Datacenter Networks at Scale cs.NI · 2026-04-16 · unverdicted · none · ref 42 · 3 links

    RNG deploys the first production flat datacenter network using quasi-random graphs, a new distributed routing protocol, and a passive optical cabling shuffle device, achieving fat-tree performance at substantially lower cost.

  • SCENIC: Stream Computation-Enhanced SmartNIC cs.AR · 2026-04-16 · unverdicted · none · ref 80

    SCENIC delivers a programmable 200G SmartNIC with offloaded protocol stacks, stream compute units, and full OS transparency that matches commercial performance for custom offloads like collective communication and GPU data partitioning.

  • Unleashing Scalable Context Parallelism for Foundation Models Pre-Training via FCP cs.DC · 2026-05-08 · unverdicted · none · ref 47

    FCP shards sequences at block level with flexible P2P communication and bin-packing to achieve near-linear scaling up to 256 GPUs and 1.13x-2.21x higher attention MFU in foundation model pre-training.