pith. sign in

Training llms with fault tolerant hsdp on 100,000 gpus,

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

years

2026 6

verdicts

UNVERDICTED 6

roles

background 1

polarities

background 1

clear filters

representative citing papers

Decoupled DiLoCo for Resilient Distributed Pre-training

cs.CL · 2026-04-23 · unverdicted · novelty 6.0

Decoupled DiLoCo enables asynchronous distributed pre-training with zero global downtime under simulated failures while preserving competitive performance on text and vision tasks.

Libra: Efficient Resource Management for Agentic RL Post-Training

cs.LG · 2026-06-02 · unverdicted · novelty 4.0

Libra optimizes GPU allocation across rollout and training in agentic RL via an elastic hybrid pool and C-MLFQ scheduler based on tool-return causal signals, claiming up to 3.0x throughput and 2.5x faster reward convergence on 48 A800 GPUs.

citing papers explorer

Showing 6 of 6 citing papers after filters.