pith. sign in

{MegaScale}: Scal- ing large language model training to more than 10,000 {GPUs}

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.SE 1

years

2025 1

verdicts

UNVERDICTED 1

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper.

  • TSGuard: Automated User-Centric Incident Diagnosis for AI Workloads in the Cloud cs.SE · 2025-06-02 · unverdicted · none · ref 25

    TSGuard builds domain knowledge bases offline from historical incidents and applies online multi-agent structured reasoning to diagnose AI workload failures, delivering 19.8% higher accuracy and 63.4% lower verification time than baselines on Azure production data.