A new partitioning algorithm that provably load-balances arbitrary sparse tensor algebra expressions by generalizing parallel merging to multi-operand, multi-dimensional hierarchical structures, implemented in a compiler framework.
InProceedings of the 31st ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2(USA)(ASPLOS ’26)
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Spotlight achieves 4x faster DiT RL post-training on spot GPUs via stale-weight exploration and elastic sequence parallelism, cutting costs 1.4-6.4x with better image quality.
DeltaBox achieves 14 ms checkpoint and 5 ms rollback for AI agent sandboxes via layered DeltaFS and incremental DeltaCR mechanisms that exploit similarity between consecutive states.
citing papers explorer
-
Spotlight: Synergizing Seed Exploration and Spot GPUs for DiT RL Post-Training
Spotlight achieves 4x faster DiT RL post-training on spot GPUs via stale-weight exploration and elastic sequence parallelism, cutting costs 1.4-6.4x with better image quality.