Extends ItoyoriFBC with promise-future synchronization via MPI one-sided communication for dynamic dependencies in AMT runtimes, shown with HLU achieving 15.6x speedup on 16 nodes.
Uni-Address Threads: Scalable Thread Management for RDMA-Based Work Stealing , year =
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
ShuntServe reports 1.42x and 1.35x higher throughput than baselines plus 31.9 percent and 31.2 percent cost-efficiency gains over on-demand instances for Llama-3.1-70B and Qwen3-32B on heterogeneous AWS spot clusters.
citing papers explorer
-
Promise-Future Synchronization for Cluster Asynchronous Many-Task Runtimes via MPI One-Sided Communication
Extends ItoyoriFBC with promise-future synchronization via MPI one-sided communication for dynamic dependencies in AMT runtimes, shown with HLU achieving 15.6x speedup on 16 nodes.
-
ShuntServe: Cost-Efficient LLM Serving on Heterogeneous Spot GPU Clusters
ShuntServe reports 1.42x and 1.35x higher throughput than baselines plus 31.9 percent and 31.2 percent cost-efficiency gains over on-demand instances for Llama-3.1-70B and Qwen3-32B on heterogeneous AWS spot clusters.