PopPy combines an ahead-of-time compiler and runtime to extract parallelism from Python compound AI applications, delivering up to 6.4x end-to-end speedups while preserving sequential semantics.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.DC 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
HetRL delivers up to 9.17x higher throughput for LLM RL training on heterogeneous GPUs by using hybrid and ILP-based schedulers to solve a joint optimization problem over computation and data dependencies.
FractalSortCPU achieves up to 6x better bandwidth efficiency than prior radix sorts on CPUs for 512MB-32GB datasets at 16-bit precision by using compressed histograms and parallel updates without pre-bucketing.
citing papers explorer
-
PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications
PopPy combines an ahead-of-time compiler and runtime to extract parallelism from Python compound AI applications, delivering up to 6.4x end-to-end speedups while preserving sequential semantics.
-
HetRL: Efficient Reinforcement Learning for LLMs in Heterogeneous Environments
HetRL delivers up to 9.17x higher throughput for LLM RL training on heterogeneous GPUs by using hybrid and ILP-based schedulers to solve a joint optimization problem over computation and data dependencies.
-
FractalSortCPU: Bandwidth-Efficient Compressed Radix Sort on CPU
FractalSortCPU achieves up to 6x better bandwidth efficiency than prior radix sorts on CPUs for 512MB-32GB datasets at 16-bit precision by using compressed histograms and parallel updates without pre-bucketing.