PopPy combines an ahead-of-time compiler and runtime to extract parallelism from Python compound AI applications, delivering up to 6.4x end-to-end speedups while preserving sequential semantics.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
WarpL uses mutation to find and isolate suboptimal instruction sequences causing performance issues in WebAssembly runtimes by comparing machine code of original and non-problematic mutant programs.
Amoeba adaptively adjusts tensor parallelism at runtime for LLM inference services to handle mixed short and long context requests, delivering 1.75x-6.57x throughput gains over prior solutions in real-world trace evaluations.
citing papers explorer
-
PopPy: Opportunistically Exploiting Parallelism in Python Compound AI Applications
PopPy combines an ahead-of-time compiler and runtime to extract parallelism from Python compound AI applications, delivering up to 6.4x end-to-end speedups while preserving sequential semantics.
-
Debugging Performance Issues in WebAssembly Runtimes via Mutation-based Inference
WarpL uses mutation to find and isolate suboptimal instruction sequences causing performance issues in WebAssembly runtimes by comparing machine code of original and non-problematic mutant programs.
-
Amoeba: Runtime Tensor Parallel Transformation for LLM Inference Services
Amoeba adaptively adjusts tensor parallelism at runtime for LLM inference services to handle mixed short and long context requests, delivering 1.75x-6.57x throughput gains over prior solutions in real-world trace evaluations.