FlexPipe introduces runtime pipeline refactoring for LLMs to achieve higher resource efficiency and lower latency in serverless GPU clusters with fragmentation.
Faasnap: Faas made fast using snapshot-based vms
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
Flare proposes routing microservice spike load selectively to serverless while keeping steady load on VMs, with claimed minimal integration changes.
citing papers explorer
-
FlexPipe: Adapting Dynamic LLM Serving Through Inflight Pipeline Refactoring in Fragmented Serverless Clusters
FlexPipe introduces runtime pipeline refactoring for LLMs to achieve higher resource efficiency and lower latency in serverless GPU clusters with fragmentation.
-
Flare: Leveraging Serverless Elasticity to Absorb Microservice Load Spikes
Flare proposes routing microservice spike load selectively to serverless while keeping steady load on VMs, with claimed minimal integration changes.