FlexPipe introduces runtime pipeline refactoring for LLMs to achieve higher resource efficiency and lower latency in serverless GPU clusters with fragmentation.
Faasnap: Faas made fast using snapshot-based vms
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
representative citing papers
Flare proposes routing microservice spike load selectively to serverless while keeping steady load on VMs, with claimed minimal integration changes.