SLO-Guard improves tuning budget consistency for SLO-constrained LLM serving by handling crashes explicitly and using a two-phase feasible-first exploration plus exploitation strategy.
Morphling: Fast, near-optimal auto-configuration for cloud-native model serving
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
SLO-Guard: Crash-Aware, Budget-Consistent Autotuning for SLO-Constrained LLM Serving
SLO-Guard improves tuning budget consistency for SLO-constrained LLM serving by handling crashes explicitly and using a two-phase feasible-first exploration plus exploitation strategy.