Festina reduces energy consumption by up to 56% for serverless LLM inference on shared GPUs while keeping TTFT/TBT SLO attainment within 2% of four state-of-the-art baselines.
In: Proceedings of the ACM International Conference on Supercomputing, ICS ’19, p
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Review chapter summarizing advances in parallel sparse direct solvers along communication reduction and data-sparse compression axes.
citing papers explorer
-
Energy-Aware Scheduling for Serverless LLM Serving on Shared GPUs
Festina reduces energy consumption by up to 56% for serverless LLM inference on shared GPUs while keeping TTFT/TBT SLO attainment within 2% of four state-of-the-art baselines.
-
Parallel Sparse and Data-Sparse Factorization-based Linear Solvers
Review chapter summarizing advances in parallel sparse direct solvers along communication reduction and data-sparse compression axes.