Slo-aware scheduling for large language model inferences

Jinqi Huang, Yi Xiong, Xuebing Yu, Wenjie Huang, Entong Li, Li Zeng, Xin Chen · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference

cs.DC · 2026-03-27 · unverdicted · novelty 7.0

TCM-Serve applies modality-aware scheduling to reduce average TTFT by 54% and 78.5% for latency-critical requests in MLLM inference.

HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling

cs.DC · 2025-08-21 · unverdicted · novelty 5.0

HFX jointly designs scheduling and scaling for multi-SLO LLM serving, achieving up to 4.44x higher SLO attainment, 65.82% lower latency, and 49.81% lower cost than prior systems on multi-task workloads.

citing papers explorer

Showing 2 of 2 citing papers.

TCM-Serve: Modality-aware Scheduling for Multimodal Large Language Model Inference cs.DC · 2026-03-27 · unverdicted · none · ref 12
TCM-Serve applies modality-aware scheduling to reduce average TTFT by 54% and 78.5% for latency-critical requests in MLLM inference.
HFX: Joint Design of Algorithms and Systems for Multi-SLO Serving and Fast Scaling cs.DC · 2025-08-21 · unverdicted · none · ref 18
HFX jointly designs scheduling and scaling for multi-SLO LLM serving, achieving up to 4.44x higher SLO attainment, 65.82% lower latency, and 49.81% lower cost than prior systems on multi-task workloads.

Slo-aware scheduling for large language model inferences

fields

years

verdicts

representative citing papers

citing papers explorer