Efficient llm scheduling by learning to rank.Advances in Neural Information Processing Systems, 37:59006–59029, 2024

Yichao Fu, Siqi Zhu, Runlong Su, Aurick Qiao, Ion Stoica, Hao Zhang · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing

cs.DC · 2025-12-22 · conditional · novelty 6.0

CascadeInfer partitions LLM instances into length-specialized groups, uses dynamic programming for stage partitioning, and applies runtime refinement plus decentralized load balancing to cut latency and raise throughput.

citing papers explorer

Showing 1 of 1 citing paper.

CascadeInfer: Length-Aware Scheduling of LLM Serving with Low Latency and Load Balancing cs.DC · 2025-12-22 · conditional · none · ref 8
CascadeInfer partitions LLM instances into length-specialized groups, uses dynamic programming for stage partitioning, and applies runtime refinement plus decentralized load balancing to cut latency and raise throughput.

Efficient llm scheduling by learning to rank.Advances in Neural Information Processing Systems, 37:59006–59029, 2024

fields

years

verdicts

representative citing papers

citing papers explorer