FATE reduces normalized makespan and P95 latency in real LLM workflow DAGs to 0.675 and 0.677 by jointly preserving multiple future execution states, outperforming RoundRobin by 32.5% and the strongest baseline by 8.9%.
Helix: Serving large language models over heterogeneous gpus and network via max-flow
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.DC 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
GoodServe proposes a predict-and-rectify routing system for agentic LLM inferences on heterogeneous GPUs that improves goodput by up to 27.4%.
citing papers explorer
-
FATE: Future-State-Aware Scheduling for Heterogeneous LLM Workflows
FATE reduces normalized makespan and P95 latency in real LLM workflow DAGs to 0.675 and 0.677 by jointly preserving multiple future execution states, outperforming RoundRobin by 32.5% and the strongest baseline by 8.9%.
-
GoodServe: Towards High-Goodput Serving of Agentic LLM Inferences over Heterogeneous Resources
GoodServe proposes a predict-and-rectify routing system for agentic LLM inferences on heterogeneous GPUs that improves goodput by up to 27.4%.