Accessed: 2024-09-09

OpenAI, Chatgpt (gpt-4),https://www · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

DuoServe-MoE: Dual-Phase Expert Prefetch and Caching for LLM Inference QoS Assurance

cs.DC · 2025-09-09 · unverdicted · novelty 7.0

DuoServe-MoE decouples prefill and decode phases in MoE LLM inference with a two-stream CUDA pipeline for prefill and an offline-trained predictor for decode, reporting up to 5.34x TTFT and 7.55x end-to-end latency gains.

citing papers explorer

Showing 1 of 1 citing paper.

DuoServe-MoE: Dual-Phase Expert Prefetch and Caching for LLM Inference QoS Assurance cs.DC · 2025-09-09 · unverdicted · none · ref 2
DuoServe-MoE decouples prefill and decode phases in MoE LLM inference with a two-stream CUDA pipeline for prefill and an offline-trained predictor for decode, reporting up to 5.34x TTFT and 7.55x end-to-end latency gains.

Accessed: 2024-09-09

fields

years

verdicts

representative citing papers

citing papers explorer