DuoServe-MoE decouples prefill and decode phases in MoE LLM inference with a two-stream CUDA pipeline for prefill and an offline-trained predictor for decode, reporting up to 5.34x TTFT and 7.55x end-to-end latency gains.
Hochreiter, Long short-term memory, Neural Computation MIT- Press (1997)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
WaveGraphNet is a graph-based coupled inverse-forward model that localizes damage in CFRP plates from sparse guided-wave measurements with improved extrapolation to unseen locations.
citing papers explorer
-
DuoServe-MoE: Dual-Phase Expert Prefetch and Caching for LLM Inference QoS Assurance
DuoServe-MoE decouples prefill and decode phases in MoE LLM inference with a two-stream CUDA pipeline for prefill and an offline-trained predictor for decode, reporting up to 5.34x TTFT and 7.55x end-to-end latency gains.
-
WaveGraphNet: Physics-Consistent Guided-Wave Damage Localization through Coupled Inverse-Forward Graph Learning
WaveGraphNet is a graph-based coupled inverse-forward model that localizes damage in CFRP plates from sparse guided-wave measurements with improved extrapolation to unseen locations.