Gonzalez and Hao Zhang and Ion Stoica , title =

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU

cs.LG · 2026-05-21 · unverdicted · novelty 5.0

A rule-based controller selects among FP16, quantized, speculative, and hybrid modes for single-GPU LLM inference, delivering 2.1x latency speedup and 51.7% lower energy per token with near-baseline accuracy on Llama-3.1-8B.

citing papers explorer

Showing 1 of 1 citing paper.

ModeSwitch-LLM: A Lightweight Phase-Aware Controller for Cross-Mode LLM Inference on a Single GPU cs.LG · 2026-05-21 · unverdicted · none · ref 12
A rule-based controller selects among FP16, quantized, speculative, and hybrid modes for single-GPU LLM inference, delivering 2.1x latency speedup and 51.7% lower energy per token with near-baseline accuracy on Llama-3.1-8B.

Gonzalez and Hao Zhang and Ion Stoica , title =

fields

years

verdicts

representative citing papers

citing papers explorer