SpecFed accelerates federated LLM inference via speculative decoding for parallel processing and top-K compression with server-side reconstruction, achieving high fidelity with reduced communication overhead.
When to ensemble: Identifying token-level points for stable and fast llm ensembling
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
eess.SP 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
SpecFed: Accelerating Federated LLM Inference with Speculative Decoding and Compressed Transmission
SpecFed accelerates federated LLM inference via speculative decoding for parallel processing and top-K compression with server-side reconstruction, achieving high fidelity with reduced communication overhead.