ActInv reconstructs client inputs from server-visible activations in LLM split inference despite common defenses, PAF quantifies per-layer leakage risk, and PriPert improves defenses via calibrated perturbations.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
WISP suppresses wasted drafting time and verification interference in edge-cloud speculative LLM serving through dynamic drafting and SLO-aware batching, delivering up to 2.1x capacity and 1.94x goodput gains over centralized and prior baselines.
citing papers explorer
-
What Does the Server See? Understanding Privacy Leakage from Large Language Models in Split Inference
ActInv reconstructs client inputs from server-visible activations in LLM split inference despite common defenses, PAF quantifies per-layer leakage risk, and PriPert improves defenses via calibrated perturbations.
-
WISP: Waste- and Interference-Suppressed Distributed Speculative LLM Serving at the Edge via Dynamic Drafting and SLO-Aware Batching
WISP suppresses wasted drafting time and verification interference in edge-cloud speculative LLM serving through dynamic drafting and SLO-aware batching, delivering up to 2.1x capacity and 1.94x goodput gains over centralized and prior baselines.