DASH-Q uses a stable diagonal curvature estimate and weighted least squares to achieve robust ultra-low-bit post-training quantization of LLMs, improving zero-shot accuracy by 7% on average over baselines.
Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems, 6:87–100
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
method 1polarities
use method 1representative citing papers
Lever optimizes the drafting, verification, and execution stages of speculative decoding for flash-backed LLM inference on smartphones, reporting 2.93x average latency reduction over baseline flash-offloaded inference.
Position paper proposes replacing fragmented narrow AI models with LLMs as the cognitive orchestrator in the RAN Intelligent Controller for Level 5 autonomous 6G networks.
citing papers explorer
-
Robust Ultra Low-Bit Post-Training Quantization via Stable Diagonal Curvature Estimate
DASH-Q uses a stable diagonal curvature estimate and weighted least squares to achieve robust ultra-low-bit post-training quantization of LLMs, improving zero-shot accuracy by 7% on average over baselines.
-
Lever: Speculative LLM Inference on Smartphones
Lever optimizes the drafting, verification, and execution stages of speculative decoding for flash-backed LLM inference on smartphones, reporting 2.93x average latency reduction over baseline flash-offloaded inference.
-
Agents Should Replace Narrow Predictive AI as the Orchestrator in 6G AI-RAN
Position paper proposes replacing fragmented narrow AI models with LLMs as the cognitive orchestrator in the RAN Intelligent Controller for Level 5 autonomous 6G networks.