AAAC learns two 64-byte codebooks per layer for 4-bit LLM weights and lets each group pick the one minimizing activation-weighted reconstruction error, storing the choice at zero extra cost.
Title resolution pending
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5representative citing papers
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
BlendIn replaces binary guidance acceptance with confidence-weighted distribution blending between base and guidance models, mitigating cascading failures in inference-time LLM alignment.
PipeSD is a cloud-edge collaborative inference framework that overlaps token generation and communication via dynamic programming pipeline scheduling and uses Bayesian-optimized dual-threshold NAV triggering, delivering 1.16x-2.16x speedup and 14.3%-25.3% energy reduction over baselines.
citing papers explorer
-
AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
AAAC learns two 64-byte codebooks per layer for 4-bit LLM weights and lets each group pick the one minimizing activation-weighted reconstruction error, storing the choice at zero extra cost.
-
Parallel Prefix Verification for Speculative Generation
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
-
Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
-
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding
PipeSD is a cloud-edge collaborative inference framework that overlaps token generation and communication via dynamic programming pipeline scheduling and uses Bayesian-optimized dual-threshold NAV triggering, delivering 1.16x-2.16x speedup and 14.3%-25.3% energy reduction over baselines.