AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4representative citing papers
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
citing papers explorer
-
AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization
AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.
-
Parallel Prefix Verification for Speculative Generation
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
-
Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
- PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding