Title resolution pending

Accelerating Large Language Model Decoding with Speculative Sampling , author= · 2023

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.

Parallel Prefix Verification for Speculative Generation

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.

Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding

cs.DC · 2026-05-13 · 2 refs

citing papers explorer

Showing 4 of 4 citing papers.

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization cs.LG · 2026-05-09 · unverdicted · none · ref 16
AAAC uses two adaptive 64-byte codebooks per layer for 4-bit LLM weight quantization, choosing the optimal one per group to minimize activation-weighted error with zero storage overhead and fast runtime.
Parallel Prefix Verification for Speculative Generation cs.AI · 2026-05-05 · unverdicted · none · ref 10
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding cs.CL · 2026-05-01 · unverdicted · none · ref 2
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding cs.DC · 2026-05-13 · unreviewed · ref 6 · 2 links

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer