Title resolution pending

Accelerating Large Language Model Decoding with Speculative Sampling , author= · 2023

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

AAAC learns two 64-byte codebooks per layer for 4-bit LLM weights and lets each group pick the one minimizing activation-weighted reconstruction error, storing the choice at zero extra cost.

Parallel Prefix Verification for Speculative Generation

cs.AI · 2026-05-05 · unverdicted · novelty 6.0

PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.

Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding

cs.CL · 2026-05-01 · unverdicted · novelty 6.0

EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.

To Intervene or Not: Guiding Inference-time Alignment with Probabilistic Model Blending

cs.LG · 2026-04-22 · conditional · novelty 5.0

BlendIn replaces binary guidance acceptance with confidence-weighted distribution blending between base and guidance models, mitigating cascading failures in inference-time LLM alignment.

PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding

cs.DC · 2026-05-13 · unverdicted · novelty 4.0 · 2 refs

PipeSD is a cloud-edge collaborative inference framework that overlaps token generation and communication via dynamic programming pipeline scheduling and uses Bayesian-optimized dual-threshold NAV triggering, delivering 1.16x-2.16x speedup and 14.3%-25.3% energy reduction over baselines.

citing papers explorer

Showing 4 of 4 citing papers after filters.

AAAC: Activation-Aware Adaptive Codebooks for 4-bit LLM Weight Quantization cs.LG · 2026-05-09 · unverdicted · none · ref 16
AAAC learns two 64-byte codebooks per layer for 4-bit LLM weights and lets each group pick the one minimizing activation-weighted reconstruction error, storing the choice at zero extra cost.
Parallel Prefix Verification for Speculative Generation cs.AI · 2026-05-05 · unverdicted · none · ref 10
PARSE accelerates LLM inference via parallel semantic prefix verification in a single forward pass, delivering 1.25x-4.3x speedups alone and up to 4.5x when combined with EAGLE-3.
Making Every Verified Token Count: Adaptive Verification for MoE Speculative Decoding cs.CL · 2026-05-01 · unverdicted · none · ref 2
EVICT adaptively truncates draft trees in MoE speculative decoding by combining drafter signals with profiled costs to retain only cost-effective prefixes, delivering up to 2.35x speedup over autoregressive decoding.
PipeSD: An Efficient Cloud-Edge Collaborative Pipeline Inference Framework with Speculative Decoding cs.DC · 2026-05-13 · unverdicted · none · ref 6 · 2 links
PipeSD is a cloud-edge collaborative inference framework that overlaps token generation and communication via dynamic programming pipeline scheduling and uses Bayesian-optimized dual-threshold NAV triggering, delivering 1.16x-2.16x speedup and 14.3%-25.3% energy reduction over baselines.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer