Rocketkv: Accelerating long-context llm inference via two-stage kv cache compression.arXiv preprint arXiv:2502.14051

Behnam, P · 2025 · arXiv 2502.14051

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents

cs.MA · 2026-05-09 · unverdicted · novelty 6.0

Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.

BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding

cs.CL · 2025-12-12 · unverdicted · novelty 6.0

BLASST dynamically sparsifies attention by thresholding softmax scores to skip blocks, delivering 1.5x speedups at 70%+ sparsity while preserving benchmark accuracy.

HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression

cs.LG · 2026-06-27 · unverdicted · novelty 5.0

HARD-KV bridges dynamic head-adaptive KV cache compression with static inference engine constraints via Cascade Cache and Logits Calibration, reporting up to 2x throughput gains on long-context math benchmarks.

citing papers explorer

Showing 3 of 3 citing papers.

Slipstream: Trajectory-Grounded Compaction Validation for Long-Horizon Agents cs.MA · 2026-05-09 · unverdicted · none · ref 62
Slipstream uses asynchronous compaction with trajectory-grounded judge validation to improve long-horizon agent accuracy by up to 8.8 percentage points and reduce latency by up to 39.7%.
BLASST: Dynamic BLocked Attention Sparsity via Softmax Thresholding cs.CL · 2025-12-12 · unverdicted · none · ref 3
BLASST dynamically sparsifies attention by thresholding softmax scores to skip blocks, delivering 1.5x speedups at 70%+ sparsity while preserving benchmark accuracy.
HARD-KV: Head-Adaptive Regularization for Decoding-time KV Compression cs.LG · 2026-06-27 · unverdicted · none · ref 2
HARD-KV bridges dynamic head-adaptive KV cache compression with static inference engine constraints via Cascade Cache and Logits Calibration, reporting up to 2x throughput gains on long-context math benchmarks.

Rocketkv: Accelerating long-context llm inference via two-stage kv cache compression.arXiv preprint arXiv:2502.14051

fields

years

verdicts

representative citing papers

citing papers explorer