pith. sign in

hub Canonical reference

W., Shao, Y

Canonical reference. 80% of citing Pith papers cite this work as background.

16 Pith papers citing it
Background 80% of classified citations

hub tools

citation-role summary

background 4 method 1

citation-polarity summary

representative citing papers

VORT: Adaptive Power-Law Memory for NLP Transformers

cs.LG · 2026-05-09 · unverdicted · novelty 7.0

VORT assigns learnable fractional orders to tokens and approximates their power-law retention kernels via sum-of-exponentials for efficient long-range dependency modeling in transformers.

Runtime-Certified Bounded-Error Quantized Attention

cs.LG · 2026-05-20 · unverdicted · novelty 6.0

A tiered KV cache architecture computes per-head per-step error bounds on quantized attention and uses adaptive fallback to guarantee bounded or exact outputs relative to FP16 reference.

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

cs.LG · 2025-04-28 · unverdicted · novelty 6.0

TurboQuant achieves near-optimal vector quantization distortion for both MSE and inner products via random rotation and per-coordinate scalar quantization, with a formal proof that it matches lower bounds within a factor of approximately 2.7.

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

cs.CL · 2024-02-05 · conditional · novelty 6.0

KIVI applies asymmetric 2-bit quantization to KV cache with per-channel keys and per-token values, reducing memory 2.6x and boosting throughput up to 3.47x with near-identical quality on Llama, Falcon, and Mistral.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

citing papers explorer

Showing 16 of 16 citing papers.