pith. sign in

Title resolution pending

23 Pith papers cite this work. Polarity classification is still indexing.

23 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 2

clear filters

representative citing papers

TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate

cs.LG · 2025-04-28 · unverdicted · novelty 6.0

TurboQuant achieves near-optimal vector quantization distortion for both MSE and inner products via random rotation and per-coordinate scalar quantization, with a formal proof that it matches lower bounds within a factor of approximately 2.7.

QVGGT: Post-Training Quantized Visual Geometry Grounded Transformer

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

QVGGT uses per-block mixed-precision analysis, outlier token filtering with PCA compensation, and task-aware scale search to achieve near-lossless W4A16 quantization of VGGT with 3-4.9x memory savings and 2.8x speedup.

High-Rate Quantized Matrix Multiplication I

cs.IT · 2026-01-23 · unverdicted · novelty 5.0

High-rate quantization theory yields accurate approximations for the distortion of absmax INT and FP schemes in generic weight-plus-activation matrix multiplication.

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

A Survey on Efficient Inference for Large Language Models

cs.CL · 2024-04-22 · accept · novelty 3.0

The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • A Survey on Efficient Inference for Large Language Models cs.CL · 2024-04-22 · accept · none · ref 217

    The paper surveys techniques to speed up and reduce the resource needs of LLM inference, organized by data-level, model-level, and system-level changes, with comparative experiments on representative methods.