RotateK uses online PCA-based rotation to align token-dependent key channel importance into a shared subspace, enabling accurate head-wise structured pruning and faster decoding in VLMs compared to prior token or channel methods.
Rethinking RoPE scaling in quantized LLM: Theory, outlier, and channel-band analysis with weight rescaling
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
FASQ delivers calibration-free LLM compression with continuous size trade-offs via product quantization and custom CUDA kernels that accelerate decode beyond FP16 speeds on consumer hardware.
citing papers explorer
-
Rotation-Aligned Key Channel Pruning for Efficient Vision-Language Model Inference
RotateK uses online PCA-based rotation to align token-dependent key channel importance into a shared subspace, enabling accurate head-wise structured pruning and faster decoding in VLMs compared to prior token or channel methods.
-
FASQ: Flexible Accelerated Subspace Quantization for Calibration-Free LLM Compression
FASQ delivers calibration-free LLM compression with continuous size trade-offs via product quantization and custom CUDA kernels that accelerate decode beyond FP16 speeds on consumer hardware.