LASER introduces curvature-weighted SVD from second-order loss approximation and loss-aware rank allocation to compress VLMs, reporting over 2.3x decoding speedup under low-precision settings.
Effectively com- press kv heads for llm
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.
LightTransfer identifies lazy layers in LLMs like LLaMA and replaces their attention with streaming attention to form hybrid models, delivering up to 2.17x throughput with under 1.5% drop on LongBench and strong results on reasoning benchmarks.
WSVD delivers over 1.8x faster VLM decoding via weighted low-rank approximation at fine granularity plus quantization, without accuracy loss.
citing papers explorer
-
LASER: Loss-Aware Singular-value Decomposition and Rank Allocation for Efficient Low-Precision Vision-Language Models
LASER introduces curvature-weighted SVD from second-order loss approximation and loss-aware rank allocation to compress VLMs, reporting over 2.3x decoding speedup under low-precision settings.
-
DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation
DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.
-
LightTransfer: Your Long-Context LLM is Secretly a Hybrid Model with Effortless Adaptation
LightTransfer identifies lazy layers in LLMs like LLaMA and replaces their attention with streaming attention to form hybrid models, delivering up to 2.17x throughput with under 1.5% drop on LongBench and strong results on reasoning benchmarks.
-
WSVD: Weighted Low-Rank Approximation for Fast and Efficient Execution of Low-Precision Vision-Language Models
WSVD delivers over 1.8x faster VLM decoding via weighted low-rank approximation at fine granularity plus quantization, without accuracy loss.