LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.
Eora: Training- free compensation for compressed llm with eigenspace low-rank approximation
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.
HCInfer recovers up to 5.2% accuracy over compressed LLMs and delivers 10.4x speedup versus full-precision models by offloading compensation parameters to CPU with async execution on resource-limited hardware.
citing papers explorer
-
From Signal Degradation to Computation Collapse: Uncovering the Two Failure Modes of LLM Quantization
LLM 2-bit quantization fails via either cumulative signal degradation or early computation collapse in key components.
-
MLorc: Momentum Low-rank Compression for Memory Efficient Large Language Model Adaptation
MLorc compresses optimizer momentum with low-rank methods to enable memory-efficient full fine-tuning of LLMs, outperforming LoRA and GaLore while matching full-parameter performance at small ranks.
-
HCInfer: An Efficient Inference System via Error Compensation for Resource-Constrained Devices
HCInfer recovers up to 5.2% accuracy over compressed LLMs and delivers 10.4x speedup versus full-precision models by offloading compensation parameters to CPU with async execution on resource-limited hardware.