EntroLLM applies tensor-level mixed quantization to reduce weight entropy then uses Huffman coding for up to 65% storage savings and faster inference on memory-limited edge devices without retraining.
Quantized neural networks: Training neural networks with low precision weights and activations,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
EntroLLM: Entropy Encoded Weight Compression for Efficient Large Language Model Inference on Edge Devices
EntroLLM applies tensor-level mixed quantization to reduce weight entropy then uses Huffman coding for up to 65% storage savings and faster inference on memory-limited edge devices without retraining.