Awq: Activation-aware weight quantization for on-device llm compression and accelera- tion.Proceedings of Machine Learning and Systems, 6:87–100, 2024

Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, Song Han · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Fast Tensorization of Neural Networks via Slice-wise Feature Distillation

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

A slice-wise feature distillation framework for independent tensorization of neural network slices to achieve scalable compression with reduced fine-tuning costs.

citing papers explorer

Showing 1 of 1 citing paper.

Fast Tensorization of Neural Networks via Slice-wise Feature Distillation cs.LG · 2026-05-19 · unverdicted · none · ref 13
A slice-wise feature distillation framework for independent tensorization of neural network slices to achieve scalable compression with reduced fine-tuning costs.

Awq: Activation-aware weight quantization for on-device llm compression and accelera- tion.Proceedings of Machine Learning and Systems, 6:87–100, 2024

fields

years

verdicts

representative citing papers

citing papers explorer