Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification

Hong Huang, Decheng Wu, Qiangqiang Hu, Guanghua Yu, Jinhai Yang, Jianchen Zhu, Xue Liu, Dapeng Wu · 2026 · arXiv 2601.07892

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

cs.CL · 2026-05-21 · unverdicted · novelty 4.0

Hy-MT2 is a new family of fast multilingual translation models that claim to outperform several open-source LLMs and commercial APIs across diverse evaluation settings while supporting efficient on-device deployment.

DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization

cs.CV · 2026-04-20 · unverdicted · novelty 4.0

DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

citing papers explorer

Showing 2 of 2 citing papers.

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild cs.CL · 2026-05-21 · unverdicted · none · ref 83
Hy-MT2 is a new family of fast multilingual translation models that claim to outperform several open-source LLMs and commercial APIs across diverse evaluation settings while supporting efficient on-device deployment.
DuQuant++: Fine-grained Rotation Enhances Microscaling FP4 Quantization cs.CV · 2026-04-20 · unverdicted · none · ref 9
DuQuant++ adapts outlier-aware fine-grained rotation to MXFP4 by matching block size to the 32-element microscaling group, enabling a single rotation that smooths distributions and achieves SOTA performance on LLaMA-3 with lower cost.

Sherry: Hardware-efficient 1.25-bit ternary quantization via fine-grained sparsification

fields

years

verdicts

representative citing papers

citing papers explorer