A survey of low-bit large language models: Basics, systems, and algorithms.Neural Networks, 192:107856, 2025

Ruihao Gong, Yifu Ding, Zining Wang, Chengtao Lv, Xingyu Zheng, Jinyang Du, Yang Yong, Shiqiao Gu, Haotong Qin, Jinyang Guo, Dahua Lin, Michele Magno, Xianglong Liu · 2025 · arXiv 2025.107856

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery

cs.AI · 2026-06-01 · conditional · novelty 7.0

2-bit quantized reasoning models exhibit process failures like loops and delayed commitment that degrade end-to-end performance, but FP16 planning and loop rescue recover accuracy on MATH-500 from 17.2% to 74.2% for Qwen3-8B while retaining speed gains.

Token-Operations-Oriented Inference Optimization Techniques for Large Models

cs.SE · 2026-06-18 · unverdicted · novelty 3.0

The paper introduces a four-layer technical architecture for token-operations-oriented inference optimization in large models and reviews key technologies and industry status at each layer.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Extreme Low-Bit Inference in Reasoning Models: Failure Modes and Targeted Recovery cs.AI · 2026-06-01 · conditional · none · ref 13
2-bit quantized reasoning models exhibit process failures like loops and delayed commitment that degrade end-to-end performance, but FP16 planning and loop rescue recover accuracy on MATH-500 from 17.2% to 74.2% for Qwen3-8B while retaining speed gains.

A survey of low-bit large language models: Basics, systems, and algorithms.Neural Networks, 192:107856, 2025

fields

years

verdicts

representative citing papers

citing papers explorer