{"paper":{"title":"Fitting Is Not Enough: Smoothness in Extremely Quantized LLMs","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Extremely quantized LLMs lose generation quality from smoothness degradation in token predictions, beyond numerical accuracy loss alone.","cross_cats":["cs.AI"],"primary_cat":"cs.CL","authors_text":"Pengzhan Li, Wanxiang Che, Xu Han, Yuxuan Li, Yuzhuang Xu","submitted_at":"2026-05-09T11:19:51Z","abstract_excerpt":"Large language models (LLMs) achieve strong performance but incur high deployment costs, motivating extremely low-bit but lossy quantization. Existing quantization algorithms mainly focus on improving the numerical accuracy of forward computation to eliminate performance degradation. In this paper, we show that extremely quantized LLMs suffer from systematic smoothness degradation beyond numerical precision loss. Through a smoothness proxy, we observe that such degradation becomes increasingly severe as the quantization bit-width decreases. Furthermore, based on sequence neighborhood modeling,"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Extremely quantized LLMs suffer from systematic smoothness degradation beyond numerical precision loss; preserving smoothness via a simple principle in post-training quantization and quantization-aware training brings additional gains beyond numerical accuracy.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The smoothness proxy used in the paper accurately captures the degradation that matters for generation quality, and the observed reduction in effective token candidates is a direct causal consequence of that smoothness loss rather than an artifact of the proxy or the evaluation setup.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"Extremely quantized LLMs exhibit systematic smoothness degradation that reduces effective token candidates and degrades generation; a smoothness-preserving principle in PTQ and QAT delivers gains beyond numerical accuracy.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Extremely quantized LLMs lose generation quality from smoothness degradation in token predictions, beyond numerical accuracy loss alone.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"41be0ebf50b7a99dbc62b2837c6ac3f9ab7bc0fdebe8726e446a09086598e9c9"},"source":{"id":"2605.08894","kind":"arxiv","version":2},"verdict":{"id":"cf5ade6b-7f5c-4fce-bc1b-efaa863d66ec","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-19T15:12:38.831731Z","strongest_claim":"Extremely quantized LLMs suffer from systematic smoothness degradation beyond numerical precision loss; preserving smoothness via a simple principle in post-training quantization and quantization-aware training brings additional gains beyond numerical accuracy.","one_line_summary":"Extremely quantized LLMs exhibit systematic smoothness degradation that reduces effective token candidates and degrades generation; a smoothness-preserving principle in PTQ and QAT delivers gains beyond numerical accuracy.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The smoothness proxy used in the paper accurately captures the degradation that matters for generation quality, and the observed reduction in effective token candidates is a direct causal consequence of that smoothness loss rather than an artifact of the proxy or the evaluation setup.","pith_extraction_headline":"Extremely quantized LLMs lose generation quality from smoothness degradation in token predictions, beyond numerical accuracy loss alone."},"integrity":{"clean":true,"summary":{"advisory":0,"critical":0,"by_detector":{},"informational":0},"endpoint":"/pith/2605.08894/integrity.json","findings":[],"available":true,"detectors_run":[{"name":"ai_meta_artifact","ran_at":"2026-05-19T20:42:19.892389Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_title_agreement","ran_at":"2026-05-19T14:01:21.505557Z","status":"completed","version":"1.0.0","findings_count":0},{"name":"doi_compliance","ran_at":"2026-05-19T10:41:57.600294Z","status":"completed","version":"1.0.0","findings_count":0}],"snapshot_sha256":"e931c3aaeda60ea11ec31626d2e9afcd0e00b2a4de340b938e8af4a0b818cb84"},"references":{"count":46,"sample":[{"doi":"10.18653/v1/n19-1245","year":2019,"title":"A. Amini, S. Gabriel, S. Lin, R. Koncel-Kedziorski, Y . Choi, and H. Hajishirzi. MathQA: Towards interpretable math word problem solving with operation-based formalisms. InProceed- ings of the 2019 Co","work_id":"b73d762e-5f33-41a0-a44d-b2e613bffd36","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.18653/v1/2022","year":2022,"title":"D. Bahri, H. Mobahi, and Y . Tay. Sharpness-aware minimization improves language model generalization. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), ","work_id":"ed940a1a-3e19-4658-88d1-34194ac465ee","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"P. L. Bartlett, D. J. Foster, and M. J. Telgarsky. Spectrally-normalized margin bounds for neural networks.Advances in neural information processing systems (NeurIPS), 30:6240–6249, 2017. URL https://","work_id":"9b7bec3d-b809-45a5-8990-2faa2c149696","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"10.48550/arxiv.1911.11641","year":2020,"title":"PIQA: Reasoning about Physical Commonsense in Natural Language","work_id":"0d865a62-6376-4606-8d3a-eeb3b6e9ba6d","ref_index":4,"cited_arxiv_id":"1911.11641","is_internal_anchor":true},{"doi":"","year":2020,"title":"A. Chan, Y . Tay, and Y .-S. Ong. What it thinks is important is important: Robustness transfers through input gradients. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogn","work_id":"0e74a83c-30e7-4e17-8b4b-35ff4e29bc48","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":46,"snapshot_sha256":"dc0a59e4af667197f76335cc65ccc69b565e5f68ea609ddcf1e0473264f7f1cc","internal_anchors":8},"formal_canon":{"evidence_count":2,"snapshot_sha256":"c2b02002443755a01c6eab353c5a81b1c69ca5c7f4cc835c62864fc3c4da2f8c"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}