{"paper":{"title":"The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Ternary-weight LLMs achieve full-precision performance at far lower computational cost","cross_cats":["cs.LG"],"primary_cat":"cs.CL","authors_text":"Furu Wei, Hongyu Wang, Jilong Xue, Lei Wang, Li Dong, Lingxiao Ma, Ruiping Wang, Shaohan Huang, Shuming Ma, Wenhui Wang","submitted_at":"2024-02-27T18:56:19Z","abstract_excerpt":"Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}. It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption. More profoundly, the 1.58-bit LLM defines a new scaling law and recipe f"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the training procedure and scaling law developed for the 1.58-bit ternary setting will continue to produce competitive performance when model size or data volume increases beyond the scales tested.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"BitNet b1.58 shows that ternary 1.58-bit LLMs can match full-precision performance at substantially lower inference cost.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Ternary-weight LLMs achieve full-precision performance at far lower computational cost","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ddc322bdb29f2bfbbc4fa69825a4f1f1d3674a93e814ceed7cf341ae4361abfb"},"source":{"id":"2402.17764","kind":"arxiv","version":1},"verdict":{"id":"133e1b69-a422-44d1-aadb-0eb10183f4ad","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-17T20:06:27.706475Z","strongest_claim":"It matches the full-precision (i.e., FP16 or BF16) Transformer LLM with the same model size and training tokens in terms of both perplexity and end-task performance, while being significantly more cost-effective in terms of latency, memory, throughput, and energy consumption.","one_line_summary":"BitNet b1.58 shows that ternary 1.58-bit LLMs can match full-precision performance at substantially lower inference cost.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the training procedure and scaling law developed for the 1.58-bit ternary setting will continue to produce competitive performance when model size or data volume increases beyond the scales tested.","pith_extraction_headline":"Ternary-weight LLMs achieve full-precision performance at far lower computational cost"},"references":{"count":15,"sample":[{"doi":"","year":1911,"title":"PIQA: Reasoning about Physical Commonsense in Natural Language","work_id":"0d865a62-6376-4606-8d3a-eeb3b6e9ba6d","ref_index":1,"cited_arxiv_id":"1911.11641","is_internal_anchor":true},{"doi":"","year":null,"title":"arXiv preprint arXiv:2307.13304 , year=","work_id":"bd3fe3b4-ccc3-419b-969e-9a80ded56858","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":1905,"title":"BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions","work_id":"511eeb84-4b95-46d5-b14f-50da43f4f19f","ref_index":3,"cited_arxiv_id":"1905.10044","is_internal_anchor":true},{"doi":"","year":2014,"title":"1.1 computing’s energy problem (and what we can do about it)","work_id":"85c69f95-61c6-46d7-a2cd-7076728504f6","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"AWQ: Activation-aware Weight Quantization for LLM Compression and Acceleration","work_id":"ea9d1d72-db24-4cae-8c89-4ecd83dd87c1","ref_index":5,"cited_arxiv_id":"2306.00978","is_internal_anchor":true}],"resolved_work":15,"snapshot_sha256":"e30fcd93a9d7cb80a19c153ea14ecd4c6b57ef294d43ba70ac52e8d0802b8538","internal_anchors":9},"formal_canon":{"evidence_count":1,"snapshot_sha256":"c821ee11b34325432ec0696a2de201f332054e83b69091e7efddad8e94353a8e"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}