K-Quantization and its Impact on Output Performance

Pierre Nugues; Robin Baki Davidsson

arxiv: 2605.19645 · v1 · pith:V3NW37ZUnew · submitted 2026-05-19 · 💻 cs.CL

K-Quantization and its Impact on Output Performance

Robin Baki Davidsson , Pierre Nugues This is my paper

Pith reviewed 2026-05-20 05:33 UTC · model grok-4.3

classification 💻 cs.CL

keywords LLM quantizationmodel compressionperformance evaluationMMLU-ProCRUXEvalMuSRbit precisionlarge language models

0 comments

The pith

Quantization from 8-bit to 2-bit reduces LLM performance on reasoning and comprehension tasks, with larger models showing greater resilience and mid-sized models offering the best efficiency balance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper evaluates eight large language models after reducing their numerical precision through quantization ranging from 2 to 8 bits. It tracks accuracy changes on MMLU-Pro for knowledge and reasoning, CRUXEval for code comprehension, and MuSR for reading comprehension. Results indicate that higher precision consistently improves outcomes with diminishing returns, while aggressive low-bit settings usually preserve acceptable accuracy though some models suffer notable drops. Larger models withstand aggressive quantization better, yet even they can lose significant performance at the lowest levels, and models in the 7-9 billion parameter range emerge as the practical sweet spot for balancing speed and quality.

Core claim

The authors establish that performance improves with higher bit precision such as 8-bit Q8_0, albeit with diminishing returns, while aggressive quantization such as 2-bit Q2_K usually retains acceptable accuracy across most models and tasks, though larger models demonstrate greater resilience to precision loss and mid-sized models in the 7-9 billion parameter range strike an optimal balance between efficiency and resource usage.

What carries the argument

Evaluation of eight LLMs at discrete quantization levels from Q2_K to Q8_0 on MMLU-Pro, CRUXEval, and MuSR to track accuracy as bit precision decreases.

Load-bearing premise

The chosen benchmarks and the specific eight LLMs tested are representative enough to support general statements about quantization impacts across models and tasks.

What would settle it

Repeating the tests on a fresh collection of models or benchmarks and observing that mid-sized models no longer provide the best efficiency-performance trade-off or that all models lose accuracy equally at 2 bits would undermine the reported trends.

Figures

Figures reproduced from arXiv: 2605.19645 by Pierre Nugues, Robin Baki Davidsson.

**Figure 3.** Figure 3: The structure of a K-quantized super-block. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Model scores. In addition, large language models may show a bias and be subject to hallucination. The rankings we observed does not guarantee that the most effective models are not prone to mistakes or misleading answers. Acknowledgments This work was partially supported by Vetenskaprådet, the Swedish Research Council, registration number 2021-04533. References Sher Badshah and Hassan Sajjad. 2024. Quan… view at source ↗

read the original abstract

Recent advancements in large language models (LLMs) have shown their remarkable capacities in many NLP tasks. However, their substantial size often presents challenges for deployment. This necessitates efficient techniques for model compression, with quantization emerging as a prominent solution. Despite its benefits, the exact impact of quantization (from 2- to 6-bit) on the performance and accuracy of LLMs remains an active area of research. This paper investigates the performance of eight LLMs at various quantization levels, focusing on tasks such as MMLU-Pro for knowledge processing and reasoning, CRUXEval for code comprehension, and MuSR for reading comprehension. Our results show a consistent trend where higher precision (e.g., 8-bit Q8\_0) yields improved performance, albeit with diminishing returns. Aggressive quantization (e.g., 2-bit Q2\_K) usually retains acceptable accuracy, though some models show a substantial loss in performance. Our findings indicate that while lower bit precision generally reduces performance, the impact varies across models and tasks. Larger models show greater resilience to aggressive quantization, but can still undergo significant drops at lower precision levels. Mid-sized models in the 7-9 billion parameter range strike an optimal balance between efficiency and resource usage. Such results provide insights into the trade-offs between model size, quantization, and performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Quantization benchmark study on eight LLMs finds practical trends in performance but relies on a narrow model and task sample.

read the letter

The punchline is that this is an empirical study that tests quantization levels on eight LLMs and reports some size-dependent trends in performance loss. It does a solid job of applying standard quantization methods across a range of bit widths and measuring results on knowledge, code, and reading tasks. The consistent observation of diminishing returns at higher precision and the note that aggressive 2-bit quantization often keeps acceptable accuracy are useful data points. Larger models showing more resilience fits with some earlier observations in the field. Nothing fundamentally new appears here. The paper does not propose new quantization techniques or provide theoretical analysis. It mainly adds benchmark results for these particular models and tasks to the existing body of work on LLM compression. The soft spots center on how far the results can be generalized. Eight models is a limited sample, and the three benchmarks may not capture all relevant variations in model behavior or task difficulty. The claim that mid-sized models strike an optimal balance could look different with a broader set of architectures or more challenging evaluations. The abstract gives trends but the full details on experimental controls would help confirm the findings. This paper is mainly for engineers and practitioners who need guidance on quantization choices when deploying models with hardware limits. It offers practical insights rather than deep theoretical advances. I think it deserves peer review. A referee could push for more models, statistical analysis, and clearer reporting to strengthen the contribution.

Referee Report

2 major / 1 minor

Summary. The paper presents an empirical study on the effects of quantization levels (specifically K-quantization from 2-bit to 8-bit) on the performance of eight large language models across three benchmarks: MMLU-Pro, CRUXEval, and MuSR. The authors observe that higher precision generally leads to better performance with diminishing returns, that 2-bit quantization often preserves acceptable accuracy, that larger models are more resilient to aggressive quantization, and that models in the 7-9 billion parameter range offer an optimal trade-off between efficiency and performance.

Significance. Should the trends prove robust, the results offer valuable practical insights for selecting quantization strategies and model sizes for efficient LLM deployment. The direct measurement approach avoids circularity and provides falsifiable observations on quantization impacts.

major comments (2)

Abstract: The general claims about larger models showing greater resilience to aggressive quantization and mid-sized (7-9B) models striking an optimal balance rest on a sample of only eight LLMs and three benchmarks. The manuscript provides no justification for model or benchmark selection, no sensitivity analysis, and no additional models to test generalizability, which is load-bearing for the broad statements on quantization impacts.
Abstract: The reported trends lack any mention of experimental controls, statistical tests, error bars, run-to-run variance, or exact model identities and architectures, making it impossible to verify the degree to which the data support the stated performance claims.

minor comments (1)

Abstract: The title uses the term 'K-Quantization' without definition or explanation of what 'K' denotes, which could be clarified for readers unfamiliar with the specific quantization scheme.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive feedback, which highlights important areas for improving the clarity and robustness of our empirical study. We address each major comment point by point below, indicating planned revisions to the manuscript.

read point-by-point responses

Referee: Abstract: The general claims about larger models showing greater resilience to aggressive quantization and mid-sized (7-9B) models striking an optimal balance rest on a sample of only eight LLMs and three benchmarks. The manuscript provides no justification for model or benchmark selection, no sensitivity analysis, and no additional models to test generalizability, which is load-bearing for the broad statements on quantization impacts.

Authors: We acknowledge that the observed trends are derived from eight models and three benchmarks, and that the abstract does not explicitly justify these choices. In the revised manuscript we will add a methodology subsection detailing the rationale for model selection (to span a representative range of sizes and families) and benchmark selection (standard tasks covering knowledge, code, and reasoning). We will also insert a limitations paragraph noting the sample size and calling for future work with additional models and sensitivity checks. These additions provide necessary context without overclaiming generalizability. revision: yes
Referee: Abstract: The reported trends lack any mention of experimental controls, statistical tests, error bars, run-to-run variance, or exact model identities and architectures, making it impossible to verify the degree to which the data support the stated performance claims.

Authors: We agree that greater experimental transparency is required. The revised version will expand the experimental setup to list exact model identities and architectures, describe quantization parameters and any reproducibility controls (e.g., fixed random seeds), and clarify that the quantization process itself is deterministic once parameters are set. Where multiple evaluation runs exist we will report variance or error bars; otherwise we will explicitly note the deterministic nature and any single-run limitations. These details will be added to both the main text and abstract where space permits. revision: yes

Circularity Check

0 steps flagged

No circularity: direct empirical reporting of quantization effects

full rationale

The paper conducts and reports direct experimental measurements of eight LLMs across quantization levels on three fixed benchmarks (MMLU-Pro, CRUXEval, MuSR). No derivation chain, fitted parameters, equations, or predictions exist; claims consist of observed trends and comparisons from the collected data. No self-citations, ansatzes, or renamings reduce any result to its own inputs by construction. The analysis is self-contained against external benchmarks and therefore exhibits no circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central observations rest on the assumption that standard benchmark tasks adequately proxy real-world performance and that the tested quantization implementations (Q8_0, Q2_K, etc.) are representative of common practice.

axioms (1)

domain assumption Benchmark scores on MMLU-Pro, CRUXEval, and MuSR reflect meaningful differences in model capability under quantization.
Invoked implicitly when generalizing from the three tasks to broader performance claims.

pith-pipeline@v0.9.0 · 5760 in / 1370 out tokens · 49438 ms · 2026-05-20T05:33:32.830312+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 69 canonical work pages · 4 internal anchors

[1]

and Le, Quoc V

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , isbn =

work page 2022
[2]

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration

Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Xiao, Guangxuan and Han, Song , title =. GetMobile: Mobile Comp. and Comm. , month = jan, pages =. 2025 , issue_date =. doi:10.1145/3714983.3714987 , abstract =

work page doi:10.1145/3714983.3714987 2025
[3]

``Give Me BF 16 or Give Me Death''? Accuracy-Performance Trade-Offs in LLM Quantization

Kurtic, Eldar and Marques, Alexandre Noll and Pandit, Shubhra and Kurtz, Mark and Alistarh, Dan. ``Give Me BF 16 or Give Me Death''? Accuracy-Performance Trade-Offs in LLM Quantization. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1304

work page doi:10.18653/v1/2025.acl-long.1304 2025
[4]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant , author =. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,. 2025 , month =. doi:10.24963/ijcai.2025/902 , url =

work page doi:10.24963/ijcai.2025/902 2025
[5]

The Twelfth International Conference on Learning Representations,

Tim Dettmers and Ruslan Svirschevski and Vage Egiazarian and Denis Kuznedelev and Elias Frantar and Saleh Ashkboos and Alexander Borzunov and Torsten Hoefler and Dan Alistarh , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

work page 2024
[6]

int8 (): 8-bit matrix multiplication for transformers at scale , author=

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale , author=. Advances in neural information processing systems , volume=

work page
[7]

S. M. Towhidul Islam Tonmoy and S. M. Mehedi Zaman and Vinija Jain and Anku Rani and Vipula Rawte and Aman Chadha and Amitava Das , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.01313 , eprinttype =. 2401.01313 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.01313 2024
[8]

Language Models are Few-Shot Learners , url =

Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

work page
[9]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh , title =. CoRR , volume =. 2022 , url =. doi:10.48550/ARXIV.2210.17323 , eprinttype =. 2210.17323 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.17323 2022
[10]

Forty-first International Conference on Machine Learning,

Wei Huang and Yangdong Liu and Haotong Qin and Ying Li and Shiming Zhang and Xianglong Liu and Michele Magno and Xiaojuan Qi , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

work page 2024
[11]

Mahoney and Kurt Keutzer , title =

Sehoon Kim and Coleman Hooper and Amir Gholami and Zhen Dong and Xiuyu Li and Sheng Shen and Michael W. Mahoney and Kurt Keutzer , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

work page 2024
[12]

Floating-point arithmetic --- Wikipedia , The Free Encyclopedia

Wikipedia contributors. Floating-point arithmetic --- Wikipedia , The Free Encyclopedia. 2024

work page 2024
[13]

The bfloat16 numerical format

Google. The bfloat16 numerical format. The BFLOAT16 Numerical Format , publisher=

work page
[14]

Lower perplexity is not always human-like

Tatsuki Kuribayashi and Yohei Oseki and Takumi Ito and Ryo Yoshida and Masayuki Asahara and Kentaro Inui , editor =. Lower Perplexity is Not Always Human-Like , booktitle =. 2021 , url =. doi:10.18653/V1/2021.ACL-LONG.405 , timestamp =

work page doi:10.18653/v1/2021.acl-long.405 2021
[15]

2024 , editor =

Gu, Alex and Roziere, Baptiste and Leather, Hugh James and Solar-Lezama, Armando and Synnaeve, Gabriel and Wang, Sida , booktitle =. 2024 , editor =

work page 2024
[16]

Mohamed Nejjar and Luca Zacharias and Fabian Stiehle and Ingo Weber , title =. J. Softw. Evol. Process. , volume =. 2025 , url =. doi:10.1002/SMR.2723 , timestamp =

work page doi:10.1002/smr.2723 2025
[17]

The Thirty-Third

Peixiang Zhong and Di Wang and Chunyan Miao , title =. The Thirty-Third. 2019 , url =. doi:10.1609/AAAI.V33I01.33017492 , timestamp =

work page doi:10.1609/aaai.v33i01.33017492 2019
[18]

2025 , note =

Julia Turc , title =. 2025 , note =

work page 2025
[19]

Llama.cpp --- Wikipedia , The Free Encyclopedia

Wikipedia contributors. Llama.cpp --- Wikipedia , The Free Encyclopedia. 2024

work page 2024
[20]

2024 , note =

HuggingFace , title =. 2024 , note =

work page 2024
[21]

2025 , note =

HuggingFace , title =. 2025 , note =

work page 2025
[22]

2023 , eprint=

QLoRA: Efficient Finetuning of Quantized LLMs , author=. 2023 , eprint=

work page 2023
[23]

k-quants by ikawrakow , howpublished =

Georgi Gerganov and. k-quants by ikawrakow , howpublished =. 2023 , note =

work page 2023
[24]

2024 , note =

Joshua Noble , title =. 2024 , note =

work page 2024
[25]

My Answer is C

Xinpeng Wang and Bolei Ma and Chengzhi Hu and Leon Weber. "My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-ACL.441 , timestamp =

work page doi:10.18653/v1/2024.findings-acl.441 2024
[26]

5th International Conference on Learning Representations,

Stephen Merity and Caiming Xiong and James Bradbury and Richard Socher , title =. 5th International Conference on Learning Representations,. 2017 , url =

work page 2017
[27]

2024 , note =

Chat Completion API , author=. 2024 , note =

work page 2024
[28]

The Tenth International Conference on Learning Representations,

Tim Dettmers and Mike Lewis and Sam Shleifer and Luke Zettlemoyer , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

work page 2022
[29]

A Comprehensive Evaluation of Quantization Strategies for Large Language Models , booktitle =

Renren Jin and Jiangcun Du and Wuwei Huang and Wei Liu and Jian Luan and Bin Wang and Deyi Xiong , editor =. A Comprehensive Evaluation of Quantization Strategies for Large Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-ACL.726 , timestamp =

work page doi:10.18653/v1/2024.findings-acl.726 2024
[30]

2024 , note =

Georgi Gerganov , title =. 2024 , note =

work page 2024
[31]

2023 , note =

Georgi Gerganov , title =. 2023 , note =

work page 2023
[32]

File:IEEE 754r Half Floating Point Format.svg --- Wikimedia Commons , the free media repository

Wikimedia Commons. File:IEEE 754r Half Floating Point Format.svg --- Wikimedia Commons , the free media repository. 2020

work page 2020
[33]

File:Bfloat16 format.svg --- Wikimedia Commons , the free media repository

Wikimedia Commons. File:Bfloat16 format.svg --- Wikimedia Commons , the free media repository. 2023

work page 2023
[34]

2024 , month=

Introducing Meta Llama 3: The most capable openly available LLM to date , author=. 2024 , month=

work page 2024
[35]

Gemma: Introducing new state-of-the-art open models , url=

Banks, Jeanine and Warkentin, Tris , year=. Gemma: Introducing new state-of-the-art open models , url=. Google , publisher=

work page
[36]

Gemma 2 is now available to researchers and developers , url=

Farabet, Clement and Warkentin, Tris , year=. Gemma 2 is now available to researchers and developers , url=. Google , publisher=

work page
[37]

Microsoft Azure Blog , author=

Introducing Phi-3: Redefining what’s possible with SLMs , url =. Microsoft Azure Blog , author=. 2024 , month=

work page 2024
[38]

Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de Las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and L. Mistral 7B , journal =. 2023 , url =. doi:10.48550/ARXIV.2310.06825 , eprinttype =. 2310.06825 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023
[39]

Mistral AI Large Language Models , author=

Tokenization , howpublished =. Mistral AI Large Language Models , author=

work page
[40]

2023 , month =

Johannes Gäßler , title =. 2023 , month =

work page 2023
[41]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2024 , month=. doi:10.1609/aaai.v38i16.29765 , abstractNote=

work page doi:10.1609/aaai.v38i16.29765 2024
[42]

Gomez and Lukasz Kaiser and Illia Polosukhin , editor =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , editor =. Attention is All you Need , booktitle =. 2017 , url =

work page 2017
[43]

Liu and Mohammad Saleh and Etienne Pot and Ben Goodrich and Ryan Sepassi and Lukasz Kaiser and Noam Shazeer , title =

Peter J. Liu and Mohammad Saleh and Etienne Pot and Ben Goodrich and Ryan Sepassi and Lukasz Kaiser and Noam Shazeer , title =. 6th International Conference on Learning Representations,. 2018 , url =

work page 2018
[44]

CoRR , volume =

Luis Perez and Lizi Ottens and Sudharshan Viswanathan , title =. CoRR , volume =. 2021 , url =. 2102.10535 , timestamp =

work page arXiv 2021
[45]

1949 , publisher=

The Mathematical Theory of Communication , author=. 1949 , publisher=

work page 1949
[46]

The Llama 3 Herd of Models

Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al. The Llama 3 Herd of Models , journal =. 2024 , url =. doi:10.48550/ARXIV.2407.21783 , eprinttype =. 2407.21783 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
[47]

Morgane Rivière and Shreya Pathak and Pier Giuseppe Sessa and Cassidy Hardin and Surya Bhupatiraju and Léonard Hussenot and Thomas Mesnard and Bobak Shahriari and Alexandre Ramé and Johan Ferret and Peter Liu and Pouya Tafti and Abe Friesen and Michelle Casbon and Sabela Ramos and Ravin Kumar and Charline Le Lan and Sammy Jerome and Anton Tsitsulin and Ni...

work page 2024
[48]

Microsoft Developer Blogs , author=

Infinite Chat using a sliding window , url=. Microsoft Developer Blogs , author=. 2023 , month=

work page 2023
[49]

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =

Deep Sparse Rectifier Neural Networks , author =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =. 2011 , editor =

work page 2011
[50]

Improving Text Embeddings with Large Language Models

Liang Wang and Nan Yang and Xiaolong Huang and Linjun Yang and Rangan Majumder and Furu Wei , editor =. Improving Text Embeddings with Large Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.642 , timestamp =

work page doi:10.18653/v1/2024.acl-long.642 2024
[51]

Jianlin Su and Murtadha H. M. Ahmed and Yu Lu and Shengfeng Pan and Wen Bo and Yunfeng Liu , title =. Neurocomputing , volume =. 2024 , url =. doi:10.1016/J.NEUCOM.2023.127063 , timestamp =

work page doi:10.1016/j.neucom.2023.127063 2024
[52]

and Lang, Tomás , year=

Ercegovac, Milos D. and Lang, Tomás , year=. Digital Arithmetic , publisher=

work page
[53]

W., and Keutzer, K

Amir Gholami and Sehoon Kim and Zhen Dong and Zhewei Yao and Michael W. Mahoney and Kurt Keutzer , title =. CoRR , volume =. 2021 , url =. 2103.13630 , timestamp =

work page arXiv 2021
[54]

Understanding and Overcoming the Challenges of Efficient Transformer Quantization , booktitle =

Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort , editor =. Understanding and Overcoming the Challenges of Efficient Transformer Quantization , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EMNLP-MAIN.627 , timestamp =

work page doi:10.18653/v1/2021.emnlp-main.627 2021
[55]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

work page 1948
[56]

Exploring Post-training Quantization in

Zhewei Yao and Xiaoxia Wu and Cheng Li and Stephen Youn and Yuxiong He , editor =. Exploring Post-training Quantization in. Thirty-Eighth. 2024 , url =. doi:10.1609/AAAI.V38I17.29908 , timestamp =

work page doi:10.1609/aaai.v38i17.29908 2024
[57]

CoRR , volume =

Sher Badshah and Hassan Sajjad , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2405.03146 , eprinttype =. 2405.03146 , timestamp =

work page doi:10.48550/arxiv.2405.03146 2024
[58]

CoRR , volume =

Yijun Liu and Yuan Meng and Fang Wu and Shenhao Peng and Hang Yao and Chaoyu Guan and Chen Tang and Xinzhu Ma and Zhi Wang and Wenwu Zhu , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2406.12928 , eprinttype =. 2406.12928 , timestamp =

work page doi:10.48550/arxiv.2406.12928 2024
[59]

Visual Intelligence , volume =

Wei Huang and Xingyu Zheng and Xudong Ma and Haotong Qin and Chengtao Lv and Hong Chen and Jie Luo and Xiaojuan Qi and Xianglong Liu and Michele Magno , title =. Visual Intelligence , volume =. 2024 , url =. doi:10.1007/S44267-024-00070-X , timestamp =

work page doi:10.1007/s44267-024-00070-x 2024
[60]

Aggregating empirical evidence from data strategy studies: a case on model quantization , journal =

Santiago del Rey and Paulo S. Aggregating empirical evidence from data strategy studies: a case on model quantization , journal =. 2025 , url =. doi:10.48550/ARXIV.2505.00816 , eprinttype =. 2505.00816 , timestamp =

work page doi:10.48550/arxiv.2505.00816 2025
[61]

CoRR , volume =

Jemin Lee and Sihyeong Park and Jinse Kwon and Jihun Oh and Yongin Kwon , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2409.11055 , eprinttype =. 2409.11055 , timestamp =

work page doi:10.48550/arxiv.2409.11055 2024
[62]

Vechev , editor =

Kazuki Egashira and Mark Vero and Robin Staab and Jingxuan He and Martin T. Vechev , editor =. Exploiting. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , year =

work page 2024
[63]

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

Xu, Zhichao and Gupta, Ashim and Li, Tao and Bentham, Oliver and Srikumar, Vivek. Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.901

work page doi:10.18653/v1/2024.findings-emnlp.901 2024
[64]

How Does Quantization Affect Multilingual LLM s?

Marchisio, Kelly and Dash, Saurabh and Chen, Hongyu and Aumiller, Dennis and. How Does Quantization Affect Multilingual LLM s?. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.935

work page doi:10.18653/v1/2024.findings-emnlp.935 2024
[65]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Egiazarian, Vage and Panferov, Andrei and Kuznedelev, Denis and Frantar, Elias and Babenko, Artem and Alistarh, Dan , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024
[66]

LLMC : Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Gong, Ruihao and Yong, Yang and Gu, Shiqiao and Huang, Yushi and Lv, Chengtao and Zhang, Yunchen and Tao, Dacheng and Liu, Xianglong. LLMC : Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2024. doi:10.18653/v1/2024....

work page doi:10.18653/v1/2024.emnlp-industry.12 2024
[67]

Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =

Wang, Yubo and Ma, Xueguang and Zhang, Ge and Ni, Yuansheng and Chandra, Abhranil and Guo, Shiguang and Ren, Weiming and Arulraj, Aaran and He, Xuan and Jiang, Ziyan and Li, Tianle and Ku, Max and Wang, Kai and Zhuang, Alex and Fan, Rongqi and Yue, Xiang and Chen, Wenhu , title =. Proceedings of the 38th International Conference on Neural Information Proc...

work page 2025
[68]

The Twelfth International Conference on Learning Representations,

Zayne Sprague and Xi Ye and Kaj Bostrom and Swarat Chaudhuri and Greg Durrett , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

work page 2024
[69]

2024 , note =

Andrei Betlen , title =. 2024 , note =

work page 2024

[1] [1]

and Le, Quoc V

Wei, Jason and Wang, Xuezhi and Schuurmans, Dale and Bosma, Maarten and Ichter, Brian and Xia, Fei and Chi, Ed H. and Le, Quoc V. and Zhou, Denny , title =. Proceedings of the 36th International Conference on Neural Information Processing Systems , articleno =. 2022 , isbn =

work page 2022

[2] [2]

AWQ: Activation-aware Weight Quantization for On-Device LLM Compression and Acceleration

Lin, Ji and Tang, Jiaming and Tang, Haotian and Yang, Shang and Xiao, Guangxuan and Han, Song , title =. GetMobile: Mobile Comp. and Comm. , month = jan, pages =. 2025 , issue_date =. doi:10.1145/3714983.3714987 , abstract =

work page doi:10.1145/3714983.3714987 2025

[3] [3]

``Give Me BF 16 or Give Me Death''? Accuracy-Performance Trade-Offs in LLM Quantization

Kurtic, Eldar and Marques, Alexandre Noll and Pandit, Shubhra and Kurtz, Mark and Alistarh, Dan. ``Give Me BF 16 or Give Me Death''? Accuracy-Performance Trade-Offs in LLM Quantization. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.1304

work page doi:10.18653/v1/2025.acl-long.1304 2025

[4] [4]

Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,

Exploring the Trade-Offs: Quantization Methods, Task Difficulty, and Model Size in Large Language Models From Edge to Giant , author =. Proceedings of the Thirty-Fourth International Joint Conference on Artificial Intelligence,. 2025 , month =. doi:10.24963/ijcai.2025/902 , url =

work page doi:10.24963/ijcai.2025/902 2025

[5] [5]

The Twelfth International Conference on Learning Representations,

Tim Dettmers and Ruslan Svirschevski and Vage Egiazarian and Denis Kuznedelev and Elias Frantar and Saleh Ashkboos and Alexander Borzunov and Torsten Hoefler and Dan Alistarh , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

work page 2024

[6] [6]

int8 (): 8-bit matrix multiplication for transformers at scale , author=

Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale , author=. Advances in neural information processing systems , volume=

work page

[7] [7]

S. M. Towhidul Islam Tonmoy and S. M. Mehedi Zaman and Vinija Jain and Anku Rani and Vipula Rawte and Aman Chadha and Amitava Das , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2401.01313 , eprinttype =. 2401.01313 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.01313 2024

[8] [8]

Language Models are Few-Shot Learners , url =

Brown, Tom and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared D and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel and Wu, Jeffrey and Winte...

work page

[9] [9]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Elias Frantar and Saleh Ashkboos and Torsten Hoefler and Dan Alistarh , title =. CoRR , volume =. 2022 , url =. doi:10.48550/ARXIV.2210.17323 , eprinttype =. 2210.17323 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.17323 2022

[10] [10]

Forty-first International Conference on Machine Learning,

Wei Huang and Yangdong Liu and Haotong Qin and Ying Li and Shiming Zhang and Xianglong Liu and Michele Magno and Xiaojuan Qi , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

work page 2024

[11] [11]

Mahoney and Kurt Keutzer , title =

Sehoon Kim and Coleman Hooper and Amir Gholami and Zhen Dong and Xiuyu Li and Sheng Shen and Michael W. Mahoney and Kurt Keutzer , title =. Forty-first International Conference on Machine Learning,. 2024 , url =

work page 2024

[12] [12]

Floating-point arithmetic --- Wikipedia , The Free Encyclopedia

Wikipedia contributors. Floating-point arithmetic --- Wikipedia , The Free Encyclopedia. 2024

work page 2024

[13] [13]

The bfloat16 numerical format

Google. The bfloat16 numerical format. The BFLOAT16 Numerical Format , publisher=

work page

[14] [14]

Lower perplexity is not always human-like

Tatsuki Kuribayashi and Yohei Oseki and Takumi Ito and Ryo Yoshida and Masayuki Asahara and Kentaro Inui , editor =. Lower Perplexity is Not Always Human-Like , booktitle =. 2021 , url =. doi:10.18653/V1/2021.ACL-LONG.405 , timestamp =

work page doi:10.18653/v1/2021.acl-long.405 2021

[15] [15]

2024 , editor =

Gu, Alex and Roziere, Baptiste and Leather, Hugh James and Solar-Lezama, Armando and Synnaeve, Gabriel and Wang, Sida , booktitle =. 2024 , editor =

work page 2024

[16] [16]

Mohamed Nejjar and Luca Zacharias and Fabian Stiehle and Ingo Weber , title =. J. Softw. Evol. Process. , volume =. 2025 , url =. doi:10.1002/SMR.2723 , timestamp =

work page doi:10.1002/smr.2723 2025

[17] [17]

The Thirty-Third

Peixiang Zhong and Di Wang and Chunyan Miao , title =. The Thirty-Third. 2019 , url =. doi:10.1609/AAAI.V33I01.33017492 , timestamp =

work page doi:10.1609/aaai.v33i01.33017492 2019

[18] [18]

2025 , note =

Julia Turc , title =. 2025 , note =

work page 2025

[19] [19]

Llama.cpp --- Wikipedia , The Free Encyclopedia

Wikipedia contributors. Llama.cpp --- Wikipedia , The Free Encyclopedia. 2024

work page 2024

[20] [20]

2024 , note =

HuggingFace , title =. 2024 , note =

work page 2024

[21] [21]

2025 , note =

HuggingFace , title =. 2025 , note =

work page 2025

[22] [22]

2023 , eprint=

QLoRA: Efficient Finetuning of Quantized LLMs , author=. 2023 , eprint=

work page 2023

[23] [23]

k-quants by ikawrakow , howpublished =

Georgi Gerganov and. k-quants by ikawrakow , howpublished =. 2023 , note =

work page 2023

[24] [24]

2024 , note =

Joshua Noble , title =. 2024 , note =

work page 2024

[25] [25]

My Answer is C

Xinpeng Wang and Bolei Ma and Chengzhi Hu and Leon Weber. "My Answer is C": First-Token Probabilities Do Not Match Text Answers in Instruction-Tuned Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-ACL.441 , timestamp =

work page doi:10.18653/v1/2024.findings-acl.441 2024

[26] [26]

5th International Conference on Learning Representations,

Stephen Merity and Caiming Xiong and James Bradbury and Richard Socher , title =. 5th International Conference on Learning Representations,. 2017 , url =

work page 2017

[27] [27]

2024 , note =

Chat Completion API , author=. 2024 , note =

work page 2024

[28] [28]

The Tenth International Conference on Learning Representations,

Tim Dettmers and Mike Lewis and Sam Shleifer and Luke Zettlemoyer , title =. The Tenth International Conference on Learning Representations,. 2022 , url =

work page 2022

[29] [29]

A Comprehensive Evaluation of Quantization Strategies for Large Language Models , booktitle =

Renren Jin and Jiangcun Du and Wuwei Huang and Wei Liu and Jian Luan and Bin Wang and Deyi Xiong , editor =. A Comprehensive Evaluation of Quantization Strategies for Large Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.FINDINGS-ACL.726 , timestamp =

work page doi:10.18653/v1/2024.findings-acl.726 2024

[30] [30]

2024 , note =

Georgi Gerganov , title =. 2024 , note =

work page 2024

[31] [31]

2023 , note =

Georgi Gerganov , title =. 2023 , note =

work page 2023

[32] [32]

File:IEEE 754r Half Floating Point Format.svg --- Wikimedia Commons , the free media repository

Wikimedia Commons. File:IEEE 754r Half Floating Point Format.svg --- Wikimedia Commons , the free media repository. 2020

work page 2020

[33] [33]

File:Bfloat16 format.svg --- Wikimedia Commons , the free media repository

Wikimedia Commons. File:Bfloat16 format.svg --- Wikimedia Commons , the free media repository. 2023

work page 2023

[34] [34]

2024 , month=

Introducing Meta Llama 3: The most capable openly available LLM to date , author=. 2024 , month=

work page 2024

[35] [35]

Gemma: Introducing new state-of-the-art open models , url=

Banks, Jeanine and Warkentin, Tris , year=. Gemma: Introducing new state-of-the-art open models , url=. Google , publisher=

work page

[36] [36]

Gemma 2 is now available to researchers and developers , url=

Farabet, Clement and Warkentin, Tris , year=. Gemma 2 is now available to researchers and developers , url=. Google , publisher=

work page

[37] [37]

Microsoft Azure Blog , author=

Introducing Phi-3: Redefining what’s possible with SLMs , url =. Microsoft Azure Blog , author=. 2024 , month=

work page 2024

[38] [38]

Albert Q. Jiang and Alexandre Sablayrolles and Arthur Mensch and Chris Bamford and Devendra Singh Chaplot and Diego de Las Casas and Florian Bressand and Gianna Lengyel and Guillaume Lample and Lucile Saulnier and L. Mistral 7B , journal =. 2023 , url =. doi:10.48550/ARXIV.2310.06825 , eprinttype =. 2310.06825 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023

[39] [39]

Mistral AI Large Language Models , author=

Tokenization , howpublished =. Mistral AI Large Language Models , author=

work page

[40] [40]

2023 , month =

Johannes Gäßler , title =. 2023 , month =

work page 2023

[41] [41]

Proceedings of the AAAI Conference on Artificial Intelligence , author=

What Makes Quantization for Large Language Model Hard? An Empirical Study from the Lens of Perturbation , volume=. Proceedings of the AAAI Conference on Artificial Intelligence , author=. 2024 , month=. doi:10.1609/aaai.v38i16.29765 , abstractNote=

work page doi:10.1609/aaai.v38i16.29765 2024

[42] [42]

Gomez and Lukasz Kaiser and Illia Polosukhin , editor =

Ashish Vaswani and Noam Shazeer and Niki Parmar and Jakob Uszkoreit and Llion Jones and Aidan N. Gomez and Lukasz Kaiser and Illia Polosukhin , editor =. Attention is All you Need , booktitle =. 2017 , url =

work page 2017

[43] [43]

Liu and Mohammad Saleh and Etienne Pot and Ben Goodrich and Ryan Sepassi and Lukasz Kaiser and Noam Shazeer , title =

Peter J. Liu and Mohammad Saleh and Etienne Pot and Ben Goodrich and Ryan Sepassi and Lukasz Kaiser and Noam Shazeer , title =. 6th International Conference on Learning Representations,. 2018 , url =

work page 2018

[44] [44]

CoRR , volume =

Luis Perez and Lizi Ottens and Sudharshan Viswanathan , title =. CoRR , volume =. 2021 , url =. 2102.10535 , timestamp =

work page arXiv 2021

[45] [45]

1949 , publisher=

The Mathematical Theory of Communication , author=. 1949 , publisher=

work page 1949

[46] [46]

The Llama 3 Herd of Models

Abhimanyu Dubey and Abhinav Jauhri and Abhinav Pandey and Abhishek Kadian and Ahmad Al. The Llama 3 Herd of Models , journal =. 2024 , url =. doi:10.48550/ARXIV.2407.21783 , eprinttype =. 2407.21783 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024

[47] [47]

Morgane Rivière and Shreya Pathak and Pier Giuseppe Sessa and Cassidy Hardin and Surya Bhupatiraju and Léonard Hussenot and Thomas Mesnard and Bobak Shahriari and Alexandre Ramé and Johan Ferret and Peter Liu and Pouya Tafti and Abe Friesen and Michelle Casbon and Sabela Ramos and Ravin Kumar and Charline Le Lan and Sammy Jerome and Anton Tsitsulin and Ni...

work page 2024

[48] [48]

Microsoft Developer Blogs , author=

Infinite Chat using a sliding window , url=. Microsoft Developer Blogs , author=. 2023 , month=

work page 2023

[49] [49]

Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =

Deep Sparse Rectifier Neural Networks , author =. Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics , pages =. 2011 , editor =

work page 2011

[50] [50]

Improving Text Embeddings with Large Language Models

Liang Wang and Nan Yang and Xiaolong Huang and Linjun Yang and Rangan Majumder and Furu Wei , editor =. Improving Text Embeddings with Large Language Models , booktitle =. 2024 , url =. doi:10.18653/V1/2024.ACL-LONG.642 , timestamp =

work page doi:10.18653/v1/2024.acl-long.642 2024

[51] [51]

Jianlin Su and Murtadha H. M. Ahmed and Yu Lu and Shengfeng Pan and Wen Bo and Yunfeng Liu , title =. Neurocomputing , volume =. 2024 , url =. doi:10.1016/J.NEUCOM.2023.127063 , timestamp =

work page doi:10.1016/j.neucom.2023.127063 2024

[52] [52]

and Lang, Tomás , year=

Ercegovac, Milos D. and Lang, Tomás , year=. Digital Arithmetic , publisher=

work page

[53] [53]

W., and Keutzer, K

Amir Gholami and Sehoon Kim and Zhen Dong and Zhewei Yao and Michael W. Mahoney and Kurt Keutzer , title =. CoRR , volume =. 2021 , url =. 2103.13630 , timestamp =

work page arXiv 2021

[54] [54]

Understanding and Overcoming the Challenges of Efficient Transformer Quantization , booktitle =

Yelysei Bondarenko and Markus Nagel and Tijmen Blankevoort , editor =. Understanding and Overcoming the Challenges of Efficient Transformer Quantization , booktitle =. 2021 , url =. doi:10.18653/V1/2021.EMNLP-MAIN.627 , timestamp =

work page doi:10.18653/v1/2021.emnlp-main.627 2021

[55] [55]

The Bell system technical journal , volume=

A mathematical theory of communication , author=. The Bell system technical journal , volume=. 1948 , publisher=

work page 1948

[56] [56]

Exploring Post-training Quantization in

Zhewei Yao and Xiaoxia Wu and Cheng Li and Stephen Youn and Yuxiong He , editor =. Exploring Post-training Quantization in. Thirty-Eighth. 2024 , url =. doi:10.1609/AAAI.V38I17.29908 , timestamp =

work page doi:10.1609/aaai.v38i17.29908 2024

[57] [57]

CoRR , volume =

Sher Badshah and Hassan Sajjad , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2405.03146 , eprinttype =. 2405.03146 , timestamp =

work page doi:10.48550/arxiv.2405.03146 2024

[58] [58]

CoRR , volume =

Yijun Liu and Yuan Meng and Fang Wu and Shenhao Peng and Hang Yao and Chaoyu Guan and Chen Tang and Xinzhu Ma and Zhi Wang and Wenwu Zhu , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2406.12928 , eprinttype =. 2406.12928 , timestamp =

work page doi:10.48550/arxiv.2406.12928 2024

[59] [59]

Visual Intelligence , volume =

Wei Huang and Xingyu Zheng and Xudong Ma and Haotong Qin and Chengtao Lv and Hong Chen and Jie Luo and Xiaojuan Qi and Xianglong Liu and Michele Magno , title =. Visual Intelligence , volume =. 2024 , url =. doi:10.1007/S44267-024-00070-X , timestamp =

work page doi:10.1007/s44267-024-00070-x 2024

[60] [60]

Aggregating empirical evidence from data strategy studies: a case on model quantization , journal =

Santiago del Rey and Paulo S. Aggregating empirical evidence from data strategy studies: a case on model quantization , journal =. 2025 , url =. doi:10.48550/ARXIV.2505.00816 , eprinttype =. 2505.00816 , timestamp =

work page doi:10.48550/arxiv.2505.00816 2025

[61] [61]

CoRR , volume =

Jemin Lee and Sihyeong Park and Jinse Kwon and Jihun Oh and Yongin Kwon , title =. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2409.11055 , eprinttype =. 2409.11055 , timestamp =

work page doi:10.48550/arxiv.2409.11055 2024

[62] [62]

Vechev , editor =

Kazuki Egashira and Mark Vero and Robin Staab and Jingxuan He and Martin T. Vechev , editor =. Exploiting. Advances in Neural Information Processing Systems 38: Annual Conference on Neural Information Processing Systems 2024, NeurIPS 2024, Vancouver, BC, Canada, December 10 - 15, 2024 , year =

work page 2024

[63] [63]

Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression

Xu, Zhichao and Gupta, Ashim and Li, Tao and Bentham, Oliver and Srikumar, Vivek. Beyond Perplexity: Multi-dimensional Safety Evaluation of LLM Compression. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.901

work page doi:10.18653/v1/2024.findings-emnlp.901 2024

[64] [64]

How Does Quantization Affect Multilingual LLM s?

Marchisio, Kelly and Dash, Saurabh and Chen, Hongyu and Aumiller, Dennis and. How Does Quantization Affect Multilingual LLM s?. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.935

work page doi:10.18653/v1/2024.findings-emnlp.935 2024

[65] [65]

Proceedings of the 41st International Conference on Machine Learning , articleno =

Egiazarian, Vage and Panferov, Andrei and Kuznedelev, Denis and Frantar, Elias and Babenko, Artem and Alistarh, Dan , title =. Proceedings of the 41st International Conference on Machine Learning , articleno =. 2024 , publisher =

work page 2024

[66] [66]

LLMC : Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit

Gong, Ruihao and Yong, Yang and Gu, Shiqiao and Huang, Yushi and Lv, Chengtao and Zhang, Yunchen and Tao, Dacheng and Liu, Xianglong. LLMC : Benchmarking Large Language Model Quantization with a Versatile Compression Toolkit. Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track. 2024. doi:10.18653/v1/2024....

work page doi:10.18653/v1/2024.emnlp-industry.12 2024

[67] [67]

Proceedings of the 38th International Conference on Neural Information Processing Systems , articleno =

Wang, Yubo and Ma, Xueguang and Zhang, Ge and Ni, Yuansheng and Chandra, Abhranil and Guo, Shiguang and Ren, Weiming and Arulraj, Aaran and He, Xuan and Jiang, Ziyan and Li, Tianle and Ku, Max and Wang, Kai and Zhuang, Alex and Fan, Rongqi and Yue, Xiang and Chen, Wenhu , title =. Proceedings of the 38th International Conference on Neural Information Proc...

work page 2025

[68] [68]

The Twelfth International Conference on Learning Representations,

Zayne Sprague and Xi Ye and Kaj Bostrom and Swarat Chaudhuri and Greg Durrett , title =. The Twelfth International Conference on Learning Representations,. 2024 , url =

work page 2024

[69] [69]

2024 , note =

Andrei Betlen , title =. 2024 , note =

work page 2024