pith. sign in

arxiv: 2509.17428 · v4 · pith:VFMDO24Fnew · submitted 2025-09-22 · 💻 cs.CL

QWHA: Quantization-Aware Walsh-Hadamard Adaptation for Parameter-Efficient Fine-Tuning on Large Language Models

Pith reviewed 2026-05-21 21:46 UTC · model grok-4.3

classification 💻 cs.CL
keywords quantization-aware fine-tuningparameter-efficient fine-tuningWalsh-Hadamard transformlarge language modelslow-bit quantizationadaptersmodel compression
0
0 comments X

The pith

QWHA uses Walsh-Hadamard transforms and adaptive initialization to reduce quantization errors in fine-tuned language models while lowering training costs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models require both quantization to cut inference costs and parameter-efficient fine-tuning to limit training overhead. Low-rank adapters often lack enough capacity for accurate results after quantization, and earlier Fourier-based adapters add overhead without fully correcting errors. QWHA solves this by adopting the Walsh-Hadamard Transform as the core kernel together with a new initialization method that selects and refines parameters. The approach mitigates quantization errors, supports fine-tuning, and cuts computational demands. Experiments show higher accuracy at low bit widths and faster training than prior adapters.

Core claim

QWHA integrates Fourier-related adapters into quantized models by using the Walsh-Hadamard Transform as the kernel and a novel initialization scheme with adaptive parameter selection and value refinement, which mitigates quantization errors, facilitates fine-tuning, and substantially reduces computational cost compared with existing methods.

What carries the argument

Walsh-Hadamard Transform kernel combined with adaptive parameter selection and value refinement for adapter initialization

Load-bearing premise

Prior Fourier-related transform adapters suffer from ineffective error reduction and added overhead when used directly in quantized models, and the Walsh-Hadamard kernel plus adaptive initialization overcomes this limitation.

What would settle it

Repeating the reported experiments on the same low-bit quantized models and finding no accuracy gain over baselines or no training speedup would show the method does not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2509.17428 by Beomseok Kang, Hyesung Jeon, Jae-Joon Kim, Seojune Lee, Yulhwa Kim.

Figure 1
Figure 1. Figure 1: Overview of Quantization-aware Walsh-Hadamard Adaptation ( [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: (a) Comparison of rank in weight updates between low-rank and FT-based adapters across [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: (a) Average coverage of outlier components within the selected parameters. (b) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Rank of adapter weights for each parameter selection methods [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effect of refinement on average layer output error. This allows the selected basis vectors to account for the impact of unselected vectors, yielding a more accurate approximation. With￾out this step, interactions among basis vectors are ignored, leading to suboptimal error reduction. Note that the refinement is applica￾ble regardless of the parameter selection strategy [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗
Figure 6
Figure 6. Figure 6: Accuracy of CLoQ and QWHA [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: (a) Weight quantization error distribution and (b) its channel-wise similarity to the pre [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Singular value and coefficient magnitude (squared) distributions with the Pareto hill index [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Parameter selection patterns and two example zoomed-in results of each method in the [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗
read the original abstract

The demand for efficient deployment of large language models (LLMs) has driven interest in quantization, which reduces inference cost, and parameter-efficient fine-tuning (PEFT), which lowers training overhead. This motivated the development of quantization-aware PEFT to produce accurate yet efficient quantized models. In this setting, reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy. However, existing methods that rely on low-rank adaptation suffer from limited representational capacity. Recent Fourier-related transform (FT)-based adapters offer greater representational power than low-rank adapters, but their direct integration into quantized models often results in ineffective error reduction and increased computational overhead. To overcome these limitations, we propose QWHA, a method that integrates FT-based adapters into quantized models by employing the Walsh-Hadamard Transform (WHT) as the transform kernel, together with a novel adapter initialization scheme incorporating adaptive parameter selection and value refinement. We demonstrate that QWHA effectively mitigates quantization errors while facilitating fine-tuning, and that its design substantially reduces computational cost. Experimental results show that QWHA consistently outperforms baselines in low-bit quantization accuracy and achieves significant training speedups over existing FT-based adapters. The code is available at https://github.com/vantaa89/qwha.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes QWHA, a quantization-aware parameter-efficient fine-tuning method for large language models. It integrates Walsh-Hadamard Transform (WHT) kernels into Fourier-related transform adapters together with an adaptive initialization scheme (parameter selection and value refinement) to mitigate quantization errors, enable effective fine-tuning of quantized models, reduce computational overhead relative to prior FT-based adapters, and achieve higher low-bit quantization accuracy along with training speedups.

Significance. If the central claims hold, QWHA would provide a concrete advance in quantization-aware PEFT by addressing representational and overhead limitations of both low-rank adapters and existing FT-based methods, with direct relevance to efficient LLM deployment. The public code release at the cited GitHub repository is a clear strength for reproducibility.

major comments (1)
  1. [Experimental Results] Experimental Results section: The paper's core claim that QWHA 'effectively mitigates quantization errors' via the WHT kernel plus adaptive initialization lacks direct empirical support. Downstream task accuracies and speedups are reported as outperforming FT-based baselines, yet no pre-/post-adapter quantization error metrics (e.g., Frobenius norm, element-wise error, or reconstruction error between original and quantized weights) are provided to isolate the claimed error-reduction mechanism from general PEFT or fine-tuning effects. This gap is load-bearing because the motivation explicitly contrasts QWHA against prior FT adapters on the basis of ineffective error reduction.
minor comments (1)
  1. [Abstract] Abstract: The statement that QWHA 'consistently outperforms baselines in low-bit quantization accuracy' would be strengthened by including at least one concrete quantitative example (e.g., average accuracy delta or specific bit-width results) rather than remaining purely qualitative.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback and for recognizing the potential significance of QWHA for quantization-aware PEFT. We address the single major comment below and will incorporate the suggested improvements in the revised manuscript.

read point-by-point responses
  1. Referee: The paper's core claim that QWHA 'effectively mitigates quantization errors' via the WHT kernel plus adaptive initialization lacks direct empirical support. Downstream task accuracies and speedups are reported as outperforming FT-based baselines, yet no pre-/post-adapter quantization error metrics (e.g., Frobenius norm, element-wise error, or reconstruction error between original and quantized weights) are provided to isolate the claimed error-reduction mechanism from general PEFT or fine-tuning effects. This gap is load-bearing because the motivation explicitly contrasts QWHA against prior FT adapters on the basis of ineffective error reduction.

    Authors: We agree that direct quantification of quantization error reduction would more rigorously isolate the contribution of the WHT kernel and adaptive initialization from general fine-tuning effects. Our current experiments focus on end-to-end downstream accuracy and training speed, which provide indirect evidence of effective error mitigation through consistent outperformance over FT-based baselines. To address this, we will add new experiments in the revised manuscript that report pre- and post-adaptation quantization error metrics (including Frobenius norm and mean squared reconstruction error) on selected layers across the evaluated models and bit-widths. These additions will directly support the motivation section's contrast with prior FT adapters. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is empirical design validated externally

full rationale

The paper introduces QWHA as a practical combination of Walsh-Hadamard Transform kernel and adaptive initialization for quantization-aware PEFT. No equations, derivations, or first-principles predictions appear in the provided text that reduce the claimed error mitigation or speedups to fitted parameters, self-definitions, or self-citation chains. Claims rest on experimental comparisons to baselines rather than internal reductions, rendering the approach self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review is based solely on the abstract; therefore the ledger is necessarily incomplete. The method introduces a novel initialization scheme whose internal parameters are not specified here. No new physical entities are postulated.

axioms (1)
  • domain assumption Reducing quantization error prior to fine-tuning is crucial for achieving high model accuracy.
    Explicitly stated in the abstract as the key motivation for the work.

pith-pipeline@v0.9.0 · 5777 in / 1416 out tokens · 72849 ms · 2026-05-21T21:46:13.858374+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · 8 internal anchors

  1. [1]

    Systematic outliers in large language models, 2025

    Yongqi An, Xu Zhao, Tao Yu, Ming Tang, and Jinqiao Wang. Systematic outliers in large language models, 2025

  2. [2]

    Barry C. Arnold. Pareto Distributions. International Co-operative Publishing House, 1983. ISBN 9780429169410. doi:https://doi.org/10.1201/b18141

  3. [3]

    Quarot: Outlier-free 4-bit inference in rotated llms

    Saleh Ashkboos, Amirkeivan Mohtashami, Maximilian Croci, Bo Li, Pashmina Cameron, Martin Jaggi, Dan Alistarh, Torsten Hoefler, and James Hensman. Quarot: Outlier-free 4-bit inference in rotated llms. Proceedings of the 37th International Conference on Neural Information Processing Systems (NeurIPS), 37: 0 100213--100240, 2024

  4. [4]

    Sparse high rank adapters

    Kartikeya Bhardwaj, Nilesh Prasad Pandey, Sweta Priyadarshi, Viswanath Ganapathy, Shreya Kadambi, Rafael Esteves, Shubhankar Borse, Paul Whatmough, Risheek Garrepalli, Mart Van Baalen, Harris Teague, and Markus Nagel. Sparse high rank adapters. In Proceedings of the 38th International Conference on Neural Information Processing Systems, NeurIPS '24, 2024

  5. [5]

    Piqa: Reasoning about physical commonsense in natural language

    Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about physical commonsense in natural language. In Proceedings of the AAAI conference on artificial intelligence, volume 34, pp.\ 7432--7439, 2020

  6. [6]

    BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions

    Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019

  7. [7]

    Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge

    Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge. arXiv preprint arXiv:1803.05457, 2018

  8. [8]

    Training Verifiers to Solve Math Word Problems

    Karl Cobbe, Vineet Kosaraju, Mohammad Bavarian, Mark Chen, Heewoo Jun, Lukasz Kaiser, Matthias Plappert, Jerry Tworek, Jacob Hilton, Reiichiro Nakano, Christopher Hesse, and John Schulman. Training verifiers to solve math word problems, 2021. URL https://arxiv.org/abs/2110.14168

  9. [9]

    Ergur, Pu Gao, Samuel Hetterich, and Maurice Rolvien

    Amin Coja-Oghlan, Alperen A. Ergur, Pu Gao, Samuel Hetterich, and Maurice Rolvien. The rank of sparse random matrices. The Proceedings of the 31th ACM-SIAM Symposium on Discrete Algorithms (SODA), pp.\ 579--591, 2020

  10. [10]

    fast-hadamard-transform

    Dao-AILab. fast-hadamard-transform. https://github.com/Dao-AILab/fast-hadamard-transform, 2024. Accessed: 2025-05-17

  11. [11]

    DeepSeek-AI, Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, Xiaokang Zhang, Xingkai Yu, Yu Wu, Z. F. Wu, Zhibin Gou, Zhihong Shao, Zhuoshu Li, Ziyi Gao, Aixin Liu, Bing Xue, Bingxuan Wang, Bochao Wu, Bei Feng, Chengda Lu, Chenggang Zhao, Chengqi Deng, Chenyu Zhang, Chong Ruan, Damai D...

  12. [12]

    Cloq: Enhancing fine-tuning of quantized llms via calibrated lora initialization

    Yanxia Deng, Aozhong Zhang, Naigang Wang, Selcuk Gurses, Zi Yang, and Penghang Yin. Cloq: Enhancing fine-tuning of quantized llms via calibrated lora initialization. Transactions on Machine Learning Research (TMLR), 2025

  13. [13]

    Qlora: Efficient finetuning of quantized llms

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. Qlora: Efficient finetuning of quantized llms. Proceedings of the 36th International Conference on Neural Information Processing Systems (NeurIPS), 36: 0 10088--10115, 2023

  14. [14]

    Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2024

    Tim Dettmers, Ruslan Svirschevski, Vage Egiazarian, Denis Kuznedelev, Elias Frantar, Saleh Ashkboos, Alexander Borzunov, Torsten Hoefler, and Dan Alistarh. Spqr: A sparse-quantized representation for near-lossless llm weight compression, 2024

  15. [15]

    Loca: Location-aware cosine adaptation for parameter-efficient fine-tuning

    Zhekai Du, Yinjie Min, Jingjing Li, Ke Lu, Changliang Zou, Liuhua Peng, Tingjin Chu, and Mingming Gong. Loca: Location-aware cosine adaptation for parameter-efficient fine-tuning. 13th International Conference on Learning Representations (ICLR), 2025

  16. [16]

    Gptq: Accurate post-training quantization for generative pre-trained transformers

    Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan-Adrian Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers. In 11th International Conference on Learning Representations (ICLR), 2023

  17. [17]

    He, B., Yin, L., Zhen, H.-L., Liu, S., Wu, H., Zhang, X., Yuan, M., and Ma, C

    Leo Gao, Jonathan Tow, Baber Abbasi, Stella Biderman, Sid Black, Anthony DiPofi, Charles Foster, Laurence Golding, Jeffrey Hsu, Alain Le Noac'h, Haonan Li, Kyle McDonell, Niklas Muennighoff, Chris Ociepa, Jason Phang, Laria Reynolds, Hailey Schoelkopf, Aviya Skowron, Lintang Sutawika, Eric Tang, Anish Thite, Ben Wang, Kevin Wang, and Andy Zou. The languag...

  18. [18]

    Parameter-efficient fine-tuning with discrete fourier transform

    Ziqi Gao, Qichao Wang, Aochuan Chen, Zijing Liu, Bingzhe Wu, Liang Chen, and Jia Li. Parameter-efficient fine-tuning with discrete fourier transform. Proceedings of the 41st International Conference on Machine Learning (ICML), 2024 b

  19. [19]

    Gerakoulis and Saeed S

    Diakoumis P. Gerakoulis and Saeed S. Ghassemzadeh. System and method for generating orthogonal codes, Mar 2004

  20. [20]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  21. [21]

    Xing, and Yoon Kim

    Han Guo, Philip Greengard, Eric P. Xing, and Yoon Kim. Lq-lora: Low-rank plus quantized matrix decomposition for efficient language model finetuning, 2024

  22. [22]

    A. Hedayat. Hadamard matrices and their applications. The Annals of Statistics, 6, 11 1978. doi:10.1214/aos/1176344370

  23. [23]

    Hedayat, Neil J

    Ashok S. Hedayat, Neil J. A. Sloane, and John Stufken. Orthogonal Arrays: Theory and Applications. Springer Series in Statistics. Springer, 1999. ISBN 978-0-387-98766-8

  24. [24]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. 10th International Conference on Learning Representations (ICLR), 2022

  25. [25]

    Ra-lora: Rank-adaptive parameter-efficient fine-tuning for accurate 2-bit quantized large language models

    Minsoo Kim, Sihwa Lee, Wonyong Sung, and Jungwook Choi. Ra-lora: Rank-adaptive parameter-efficient fine-tuning for accurate 2-bit quantized large language models. In Findings of the Association for Computational Linguistics 2024 (ACL), pp.\ 15773--15786, 2024 a

  26. [26]

    Mahoney, and Kurt Keutzer

    Sehoon Kim, Coleman Hooper, Amir Gholami, Zhen Dong, Xiuyu Li, Sheng Shen, Michael W. Mahoney, and Kurt Keutzer. Squeezellm: Dense-and-sparse quantization, 2024 b

  27. [27]

    Kopiczko, Tijmen Blankevoort, and Yuki M

    Dawid J. Kopiczko, Tijmen Blankevoort, and Yuki M. Asano. Vera: Vector-based random matrix adaptation, 2024

  28. [28]

    Henry O. Kunz. On the equivalence between one-dimensional discrete walsh-hadamard and multidimensional discrete fourier transforms. IEEE Transactions on Computers, C-28 0 (3): 0 267--268, 1979. doi:10.1109/TC.1979.1675334

  29. [29]

    Prefix-tuning: Optimizing continuous prompts for generation

    Xiang Lisa Li and Percy Liang. Prefix-tuning: Optimizing continuous prompts for generation. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics (ACL), 2021

  30. [30]

    Loftq: Lora-fine-tuning-aware quantization for large language models

    Yixiao Li, Yifan Yu, Chen Liang, Nikos Karampatziakis, Pengcheng He, Weizhu Chen, and Tuo Zhao. Loftq: Lora-fine-tuning-aware quantization for large language models. In 12th International Conference on Learning Representations (ICLR), 2024

  31. [31]

    Apiq: Finetuning of 2-bit quantized large language model

    Baohao Liao, Christian Herold, Shahram Khadivi, and Christof Monz. Apiq: Finetuning of 2-bit quantized large language model. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp.\ 20996--21020, 2024

  32. [32]

    Awq: Activation-aware weight quantization for llm compression and acceleration, 2024

    Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for llm compression and acceleration, 2024

  33. [33]

    Haokun Liu, Derek Tam, Mohammed Muqeeth, Jay Mohta, Tenghao Huang, Mohit Bansal, and Colin A. Raffel. Few-shot parameter-efficient fine-tuning is better and cheaper than in-context learning. In The 35th Annual Conference on Neural Information Processing Systems (NeurIPS), 2022

  34. [34]

    Visual instruction tuning, 2023

    Haotian Liu, Chunyuan Li, Qingyang Wu, and Yong Jae Lee. Visual instruction tuning, 2023

  35. [35]

    Dora: Weight-decomposed low-rank adaptation, 2024

    Shih-Yang Liu, Chien-Yi Wang, Hongxu Yin, Pavlo Molchanov, Yu-Chiang Frank Wang, Kwang-Ting Cheng, and Min-Hung Chen. Dora: Weight-decomposed low-rank adaptation, 2024

  36. [36]

    Spinquant: Llm quantization with learned rotations, 2025

    Zechun Liu, Changsheng Zhao, Igor Fedorov, Bilge Soran, Dhruv Choudhary, Raghuraman Krishnamoorthi, Vikas Chandra, Yuandong Tian, and Tijmen Blankevoort. Spinquant: Llm quantization with learned rotations, 2025

  37. [37]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

  38. [38]

    Pointer Sentinel Mixture Models

    Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models. CoRR, abs/1609.07843, 2016. URL http://arxiv.org/abs/1609.07843

  39. [39]

    Mistral 7b v0.3

    Mistral AI . Mistral 7b v0.3. https://huggingface.co/mistralai/Mistral-7B-v0.3, 2024. Model card, Apache 2.0 license, released 2024/11/30

  40. [40]

    B. K. Natarajan. Sparse approximate solutions to linear systems. SIAM Journal on Computing, 24 0 (2): 0 227--234, 1995. doi:10.1137/S0097539792240406. URL https://doi.org/10.1137/S0097539792240406

  41. [41]

    Toolllm: Facilitating large language models to master 16000+ real-world apis, 2024

    Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. Toolllm: Facilitating large language models to master 16000+ real-world apis, 2024

  42. [42]

    Winogrande: An adversarial winograd schema challenge at scale

    Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale. Communications of the ACM, 64 0 (9): 0 99--106, 2021

  43. [43]

    Seberry and Mieko Yamada

    Jennifer R. Seberry and Mieko Yamada. Hadamard matrices, sequences, and block designs. In Jeffrey H. Dinitz and Douglas R. Stinson (eds.), Contemporary Design Theory: A Collection of Surveys, pp.\ 431--560. Wiley, 1992

  44. [44]

    Omniquant: Omnidirectionally calibrated quantization for large language models, 2024

    Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, and Ping Luo. Omniquant: Omnidirectionally calibrated quantization for large language models, 2024

  45. [45]

    Ssh: Sparse spectrum adaptation via discrete hartley transformation

    Yixian Shen, Qi Bi, Jia-Hong Huang, Hongyi Zhu, Andy D Pimentel, and Anuj Pathania. Ssh: Sparse spectrum adaptation via discrete hartley transformation. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics (NAACL), 2025

  46. [46]

    N. J. A. Sloane. A library of hadamard matrices. http://neilsloane.com/hadamard/, 2004. Accessed: 2025-05-16

  47. [47]

    Commonsenseqa: A question answering challenge targeting commonsense knowledge

    Alon Talmor, Jonathan Herzig, Nicholas Lourie, and Jonathan Berant. Commonsenseqa: A question answering challenge targeting commonsense knowledge. In North American Chapter of the Association for Computational Linguistics (NAACL), 2019

  48. [48]

    Hashimoto

    Rohan Taori, Ishaan Gulrajani, Tianyi Zhang, Yann Dubois, Xuechen Li, Carlos Guestrin, Percy Liang, and Tatsunori B. Hashimoto. Stanford alpaca: An instruction-following llama model. https://github.com/tatsu-lab/stanford_alpaca, 2023

  49. [49]

    Quip\#: Even better llm quantization with hadamard incoherence and lattice codebooks, 2024

    Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, and Christopher De Sa. Quip\#: Even better llm quantization with hadamard incoherence and lattice codebooks, 2024

  50. [50]

    Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M

    Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, and Quoc V. Le. Finetuned language models are zero-shot learners, 2022

  51. [51]

    HellaSwag: Can a Machine Really Finish Your Sentence?

    Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence? arXiv preprint arXiv:1905.07830, 2019

  52. [52]

    Magr: Weight magnitude reduction for enhancing post-training quantization, 2024

    Aozhong Zhang, Naigang Wang, Yanxia Deng, Xin Li, Zi Yang, and Penghang Yin. Magr: Weight magnitude reduction for enhancing post-training quantization, 2024

  53. [53]

    Pan, Zhangyang Wang, and Jinwon Lee

    Hanqing Zhu, Zhenyu Zhang, Wenyan Cong, Xi Liu, Sem Park, Vikas Chandra, Bo Long, David Z. Pan, Zhangyang Wang, and Jinwon Lee. Apollo: Sgd-like memory, adamw-level performance, 2025

  54. [54]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  55. [55]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  56. [56]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...