Recognition: 2 theorem links
· Lean TheoremFitting Is Not Enough: Smoothness in Extremely Quantized LLMs
Pith reviewed 2026-05-12 01:14 UTC · model grok-4.3
The pith
Extremely quantized LLMs lose generation quality from smoothness degradation separate from numerical errors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Extremely quantized LLMs suffer from systematic smoothness degradation beyond numerical precision loss. Through a smoothness proxy, such degradation becomes increasingly severe as the quantization bit-width decreases. Based on sequence neighborhood modeling, quantized models exhibit a rapid reduction of effective token candidates within the prediction neighborhood, which directly leads to a sparser decoding tree and degraded generation quality. Introducing a simple smoothness-preserving principle in both post-training quantization and quantization-aware training demonstrates that preserving smoothness brings additional gains beyond numerical accuracy.
What carries the argument
The smoothness proxy paired with sequence neighborhood modeling that counts the shrinking set of effective token candidates around each prediction point.
If this is right
- Smoothness degradation grows worse as bit-width is reduced further.
- Fewer effective token candidates in local neighborhoods produce measurably sparser decoding trees.
- Adding the smoothness-preserving principle to existing quantization pipelines yields quality gains that numerical accuracy improvements alone do not deliver.
- Future extreme quantization algorithms should treat smoothness preservation as an explicit objective alongside numerical fidelity.
Where Pith is reading between the lines
- The same smoothness loss may appear in other compression methods such as pruning or low-rank adaptation if they also flatten local prediction distributions.
- Architectural changes that encourage smoother output distributions could make models more tolerant of extreme quantization from the start.
- Smoothness metrics could be added to standard evaluation suites for compressed models to catch quality drops not visible in perplexity alone.
Load-bearing premise
The chosen smoothness proxy and sequence neighborhood modeling correctly isolate a degradation mechanism that is causally responsible for reduced generation quality independent of numerical error.
What would settle it
A controlled comparison in which a quantization method augmented with the smoothness-preserving principle produces no improvement in generation quality metrics over a numerically optimized baseline would falsify the independence of the smoothness effect.
Figures
read the original abstract
Large language models (LLMs) achieve strong performance but incur high deployment costs, motivating extremely low-bit but lossy quantization. Existing quantization algorithms mainly focus on improving the numerical accuracy of forward computation to eliminate performance degradation. In this paper, we show that extremely quantized LLMs suffer from systematic smoothness degradation beyond numerical precision loss. Through a smoothness proxy, we observe that such degradation becomes increasingly severe as the quantization bit-width decreases. Furthermore, based on sequence neighborhood modeling, we find that quantized models exhibit a rapid reduction of effective token candidates within the prediction neighborhood, which directly leads to a sparser decoding tree and degraded generation quality. To validate it, we introduce a simple smoothness-preserving principle in both post-training quantization and quantization-aware training, and demonstrate that preserving smoothness brings additional gains beyond numerical accuracy. The core goal of this paper is to highlight smoothness preservation as an important design consideration for future extreme quantization methods. Code is available at https://github.com/xuyuzhuang11/FINE.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that extremely quantized LLMs suffer from systematic smoothness degradation beyond numerical precision loss. Using a smoothness proxy, the authors observe that this degradation worsens as bit-width decreases. Via sequence neighborhood modeling, they report a rapid reduction in effective token candidates within the prediction neighborhood, which they argue directly produces a sparser decoding tree and lower generation quality. They introduce a simple smoothness-preserving principle for both post-training quantization and quantization-aware training, and show that it yields additional performance gains beyond those from numerical accuracy improvements alone. The core contribution is positioning smoothness preservation as an important new design consideration for extreme quantization methods.
Significance. If the claimed causal mechanism holds independently of numerical error, the work identifies a previously under-emphasized factor that could meaningfully improve the quality of extremely low-bit LLMs without additional compute. The public code release is a clear strength that enables direct verification and extension. The result, if substantiated, would shift quantization research toward joint optimization of numerical fit and distributional smoothness.
major comments (3)
- Section 3 (sequence neighborhood modeling): the assertion that reduced effective token candidates 'directly leads' to a sparser decoding tree and degraded generation quality is supported only by correlational trends across bit-widths. No controlled experiment is described that holds numerical forward-pass accuracy fixed while varying the neighborhood structure or smoothness proxy; without such isolation, the causal claim that smoothness degradation operates independently of numerical precision loss remains unproven.
- Section 4 (smoothness-preserving principle): the reported gains in PTQ and QAT are presented as evidence that smoothness preservation adds value beyond numerical accuracy. However, the manuscript does not include ablations that match the numerical error of the baseline while applying the principle, nor does it quantify how much of the intervention's effect is on forward-pass error versus on the neighborhood metrics. This leaves open the possibility that the gains are downstream of improved numerical fit rather than an independent smoothness mechanism.
- Experimental protocol (throughout results): the abstract and main text report consistent observations and gains from the proxy and principle, yet provide no error bars, variance estimates across random seeds, or detailed ablation tables. This absence makes it impossible to judge the statistical reliability of the claimed separation between smoothness effects and numerical precision effects.
minor comments (2)
- The smoothness proxy and sequence neighborhood definitions would benefit from explicit equations or pseudocode in the main text (rather than only in the appendix or code) to improve clarity and reproducibility.
- Figure captions and axis labels could more explicitly indicate which curves correspond to the smoothness-preserving intervention versus standard quantization baselines.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. The comments highlight important opportunities to strengthen the causal evidence and statistical reporting for our claims about smoothness degradation in extremely quantized LLMs. We address each major comment below and will incorporate the suggested improvements in the revised manuscript.
read point-by-point responses
-
Referee: Section 3 (sequence neighborhood modeling): the assertion that reduced effective token candidates 'directly leads' to a sparser decoding tree and degraded generation quality is supported only by correlational trends across bit-widths. No controlled experiment is described that holds numerical forward-pass accuracy fixed while varying the neighborhood structure or smoothness proxy; without such isolation, the causal claim that smoothness degradation operates independently of numerical precision loss remains unproven.
Authors: We agree that the primary evidence in the current manuscript consists of correlational trends across bit-widths. The sequence neighborhood modeling shows a clear reduction in effective token candidates as smoothness degrades, which provides a mechanistic explanation for the sparser decoding tree. To isolate the effect more rigorously, we will add a controlled experiment in the revision that applies targeted perturbations to the smoothness proxy while monitoring and holding numerical forward-pass accuracy constant, thereby demonstrating the independent contribution to decoding sparsity and generation quality. revision: partial
-
Referee: Section 4 (smoothness-preserving principle): the reported gains in PTQ and QAT are presented as evidence that smoothness preservation adds value beyond numerical accuracy. However, the manuscript does not include ablations that match the numerical error of the baseline while applying the principle, nor does it quantify how much of the intervention's effect is on forward-pass error versus on the neighborhood metrics. This leaves open the possibility that the gains are downstream of improved numerical fit rather than an independent smoothness mechanism.
Authors: We acknowledge that the current presentation does not fully isolate the contributions. The smoothness-preserving principle is designed to act on distributional properties rather than directly minimizing numerical error. In the revision, we will include new ablations that explicitly match numerical error between the baseline and our method (e.g., by adjusting quantization parameters to equalize forward-pass metrics) and will quantify the differential effects on neighborhood metrics versus numerical accuracy to clarify the independent role of smoothness preservation. revision: partial
-
Referee: Experimental protocol (throughout results): the abstract and main text report consistent observations and gains from the proxy and principle, yet provide no error bars, variance estimates across random seeds, or detailed ablation tables. This absence makes it impossible to judge the statistical reliability of the claimed separation between smoothness effects and numerical precision effects.
Authors: We agree that the absence of error bars and variance reporting limits the ability to assess statistical reliability. We will revise all experimental sections to report means and standard deviations across multiple random seeds, include variance estimates for key metrics, and expand the ablation tables with statistical details to support evaluation of the separation between smoothness and numerical effects. revision: yes
Circularity Check
No significant circularity
full rationale
The paper defines a smoothness proxy as an independent diagnostic tool, uses sequence neighborhood modeling to observe token candidate reduction, and then applies a separate smoothness-preserving intervention in PTQ and QAT to obtain gains beyond numerical accuracy. No derivation step reduces a claimed prediction or result to its own inputs by construction, no load-bearing premise rests on self-citation, and no ansatz or uniqueness claim is smuggled in. The central chain remains empirically additive rather than tautological.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The smoothness proxy isolates degradation beyond numerical precision loss
- domain assumption Reduction in effective token candidates directly produces sparser decoding trees and lower generation quality
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We define a smoothness proxy... Cavg = E[||∇x fθ||2] ... sequence neighborhood modeling... rPPL-k ... LGP ... LGR
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
quantized models exhibit a rapid reduction of effective token candidates within the prediction neighborhood, which directly leads to a sparser decoding tree
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A. Amini, S. Gabriel, S. Lin, R. Koncel-Kedziorski, Y . Choi, and H. Hajishirzi. MathQA: Towards interpretable math word problem solving with operation-based formalisms. InProceed- ings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 2357–2367, 2019. URL ...
-
[2]
D. Bahri, H. Mobahi, and Y . Tay. Sharpness-aware minimization improves language model generalization. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (ACL), pages 7360–7371, 2022. URL https://doi.org/10.18653/v1/2022. acl-long.508
-
[3]
P. L. Bartlett, D. J. Foster, and M. J. Telgarsky. Spectrally-normalized margin bounds for neural networks.Advances in neural information processing systems (NeurIPS), 30:6240–6249, 2017. URL https://proceedings.neurips.cc/paper_files/paper/ 2017/hash/b22b257ad0519d4500539da3c8bcf4dd-Abstract.html
work page 2017
-
[4]
Y . Bisk, R. Zellers, J. Gao, Y . Choi, et al. PIQA: Reasoning about physical commonsense in natural language. InProceedings of the AAAI conference on artificial intelligence (AAAI), pages 7432–7439, 2020. URLhttps://doi.org/10.48550/arXiv.1911.11641
-
[5]
A. Chan, Y . Tay, and Y .-S. Ong. What it thinks is important is important: Robustness transfers through input gradients. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 332–341, 2020. URL https://openaccess. thecvf.com/content_CVPR_2020/html/Chan_What_It_Thinks_Is_Important_Is_ Important_Robustness_Transf...
work page 2020
-
[6]
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
C. Clark, K. Lee, M.-W. Chang, T. Kwiatkowski, M. Collins, and K. Toutanova. BoolQ: Exploring the surprising difficulty of natural yes/no questions. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT), pages 2924–2936, 2019. URL https://doi. org/10.186...
-
[7]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
P. Clark, I. Cowhey, O. Etzioni, T. Khot, A. Sabharwal, C. Schoenick, and O. Tafjord. Think you have solved question answering? Try ARC, the AI2 Reasoning Challenge.arXiv preprint arXiv:1803.05457v1, 2018. URLhttps://doi.org/10.48550/arXiv.1803.05457
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1803.05457 2018
-
[8]
and Rosenfeld, Elan and Kolter, J
J. Cohen, E. Rosenfeld, and Z. Kolter. Certified adversarial robustness via randomized smooth- ing. InProceedings of International Conference on Machine Learning (ICML), pages 1310– 1320, 2019. URLhttps://doi.org/10.48550/arXiv.1902.02918
-
[9]
T. Dettmers, R. Svirschevski, V . Egiazarian, D. Kuznedelev, E. Frantar, S. Ashkboos, A. Borzunov, T. Hoefler, and D. Alistarh. SpQR: A sparse-quantized representation for near- lossless LLM weight compression. InProceedings of the Twelfth International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id= Q1u25ahSuy. 11
work page 2024
-
[10]
P. Dong, L. Li, Y . Zhong, D. Du, R. FAN, Y . Chen, Z. Tang, Q. Wang, W. Xue, Y . Guo, et al. STBLLM: Breaking the 1-bit barrier with structured binary llms. InProceedings of the Thirteenth International Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/forum?id=6XUSDvBFkV
work page 2025
-
[11]
P. Dwivedi, B. Islam, and M. Kajal. Smooth gradient loss: a loss function for gradient regularization in deep learning optimization.The Journal of Supercomputing, 81:1–45, 2025. URLhttps://doi.org/10.1007/s11227-025-07954-9
-
[12]
V . Egiazarian, A. Panferov, D. Kuznedelev, E. Frantar, A. Babenko, and D. Alistarh. Ex- treme compression of large language models via additive quantization. InProceedings of International Conference on Machine Learning (ICML), pages 12284–12303, 2024. URL https://proceedings.mlr.press/v235/egiazarian24a.html
work page 2024
-
[13]
N. B. Erichson, O. Azencot, A. Queiruga, L. Hodgkinson, and M. W. Mahoney. Lipschitz recurrent neural networks. InProceedings of the Ninth International Conference on Learning Representations (ICLR), 2021. URLhttps://openreview.net/forum?id=-N7PBXqOUJZ
work page 2021
-
[14]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
E. Frantar, S. Ashkboos, T. Hoefler, and D. Alistarh. GPTQ: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323v2, 2022. URL https://doi.org/10.48550/arXiv.2210.17323
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2210.17323 2022
-
[15]
H. Gouk, E. Frank, B. Pfahringer, and M. J. Cree. Regularisation of neural networks by enforcing lipschitz continuity.Machine Learning, 110:393–416, 2021. URL https://doi. org/10.1007/s10994-020-05929-w
-
[16]
W. Huang, Y . Liu, H. Qin, Y . Li, S. Zhang, X. Liu, M. Magno, and X. Qi. BiLLM: Pushing the limit of post-training quantization for LLMs. InProceedings of International Conference on Machine Learning (ICML), pages 20023–20042, 2024. URL https://proceedings.mlr. press/v235/huang24q.html
work page 2024
-
[17]
G. Khromov and S. P. Singh. Some fundamental aspects about lipschitz continuity of neural networks. InProceedings of the Twelfth International Conference on Learning Representations (ICLR), 2024. URLhttps://openreview.net/forum?id=5jWsW08zUh
work page 2024
-
[18]
H. Kim, G. Papamakarios, and A. Mnih. The lipschitz constant of self-attention. InProceedings of International Conference on Machine Learning (ICML), pages 5562–5571, 2021. URL https://proceedings.mlr.press/v139/kim21i.html
work page 2021
-
[19]
H. Lee, S. Cho, and C. Kim. Indirect gradient matching for adversarial robust distillation. In Proceedings of the Thirteenth International Conference on Learning Representations (ICLR),
-
[20]
URLhttps://doi.org/10.48550/arXiv.2312.03286
-
[21]
T. Li, P. Zhou, Z. He, X. Cheng, and X. Huang. Friendly sharpness-aware minimization. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), pages 5631–5640, 2024. URL https://openaccess.thecvf.com/content/CVPR2024/ html/Li_Friendly_Sharpness-Aware_Minimization_CVPR_2024_paper.html
work page 2024
-
[22]
Z. Li, X. Yan, T. Zhang, H. Qin, D. Xie, J. Tian, L. Kong, Y . Zhang, X. Yang, et al. ARB- LLM: Alternating refined binarizations for large language models. InProceedings of the Thirteenth International Conference on Learning Representations (ICLR), 2025. URL https: //openreview.net/forum?id=ZU8OdDLTts
work page 2025
-
[23]
Z. Liu, B. Oguz, C. Zhao, E. Chang, P. Stock, Y . Mehdad, Y . Shi, R. Krishnamoorthi, and V . Chandra. LLM-QAT: Data-free quantization aware training for large language models. In Findings of the Association for Computational Linguistics (ACL Findings), pages 467–484,
-
[24]
URLhttps://doi.org/10.18653/v1/2024.findings-acl.26
-
[25]
Z. Liu, C. Zhao, I. Fedorov, B. Soran, D. Choudhary, R. Krishnamoorthi, V . Chandra, Y . Tian, and T. Blankevoort. SpinQuant: LLM quantization with learned rotations. InProceedings of the Thirteenth International Conference on Learning Representations (ICLR), 2025. URL https://doi.org/10.48550/arXiv.2405.16406. 12
-
[26]
K. Lyu, Z. Li, and S. Arora. Understanding the generalization benefit of normalization layers: Sharpness reduction.Advances in Neural Information Processing Systems (NeurIPS), 35: 34689–34708, 2022. URL https://papers.nips.cc/paper_files/paper/2022/hash/ dffd1c523512e557f4e75e8309049213-Abstract-Conference.html
work page 2022
-
[27]
S. Ma, H. Wang, L. Ma, L. Wang, W. Wang, S. Huang, L. Dong, R. Wang, J. Xue, and F. Wei. The era of 1-bit LLMs: All large language models are in 1.58 bits.arXiv preprint arXiv:2402.17764, 2024. URLhttps://doi.org/10.48550/arXiv.2402.17764
- [28]
-
[29]
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
T. Mihaylov, P. Clark, T. Khot, and A. Sabharwal. Can a suit of armor conduct electricity? a new dataset for open book question answering. InProceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 2381–2391, 2018. URL https://doi.org/10.48550/arXiv.1809.02789
work page internal anchor Pith review doi:10.48550/arxiv.1809.02789 2018
-
[30]
P. Pauli, A. Koch, J. Berberich, P. Kohler, and F. Allgöwer. Training robust neural networks using lipschitz bounds.IEEE Control Systems Letters, 6:121–126, 2021. URL https://doi. org/10.1109/LCSYS.2021.3050444
-
[31]
X. Qi, J. Wang, Y . Chen, Y . Shi, and L. Zhang. LipsFormer: Introducing lipschitz continuity to vision transformers. InProceedings of the Eleventh International Conference on Learning Representations (ICLR), 2023. URLhttps://openreview.net/forum?id=cHf1DcCwcH3
work page 2023
-
[32]
C. Raffel, N. Shazeer, A. Roberts, K. Lee, S. Narang, M. Matena, Y . Zhou, W. Li, and P. J. Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.The Journal of Machine Learning Research, 21:5485–5551, 2020. URL https://jmlr.org/papers/ volume21/20-074/20-074.pdf
work page 2020
-
[33]
WinoGrande: An Adversarial Winograd Schema Challenge at Scale
K. Sakaguchi, R. L. Bras, C. Bhagavatula, and Y . Choi. Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021. URL https: //doi.org/10.48550/arXiv.1907.10641
work page internal anchor Pith review doi:10.48550/arxiv.1907.10641 2021
-
[34]
W. Shao, M. Chen, Z. Zhang, P. Xu, L. Zhao, Z. Li, K. Zhang, P. Gao, Y . Qiao, and P. Luo. OmniQuant: Omnidirectionally calibrated quantization for large language models. InProceed- ings of the Twelfth International Conference on Learning Representations (ICLR), 2024. URL https://openreview.net/forum?id=8Wuvhh0LYW
work page 2024
-
[35]
Z. Shi, Y . Wang, H. Zhang, J. Z. Kolter, and C.-J. Hsieh. Efficiently com- puting local lipschitz constants of neural networks via bound propagation.Ad- vances in Neural Information Processing Systems (NeurIPS), 35:2350–2364, 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/hash/ 0ff54b4ec4f70b3ae12c8621ca8a49f4-Abstract-Conference.html
work page 2022
-
[36]
S. Singla and S. Feizi. Skew orthogonal convolutions. InProceedings of International Confer- ence on Machine Learning (ICML), pages 9756–9766, 2021. URL https://proceedings. mlr.press/v139/singla21a.html
work page 2021
-
[37]
A. Tseng, J. Chee, Q. Sun, V . Kuleshov, and C. De Sa. QuIP#: Even better LLM quantization with hadamard incoherence and lattice codebooks. InProceedings of International Conference on Machine Learning (ICML), pages 48630–48656, 2024. URL https://proceedings.mlr. press/v235/tseng24a.html
work page 2024
-
[38]
A. Virmaux and K. Scaman. Lipschitz regularity of deep neural networks: analysis and efficient estimation.Advances in Neural Information Processing Systems (NeurIPS), 31:3835–3844, 2018. URL https://proceedings.neurips.cc/paper_files/paper/ 2018/hash/d54e99a6c03704e95e6965532dec148b-Abstract.html
work page 2018
-
[39]
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. Bowman. GLUE: A multi-task benchmark and analysis platform for natural language understanding. InProceedings of the 2018 EMNLP Workshop BlackboxNLP: Analyzing and Interpreting Neural Networks for NLP, pages 353–355, 2018. URLhttps://doi.org/10.18653/v1/W18-5446. 13
-
[40]
R. Wang, Y . Gong, X. Liu, G. Zhao, Z. Yang, B. Guo, Z.-J. Zha, and P. CHENG. Op- timizing large language model training using FP4 quantization. InProceedings of In- ternational Conference on Machine Learning (ICML), pages 62937–62957, 2025. URL https://proceedings.mlr.press/v267
work page 2025
-
[41]
Z. Wang, G. Prakriya, and S. Jha. A quantitative geometric approach to neural-network smoothness.Advances in Neural Information Processing Systems (NeurIPS), 35:34201– 34215, 2022. URL https://proceedings.neurips.cc/paper_files/paper/2022/ hash/dd1322ce23cbbdd9d7ebb0ad1223c27a-Abstract-Conference.html
work page 2022
-
[42]
G. Xiao, J. Lin, M. Seznec, H. Wu, J. Demouth, and S. Han. SmoothQuant: Accurate and efficient post-training quantization for large language models. InProceedings of In- ternational Conference on Machine Learning (ICML), pages 38087–38099, 2023. URL https://proceedings.mlr.press/v202/xiao23c.html
work page 2023
-
[43]
Y . Xu, X. Han, Z. Yang, S. Wang, Q. Zhu, Z. Liu, W. Liu, and W. Che. OneBit: Towards extremely low-bit large language models.Advances in Neural Information Processing System (NeurIPS), 37:66357–66382, 2024. URLhttps://doi.org/10.52202/079017-2122
-
[44]
Y . Xu, S. Ji, Q. Zhu, and W. Che. CRVQ: Channel-relaxed vector quantization for extreme compression of LLMs.Transactions of the Association for Computational Linguistics (TACL), 13:1488–1506, 2025. URLhttps://doi.org/10.1162/TACL.a.45
-
[45]
R. Zellers, A. Holtzman, Y . Bisk, A. Farhadi, and Y . Choi. Hellaswag: Can a machine really finish your sentence? InProceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pages 4791–4800, 2019. URL https://doi.org/10. 18653/v1/P19-1472
work page 2019
-
[46]
Z. Zhou, J. Liang, Y . Song, L. Yu, H. Wang, W. Zhang, Y . Yu, and Z. Zhang. Lipschitz generative adversarial nets. InProceedings of International Conference on Machine Learning (ICML), pages 7584–7593, 2019. URLhttps://proceedings.mlr.press/v97/zhou19c.html. A Appendix A.1 Smoothness in Training Process In this section, we discuss several topics related ...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.