Nonlinear Bipolar Compensation: Handling Outliers in Post-Training Quantization
Pith reviewed 2026-05-20 20:50 UTC · model grok-4.3
The pith
Nonlinear compensation via logarithmic mapping reduces outlier damage in post-training quantization while keeping computation light.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NBC introduces nonlinear compensation to reduce the effect of outliers, and BLT maps both the quantized input and the quantization error into a transformed space where a simple linear layer performs compensation while preserving efficiency.
What carries the argument
Bipolar Logarithmic Transformation (BLT), a mapping applied jointly to the quantized input and the quantization error that compresses outliers so a subsequent linear layer can perform compensation.
If this is right
- Quantized networks achieve higher accuracy than prior linear-compensation methods on the same bit-widths.
- The added layer remains cheap enough that overall inference speed stays comparable to standard post-training quantization.
- The approach works across multiple quantization algorithms and network architectures without retraining.
- Outlier sensitivity drops, allowing lower bit-widths to remain usable on tasks where they previously failed.
Where Pith is reading between the lines
- The same transformed-space idea might be tested on other post-training compression steps such as pruning or low-rank approximation.
- If the log mapping proves stable, it could be applied once per layer rather than per tensor to further reduce overhead.
- A natural next measurement is whether the recovered accuracy holds when the quantized model is fine-tuned for only a few epochs.
Load-bearing premise
Mapping both input and error through the bipolar log transform will compress outliers enough that the linear compensation layer recovers accuracy without leaving model-specific or bit-width-specific distortions unaddressed.
What would settle it
Run the method on a held-out model and bit-width combination; if top-1 accuracy remains more than a few points below the unquantized baseline while the same linear layer without BLT performs no worse, the central claim fails.
Figures
read the original abstract
Network quantization has emerged as one of the most practical model compression techniques, which significantly reduces a model's memory and compute consumption by mapping floating-point numbers to low-bit representations. However, existing quantization methods typically suffer from the speed-accuracy tradeoff and limited generalization. To address these issues, recent compensation-based methods offer an efficient yet general solution by introducing additional lightweight linear layers into the quantized network. However, the accuracy of these methods suffers from their limited compensation capability and high sensitivity to outliers. In this paper, we propose Nonlinear Bipolar Compensation (NBC), a post-training quantization approach that introduces nonlinear compensation to reduce the effect of outliers. We further design Bipolar Logarithmic Transformation (BLT), which compresses outliers by mapping both the quantized input and the quantization error into a transformed space. A simple linear layer is then applied for compensation in the transformed space, preserving the efficiency of our method. Extensive experiments across various tasks, models, and quantization methods confirm the effectiveness, efficiency, robustness, and generality of our NBC approach.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Nonlinear Bipolar Compensation (NBC) as a post-training quantization technique to mitigate the impact of outliers on accuracy. It introduces the Bipolar Logarithmic Transformation (BLT) that maps both the quantized input activations and the quantization error into a transformed space; a lightweight linear layer then performs compensation in that space before inversion back to the original domain. The authors assert that this nonlinear compensation improves upon prior linear compensation methods while preserving efficiency, with claims of effectiveness, robustness, and generality backed by extensive experiments across tasks, models, and quantization schemes.
Significance. If the central construction proves sound, NBC would supply a practical, low-overhead route to outlier-robust PTQ that retains the efficiency advantages of linear compensation layers. The explicit use of a simple linear layer inside the transformed space is a clear engineering strength. However, the absence of any derivation or bound on the residual error after the nonlinear round-trip limits the ability to assess whether the method systematically reduces error or merely redistributes it across the distribution.
major comments (2)
- [Method / BLT construction] The construction applies a nonlinear BLT to both input and error, followed by a linear layer and inversion. Because BLT is nonlinear, the net operator in the original domain is a magnitude-dependent nonlinear correction. No derivation of this effective operator or bound on the residual error (especially for the non-outlier mass of the distribution) is supplied, leaving the claim that the method reliably reduces rather than redistributes quantization error unanalyzed. This analysis is load-bearing for the robustness and generality assertions.
- [Abstract / Experiments] The abstract states that extensive experiments confirm effectiveness, efficiency, robustness, and generality, yet the provided text contains no quantitative accuracy deltas, error bars, dataset specifications, or ablation results. Without these concrete numbers it is impossible to verify whether the claimed improvements hold across bit-widths and models or whether they are driven by the nonlinear compensation itself.
minor comments (2)
- Clarify the precise functional form of the Bipolar Logarithmic Transformation (including any scaling or offset parameters) and the exact inversion step so that readers can reproduce the nonlinear composition.
- Add a short complexity analysis (FLOPs or latency overhead of the added linear layer) to substantiate the efficiency claim relative to prior compensation methods.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments. We address each major comment below and indicate revisions to be incorporated in the next version of the manuscript.
read point-by-point responses
-
Referee: [Method / BLT construction] The construction applies a nonlinear BLT to both input and error, followed by a linear layer and inversion. Because BLT is nonlinear, the net operator in the original domain is a magnitude-dependent nonlinear correction. No derivation of this effective operator or bound on the residual error (especially for the non-outlier mass of the distribution) is supplied, leaving the claim that the method reliably reduces rather than redistributes quantization error unanalyzed. This analysis is load-bearing for the robustness and generality assertions.
Authors: We acknowledge that the current manuscript does not supply a closed-form derivation of the composed nonlinear operator in the original domain or theoretical bounds on the residual error after the round-trip transformation. The method was developed from the empirical observation that logarithmic compression allows a linear compensator to more effectively attenuate large-magnitude outliers while leaving the bulk distribution largely unaffected. In the revised manuscript we will add a dedicated analysis subsection that (i) derives the effective correction operator obtained by composing BLT, the linear layer, and the inverse BLT, and (ii) reports the empirical distribution of residual quantization error on both outlier and non-outlier activations across representative layers, thereby providing quantitative support for the claim that error is reduced rather than merely redistributed. revision: yes
-
Referee: [Abstract / Experiments] The abstract states that extensive experiments confirm effectiveness, efficiency, robustness, and generality, yet the provided text contains no quantitative accuracy deltas, error bars, dataset specifications, or ablation results. Without these concrete numbers it is impossible to verify whether the claimed improvements hold across bit-widths and models or whether they are driven by the nonlinear compensation itself.
Authors: We agree that the abstract would be more informative if it contained concrete performance numbers. In the revised version we will shorten the general claims and insert a concise statement of the principal empirical results, for example the average top-1 accuracy gain on ImageNet for ResNet-50 and ViT-B/16 under W4A4 quantization relative to the strongest linear-compensation baseline, together with a brief reference to the evaluation protocol. revision: yes
Circularity Check
No significant circularity in the NBC/BLT construction
full rationale
The paper presents NBC and BLT as new algorithmic constructions for post-training quantization compensation. No equations, derivations, or self-citations are exhibited that reduce any claimed prediction or result to a fitted parameter, self-definition, or prior author work by construction. The approach is framed as an empirical method whose effectiveness is demonstrated through experiments across models and bit-widths, leaving the central claims independent of any circular reduction.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Bipolar Logarithmic Transformation (BLT) ... f(x) = log2(x)+N+1 for x>2^{-N}, 2^N x for |x|<=2^{-N}, -log2(-x)-N-1 for x<-2^{-N}
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanJ_uniquely_calibrated_via_higher_derivative unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ynbc = yq + f^{-1}(W f(xq) + b)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[2]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021
work page 2021
-
[3]
Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learners.Advances in neural information processing systems, 33:1877–1901, 2020
work page 1901
-
[4]
Bert: Pre-training of deep bidirectional transformers for language understanding
Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186, 2019
work page 2019
-
[5]
Learning transferable visual models from natural language supervision
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021
work page 2021
-
[6]
A survey of model compression and acceleration for deep neural networks.ArXiv, abs/1710.09282, 2017
Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. A survey of model compression and acceleration for deep neural networks.arXiv preprint arXiv:1710.09282, 2017
-
[7]
A survey of quantization methods for efficient neural network inference
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer. A survey of quantization methods for efficient neural network inference. InLow-power computer vision, pages 291–326. Chapman and Hall/CRC, 2022
work page 2022
-
[8]
Guang Liang, Xinyao Liu, and Jianxin Wu. Gplq: A general, practical, and lightning qat method for vision transformers.arXiv preprint arXiv:2506.11784, 2025
-
[9]
Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, and Guodong Guo. Q-vit: Accurate and fully quantized low-bit vision transformer.Advances in neural information processing systems, 35:34451–34463, 2022
work page 2022
-
[10]
Learned step size quantization
Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dhar- mendra S Modha. Learned step size quantization.arXiv preprint arXiv:1902.08153, 2019. 17
-
[11]
Mengzhao Chen, Wenqi Shao, Peng Xu, Jiahao Wang, Peng Gao, Kaipeng Zhang, and Ping Luo. Efficientqat: Efficient quantization-aware training for large language models.arXiv preprint arXiv:2407.11062, 2024
-
[12]
Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization
Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, and Guangyu Sun. Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization. InEuropean conference on computer vision, pages 191–207. Springer, 2022
work page 2022
-
[13]
Repq-vit: Scale reparameterization for post-training quantization of vision transformers
Zhikai Li, Junrui Xiao, Lianwei Yang, and Qingyi Gu. Repq-vit: Scale reparameterization for post-training quantization of vision transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17227–17236, 2023
work page 2023
-
[14]
Instance-aware group quantization for vision transformers
Jaehyeon Moon, Dohyung Kim, Junyong Cheon, and Bumsub Ham. Instance-aware group quantization for vision transformers. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 16132–16141, 2024
work page 2024
-
[15]
Yunshan Zhong, You Huang, Jiawei Hu, Yuxin Zhang, and Rongrong Ji. Towards accurate post-training quantization of vision transformers via error reduction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(4):2676–2692, 2025
work page 2025
-
[16]
Up or down? adaptive rounding for post-training quantization
Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort. Up or down? adaptive rounding for post-training quantization. InInternational conference on machine learning, pages 7197–7206. PMLR, 2020
work page 2020
-
[17]
Brecq: Pushing the limit of post-training quantization by block reconstruc- tion
Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv preprint arXiv:2102.05426, 2021
-
[18]
Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization
Xiuying Wei, Ruihao Gong, Yuhang Li, Xianglong Liu, and Fengwei Yu. Qdrop: Ran- domly dropping quantization for extremely low-bit post-training quantization.arXiv preprint arXiv:2203.05740, 2022
-
[19]
Zhuguanyu Wu, Jiayi Zhang, Jiaxin Chen, Jinyang Guo, Di Huang, and Yunhong Wang. Aphq- vit: Post-training quantization with average perturbation hessian based reconstruction for vision transformers. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 9686–9695, 2025
work page 2025
-
[20]
Minghao Fu, Hao Yu, Jie Shao, Junjie Zhou, Ke Zhu, and Jianxin Wu. Quantization without tears. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 4462–4472, June 2025
work page 2025
-
[21]
Ningyuan Tang, Minghao Fu, Hao Yu, and Jianxin Wu. Qwt-v2: Practical, effective and efficient post-training quantization.arXiv preprint arXiv:2505.20932, 2025
-
[22]
Tim Dettmers, Mike Lewis, Younes Belkada, and Luke Zettlemoyer. Gpt3. int8 (): 8-bit matrix multiplication for transformers at scale.Advances in neural information processing systems, 35: 30318–30332, 2022
work page 2022
-
[23]
Vision Transformers Need Registers
Timothée Darcet, Maxime Oquab, Julien Mairal, and Piotr Bojanowski. Vision transformers need registers.arXiv preprint arXiv:2309.16588, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[24]
Smoothquant: Accurate and efficient post-training quantization for large language models
Guangxuan Xiao, Ji Lin, Mickael Seznec, Hao Wu, Julien Demouth, and Song Han. Smoothquant: Accurate and efficient post-training quantization for large language models. InInternational conference on machine learning, pages 38087–38099. PMLR, 2023
work page 2023
-
[25]
Adalog: Post- training quantization for vision transformers with adaptive logarithm quantizer
Zhuguanyu Wu, Jiaxin Chen, Hanwen Zhong, Di Huang, and Yunhong Wang. Adalog: Post- training quantization for vision transformers with adaptive logarithm quantizer. InEuropean Conference on Computer Vision, pages 411–427. Springer, 2024
work page 2024
-
[26]
Q-dit: Accurate post-training quantization for diffusion transformers
Lei Chen, Yuan Meng, Chen Tang, Xinzhu Ma, Jingyan Jiang, Xin Wang, Zhi Wang, and Wenwu Zhu. Q-dit: Accurate post-training quantization for diffusion transformers. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 28306–28315, 2025. 18
work page 2025
-
[27]
Haokun Lin, Haobo Xu, Yichen Wu, Jingzhi Cui, Yingtao Zhang, Linzhan Mou, Linqi Song, Zhenan Sun, and Ying Wei. Duquant: Distributing outliers via dual transformation makes stronger quantized llms.Advances in Neural Information Processing Systems, 37:87766–87800, 2024
work page 2024
-
[28]
Tao Jiang, Yucheng Jiang, Xiwen Yao, Gong Cheng, and Junwei Han. Uq-vit: Harmonizing extreme activations with hardware-friendly uniform quantization in vision transformers. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 22354–22362, 2026
work page 2026
-
[29]
Lianwei Yang, Haisong Gong, Haokun Lin, Yichen Wu, Zhenan Sun, and Qingyi Gu. Dopq-vit: Towards distribution-friendly and outlier-aware post-training quantization for vision transform- ers.arXiv preprint arXiv:2408.03291, 2024
-
[30]
Zhuguanyu Wu, Shihe Wang, Jiayi Zhang, Jiaxin Chen, and Yunhong Wang. Fima-q: Post- training quantization for vision transformers by fisher information matrix approximation. In Proceedings of the Computer Vision and Pattern Recognition Conference, pages 14891–14900, 2025
work page 2025
-
[31]
Notes on the use of data transformations.Practical assessment, research, and evaluation, 8(1), 2002
Jason Osborne. Notes on the use of data transformations.Practical assessment, research, and evaluation, 8(1), 2002
work page 2002
-
[32]
Imagenet: A large- scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
work page 2009
-
[33]
Training data-efficient image transformers & distillation through attention
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021
work page 2021
-
[34]
GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers
Elias Frantar, Saleh Ashkboos, Torsten Hoefler, and Dan Alistarh. Gptq: Accurate post-training quantization for generative pre-trained transformers.arXiv preprint arXiv:2210.17323, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[35]
Attention is all you need.Advances in neural information processing systems, 30, 2017
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017
work page 2017
-
[36]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
work page 2023
-
[37]
Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilibrium.Advances in neural information processing systems, 30, 2017
work page 2017
-
[38]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models.arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[39]
Ji Lin, Jiaming Tang, Haotian Tang, Shang Yang, Wei-Ming Chen, Wei-Chen Wang, Guangxuan Xiao, Xingyu Dang, Chuang Gan, and Song Han. Awq: Activation-aware weight quantization for on-device llm compression and acceleration.Proceedings of machine learning and systems, 6:87–100, 2024
work page 2024
-
[40]
Pointer Sentinel Mixture Models
Stephen Merity, Caiming Xiong, James Bradbury, and Richard Socher. Pointer sentinel mixture models.arXiv preprint arXiv:1609.07843, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[41]
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.Journal of machine learning research, 21(140):1–67, 2020
work page 2020
-
[42]
Piqa: Reasoning about phys- ical commonsense in natural language
Yonatan Bisk, Rowan Zellers, Jianfeng Gao, Yejin Choi, et al. Piqa: Reasoning about phys- ical commonsense in natural language. InProceedings of the AAAI conference on artificial intelligence, volume 34, pages 7432–7439, 2020. 19
work page 2020
-
[43]
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark, Isaac Cowhey, Oren Etzioni, Tushar Khot, Ashish Sabharwal, Carissa Schoenick, and Oyvind Tafjord. Think you have solved question answering? try arc, the ai2 reasoning challenge.arXiv preprint arXiv:1803.05457, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[44]
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark, Kenton Lee, Ming-Wei Chang, Tom Kwiatkowski, Michael Collins, and Kristina Toutanova. Boolq: Exploring the surprising difficulty of natural yes/no questions. arXiv preprint arXiv:1905.10044, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[45]
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers, Ari Holtzman, Yonatan Bisk, Ali Farhadi, and Yejin Choi. Hellaswag: Can a machine really finish your sentence?arXiv preprint arXiv:1905.07830, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1905
-
[46]
Keisuke Sakaguchi, Ronan Le Bras, Chandra Bhagavatula, and Yejin Choi. Winogrande: An adversarial winograd schema challenge at scale.Communications of the ACM, 64(9):99–106, 2021
work page 2021
-
[47]
https://developer.nvidia.com/ tensorrt
NVIDIA Corporation.NVIDIA TensorRT, 2024. https://developer.nvidia.com/ tensorrt
work page 2024
-
[48]
Microsoft.ONNX Runtime, 2024.https://onnxruntime.ai/
work page 2024
-
[49]
{TVM}: An automated {End-to-End} optimizing compiler for deep learning
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, et al. {TVM}: An automated {End-to-End} optimizing compiler for deep learning. In13th USENIX symposium on operating systems design and implementation (OSDI 18), pages 578–594, 2018
work page 2018
-
[50]
https://github.com/NVIDIA/ FasterTransformer
NVIDIA Corporation.NVIDIA FasterTransformer, 2024. https://github.com/NVIDIA/ FasterTransformer
work page 2024
-
[51]
Marlin: Mixed- precision auto-regressive parallel inference on large language models
Elias Frantar, Roberto L Castro, Jiale Chen, Torsten Hoefler, and Dan Alistarh. Marlin: Mixed- precision auto-regressive parallel inference on large language models. InProceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming, pages 239–251, 2025
work page 2025
-
[52]
Yilong Zhao, Chien-Yu Lin, Kan Zhu, Zihao Ye, Lequn Chen, Size Zheng, Luis Ceze, Arvind Krishnamurthy, Tianqi Chen, and Baris Kasikci. Atom: Low-bit quantization for efficient and accurate llm serving.Proceedings of Machine Learning and Systems, 6:196–209, 2024
work page 2024
-
[53]
Fully quantized network for object detection
Rundong Li, Yan Wang, Feng Liang, Hongwei Qin, Junjie Yan, and Rui Fan. Fully quantized network for object detection. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2810–2819, 2019
work page 2019
-
[54]
Guo-Hua Wang, Yifan Ge, and Jianxin Wu. Distilling knowledge by mimicking features.IEEE Transactions on Pattern Analysis and Machine Intelligence, 44(11):8183–8195, 2021
work page 2021
-
[55]
Microsoft coco: Common objects in context
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C Lawrence Zitnick. Microsoft coco: Common objects in context. InEuropean conference on computer vision, pages 740–755. Springer, 2014
work page 2014
-
[56]
Kaiming He, Georgia Gkioxari, Piotr Dollár, and Ross Girshick. Mask r-cnn. InProceedings of the IEEE international conference on computer vision, pages 2961–2969, 2017
work page 2017
-
[57]
Cascade r-cnn: Delving into high quality object detection
Zhaowei Cai and Nuno Vasconcelos. Cascade r-cnn: Delving into high quality object detection. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 6154–6162, 2018
work page 2018
-
[58]
Understanding the difficulty of training deep feedfor- ward neural networks
Xavier Glorot and Yoshua Bengio. Understanding the difficulty of training deep feedfor- ward neural networks. InProceedings of the thirteenth international conference on artificial intelligence and statistics, pages 249–256. JMLR Workshop and Conference Proceedings, 2010
work page 2010
-
[59]
Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting.The journal of machine learning research, 15(1):1929–1958, 2014
work page 1929
-
[60]
Treasures in discarded weights for llm quantization
Hao Yu, Yang Zhou, Bohua Chen, Zelan Yang, Shen Li, Yong Li, and Jianxin Wu. Treasures in discarded weights for llm quantization. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 22218–22226, 2025. 20
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.