pith. sign in

arxiv: 2606.27884 · v1 · pith:B5AWWGQ3new · submitted 2026-06-26 · 💻 cs.AR · cs.AI

SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures

Pith reviewed 2026-06-29 02:26 UTC · model grok-4.3

classification 💻 cs.AR cs.AI
keywords mixed-precision DNNsspatial architecturesdesign space explorationanalytical cost modelbit-level entropyprecision selectionmulti-precision acceleratorsDNN mapping
0
0 comments X

The pith

SEADA provides an efficient methodology using analytical cost models and bit-level entropy to optimize mixed-precision DNN mappings on multi-precision spatial architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SEADA to solve challenges in assigning precisions across DNN layers while accounting for accuracy sensitivity, architectural heterogeneity, and system-level costs on spatial accelerators. It combines a configurable analytical cost model, a fast mapping tool for near-optimal workload placement, floating-point layer models, and entropy-based precision selection. This setup targets efficient design-space exploration without exhaustive simulations. A sympathetic reader would care because it promises quicker identification of precision assignments that trade off latency, energy, and accuracy. If correct, it supports better co-design of hardware and mixed-precision networks for inference.

Core claim

SEADA comprises (i) a configurable system-level analytical cost model of a multi-precision spatial accelerator architecture; (ii) a fast mapping tool that identifies near-optimal mappings of DNN workloads onto the target integer accelerator; (iii) analytical models for floating-point layers to estimate the overall benefits of mixed-precision execution; and (iv) a per-layer precision selection methodology based on bit-level entropy, enabling efficient assignment across multiple numerical precisions.

What carries the argument

The SEADA four-component framework that integrates an analytical cost model with bit-level entropy for per-layer precision selection and fast mapping.

If this is right

  • Designers gain a framework for rapid design-space exploration of multi-precision architectures.
  • Near-optimal mappings of DNN workloads can be found without exhaustive search or full simulation.
  • Overall benefits of mixed-precision execution, including floating-point layers, become estimable analytically.
  • Precision assignments across multiple numerical formats can be performed efficiently while balancing accuracy and constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The entropy-based selection might generalize as a lightweight proxy for quantization sensitivity in other hardware mapping problems.
  • The analytical models could reduce early-stage reliance on hardware prototypes in accelerator design flows.
  • Combining SEADA with automated search techniques might further improve mapping quality on heterogeneous systems.
  • Validation on additional accelerator topologies beyond spatial ones would test broader applicability.

Load-bearing premise

The configurable system-level analytical cost model and per-layer precision selection based on bit-level entropy accurately capture system-level costs and accuracy sensitivity without needing full simulation or post-hoc adjustments.

What would settle it

Direct comparison of SEADA-predicted latency, energy, accuracy, and selected precisions against cycle-accurate simulations or hardware measurements on multiple DNN models and precision configurations.

read the original abstract

Mixed-precision computation has been introduced in deep neural networks (DNNs) as an effective approach to reduce latency, energy consumption, and memory footprint. However, efficiently mapping mixed-precision networks onto multi-precision spatial architectures poses several challenges. These include determining the appropriate precision for each layer, balancing layer-wise accuracy sensitivity to quantization against architectural heterogeneity and system-level constraints, and accurately estimating the system-level cost of heterogeneous precision assignments. This work presents SEADA, an efficient methodology designed to address these challenges. SEADA comprises: (i) a configurable system-level analytical cost model of a multi-precision spatial accelerator architecture; (ii) a fast mapping tool that identifies near-optimal mappings of DNN workloads onto the target integer accelerator; (iii) analytical models for floating-point layers to estimate the overall benefits of mixed-precision execution; and (iv) a per-layer precision selection methodology based on bit-level entropy, enabling efficient assignment across multiple numerical precisions. SEADA's efficiency provides designers with a robust framework for the design-space exploration of multi-precision architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript presents SEADA, a methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures. It comprises (i) a configurable system-level analytical cost model, (ii) a fast mapping tool for near-optimal workload mappings, (iii) analytical models for floating-point layers, and (iv) per-layer precision selection based on bit-level entropy. The approach aims to determine suitable per-layer precisions while balancing accuracy sensitivity, architectural heterogeneity, and system-level constraints, ultimately providing a framework for design-space exploration.

Significance. If the analytical cost model and entropy-based selection are shown to accurately predict system costs and accuracy impacts without post-hoc simulation adjustments, SEADA could streamline DSE for heterogeneous accelerators by reducing reliance on full-system simulations. The provision of a mapping tool and floating-point models would add practical value for mixed-precision hardware design.

major comments (2)
  1. [Abstract] Abstract: The description of SEADA's four components provides no validation data, error analysis, baseline comparisons, or quantitative results. This is load-bearing for the central claim that the configurable cost model and bit-level entropy selection 'accurately capture system-level costs and accuracy sensitivity without needing full simulation,' as no evidence is supplied to support the efficiency or robustness assertions.
  2. [Abstract] Abstract: The weakest assumption—that the analytical models suffice for system-level estimation and per-layer assignment—remains untested in the provided description. Without experiments demonstrating that the entropy metric correlates with accuracy sensitivity across precisions or that the cost model matches simulated results within acceptable error bounds, the claim of an 'efficient' and 'robust' framework cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments on the abstract. The full manuscript contains the requested validation experiments, error analysis, and comparisons, but we agree the abstract would be strengthened by briefly referencing key quantitative outcomes to better support the efficiency and robustness claims.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The description of SEADA's four components provides no validation data, error analysis, baseline comparisons, or quantitative results. This is load-bearing for the central claim that the configurable cost model and bit-level entropy selection 'accurately capture system-level costs and accuracy sensitivity without needing full simulation,' as no evidence is supplied to support the efficiency or robustness assertions.

    Authors: The abstract is a high-level overview; the manuscript body provides the validation data, error bounds, baseline comparisons, and quantitative results for the cost model and entropy selection. To address the concern directly, we will revise the abstract to include concise references to these results (e.g., model accuracy within X% of simulation and entropy correlation with accuracy sensitivity). revision: yes

  2. Referee: [Abstract] Abstract: The weakest assumption—that the analytical models suffice for system-level estimation and per-layer assignment—remains untested in the provided description. Without experiments demonstrating that the entropy metric correlates with accuracy sensitivity across precisions or that the cost model matches simulated results within acceptable error bounds, the claim of an 'efficient' and 'robust' framework cannot be evaluated.

    Authors: The manuscript includes the requested experiments on entropy-accuracy correlation and cost-model vs. simulation matching. We will update the abstract to note these findings at a summary level so the central claims are supported even in the condensed description. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a methodology with four components: a configurable analytical cost model, a mapping tool, floating-point layer models, and bit-level entropy-based precision selection. No equations, derivations, or load-bearing steps are visible in the provided abstract or description that reduce any claimed result to its inputs by construction, self-definition, or self-citation chains. The claims concern the efficiency of the proposed framework for design-space exploration rather than any fitted prediction or uniqueness theorem that collapses to the input data. The derivation chain is therefore self-contained at the level of a high-level methodology proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities; all fields left empty as details are unavailable.

pith-pipeline@v0.9.1-grok · 5712 in / 995 out tokens · 34340 ms · 2026-06-29T02:26:21.187405+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

50 extracted references · 32 canonical work pages · 5 internal anchors

  1. [1]

    2022.NVIDIA Hopper Architecture In-Depth

    Michael Andersch, Greg Palmer, Ronny Krashinsky, Nick Stam, Vishal Mehta, Gonzalo Brito, and Sridhar Ramaswamy. 2022.NVIDIA Hopper Architecture In-Depth. Technical Report. NVIDIA. https://developer.nvidia.com/blog/nvidia- hopper-architecture-in-depth/

  2. [2]

    Renzo Andri, Enrico Reggiani, and Lukas Cavigelli. 2025. Flex-SFU: Activation Function Acceleration With Nonuniform Piecewise Approximation.IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems44, 11 (2025), 4236–4248. https://doi.org/10.1109/TCAD.2025.3558140

  3. [3]

    McKinstry, Steven K

    Deepika Bablani, Jeffrey L. McKinstry, Steven K. Esser, Rathinakumar Ap- puswamy, and Dharmendra S. Modha. 2024. Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference. InASPLOS’24 Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2). https://doi.org/10.48550/arXiv.2301.13330

  4. [4]

    Andrea Belano, Yvan Tortorella, Angelo Garofalo, Luca Benini, Davide Rossi, and Francesco Conti. 2025. A Flexible Template for Edge Generative AI With High-Accuracy Accelerated Softmax and GELU.IEEE Journal on Emerging and Selected Topics in Circuits and Systems15, 2 (2025), 200–216. https://doi.org/10. 1109/JETCAS.2025.3562734

  5. [5]

    Sami Ben Ali, Silviu-Ioan Filip, and Olivier Sentieys. 2024. A Stochastic Rounding-Enabled Low-Precision Floating-Point MAC for DNN Training. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024. 1–6. https://doi.org/10.23919/DATE58400.2024.10546735

  6. [6]

    Weihan Chen, Peisong Wang, and Jian Cheng. 2021. Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization. In2021 IEEE/CVF International Conference on Computer Vision (ICCV). 5330–5339. https://doi.org/ 10.1109/ICCV48922.2021.00530

  7. [7]

    Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architec- ture for Energy-Efficient Dataflow for Convolutional Neural Networks. In2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 367–379. https://doi.org/10.1109/ISCA.2016.40

  8. [8]

    Tri Dao. 2024. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. In12th International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2307.08691

  9. [9]

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- ageNet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR. 2009.5206848

  10. [10]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRRabs/1810.04805 (2018). https://doi.org/10.48550/arXiv.1810.04805

  11. [11]

    Mahoney, and Kurt Keutzer

    Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. HAWQ-V2: Hessian Aware Trace-Weighted Quantization of Neural Networks. In34th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’20). Article 1555, 12 pages. https://doi.org/doi/abs/10.5555/3495724.3497279

  12. [12]

    Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. 2020. Learned Step Size Quantization. InInternational Conference on Learning Representations. https://doi.org/10.48550/arXiv.1902. 08153

  13. [13]

    Sameh Galal and Mark Horowitz. 2011. Energy-Efficient Floating-Point Unit Design.IEEE Trans. Comput.60, 7 (2011), 913–922. https://doi.org/10.1109/TC. 2010.121

  14. [14]

    findings-emnlp.765/

    Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference.CoRRabs/2103.13630 (2021). https://doi.org/10.48550/arXiv. 2103.13630

  15. [15]

    2016.Mixed-Precision Programming with CUDA 8

    Mark Harris. 2016.Mixed-Precision Programming with CUDA 8. Technical Report. NVIDIA. https://developer.nvidia.com/blog/mixed-precision-programming- cuda-8/

  16. [16]

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90

  17. [17]

    Hinton, Oriol Vinyals, and Jeffrey Dean

    Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowl- edge in a Neural Network.CoRRabs/1503.02531 (2015). https://doi.org/10.48550/ arXiv.1503.02531

  18. [18]

    Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, and Yakun Sophia Shao. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 554–566. https: //doi.org/10.1109/ISCA52012.2021.00050

  19. [19]

    Ahmet Inci, Siri Virupaksha, Aman Jain, Ting-Wu Chin, Venkata Thallam, Ruizhou Ding, and Diana Marculescu. 2023. QUIDAM: A Framework for Quantization-aware DNN Accelerator and Model Co-Exploration.ACM Trans. Embed. Comput. Syst.22, 2, Article 33 (Jan. 2023), 21 pages. https://doi.org/10. 1145/3555807

  20. [20]

    Jung, Arne Symons, Linyan Mei, Marian Verhelst, and Luca Benini

    Victor J.B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, and Luca Benini

  21. [21]

    In2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

    SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators. In2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1–5. https://doi.org/10.1109/AICAS57966.2023. 10168625

  22. [22]

    Dhiraj D. Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Ku- nal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evan- gelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, and Pradeep Dubey. 2019. A Study of BF...

  23. [23]

    Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. InIEEE/ACM International Conference On Computer Aided Design, ICCAD. 44:1–44:9. https: //doi.org/10.1145/3400302.3415639

  24. [24]

    Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer

  25. [25]

    InInternational Conference on Machine Learning

    I-BERT: Integer-only BERT quantization. InInternational Conference on Machine Learning. PMLR, 5506–5518. https://doi.org/10.48550/arXiv.2101.01321

  26. [26]

    Jan Klhufek, Miroslav Safar, Vojtech Mrazek, Zdenek Vasicek, and Lukas Sekanina

  27. [27]

    In2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS)

    Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators. In2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS). 1–6. https://doi.org/10. 1109/DDECS60919.2024.10508920

  28. [28]

    Andrey Kuzmin, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, and Tijmen Blankevoort. 2022. FP8 quantization: the power of the exponent. In36th International Conference on Neural Information Processing Systems (NIPS). Article 1065, 12 pages. https://doi.org/10.52202/068431-1065

  29. [29]

    Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)(Columbus, OH, USA). 754–768. https://doi.org/10.1145/3352460.3358252

  30. [30]

    Fiorin Leandro, Luigi Altamura, and Cristina Silvano. 2025. qGAMMA: A Frame- work for Optimal DNN Mapping on Multi-Precision Accelerators. InEmbedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). Springer International Publishing

  31. [31]

    Sae Kyu Lee, Ankur Agrawal, Joel Silberman, Matthew Ziegler, Mingu Kang, Swa- gath Venkataramani, Nianzheng Cao, Bruce Fleischer, Michael Guillorn, Matthew Cohen, Silvia M. Mueller, Jinwook Oh, Martin Lutz, Jinwook Jung, Siyu Koswatta, Ching Zhou, Vidhi Zalani, Monodeep Kar, James Bonanno, Robert Casatuta, Chia- Yu Chen, Jungwook Choi, Howard Haynie, Alys...

  32. [32]

    Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, and Song Han. 2025. QServe: W4A8KV4 Quantization and System Co- design for Efficient LLM serving.Machine Learning and Systems (MLSys)7 (2025). https://doi.org/10.48550/arXiv.2405.04532

  33. [33]

    Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W

    Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In2019 IEEE International Symposium on Perfor- mance Analysis of Systems and Software (ISPASS). 304–315. https:...

  34. [34]

    Press, Saul A

    William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007.Numerical Recipes: The Art of Scientific Computing(3rd ed.). Cambridge University Press. 12 PRE-PRINT SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures

  35. [35]

    Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text.CoRR abs/1606.05250 (2016). https://doi.org/10.48550/arXiv.1606.05250

  36. [36]

    Marco Ronzani and Cristina Silvano. 2025. FactorFlow: Mapping GEMMs on Spatial Architectures through Adaptive Programming and Greedy Optimization. In30th Asia and South Pacific Design Automation Conference (ASPDAC). 706–712. https://doi.org/10.1145/3658617.3697670

  37. [37]

    Marco Ronzani and Cristina Silvano. 2025. QuickFlow: An Efficient Local Search Method to Map Convolutions on Spatial Architectures. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1–9. https: //doi.org/10.1109/ICCAD66269.2025.11240877

  38. [38]

    Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, En- rico Calore, Sebastiano Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, and...

  39. [39]

    Xiao Sun, Naigang Wang, Chia-yu Chen, Jia-min Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2020. Ultra-low precision 4-bit training of deep neural networks. In34th International Conference on Neural Information Processing Systems (NIPS). Article 152, 12 pages

  40. [40]

    Arne Symons, Linyan Mei, and Marian Verhelst. 2021. LOMA: Fast Auto- Scheduling on DNN Accelerators through Loop-Order-based Memory Allocation. In2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1–4. https://doi.org/10.1109/AICAS51828.2021.9458493

  41. [41]

    Run Wang, Gamze Islamoglu, Andrea Belano, Viviane Potocnik, Francesco Conti, Angelo Garofalo, and Luca Bonini. 2025. VEXP: A Low-Cost RISC- V ISA Extension for Accelerated Softmax Computation in Transformers. In 2025 IEEE 32nd Symposium on Computer Arithmetic (ARITH). 37–44. https: //doi.org/10.1109/ARITH64983.2025.00016

  42. [42]

    B. P. Welford. 1962. Note on a method for calculating corrected sums of squares and products.Technometrics4, 3 (1962), 419–420

  43. [43]

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement De- langue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of...

  44. [44]

    Emer, and Vivienne Sze

    Yannan Nellie Wu, Joel S. Emer, and Vivienne Sze. 2019. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–8. https://doi.org/10.1109/ICCAD45719.2019.8942149

  45. [45]

    Mahoney, and Kurt Keutzer

    Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, and Kurt Keutzer

  46. [46]

    https://doi.org/10.48550/arXiv.2011.10680

    HAWQ-V3: Dyadic Neural Network Quantization.CoRRabs/2011.10680 (2020). https://doi.org/10.48550/arXiv.2011.10680

  47. [47]

    Jiaqi Zhang. 2025. Survey of Quantization-Aware Training (QAT) Applications in Deep Learning Quantization. In2025 International Symposium on Artificial Intelligence and Computational Social Sciences (AICSS). 431–442. https://doi.org/ 10.1145/3776759.3776826

  48. [48]

    Xiaotian Zhao, Ruge Xu, and Xinfei Guo. 2023. Post-training Quantization or Quantization-aware Training? That is the Question. In2023 China Semiconduc- tor Technology International Conference (CSTIC). 1–3. https://doi.org/10.1109/ CSTIC58779.2023.10219214

  49. [49]

    Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou

  50. [50]

    https://doi.org/10.48550/ arXiv.1606.06160 13

    DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.CoRRabs/1606.06160 (2016). https://doi.org/10.48550/ arXiv.1606.06160 13