SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures

Cristina Silvano; Leandro Fiorin; Marco Ronzani

arxiv: 2606.27884 · v1 · pith:B5AWWGQ3new · submitted 2026-06-26 · 💻 cs.AR · cs.AI

SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures

Leandro Fiorin , Marco Ronzani , Cristina Silvano This is my paper

Pith reviewed 2026-06-29 02:26 UTC · model grok-4.3

classification 💻 cs.AR cs.AI

keywords mixed-precision DNNsspatial architecturesdesign space explorationanalytical cost modelbit-level entropyprecision selectionmulti-precision acceleratorsDNN mapping

0 comments

The pith

SEADA provides an efficient methodology using analytical cost models and bit-level entropy to optimize mixed-precision DNN mappings on multi-precision spatial architectures.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SEADA to solve challenges in assigning precisions across DNN layers while accounting for accuracy sensitivity, architectural heterogeneity, and system-level costs on spatial accelerators. It combines a configurable analytical cost model, a fast mapping tool for near-optimal workload placement, floating-point layer models, and entropy-based precision selection. This setup targets efficient design-space exploration without exhaustive simulations. A sympathetic reader would care because it promises quicker identification of precision assignments that trade off latency, energy, and accuracy. If correct, it supports better co-design of hardware and mixed-precision networks for inference.

Core claim

SEADA comprises (i) a configurable system-level analytical cost model of a multi-precision spatial accelerator architecture; (ii) a fast mapping tool that identifies near-optimal mappings of DNN workloads onto the target integer accelerator; (iii) analytical models for floating-point layers to estimate the overall benefits of mixed-precision execution; and (iv) a per-layer precision selection methodology based on bit-level entropy, enabling efficient assignment across multiple numerical precisions.

What carries the argument

The SEADA four-component framework that integrates an analytical cost model with bit-level entropy for per-layer precision selection and fast mapping.

If this is right

Designers gain a framework for rapid design-space exploration of multi-precision architectures.
Near-optimal mappings of DNN workloads can be found without exhaustive search or full simulation.
Overall benefits of mixed-precision execution, including floating-point layers, become estimable analytically.
Precision assignments across multiple numerical formats can be performed efficiently while balancing accuracy and constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The entropy-based selection might generalize as a lightweight proxy for quantization sensitivity in other hardware mapping problems.
The analytical models could reduce early-stage reliance on hardware prototypes in accelerator design flows.
Combining SEADA with automated search techniques might further improve mapping quality on heterogeneous systems.
Validation on additional accelerator topologies beyond spatial ones would test broader applicability.

Load-bearing premise

The configurable system-level analytical cost model and per-layer precision selection based on bit-level entropy accurately capture system-level costs and accuracy sensitivity without needing full simulation or post-hoc adjustments.

What would settle it

Direct comparison of SEADA-predicted latency, energy, accuracy, and selected precisions against cycle-accurate simulations or hardware measurements on multiple DNN models and precision configurations.

read the original abstract

Mixed-precision computation has been introduced in deep neural networks (DNNs) as an effective approach to reduce latency, energy consumption, and memory footprint. However, efficiently mapping mixed-precision networks onto multi-precision spatial architectures poses several challenges. These include determining the appropriate precision for each layer, balancing layer-wise accuracy sensitivity to quantization against architectural heterogeneity and system-level constraints, and accurately estimating the system-level cost of heterogeneous precision assignments. This work presents SEADA, an efficient methodology designed to address these challenges. SEADA comprises: (i) a configurable system-level analytical cost model of a multi-precision spatial accelerator architecture; (ii) a fast mapping tool that identifies near-optimal mappings of DNN workloads onto the target integer accelerator; (iii) analytical models for floating-point layers to estimate the overall benefits of mixed-precision execution; and (iv) a per-layer precision selection methodology based on bit-level entropy, enabling efficient assignment across multiple numerical precisions. SEADA's efficiency provides designers with a robust framework for the design-space exploration of multi-precision architectures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SEADA combines an analytical cost model, mapping tool, FP models, and entropy-based selection for mixed-precision DNNs, but the abstract shows no validation or comparisons to support the efficiency claims.

read the letter

SEADA is a practical-sounding methodology for mixed-precision DNN optimization on spatial architectures, but it needs validation data to back up the efficiency claims.

The paper introduces SEADA with four main components: a configurable system-level analytical cost model for multi-precision spatial accelerators, a fast mapping tool for DNN workloads, analytical models for floating-point layers, and a per-layer precision selection based on bit-level entropy. This combination aims to handle the challenges of choosing precisions while balancing accuracy sensitivity, architectural heterogeneity, and system-level costs.

What the paper does well is outlining a framework that integrates these elements for design-space exploration. The use of entropy for precision assignment is a sensible way to make the selection efficient without full simulations.

The main soft spot is the absence of any validation. The abstract describes the methodology but provides no data on how well the models predict actual costs, no accuracy results for the selected precisions, and no comparisons to existing methods. This leaves the claim that it provides a robust framework untested in the provided text. The assumption that the analytical models accurately capture system costs without post-hoc adjustments is central but unsupported here.

If the full manuscript includes experiments and reproducible results, that would strengthen it considerably. Based on the abstract alone, the work is more of a tool description than a validated contribution.

This paper is aimed at researchers and designers working on multi-precision architectures for deep learning accelerators. Someone in that niche might find the mapping tool and entropy method useful as a starting point.

It deserves a serious referee because the problem it addresses is real in the field, and the proposed components are logically connected to the challenges.

Recommendation: Yes, it should go to peer review so the full details and any experiments can be assessed.

Referee Report

2 major / 0 minor

Summary. The manuscript presents SEADA, a methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures. It comprises (i) a configurable system-level analytical cost model, (ii) a fast mapping tool for near-optimal workload mappings, (iii) analytical models for floating-point layers, and (iv) per-layer precision selection based on bit-level entropy. The approach aims to determine suitable per-layer precisions while balancing accuracy sensitivity, architectural heterogeneity, and system-level constraints, ultimately providing a framework for design-space exploration.

Significance. If the analytical cost model and entropy-based selection are shown to accurately predict system costs and accuracy impacts without post-hoc simulation adjustments, SEADA could streamline DSE for heterogeneous accelerators by reducing reliance on full-system simulations. The provision of a mapping tool and floating-point models would add practical value for mixed-precision hardware design.

major comments (2)

[Abstract] Abstract: The description of SEADA's four components provides no validation data, error analysis, baseline comparisons, or quantitative results. This is load-bearing for the central claim that the configurable cost model and bit-level entropy selection 'accurately capture system-level costs and accuracy sensitivity without needing full simulation,' as no evidence is supplied to support the efficiency or robustness assertions.
[Abstract] Abstract: The weakest assumption—that the analytical models suffice for system-level estimation and per-layer assignment—remains untested in the provided description. Without experiments demonstrating that the entropy metric correlates with accuracy sensitivity across precisions or that the cost model matches simulated results within acceptable error bounds, the claim of an 'efficient' and 'robust' framework cannot be evaluated.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their comments on the abstract. The full manuscript contains the requested validation experiments, error analysis, and comparisons, but we agree the abstract would be strengthened by briefly referencing key quantitative outcomes to better support the efficiency and robustness claims.

read point-by-point responses

Referee: [Abstract] Abstract: The description of SEADA's four components provides no validation data, error analysis, baseline comparisons, or quantitative results. This is load-bearing for the central claim that the configurable cost model and bit-level entropy selection 'accurately capture system-level costs and accuracy sensitivity without needing full simulation,' as no evidence is supplied to support the efficiency or robustness assertions.

Authors: The abstract is a high-level overview; the manuscript body provides the validation data, error bounds, baseline comparisons, and quantitative results for the cost model and entropy selection. To address the concern directly, we will revise the abstract to include concise references to these results (e.g., model accuracy within X% of simulation and entropy correlation with accuracy sensitivity). revision: yes
Referee: [Abstract] Abstract: The weakest assumption—that the analytical models suffice for system-level estimation and per-layer assignment—remains untested in the provided description. Without experiments demonstrating that the entropy metric correlates with accuracy sensitivity across precisions or that the cost model matches simulated results within acceptable error bounds, the claim of an 'efficient' and 'robust' framework cannot be evaluated.

Authors: The manuscript includes the requested experiments on entropy-accuracy correlation and cost-model vs. simulation matching. We will update the abstract to note these findings at a summary level so the central claims are supported even in the condensed description. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper describes a methodology with four components: a configurable analytical cost model, a mapping tool, floating-point layer models, and bit-level entropy-based precision selection. No equations, derivations, or load-bearing steps are visible in the provided abstract or description that reduce any claimed result to its inputs by construction, self-definition, or self-citation chains. The claims concern the efficiency of the proposed framework for design-space exploration rather than any fitted prediction or uniqueness theorem that collapses to the input data. The derivation chain is therefore self-contained at the level of a high-level methodology proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no information on free parameters, axioms, or invented entities; all fields left empty as details are unavailable.

pith-pipeline@v0.9.1-grok · 5712 in / 995 out tokens · 34340 ms · 2026-06-29T02:26:21.187405+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

50 extracted references · 32 canonical work pages · 5 internal anchors

[1]

2022.NVIDIA Hopper Architecture In-Depth

Michael Andersch, Greg Palmer, Ronny Krashinsky, Nick Stam, Vishal Mehta, Gonzalo Brito, and Sridhar Ramaswamy. 2022.NVIDIA Hopper Architecture In-Depth. Technical Report. NVIDIA. https://developer.nvidia.com/blog/nvidia- hopper-architecture-in-depth/

2022
[2]

Renzo Andri, Enrico Reggiani, and Lukas Cavigelli. 2025. Flex-SFU: Activation Function Acceleration With Nonuniform Piecewise Approximation.IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems44, 11 (2025), 4236–4248. https://doi.org/10.1109/TCAD.2025.3558140

work page doi:10.1109/tcad.2025.3558140 2025
[3]

McKinstry, Steven K

Deepika Bablani, Jeffrey L. McKinstry, Steven K. Esser, Rathinakumar Ap- puswamy, and Dharmendra S. Modha. 2024. Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference. InASPLOS’24 Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2). https://doi.org/10.48550/arXiv.2301.13330

work page doi:10.48550/arxiv.2301.13330 2024
[4]

Andrea Belano, Yvan Tortorella, Angelo Garofalo, Luca Benini, Davide Rossi, and Francesco Conti. 2025. A Flexible Template for Edge Generative AI With High-Accuracy Accelerated Softmax and GELU.IEEE Journal on Emerging and Selected Topics in Circuits and Systems15, 2 (2025), 200–216. https://doi.org/10. 1109/JETCAS.2025.3562734

arXiv 2025
[5]

Sami Ben Ali, Silviu-Ioan Filip, and Olivier Sentieys. 2024. A Stochastic Rounding-Enabled Low-Precision Floating-Point MAC for DNN Training. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024. 1–6. https://doi.org/10.23919/DATE58400.2024.10546735

work page doi:10.23919/date58400.2024.10546735 2024
[6]

Weihan Chen, Peisong Wang, and Jian Cheng. 2021. Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization. In2021 IEEE/CVF International Conference on Computer Vision (ICCV). 5330–5339. https://doi.org/ 10.1109/ICCV48922.2021.00530

work page doi:10.1109/iccv48922.2021.00530 2021
[7]

Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architec- ture for Energy-Efficient Dataflow for Convolutional Neural Networks. In2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 367–379. https://doi.org/10.1109/ISCA.2016.40

work page doi:10.1109/isca.2016.40 2016
[8]

Tri Dao. 2024. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. In12th International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2307.08691

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.08691 2024
[9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- ageNet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR. 2009.5206848

work page doi:10.1109/cvpr 2009
[10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRRabs/1810.04805 (2018). https://doi.org/10.48550/arXiv.1810.04805

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1810.04805 2018
[11]

Mahoney, and Kurt Keutzer

Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. HAWQ-V2: Hessian Aware Trace-Weighted Quantization of Neural Networks. In34th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’20). Article 1555, 12 pages. https://doi.org/doi/abs/10.5555/3495724.3497279

work page doi:10.5555/3495724.3497279 2020
[12]

Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. 2020. Learned Step Size Quantization. InInternational Conference on Learning Representations. https://doi.org/10.48550/arXiv.1902. 08153

work page doi:10.48550/arxiv.1902 2020
[13]

Sameh Galal and Mark Horowitz. 2011. Energy-Efficient Floating-Point Unit Design.IEEE Trans. Comput.60, 7 (2011), 913–922. https://doi.org/10.1109/TC. 2010.121

work page doi:10.1109/tc 2011
[14]

findings-emnlp.765/

Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference.CoRRabs/2103.13630 (2021). https://doi.org/10.48550/arXiv. 2103.13630

work page internal anchor Pith review doi:10.48550/arxiv 2021
[15]

2016.Mixed-Precision Programming with CUDA 8

Mark Harris. 2016.Mixed-Precision Programming with CUDA 8. Technical Report. NVIDIA. https://developer.nvidia.com/blog/mixed-precision-programming- cuda-8/

2016
[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016
[17]

Hinton, Oriol Vinyals, and Jeffrey Dean

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowl- edge in a Neural Network.CoRRabs/1503.02531 (2015). https://doi.org/10.48550/ arXiv.1503.02531

Pith/arXiv arXiv 2015
[18]

Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, and Yakun Sophia Shao. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 554–566. https: //doi.org/10.1109/ISCA52012.2021.00050

work page doi:10.1109/isca52012.2021.00050 2021
[19]

Ahmet Inci, Siri Virupaksha, Aman Jain, Ting-Wu Chin, Venkata Thallam, Ruizhou Ding, and Diana Marculescu. 2023. QUIDAM: A Framework for Quantization-aware DNN Accelerator and Model Co-Exploration.ACM Trans. Embed. Comput. Syst.22, 2, Article 33 (Jan. 2023), 21 pages. https://doi.org/10. 1145/3555807

2023
[20]

Jung, Arne Symons, Linyan Mei, Marian Verhelst, and Luca Benini

Victor J.B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, and Luca Benini
[21]

In2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators. In2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1–5. https://doi.org/10.1109/AICAS57966.2023. 10168625

work page doi:10.1109/aicas57966.2023 2023
[22]

Dhiraj D. Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Ku- nal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evan- gelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, and Pradeep Dubey. 2019. A Study of BF...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.12322 2019
[23]

Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. InIEEE/ACM International Conference On Computer Aided Design, ICCAD. 44:1–44:9. https: //doi.org/10.1145/3400302.3415639

work page doi:10.1145/3400302.3415639 2020
[24]

Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer
[25]

InInternational Conference on Machine Learning

I-BERT: Integer-only BERT quantization. InInternational Conference on Machine Learning. PMLR, 5506–5518. https://doi.org/10.48550/arXiv.2101.01321

work page doi:10.48550/arxiv.2101.01321
[26]

Jan Klhufek, Miroslav Safar, Vojtech Mrazek, Zdenek Vasicek, and Lukas Sekanina
[27]

In2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS)

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators. In2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS). 1–6. https://doi.org/10. 1109/DDECS60919.2024.10508920

arXiv 2024
[28]

Andrey Kuzmin, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, and Tijmen Blankevoort. 2022. FP8 quantization: the power of the exponent. In36th International Conference on Neural Information Processing Systems (NIPS). Article 1065, 12 pages. https://doi.org/10.52202/068431-1065

work page doi:10.52202/068431-1065 2022
[29]

Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)(Columbus, OH, USA). 754–768. https://doi.org/10.1145/3352460.3358252

work page doi:10.1145/3352460.3358252 2019
[30]

Fiorin Leandro, Luigi Altamura, and Cristina Silvano. 2025. qGAMMA: A Frame- work for Optimal DNN Mapping on Multi-Precision Accelerators. InEmbedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). Springer International Publishing

2025
[31]

Sae Kyu Lee, Ankur Agrawal, Joel Silberman, Matthew Ziegler, Mingu Kang, Swa- gath Venkataramani, Nianzheng Cao, Bruce Fleischer, Michael Guillorn, Matthew Cohen, Silvia M. Mueller, Jinwook Oh, Martin Lutz, Jinwook Jung, Siyu Koswatta, Ching Zhou, Vidhi Zalani, Monodeep Kar, James Bonanno, Robert Casatuta, Chia- Yu Chen, Jungwook Choi, Howard Haynie, Alys...

work page doi:10.1109/jssc.2021.3120113 2022
[32]

Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, and Song Han. 2025. QServe: W4A8KV4 Quantization and System Co- design for Efficient LLM serving.Machine Learning and Systems (MLSys)7 (2025). https://doi.org/10.48550/arXiv.2405.04532

work page doi:10.48550/arxiv.2405.04532 2025
[33]

Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W

Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In2019 IEEE International Symposium on Perfor- mance Analysis of Systems and Software (ISPASS). 304–315. https:...

arXiv 2019
[34]

Press, Saul A

William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007.Numerical Recipes: The Art of Scientific Computing(3rd ed.). Cambridge University Press. 12 PRE-PRINT SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures

2007
[35]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text.CoRR abs/1606.05250 (2016). https://doi.org/10.48550/arXiv.1606.05250

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.05250 2016
[36]

Marco Ronzani and Cristina Silvano. 2025. FactorFlow: Mapping GEMMs on Spatial Architectures through Adaptive Programming and Greedy Optimization. In30th Asia and South Pacific Design Automation Conference (ASPDAC). 706–712. https://doi.org/10.1145/3658617.3697670

work page doi:10.1145/3658617.3697670 2025
[37]

Marco Ronzani and Cristina Silvano. 2025. QuickFlow: An Efficient Local Search Method to Map Convolutions on Spatial Architectures. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1–9. https: //doi.org/10.1109/ICCAD66269.2025.11240877

work page doi:10.1109/iccad66269.2025.11240877 2025
[38]

Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, En- rico Calore, Sebastiano Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, and...

work page doi:10.1145/3729215 2025
[39]

Xiao Sun, Naigang Wang, Chia-yu Chen, Jia-min Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2020. Ultra-low precision 4-bit training of deep neural networks. In34th International Conference on Neural Information Processing Systems (NIPS). Article 152, 12 pages

2020
[40]

Arne Symons, Linyan Mei, and Marian Verhelst. 2021. LOMA: Fast Auto- Scheduling on DNN Accelerators through Loop-Order-based Memory Allocation. In2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1–4. https://doi.org/10.1109/AICAS51828.2021.9458493

work page doi:10.1109/aicas51828.2021.9458493 2021
[41]

Run Wang, Gamze Islamoglu, Andrea Belano, Viviane Potocnik, Francesco Conti, Angelo Garofalo, and Luca Bonini. 2025. VEXP: A Low-Cost RISC- V ISA Extension for Accelerated Softmax Computation in Transformers. In 2025 IEEE 32nd Symposium on Computer Arithmetic (ARITH). 37–44. https: //doi.org/10.1109/ARITH64983.2025.00016

work page doi:10.1109/arith64983.2025.00016 2025
[42]

B. P. Welford. 1962. Note on a method for calculating corrected sums of squares and products.Technometrics4, 3 (1962), 419–420

1962
[43]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement De- langue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[44]

Emer, and Vivienne Sze

Yannan Nellie Wu, Joel S. Emer, and Vivienne Sze. 2019. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–8. https://doi.org/10.1109/ICCAD45719.2019.8942149

work page doi:10.1109/iccad45719.2019.8942149 2019
[45]

Mahoney, and Kurt Keutzer

Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, and Kurt Keutzer
[46]

https://doi.org/10.48550/arXiv.2011.10680

HAWQ-V3: Dyadic Neural Network Quantization.CoRRabs/2011.10680 (2020). https://doi.org/10.48550/arXiv.2011.10680

work page doi:10.48550/arxiv.2011.10680 2011
[47]

Jiaqi Zhang. 2025. Survey of Quantization-Aware Training (QAT) Applications in Deep Learning Quantization. In2025 International Symposium on Artificial Intelligence and Computational Social Sciences (AICSS). 431–442. https://doi.org/ 10.1145/3776759.3776826

work page doi:10.1145/3776759.3776826 2025
[48]

Xiaotian Zhao, Ruge Xu, and Xinfei Guo. 2023. Post-training Quantization or Quantization-aware Training? That is the Question. In2023 China Semiconduc- tor Technology International Conference (CSTIC). 1–3. https://doi.org/10.1109/ CSTIC58779.2023.10219214

arXiv 2023
[49]

Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou
[50]

https://doi.org/10.48550/ arXiv.1606.06160 13

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.CoRRabs/1606.06160 (2016). https://doi.org/10.48550/ arXiv.1606.06160 13

Pith/arXiv arXiv 2016

[1] [1]

2022.NVIDIA Hopper Architecture In-Depth

Michael Andersch, Greg Palmer, Ronny Krashinsky, Nick Stam, Vishal Mehta, Gonzalo Brito, and Sridhar Ramaswamy. 2022.NVIDIA Hopper Architecture In-Depth. Technical Report. NVIDIA. https://developer.nvidia.com/blog/nvidia- hopper-architecture-in-depth/

2022

[2] [2]

Renzo Andri, Enrico Reggiani, and Lukas Cavigelli. 2025. Flex-SFU: Activation Function Acceleration With Nonuniform Piecewise Approximation.IEEE Trans- actions on Computer-Aided Design of Integrated Circuits and Systems44, 11 (2025), 4236–4248. https://doi.org/10.1109/TCAD.2025.3558140

work page doi:10.1109/tcad.2025.3558140 2025

[3] [3]

McKinstry, Steven K

Deepika Bablani, Jeffrey L. McKinstry, Steven K. Esser, Rathinakumar Ap- puswamy, and Dharmendra S. Modha. 2024. Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference. InASPLOS’24 Workshop on Energy Efficient Machine Learning and Cognitive Computing (EMC2). https://doi.org/10.48550/arXiv.2301.13330

work page doi:10.48550/arxiv.2301.13330 2024

[4] [4]

Andrea Belano, Yvan Tortorella, Angelo Garofalo, Luca Benini, Davide Rossi, and Francesco Conti. 2025. A Flexible Template for Edge Generative AI With High-Accuracy Accelerated Softmax and GELU.IEEE Journal on Emerging and Selected Topics in Circuits and Systems15, 2 (2025), 200–216. https://doi.org/10. 1109/JETCAS.2025.3562734

arXiv 2025

[5] [5]

Sami Ben Ali, Silviu-Ioan Filip, and Olivier Sentieys. 2024. A Stochastic Rounding-Enabled Low-Precision Floating-Point MAC for DNN Training. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2024. 1–6. https://doi.org/10.23919/DATE58400.2024.10546735

work page doi:10.23919/date58400.2024.10546735 2024

[6] [6]

Weihan Chen, Peisong Wang, and Jian Cheng. 2021. Towards Mixed-Precision Quantization of Neural Networks via Constrained Optimization. In2021 IEEE/CVF International Conference on Computer Vision (ICCV). 5330–5339. https://doi.org/ 10.1109/ICCV48922.2021.00530

work page doi:10.1109/iccv48922.2021.00530 2021

[7] [7]

Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A Spatial Architec- ture for Energy-Efficient Dataflow for Convolutional Neural Networks. In2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA). 367–379. https://doi.org/10.1109/ISCA.2016.40

work page doi:10.1109/isca.2016.40 2016

[8] [8]

Tri Dao. 2024. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. In12th International Conference on Learning Representations (ICLR). https://doi.org/10.48550/arXiv.2307.08691

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2307.08691 2024

[9] [9]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Im- ageNet: A large-scale hierarchical image database. In2009 IEEE Conference on Computer Vision and Pattern Recognition. 248–255. https://doi.org/10.1109/CVPR. 2009.5206848

work page doi:10.1109/cvpr 2009

[10] [10]

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2018. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. CoRRabs/1810.04805 (2018). https://doi.org/10.48550/arXiv.1810.04805

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1810.04805 2018

[11] [11]

Mahoney, and Kurt Keutzer

Zhen Dong, Zhewei Yao, Daiyaan Arfeen, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. 2020. HAWQ-V2: Hessian Aware Trace-Weighted Quantization of Neural Networks. In34th International Conference on Neural Information Processing Systems(Vancouver, BC, Canada)(NIPS ’20). Article 1555, 12 pages. https://doi.org/doi/abs/10.5555/3495724.3497279

work page doi:10.5555/3495724.3497279 2020

[12] [12]

Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. 2020. Learned Step Size Quantization. InInternational Conference on Learning Representations. https://doi.org/10.48550/arXiv.1902. 08153

work page doi:10.48550/arxiv.1902 2020

[13] [13]

Sameh Galal and Mark Horowitz. 2011. Energy-Efficient Floating-Point Unit Design.IEEE Trans. Comput.60, 7 (2011), 913–922. https://doi.org/10.1109/TC. 2010.121

work page doi:10.1109/tc 2011

[14] [14]

findings-emnlp.765/

Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, and Kurt Keutzer. 2021. A Survey of Quantization Methods for Efficient Neural Network Inference.CoRRabs/2103.13630 (2021). https://doi.org/10.48550/arXiv. 2103.13630

work page internal anchor Pith review doi:10.48550/arxiv 2021

[15] [15]

2016.Mixed-Precision Programming with CUDA 8

Mark Harris. 2016.Mixed-Precision Programming with CUDA 8. Technical Report. NVIDIA. https://developer.nvidia.com/blog/mixed-precision-programming- cuda-8/

2016

[16] [16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90

work page doi:10.1109/cvpr.2016.90 2016

[17] [17]

Hinton, Oriol Vinyals, and Jeffrey Dean

Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 2015. Distilling the Knowl- edge in a Neural Network.CoRRabs/1503.02531 (2015). https://doi.org/10.48550/ arXiv.1503.02531

Pith/arXiv arXiv 2015

[18] [18]

Qijing Huang, Minwoo Kang, Grace Dinh, Thomas Norell, Aravind Kalaiah, James Demmel, John Wawrzynek, and Yakun Sophia Shao. 2021. CoSA: Scheduling by Constrained Optimization for Spatial Accelerators. In2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA). 554–566. https: //doi.org/10.1109/ISCA52012.2021.00050

work page doi:10.1109/isca52012.2021.00050 2021

[19] [19]

Ahmet Inci, Siri Virupaksha, Aman Jain, Ting-Wu Chin, Venkata Thallam, Ruizhou Ding, and Diana Marculescu. 2023. QUIDAM: A Framework for Quantization-aware DNN Accelerator and Model Co-Exploration.ACM Trans. Embed. Comput. Syst.22, 2, Article 33 (Jan. 2023), 21 pages. https://doi.org/10. 1145/3555807

2023

[20] [20]

Jung, Arne Symons, Linyan Mei, Marian Verhelst, and Luca Benini

Victor J.B. Jung, Arne Symons, Linyan Mei, Marian Verhelst, and Luca Benini

[21] [21]

In2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS)

SALSA: Simulated Annealing based Loop-Ordering Scheduler for DNN Accelerators. In2023 IEEE 5th International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1–5. https://doi.org/10.1109/AICAS57966.2023. 10168625

work page doi:10.1109/aicas57966.2023 2023

[22] [22]

Dhiraj D. Kalamkar, Dheevatsa Mudigere, Naveen Mellempudi, Dipankar Das, Ku- nal Banerjee, Sasikanth Avancha, Dharma Teja Vooturi, Nataraj Jammalamadaka, Jianyu Huang, Hector Yuen, Jiyan Yang, Jongsoo Park, Alexander Heinecke, Evan- gelos Georganas, Sudarshan Srinivasan, Abhisek Kundu, Misha Smelyanskiy, Bharat Kaul, and Pradeep Dubey. 2019. A Study of BF...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.12322 2019

[23] [23]

Sheng-Chun Kao and Tushar Krishna. 2020. GAMMA: Automating the HW Mapping of DNN Models on Accelerators via Genetic Algorithm. InIEEE/ACM International Conference On Computer Aided Design, ICCAD. 44:1–44:9. https: //doi.org/10.1145/3400302.3415639

work page doi:10.1145/3400302.3415639 2020

[24] [24]

Sehoon Kim, Amir Gholami, Zhewei Yao, Michael W Mahoney, and Kurt Keutzer

[25] [25]

InInternational Conference on Machine Learning

I-BERT: Integer-only BERT quantization. InInternational Conference on Machine Learning. PMLR, 5506–5518. https://doi.org/10.48550/arXiv.2101.01321

work page doi:10.48550/arxiv.2101.01321

[26] [26]

Jan Klhufek, Miroslav Safar, Vojtech Mrazek, Zdenek Vasicek, and Lukas Sekanina

[27] [27]

In2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS)

Exploring Quantization and Mapping Synergy in Hardware-Aware Deep Neural Network Accelerators. In2024 27th International Symposium on Design & Diagnostics of Electronic Circuits & Systems (DDECS). 1–6. https://doi.org/10. 1109/DDECS60919.2024.10508920

arXiv 2024

[28] [28]

Andrey Kuzmin, Mart Van Baalen, Yuwei Ren, Markus Nagel, Jorn Peters, and Tijmen Blankevoort. 2022. FP8 quantization: the power of the exponent. In36th International Conference on Neural Information Processing Systems (NIPS). Article 1065, 12 pages. https://doi.org/10.52202/068431-1065

work page doi:10.52202/068431-1065 2022

[29] [29]

Hyoukjun Kwon, Prasanth Chatarasi, Michael Pellauer, Angshuman Parashar, Vivek Sarkar, and Tushar Krishna. 2019. Understanding Reuse, Performance, and Hardware Cost of DNN Dataflow: A Data-Centric Approach. In52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)(Columbus, OH, USA). 754–768. https://doi.org/10.1145/3352460.3358252

work page doi:10.1145/3352460.3358252 2019

[30] [30]

Fiorin Leandro, Luigi Altamura, and Cristina Silvano. 2025. qGAMMA: A Frame- work for Optimal DNN Mapping on Multi-Precision Accelerators. InEmbedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS). Springer International Publishing

2025

[31] [31]

Sae Kyu Lee, Ankur Agrawal, Joel Silberman, Matthew Ziegler, Mingu Kang, Swa- gath Venkataramani, Nianzheng Cao, Bruce Fleischer, Michael Guillorn, Matthew Cohen, Silvia M. Mueller, Jinwook Oh, Martin Lutz, Jinwook Jung, Siyu Koswatta, Ching Zhou, Vidhi Zalani, Monodeep Kar, James Bonanno, Robert Casatuta, Chia- Yu Chen, Jungwook Choi, Howard Haynie, Alys...

work page doi:10.1109/jssc.2021.3120113 2022

[32] [32]

Yujun Lin, Haotian Tang, Shang Yang, Zhekai Zhang, Guangxuan Xiao, Chuang Gan, and Song Han. 2025. QServe: W4A8KV4 Quantization and System Co- design for Efficient LLM serving.Machine Learning and Systems (MLSys)7 (2025). https://doi.org/10.48550/arXiv.2405.04532

work page doi:10.48550/arxiv.2405.04532 2025

[33] [33]

Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W

Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. 2019. Timeloop: A Systematic Approach to DNN Accelerator Evaluation. In2019 IEEE International Symposium on Perfor- mance Analysis of Systems and Software (ISPASS). 304–315. https:...

arXiv 2019

[34] [34]

Press, Saul A

William H. Press, Saul A. Teukolsky, William T. Vetterling, and Brian P. Flannery. 2007.Numerical Recipes: The Art of Scientific Computing(3rd ed.). Cambridge University Press. 12 PRE-PRINT SEADA: An efficient methodology for optimizing mixed-precision DNNs on multi-precision spatial architectures

2007

[35] [35]

Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. SQuAD: 100,000+ Questions for Machine Comprehension of Text.CoRR abs/1606.05250 (2016). https://doi.org/10.48550/arXiv.1606.05250

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1606.05250 2016

[36] [36]

Marco Ronzani and Cristina Silvano. 2025. FactorFlow: Mapping GEMMs on Spatial Architectures through Adaptive Programming and Greedy Optimization. In30th Asia and South Pacific Design Automation Conference (ASPDAC). 706–712. https://doi.org/10.1145/3658617.3697670

work page doi:10.1145/3658617.3697670 2025

[37] [37]

Marco Ronzani and Cristina Silvano. 2025. QuickFlow: An Efficient Local Search Method to Map Convolutions on Spatial Architectures. In2025 IEEE/ACM International Conference On Computer Aided Design (ICCAD). 1–9. https: //doi.org/10.1109/ICCAD66269.2025.11240877

work page doi:10.1109/iccad66269.2025.11240877 2025

[38] [38]

Cristina Silvano, Daniele Ielmini, Fabrizio Ferrandi, Leandro Fiorin, Serena Curzel, Luca Benini, Francesco Conti, Angelo Garofalo, Cristian Zambelli, En- rico Calore, Sebastiano Schifano, Maurizio Palesi, Giuseppe Ascia, Davide Patti, Nicola Petra, Davide De Caro, Luciano Lavagno, Teodoro Urso, Valeria Cardellini, Gian Carlo Cardarilli, Robert Birke, and...

work page doi:10.1145/3729215 2025

[39] [39]

Xiao Sun, Naigang Wang, Chia-yu Chen, Jia-min Ni, Ankur Agrawal, Xiaodong Cui, Swagath Venkataramani, Kaoutar El Maghraoui, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2020. Ultra-low precision 4-bit training of deep neural networks. In34th International Conference on Neural Information Processing Systems (NIPS). Article 152, 12 pages

2020

[40] [40]

Arne Symons, Linyan Mei, and Marian Verhelst. 2021. LOMA: Fast Auto- Scheduling on DNN Accelerators through Loop-Order-based Memory Allocation. In2021 IEEE 3rd International Conference on Artificial Intelligence Circuits and Systems (AICAS). 1–4. https://doi.org/10.1109/AICAS51828.2021.9458493

work page doi:10.1109/aicas51828.2021.9458493 2021

[41] [41]

Run Wang, Gamze Islamoglu, Andrea Belano, Viviane Potocnik, Francesco Conti, Angelo Garofalo, and Luca Bonini. 2025. VEXP: A Low-Cost RISC- V ISA Extension for Accelerated Softmax Computation in Transformers. In 2025 IEEE 32nd Symposium on Computer Arithmetic (ARITH). 37–44. https: //doi.org/10.1109/ARITH64983.2025.00016

work page doi:10.1109/arith64983.2025.00016 2025

[42] [42]

B. P. Welford. 1962. Note on a method for calculating corrected sums of squares and products.Technometrics4, 3 (1962), 419–420

1962

[43] [43]

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement De- langue, Anthony Moi, Pierric Cistac, Tim Rault, Rémi Louf, Morgan Funtowicz, Joe Davison, Sam Shleifer, Patrick von Platen, Clara Ma, Yacine Jernite, Julien Plu, Canwen Xu, Teven Le Scao, Sylvain Gugger, Mariama Drame, Quentin Lhoest, and Alexander M. Rush. 2020. Transformers: State-of...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020

[44] [44]

Emer, and Vivienne Sze

Yannan Nellie Wu, Joel S. Emer, and Vivienne Sze. 2019. Accelergy: An Architecture-Level Energy Estimation Methodology for Accelerator Designs. In 2019 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 1–8. https://doi.org/10.1109/ICCAD45719.2019.8942149

work page doi:10.1109/iccad45719.2019.8942149 2019

[45] [45]

Mahoney, and Kurt Keutzer

Zhewei Yao, Zhen Dong, Zhangcheng Zheng, Amir Gholami, Jiali Yu, Eric Tan, Leyuan Wang, Qijing Huang, Yida Wang, Michael W. Mahoney, and Kurt Keutzer

[46] [46]

https://doi.org/10.48550/arXiv.2011.10680

HAWQ-V3: Dyadic Neural Network Quantization.CoRRabs/2011.10680 (2020). https://doi.org/10.48550/arXiv.2011.10680

work page doi:10.48550/arxiv.2011.10680 2011

[47] [47]

Jiaqi Zhang. 2025. Survey of Quantization-Aware Training (QAT) Applications in Deep Learning Quantization. In2025 International Symposium on Artificial Intelligence and Computational Social Sciences (AICSS). 431–442. https://doi.org/ 10.1145/3776759.3776826

work page doi:10.1145/3776759.3776826 2025

[48] [48]

Xiaotian Zhao, Ruge Xu, and Xinfei Guo. 2023. Post-training Quantization or Quantization-aware Training? That is the Question. In2023 China Semiconduc- tor Technology International Conference (CSTIC). 1–3. https://doi.org/10.1109/ CSTIC58779.2023.10219214

arXiv 2023

[49] [49]

Shuchang Zhou, Zekun Ni, Xinyu Zhou, He Wen, Yuxin Wu, and Yuheng Zou

[50] [50]

https://doi.org/10.48550/ arXiv.1606.06160 13

DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients.CoRRabs/1606.06160 (2016). https://doi.org/10.48550/ arXiv.1606.06160 13

Pith/arXiv arXiv 2016