pith. machine review for the scientific record. sign in

arxiv: 2604.24492 · v1 · submitted 2026-04-27 · 💻 cs.CV · cs.AI· cs.ET· cs.LG· cs.NE

Recognition: unknown

Deployment-Aligned Low-Precision Neural Architecture Search for Spaceborne Edge AI

Authors on Pith no claims yet

Pith reviewed 2026-05-08 04:31 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.ETcs.LGcs.NE
keywords neural architecture searchlow-precision traininghardware-aware NASedge AIvessel segmentationFP16 inferencevisual processing unitspaceborne monitoring
0
0 comments X

The pith

Integrating FP16 constraints into hardware-aware NAS recovers two-thirds of the accuracy lost when deploying models on low-precision edge VPUs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that hardware-aware neural architecture search can close much of the accuracy gap that appears when models move from full-precision optimization to low-precision deployment on edge hardware. Standard pipelines optimize architectures at full precision and only apply low-precision adaptation afterward, which creates a mismatch that drops performance on devices like the Myriad X VPU. By exposing every candidate architecture to FP16 numerical constraints during both fine-tuning and evaluation inside the NAS loop, the search jointly improves architectural efficiency and numerical robustness. On a vessel-segmentation task for spaceborne monitoring, this yields 0.826 mIoU on-device for a 95,791-parameter model, compared with 0.78 mIoU after ordinary post-training conversion, recovering roughly two-thirds of the deployment-induced loss without any increase in model size or change to the search space. Readers interested in reliable edge AI would care because the method keeps the same compact architecture while making its deployment behavior far more predictable.

Core claim

By integrating deployment-aligned low-precision training directly into hardware-aware NAS, candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation. This enables joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. Evaluated on vessel segmentation for spaceborne maritime monitoring targeting the Intel Movidius Myriad X VPU, the approach achieves 0.826 mIoU on-device for an architecture of 95,791 parameters, compared with 0.78 mIoU after post-training precision conversion from a full-precision baseline of 0.85 mIoU, thereby recovering approximately two-thirds of the gap.

What carries the argument

Deployment-aligned low-precision training, the mechanism that exposes every NAS candidate to FP16 constraints during both fine-tuning and evaluation so that numerical robustness is optimized together with architectural efficiency.

If this is right

  • On-device accuracy rises for an unchanged model size and architecture.
  • The accuracy gap between full-precision NAS and low-precision deployment shrinks without extra parameters.
  • No alterations to the evolutionary search strategy or search space are needed.
  • The same compact network meets stricter latency and accuracy targets on resource-constrained VPUs.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same alignment technique could be tested on other low-precision formats such as INT8 or on different edge accelerators.
  • Space missions that cannot perform post-deployment retraining would benefit from earlier numerical robustness in the search.
  • Hardware-aware NAS frameworks may need to treat numerical constraints as first-class hardware metrics rather than post-processing steps.

Load-bearing premise

That exposing architectures to FP16 constraints during NAS fine-tuning and evaluation accurately predicts and improves real deployment behavior on the target VPU without introducing search biases.

What would settle it

Deploy the architecture selected by the low-precision NAS on the actual Myriad X VPU and measure its mIoU; if the result falls to or below the 0.78 mIoU obtained by post-training conversion of a full-precision NAS model, the central claim does not hold.

Figures

Figures reproduced from arXiv: 2604.24492 by Giacomo Zema, Parampuneet Kaur Thind, Roberto Del Prete, Vaibhav Katturu.

Figure 1
Figure 1. Figure 1: The diagram illustrates the proposed NAS framework in which each candidate architecture generated by the genetic algorithm view at source ↗
Figure 2
Figure 2. Figure 2: Quantitative comparison between post-training precision conversion and low-precision-aware fine-tuning within the hardware view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative comparison of vessel segmentation under different numerical conditions. From left to right: input image, ground view at source ↗
read the original abstract

Designing deep networks that meet strict latency and accuracy constraints on edge accelerators increasingly relies on hardware-aware optimization, including neural architecture search (NAS) guided by device-level metrics. Yet most hardware-aware NAS pipelines still optimize architectures under full-precision assumptions and apply low-precision adaptation only after the search, leading to a mismatch between optimization-time behavior and deployment-time execution on low-precision hardware that can substantially degrade accuracy. We address this limitation by integrating deployment-aligned low-precision training directly into hardware-aware NAS. Candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation, enabling joint optimization of architectural efficiency and numerical robustness without modifying the search space or evolutionary strategy. We evaluate the proposed framework on vessel segmentation for spaceborne maritime monitoring, targeting the Intel Movidius Myriad X Visual Processing Unit (VPU). While post-training precision conversion reduces on-device performance from 0.85 to 0.78 mIoU, deployment-aligned low-precision training achieves 0.826 mIoU on-device for the same architecture (95,791 parameters), recovering approximately two-thirds of deployment-induced accuracy gap without increasing model complexity. These results demonstrate that incorporating deployment-consistent numerical constraints into hardware-aware NAS substantially improves robustness and alignment between optimization and deployment for resource-constrained edge Artificial Intelligence (AI).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 3 minor

Summary. The paper proposes integrating deployment-aligned low-precision (FP16) training directly into hardware-aware neural architecture search (NAS) for edge AI. Candidate architectures are exposed to FP16 numerical constraints during fine-tuning and evaluation phases of the NAS process, enabling joint optimization of efficiency and robustness without changes to the search space or evolutionary strategy. Evaluated on vessel segmentation for spaceborne maritime monitoring targeting the Intel Movidius Myriad X VPU, post-training precision conversion drops performance from 0.85 to 0.78 mIoU, while the proposed method achieves 0.826 mIoU on-device for the same 95,791-parameter architecture, recovering approximately two-thirds of the deployment-induced accuracy gap.

Significance. If the results hold under rigorous validation, this approach could meaningfully advance hardware-aware NAS for low-precision edge devices by reducing the mismatch between optimization and deployment. The empirical on-device measurements on a real VPU for a space application provide practical value, and the lack of added model complexity strengthens the case for adoption in resource-constrained settings.

major comments (2)
  1. Abstract: The headline claim that deployment-aligned low-precision training recovers ~2/3 of the 0.85-to-0.78 mIoU gap (achieving 0.826 mIoU on-device for the 95,791-param model) rests on the assumption that FP16 simulation during NAS fine-tuning and evaluation accurately predicts real Myriad X VPU behavior; no verification of VPU-specific effects such as rounding modes, saturation, or fused multiply-add precision is provided, risking that the reported alignment is an artifact of the simulation rather than true deployment robustness.
  2. Evaluation section: No error bars, standard deviations, or statistics from multiple runs are reported for the mIoU figures, and there is no ablation isolating the contribution of the FP16 exposure within NAS from the base evolutionary search strategy or other factors; this undermines confidence in the cross-condition comparison and the claim that no modifications to search space or strategy were needed.
minor comments (3)
  1. Provide more explicit details on the implementation of FP16 constraints (e.g., quantization simulation method, scaling, or clipping) and how they were selected to approximate the target VPU.
  2. Include additional baselines such as standard post-training quantization applied after full-precision NAS or other low-precision NAS variants for better context on the improvement magnitude.
  3. Clarify reproducibility aspects including exact search space definition, evolutionary parameters, and training hyperparameters even if the strategy itself is unchanged.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. We address each major comment below and indicate planned revisions to strengthen the paper.

read point-by-point responses
  1. Referee: Abstract: The headline claim that deployment-aligned low-precision training recovers ~2/3 of the 0.85-to-0.78 mIoU gap (achieving 0.826 mIoU on-device for the 95,791-param model) rests on the assumption that FP16 simulation during NAS fine-tuning and evaluation accurately predicts real Myriad X VPU behavior; no verification of VPU-specific effects such as rounding modes, saturation, or fused multiply-add precision is provided, risking that the reported alignment is an artifact of the simulation rather than true deployment robustness.

    Authors: We thank the referee for this observation. The reported 0.826 mIoU is measured via direct on-device inference on the Intel Movidius Myriad X VPU after deployment, not in simulation. The FP16 simulation is applied only during NAS fine-tuning and evaluation to select architectures that are numerically robust under low precision. We used standard framework-level FP16 emulation without custom modeling of VPU-specific rounding, saturation, or FMA behaviors. In the revised manuscript we will add a paragraph in the evaluation section clarifying these simulation assumptions and noting that the on-device results provide independent empirical confirmation of the robustness gains. revision: partial

  2. Referee: Evaluation section: No error bars, standard deviations, or statistics from multiple runs are reported for the mIoU figures, and there is no ablation isolating the contribution of the FP16 exposure within NAS from the base evolutionary search strategy or other factors; this undermines confidence in the cross-condition comparison and the claim that no modifications to search space or strategy were needed.

    Authors: We agree that error bars from multiple runs and a dedicated ablation would increase confidence. Performing repeated full NAS runs is computationally prohibitive on the target hardware, so results are from a single evolutionary search (standard practice in many NAS works). The comparison isolates the effect of adding FP16 exposure during NAS versus post-training quantization on the identical architecture found by the unchanged search. In revision we will insert a limitations subsection explaining the single-run reporting, the rationale for leaving the evolutionary strategy unmodified, and any retrievable run-to-run variance from intermediate logs. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical NAS integration with on-device validation

full rationale

The paper reports an empirical framework that inserts FP16 constraints into existing hardware-aware NAS (fine-tuning and evaluation phases) and measures resulting on-device mIoU on the Myriad X VPU. No equations, derivations, or fitted parameters are presented that reduce to their own inputs by construction. The headline recovery (0.826 mIoU) is an observed experimental outcome, not a prediction forced by a model fit or self-citation chain. The method re-uses a standard evolutionary NAS strategy without claimed uniqueness theorems or ansatzes imported from prior author work. Self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, new axioms, or invented entities; the work is an empirical engineering application of existing NAS and quantization techniques.

pith-pipeline@v0.9.0 · 5549 in / 1133 out tokens · 20837 ms · 2026-05-08T04:31:28.345792+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

43 extracted references · 18 canonical work pages

  1. [1]

    Matt Poyser and Toby P. Breckon. Neural architec- ture search: A contemporary literature review for com- puter vision applications.Pattern Recognition, 147:110052,

  2. [2]

    doi: https : / / doi

    ISSN 0031-3203. doi: https : / / doi . org / 10 . 1016 / j . patcog . 2023 . 110052. URLhttps : / / www . sciencedirect . com / science / article / pii / S0031320323007495. 1

  3. [3]

    Estimating the intrinsic dimension of datasets by a minimal neighborhood information

    Roberto Del Prete, Parampuneet Kaur Thind, Andrea Mazzeo, Matthew Whitley, Lorenzo Papa, Nicolas Long´ep´e, and Gabriele Meoni. Optimizing deep learning models for on-orbit deployment through neural architecture search.Sci- entific Reports, 15(1):37783, 2025. doi: 10.1038/s41598- 025-21467-8. 1, 2, 3

  4. [4]

    Naseo: Neural architecture search for earth observation on- board processing

    Parampuneet Kaur Thind, Roberto Del Prete, Matthew Whit- ley, Andrea Mazzeo, Nicolas Long´ep´e, and Gabriele Meoni. Naseo: Neural architecture search for earth observation on- board processing. In2025 European Data Handling & Data Processing Conference (EDHPC), pages 1–8, 2025. 1, 2

  5. [5]

    Dpp-net: Device-aware progressive search for pareto-optimal neural architectures, 2018

    Jin-Dong Dong, An-Chieh Cheng, Da-Cheng Juan, Wei Wei, and Min Sun. Dpp-net: Device-aware progressive search for pareto-optimal neural architectures, 2018. URLhttps: //arxiv.org/abs/1806.08198. 1, 3

  6. [6]

    Assessing the added value of onboard earth observation pro- cessing with the iride heo service segment

    Parampuneet Kaur Thind, Charles Mwangi, Giovanni Varetto, Lorenzo Sarti, Andrea Papa, and Andrea Taramelli. Assessing the added value of onboard earth observation pro- cessing with the iride heo service segment. InProceedings of the IEEE International Geoscience and Remote Sensing Symposium (IGARSS), 2026. Accepted for publication. 1

  7. [7]

    Overview of ESA’s Earth Ob- servation upcoming small satellites missions

    Massimiliano Pastena, Michel Tossaint, Amanda Regan, Michele Castorina, Pierre Mathieu, Josep Rosello, Antonio Gabriele, and Nicola Melega. Overview of ESA’s Earth Ob- servation upcoming small satellites missions. 08 2020

  8. [8]

    Development and implementation of theΦSat-2 mission

    Nicola Melega, Nicolas Longepe, Agne Paskeviciute, Valentina Marchese, Oriol Aragon, Irina Babkina, Alessan- dro Marin, Jakub Nalepa, Leonie Buckley, Giorgia Guer- risi, Sofia Oliveira, and Hano Steyn. Development and implementation of theΦSat-2 mission. In Max Petrozzi- Ilstad, editor,Small Satellites Systems and Services Sym- posium (4S 2024), volume 13...

  9. [9]

    Prediction of annual snow accumulation using a recurrent graph convolutional approach, in: IGARSS2023-2023IEEEInternationalGeoscienceandRemoteSensingSymposium,pp.5344–5347

    Roberto Del Prete, Gabriele Meoni, Nicolas Long ´ep´e, Maria Daniela Graziano, and Alfredo Renga. First results of vessel detection with onboard processing of sentinel-2 raw data by deep learning. InIGARSS 2023 - 2023 IEEE International Geoscience and Remote Sensing Symposium, pages 6262–6265, 2023. doi: 10.1109/IGARSS52108.2023. 10283401. 2

  10. [10]

    Towards onboard thermal hotspots segmentation with raw multispectral satel- lite imagery.International Journal of Applied Earth Obser- vation and Geoinformation, 146:105095, 2026

    Cristopher Castro Traba, David Rijlaarsdam, Jian Guo, Roberto Del Prete, and Gabriele Meoni. Towards onboard thermal hotspots segmentation with raw multispectral satel- lite imagery.International Journal of Applied Earth Obser- vation and Geoinformation, 146:105095, 2026. ISSN 1569-

  11. [11]

    URL https : / / www

    doi: https://doi.org/10.1016/j.jag.2026.105095. URL https : / / www . sciencedirect . com / science / article/pii/S1569843226000117. 2

  12. [12]

    Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

    Gonzalo Mateo-Garcia, Joshua Veitch-Michaelis, Lewis Smith, Silviu Vlad Oprea, Guy Schumann, Yarin Gal, Atılım G¨unes ¸ Baydin, and Dietmar Backes. Towards global flood mapping onboard low cost satellites with machine learning.Scientific Reports, 11(1):7249, 2021

  13. [13]

    On-board volcanic eruption detection through CNNss and satellite multispectral imagery.Remote Sensing, 13(17): 3479, 2021

    Maria Pia Del Rosso, Alessandro Sebastianelli, Dario Spiller, Pierre Philippe Mathieu, and Silvia Liberata Ullo. On-board volcanic eruption detection through CNNss and satellite multispectral imagery.Remote Sensing, 13(17): 3479, 2021. 2

  14. [14]

    AI-enabled onboard edge computing for satel- lite intelligence in disaster management.https://www

    UN-SPIDER. AI-enabled onboard edge computing for satel- lite intelligence in disaster management.https://www. un- spider.org/news- and- events, 2025. UN- SPIDER news archive (original article removed); Accessed: 2026-02-12. 2

  15. [15]

    Progress and chal- lenges in intelligent remote sensing satellite systems.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15:1814–1822, 2022

    Bing Zhang, Yuanfeng Wu, Boya Zhao, Jocelyn Chanussot, Danfeng Hong, Jing Yao, and Lianru Gao. Progress and chal- lenges in intelligent remote sensing satellite systems.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15:1814–1822, 2022. 2

  16. [16]

    Carnicero, Nicolas Long ´ep´e, Aiste Paskeviciute, Valerio Marchese, Oriol Aragon, Irina Babkina, Alessandro Marin, Jakub Nalepa, and Leonie Buckley

    Nicola Melega, Bernardo A. Carnicero, Nicolas Long ´ep´e, Aiste Paskeviciute, Valerio Marchese, Oriol Aragon, Irina Babkina, Alessandro Marin, Jakub Nalepa, and Leonie Buckley. Implementation of theϕsat-2 on-board image pro- cessing chain. InSensors, Systems, and Next-Generation Satellites XXVII, volume 12729, pages 264–276. SPIE, 2023. 2

  17. [17]

    Towards the use of artifi- cial intelligence on the edge in space systems: Challenges and opportunities.IEEE Aerospace and Electronic Systems Magazine, 35(12):44–56, 2020

    Gianluca Furano, Gabriele Meoni, Aubrey Dunne, David Moloney, Veronique Ferlet-Cavrois, Antonis Tavoularis, Jonathan Byrne, L ´eonie Buckley, Mihalis Psarakis, Kay- Obbe V oss, and Luca Fanucci. Towards the use of artifi- cial intelligence on the edge in space systems: Challenges and opportunities.IEEE Aerospace and Electronic Systems Magazine, 35(12):44–...

  18. [18]

    doi: 10.1109/access.2024

    Angela Cratere, Leandro Gagliardi, Gabriel A. Sanca, Fed- erico Golmar, and Francesco Dell’Olio. On-board computer for CubeSats: State-of-the-art and future trends.IEEE Ac- cess, 12:99537–99569, 2024. doi: 10.1109/ACCESS.2024. 3428388. 2

  19. [19]

    Object de- tection using deep learning, CNNs and vision transform- ers: A review.IEEE Access, PP:1–1, 01 2023

    Ayoub Benali Amjoud and Mustapha Amrouch. Object de- tection using deep learning, CNNs and vision transform- ers: A review.IEEE Access, PP:1–1, 01 2023. doi: 10.1109/ACCESS.2023.3266093. 2

  20. [20]

    Model compression and acceleration for deep neural networks: The principles, progress, and challenges.IEEE Signal Processing Magazine, 35(1):126–136, 2018

    Yu Cheng, Duo Wang, Pan Zhou, and Tao Zhang. Model compression and acceleration for deep neural networks: The principles, progress, and challenges.IEEE Signal Processing Magazine, 35(1):126–136, 2018. 2

  21. [21]

    A comprehensive survey on model compression and acceleration.Artificial Intelligence Review, 53(7):5113–5155, 2020

    Tejalal Choudhary, Vipul Mishra, Anurag Goswami, and Ja- gannathan Sarangapani. A comprehensive survey on model compression and acceleration.Artificial Intelligence Review, 53(7):5113–5155, 2020. ISSN 1573-7462. doi: 10.1007/ s10462-020-09816-7. URLhttps://doi.org/10. 1007/s10462-020-09816-7. 2, 3

  22. [22]

    Ar- tificial intelligence to advance earth observation: A review of models, recent trends, and pathways forward.IEEE Geo- science and Remote Sensing Magazine, 2024

    Devis Tuia, Konrad Schindler, Beg ¨um Demir, Xiao Xiang Zhu, Mrinalini Kochupillai, Saˇso Dˇzeroski, Jan N Van Rijn, Holger H Hoos, Fabio Del Frate, Mihai Datcu, et al. Ar- tificial intelligence to advance earth observation: A review of models, recent trends, and pathways forward.IEEE Geo- science and Remote Sensing Magazine, 2024. 2

  23. [23]

    Colin Reeves.Genetic Algorithms, volume 146, pages 109–

  24. [24]

    Cartesian vs. Radial – A Comparative Evaluation of Two Visualization Tools

    09 2010. ISBN 978-1-4419-1663-1. doi: 10.1007/978- 1-4419-1665-5 5. 3

  25. [25]

    Littman, and Andrew W

    Leslie Pack Kaelbling, Michael L. Littman, and Andrew W. Moore. Reinforcement learning: A survey.Journal of Arti- ficial Intelligence Research, 4:237–285, 1996. doi: 10.1613/ jair.301. 3

  26. [26]

    Nas-bench- 1shot1: Benchmarking and dissecting one-shot neural archi- tecture search, 2020

    Arber Zela, Julien Siems, and Frank Hutter. Nas-bench- 1shot1: Benchmarking and dissecting one-shot neural archi- tecture search, 2020. URLhttps://arxiv.org/abs/ 2001.10422. 3

  27. [27]

    DARTS: Differentiable Architecture Search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. DARTS: Differentiable architecture search, 2019. URLhttps:// arxiv.org/abs/1806.09055

  28. [28]

    ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware

    Han Cai, Ligeng Zhu, and Song Han. Proxylessnas: Direct neural architecture search on target task and hardware, 2019. URLhttps://arxiv.org/abs/1812.00332. 3

  29. [29]

    AutoML-based neural architecture search for object recognition in satellite imagery.Remote Sensing, 15(1), 2023

    Povilas Gudzius, Olga Kurasova, Vytenis Darulis, and Ernestas Filatovas. AutoML-based neural architecture search for object recognition in satellite imagery.Remote Sensing, 15(1), 2023. ISSN 2072-4292. doi: 10.3390/ rs15010091. URLhttps://www.mdpi.com/2072- 4292/15/1/91. 3

  30. [30]

    Guangyuan Liu, Yangyang Li, Yanqiao Chen, Ronghua Shang, and Licheng Jiao. Pol-nas: A neural architec- ture search method with feature selection for polsar image classification.IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 15:9339–9354,

  31. [31]

    URLhttps://api.semanticscholar.org/ CorpusID:253321267

  32. [32]

    Designing a classifier for active fire detection from multispectral satellite imagery using neural architecture search, 2024

    Amber Cassimon, Phil Reiter, Siegfried Mercelis, and Kevin Mets. Designing a classifier for active fire detection from multispectral satellite imagery using neural architecture search, 2024. URLhttps://arxiv.org/abs/2410. 05425. 3

  33. [33]

    A white paper on neural network quantization

    Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yely- sei Bondarenko, Mart van Baalen, and Tijmen Blankevoort. A white paper on neural network quantization, 2021. URL https://arxiv.org/abs/2106.08295. 3

  34. [34]

    Post-training 4-bit quantization of convolution networks for rapid-deployment

    Ron Banner, Yury Nahshan, Elad Hoffer, and Daniel Soudry. Post-training 4-bit quantization of convolution networks for rapid-deployment, 2019. URLhttps://arxiv.org/ abs/1810.05723. 3

  35. [35]

    Mahoney, and Kurt Keutzer

    Yaohui Cai, Zhewei Yao, Zhen Dong, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. Zeroq: A novel zero shot quantization framework, 2020. URLhttps: //arxiv.org/abs/2001.00281. 3

  36. [36]

    Low-bit quantization of neural networks for efficient infer- ence, 2019

    Yoni Choukroun, Eli Kravchik, Fan Yang, and Pavel Kisilev. Low-bit quantization of neural networks for efficient infer- ence, 2019. URLhttps://arxiv.org/abs/1902. 06822. 3

  37. [37]

    Shifted and squeezed 8-bit floating point format for low-precision training of deep neural networks, 2020

    L ´eopold Cambier, Anahita Bhiwandiwalla, Ting Gong, Mehran Nekuii, Oguz H Elibol, and Hanlin Tang. Shifted and squeezed 8-bit floating point format for low-precision training of deep neural networks, 2020. URLhttps: //arxiv.org/abs/2001.05674. 3

  38. [38]

    Neural gradients are near- lognormal: improved quantized and sparse training, 2020

    Brian Chmiel, Liad Ben-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel Soudry. Neural gradients are near- lognormal: improved quantized and sparse training, 2020. URLhttps://arxiv.org/abs/2006.08173. 3

  39. [39]

    I.-J., Srini- vasan, V ., and Gopalakrishnan, K

    Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. Pact: Parameterized clipping activation for quantized neural networks, 2018. URLhttps://arxiv. org/abs/1805.06085. 5

  40. [40]

    Hrsc2016, 2025

    Zikun Liu, Hongzhen Wang, Lubin Weng, and Yiping Yang. Hrsc2016, 2025. URLhttps://dx.doi.org/10. 21227/rgx1-sh71. 5

  41. [41]

    AMD Radeon™ Graph- ics.https : / / www

    Advanced Micro Devices, Inc. AMD Radeon™ Graph- ics.https : / / www . amd . com / en / products / graphics / desktops / radeon . html, 2025. Ac- cessed: 2025-05-02. 6

  42. [42]

    Intel® Movidius™ Myriad™ X Vision Processing Unit.https : / / www

    Intel Corporation. Intel® Movidius™ Myriad™ X Vision Processing Unit.https : / / www . intel . com / content / www / us / en / products / sku / 125926 / intel - movidius - myriad - x - vision - processing- unit- 4gb/specifications.html,

  43. [43]

    Accessed: 2025-05-02. 6