pith. sign in

arxiv: 2604.16964 · v1 · submitted 2026-04-18 · 💻 cs.AR

E2AFS: Energy-Efficient Approximate Floating Point Square Rooter for Error Tolerant Computing

Pith reviewed 2026-05-10 06:59 UTC · model grok-4.3

classification 💻 cs.AR
keywords approximate computingfloating-point square rootenergy-efficient architectureFPGA implementationerror-tolerant computingSobel edge detectionK-means quantizationpower-delay product
0
0 comments X

The pith

E2AFS introduces a multiplier-free approximate floating-point square-root architecture that achieves lower dynamic power, shorter delay, and better power-delay product than prior designs while keeping errors acceptable for error-tolerantuse

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents E2AFS as a lightweight floating-point square-root design that avoids multipliers entirely to cut energy use in embedded and edge systems. It reduces logic depth and switching activity compared with conventional multiplier-based or iterative methods. On an Artix-7 FPGA the design records 7.63 mW dynamic power, 4.639 ns critical-path delay, and 35.39 pJ power-delay product, outperforming ESAS and CWAHA. Accuracy checks show low deviation from the exact square-root function, and application tests in Sobel edge detection and K-means quantization confirm the errors stay tolerable.

Core claim

E2AFS is a fully multiplier-free floating-point square-root architecture that minimizes logic depth and switching activity. FPGA implementation on Artix-7 demonstrates the lowest dynamic power of 7.63 mW, shortest critical-path delay of 4.639 ns, and minimum power-delay product of 35.39 pJ versus existing ESAS and CWAHA designs. Error metrics and graphical analysis establish consistently low deviation from the exact function, and end-to-end validation in Sobel edge detection and K-means color quantization shows the approximation remains suitable for low-power real-time edge and embedded platforms.

What carries the argument

The E2AFS multiplier-free floating-point square-root architecture that reduces logic depth and switching activity to improve power and delay.

If this is right

  • Records 7.63 mW dynamic power on Artix-7 FPGA
  • Achieves 4.639 ns critical-path delay
  • Delivers 35.39 pJ power-delay product
  • Maintains low deviation from exact square-root function
  • Supports low-power real-time operation in edge and embedded platforms

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The multiplier-free approach may transfer to other floating-point operations under similar hardware constraints
  • Lower power and delay could allow more complex computations within fixed energy budgets on battery-powered edge devices
  • The error-tolerance validation suggests the unit could fit additional real-time signal-processing pipelines beyond the two tested applications

Load-bearing premise

The approximation errors stay small enough that they do not degrade end-to-end quality in the targeted error-tolerant applications such as Sobel edge detection and K-means quantization.

What would settle it

Power or delay measurements on an Artix-7 device exceeding 7.63 mW or 4.639 ns, or visible quality loss in Sobel edge detection and K-means outputs when using E2AFS versus exact square-root computation, would disprove the efficiency and suitability claims.

Figures

Figures reproduced from arXiv: 2604.16964 by Jatin Kumar Reddy Mothe, Prateek Goyal, Sujit Kumar Sahoo, Swara Rajesh Shelke.

Figure 1
Figure 1. Figure 1: Overall architectural flow of the proposed E2AFS illustrating the key computational stages [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Graphical Analysis of Various Floating-Point Square Rooters [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Normalized Figures of Merit (FoM1 and FoM2) highlighting speed and energy efficiency. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visual comparison of edge-detection performance across approximate square-root architectures. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Visual outcomes of color quantization using various approximate square-root architectures. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Floating-point square-root computation is a power- and delay-critical operation in edge-AI, signal-processing, and embedded systems. Conventional implementations typically rely on multipliers or iterative pipelines, resulting in increased hardware complexity, switching activity, and energy consumption. This work presents E2AFS, a lightweight and fully multiplier-free floating-point square-root architecture optimized for energy-efficient computation. By reducing logic depth and minimizing switching activity, the proposed design achieves substantial improvements in hardware efficiency and performance. FPGA implementation on an Artix-7 device demonstrates that E2AFS achieves the lowest dynamic power (7.63 mW), the shortest critical-path delay (4.639 ns), and the minimum power-delay product (35.39 pJ) compared to existing ESAS and CWAHA architectures. Error evaluation using multiple accuracy metrics, together with graphical analysis, shows that E2AFS closely approximates the exact square-root function with consistently low deviation. Application-level validation in Sobel edge detection and K-means color quantization further confirms its suitability for low-power real-time edge and embedded platforms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes E2AFS, a lightweight multiplier-free approximate floating-point square-root architecture for energy-efficient computation in error-tolerant domains. It claims that FPGA synthesis on an Artix-7 device yields the lowest dynamic power (7.63 mW), shortest critical-path delay (4.639 ns), and minimum power-delay product (35.39 pJ) relative to the ESAS and CWAHA baselines, while maintaining low approximation error as shown by accuracy metrics, graphical analysis, and end-to-end tests on Sobel edge detection and K-means color quantization.

Significance. If the performance deltas hold under identical synthesis conditions, the design could offer a practical hardware primitive for low-power edge-AI and embedded signal-processing pipelines where square-root operations are frequent. The combination of multiplier elimination, reported PDP improvement, and application-level validation would strengthen the case for approximate arithmetic in resource-constrained platforms.

major comments (3)
  1. [FPGA Implementation Results] FPGA results section (abstract and implementation results): The central superiority claims for dynamic power, delay, and PDP rest on comparisons to ESAS and CWAHA, yet the manuscript provides no explicit statement that all three designs were re-implemented by the authors using identical Vivado synthesis settings, optimization directives, place-and-route constraints, clock frequency for power analysis, and input switching activity. These factors directly affect the reported 7.63 mW / 4.639 ns / 35.39 pJ figures; without this confirmation the deltas cannot be attributed solely to the E2AFS architecture.
  2. [Proposed Architecture / Error Evaluation] Proposed design and error evaluation sections: No design equations, algorithmic steps, or error formulas are supplied for the approximation method. The abstract asserts a “multiplier-free” property and “consistently low deviation,” but without the underlying mapping or error-bound derivation it is impossible to verify the claimed reduction in logic depth or to reproduce the accuracy metrics used for the Sobel and K-means tests.
  3. [Application-Level Validation] Application validation section: The claim that approximation error “does not meaningfully degrade” end-to-end quality in Sobel edge detection and K-means quantization is load-bearing for the error-tolerant suitability argument, yet the manuscript reports only qualitative confirmation without quantitative metrics (e.g., PSNR, clustering accuracy delta, or visual difference scores) comparing exact versus approximate outputs.
minor comments (1)
  1. [Abstract / Figures] Ensure that all figures referenced in the error and application sections are numbered and captioned consistently with the text; the abstract mentions “graphical analysis” without a corresponding figure citation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help improve the clarity and rigor of our manuscript. We address each major comment point by point below and will revise the paper to incorporate the suggested clarifications and additions.

read point-by-point responses
  1. Referee: [FPGA Implementation Results] The central superiority claims for dynamic power, delay, and PDP rest on comparisons to ESAS and CWAHA, yet the manuscript provides no explicit statement that all three designs were re-implemented by the authors using identical Vivado synthesis settings, optimization directives, place-and-route constraints, clock frequency for power analysis, and input switching activity. These factors directly affect the reported 7.63 mW / 4.639 ns / 35.39 pJ figures; without this confirmation the deltas cannot be attributed solely to the E2AFS architecture.

    Authors: We confirm that E2AFS, ESAS, and CWAHA were all re-implemented by the authors on the same Artix-7 device using identical Vivado synthesis settings, optimization directives, place-and-route constraints, clock frequency, and input switching activity vectors for power estimation. We will add an explicit paragraph in the FPGA Implementation Results section stating these conditions to ensure the performance comparisons are fully reproducible and attributable to architectural differences. revision: yes

  2. Referee: [Proposed Architecture / Error Evaluation] No design equations, algorithmic steps, or error formulas are supplied for the approximation method. The abstract asserts a “multiplier-free” property and “consistently low deviation,” but without the underlying mapping or error-bound derivation it is impossible to verify the claimed reduction in logic depth or to reproduce the accuracy metrics used for the Sobel and K-means tests.

    Authors: The Proposed Architecture section describes the multiplier-free approximation via bit-level manipulations of the mantissa and exponent, along with the resulting logic simplification. To address the concern, we will add explicit design equations for the approximation mapping, a step-by-step algorithmic description (including pseudocode), and the error-bound derivation in the revised manuscript. This will enable verification of the logic-depth reduction and reproduction of the accuracy metrics. revision: yes

  3. Referee: [Application-Level Validation] The claim that approximation error “does not meaningfully degrade” end-to-end quality in Sobel edge detection and K-means quantization is load-bearing for the error-tolerant suitability argument, yet the manuscript reports only qualitative confirmation without quantitative metrics (e.g., PSNR, clustering accuracy delta, or visual difference scores) comparing exact versus approximate outputs.

    Authors: We acknowledge that the current application validation relies primarily on qualitative visual comparisons. In the revision, we will add quantitative metrics: PSNR and SSIM for Sobel edge detection, and clustering accuracy/purity deltas for K-means, directly comparing exact and approximate outputs. These additions will provide objective support for the claim that approximation error does not meaningfully degrade end-to-end quality. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical hardware design and FPGA results

full rationale

The paper proposes an approximate floating-point square-root hardware architecture (E2AFS) and reports direct FPGA synthesis metrics (power, delay, PDP) plus error and application-level results on Artix-7. No derivation chain, first-principles prediction, parameter fitting presented as output, or self-citation load-bearing any claim is present. Central results are empirical measurements from synthesis and testing, not reductions to inputs by construction or renaming of prior results. This matches the reader's assessment of minimal circularity burden for a design paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities; the approximation method itself is not described, so any design choices that control error versus efficiency remain unspecified.

pith-pipeline@v0.9.0 · 5506 in / 1213 out tokens · 50536 ms · 2026-05-10T06:59:16.348335+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages

  1. [1]

    Parhami, Computer Arithmetic: Algorithms and Hardware Designs

    B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs . London, U.K.: Oxford University Press, 2000

  2. [2]

    Approximate computing: An emerging paradigm for energy-efficient design,

    J. Han and M. Orshansky, “Approximate computing: An emerging paradigm for energy-efficient design,” in 2013 18th IEEE European Test Symposium (ETS) , 2013, pp. 1–6

  3. [3]

    A review, classification, and comparative evaluation of approximate arithmetic circuits,

    H. Jiang, C. Liu, L. Liu, F. Lombardi, and J. Han, “A review, classification, and comparative evaluation of approximate arithmetic circuits,” J. Emerg. Technol. Comput. Syst. , vol. 13, no. 4, Aug. 2017

  4. [4]

    Approximate arithmetic circuits: A survey, characterization, and recent applications,

    H. Jiang, F. J. H. Santiago, H. Mo, L. Liu, and J. Han, “Approximate arithmetic circuits: A survey, characterization, and recent applications,” Proceedings of the IEEE, vol. 108, no. 12, pp. 2108–2135, 2020

  5. [5]

    Systematic design of an approximate adder: The opti- mized lower part constant-or adder,

    A. Dalloo, A. Najafi, and A. Garcia-Ortiz, “Systematic design of an approximate adder: The opti- mized lower part constant-or adder,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 26, no. 8, pp. 1595–1599, 2018

  6. [6]

    Design and evaluation of low power error tolerant adder,

    R. NA, A. A. A, and S. P, “Design and evaluation of low power error tolerant adder,” in 2023 International Conference on Next Generation Electronics (NEleX) , 2023, pp. 1–6

  7. [7]

    Design of approximate radix-4 booth multipliers for error-tolerant computing,

    W. Liu, L. Qian, C. Wang, H. Jiang, J. Han, and F. Lombardi, “Design of approximate radix-4 booth multipliers for error-tolerant computing,” IEEE Transactions on Computers , vol. 66, no. 8, pp. 1435–1441, 2017

  8. [8]

    Design and analysis of approximate redundant binary multipliers,

    W. Liu, T. Cao, P. Yin, Y. Zhu, C. Wang, E. E. Swartzlander, and F. Lombardi, “Design and analysis of approximate redundant binary multipliers,” IEEE Transactions on Computers , vol. 68, no. 6, pp. 804–819, 2019

  9. [9]

    Energy-efficient logarithmic square rooter for error-resilient applications,

    N. Arya, M. Pattanaik, and G. Sharma, “Energy-efficient logarithmic square rooter for error-resilient applications,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems , vol. 29, no. 11, pp. 1994–1997, 11 2021

  10. [10]

    Esas: Exponent series based approximate square root design,

    O. G. Ratnaparkhi and M. Rao, “Esas: Exponent series based approximate square root design,” in 2022 25th Euromicro Conference on Digital System Design (DSD) , 2022, pp. 39–45

  11. [11]

    Low-power hardware architecture of optimized logarithmic square rooter with enhanced error compensation for error-tolerant systems,

    P. Goyal and S. K. Sahoo, “Low-power hardware architecture of optimized logarithmic square rooter with enhanced error compensation for error-tolerant systems,” Integration, vol. 105, p. 102522, 2025

  12. [12]

    Cwaha: Cluster-wise approximation for hardware implementation of arithmetic functions,

    O. G. Ratnaparkhi and M. Rao, “Cwaha: Cluster-wise approximation for hardware implementation of arithmetic functions,” in 2023 IEEE Computer Society Annual Symposium on VLSI (ISVLSI) , 2023, pp. 1–6

  13. [13]

    Vivado design suite tutorial: Power analysis and optimization,

    X. Inc., “Vivado design suite tutorial: Power analysis and optimization,” 2022, available at: https: //www.xilinx.com. 10

  14. [14]

    New metrics for the reliability of approximate and probabilistic adders,

    J. Liang, J. Han, and F. Lombardi, “New metrics for the reliability of approximate and probabilistic adders,” IEEE Transactions on Computers , vol. 62, pp. 1760–1771, 09 2013

  15. [15]

    Gonzalez and R

    R. Gonzalez and R. Woods, Digital Image Processing . Upper Saddle River, N.J.: Prentice Hall, 2008

  16. [16]

    Design tradeoffs in a hardware implemen- tation of the k-means clustering algorithm,

    M. Leeser, J. Theiler, M. Estlick, and J. Szymanski, “Design tradeoffs in a hardware implemen- tation of the k-means clustering algorithm,” in Proceedings of the 2000 IEEE Sensor Array and Multichannel Signal Processing Workshop. SAM 2000 (Cat. No.00EX410) , 2000, pp. 520–524. 11