arxiv: 2605.08231 · v1 · submitted 2026-05-06 · 💻 cs.LG · cs.AI· cs.AR

Recognition: 2 theorem links

· Lean Theorem

TRAM: Training Approximate Multiplier Structures for Low-Power AI Accelerators

Chang Meng , Hanyu Wang , Yuyang Ye , Mingfei Yu , Wayne Burleson , Giovanni De Micheli

Authors on Pith no claims yet

Pith reviewed 2026-05-12 00:52 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.AR

keywords approximate computingmultipliersAI acceleratorslow-power designjoint optimizationCNNvision transformerspower reduction

0 comments

The pith

TRAM jointly optimizes approximate multiplier structures with AI model parameters to reduce power while limiting accuracy loss.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TRAM as a method that trains approximate multipliers together with the AI model instead of designing multipliers separately first. This co-optimization targets the power-hungry multiplier units inside neural networks to find structures that consume less energy for the specific computations the model performs. Experiments on CNNs with CIFAR-10 show up to 25 percent lower multiplier power than prior approximate multipliers, and similar gains appear for vision transformers on ImageNet. A sympathetic reader would care because multipliers dominate energy use in AI accelerators, and any reduction that keeps accuracy drops small could extend battery life or lower cooling costs in deployed systems. The approach treats the multiplier configuration as an additional trainable element during backpropagation.

Core claim

TRAM performs joint optimization of approximate multiplier structures and model weights during training, using a power estimation model to penalize high-power configurations while the loss function keeps accuracy high. This produces multiplier designs tailored to each layer's data patterns rather than generic approximations chosen in advance.

What carries the argument

The TRAM joint optimization loop, which alternates or combines updates to model parameters and to the selection or bit-width choices inside approximate multipliers using differentiable power proxies.

If this is right

AI training pipelines can incorporate multiplier structure search as a standard step without separate hardware design phases.
Power reduction scales with model size because the savings apply to every multiplication operation in the network.
The same framework can be reused across different model families by simply changing the training dataset and architecture.
Designers gain a direct knob to trade accuracy for power by adjusting the weight of the power penalty term.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hardware-aware training of this form may become routine for any edge device where power is the primary constraint.
Extending the method to other approximate units such as adders or activation functions would follow the same joint-optimization pattern.
If the power model can be made differentiable at the gate level, the approach could move from simulation to direct silicon optimization.
Mobile and IoT applications could see the largest practical impact because the reported percentage savings compound over millions of inferences.

Load-bearing premise

The power estimation model used inside training must closely match the actual power draw of the final hardware multiplier implementation.

What would settle it

Fabricate the TRAM-designed multipliers in silicon or on an FPGA and measure real power under the same workloads; if the measured savings fall below the simulated figures by more than the reported margin, the joint-optimization benefit does not hold.

Figures

Figures reproduced from arXiv: 2605.08231 by Chang Meng, Giovanni De Micheli, Hanyu Wang, Mingfei Yu, Wayne Burleson, Yuyang Ye.

**Figure 2.** Figure 2: TRAM framework overview. Experimental results show that, compared to state-of-the-art AxM designs, TRAM reduces AxM power by up to 25.05% on CNNs with CIFAR-10 at the same accuracy level, and by 27.09% on vision transformers with ImageNet. Since TRAM allows different structure parameters for different model layers, it naturally supports layerwise application of different AxMs. Compared to the state-of-the… view at source ↗

**Figure 3.** Figure 3: Dataflow for computing the objective function in Eq. [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗

**Figure 4.** Figure 4: Comparison of final accuracy and AxM power con [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Impact of 𝜆 on DenseNet161 accuracy and AxM power under w4a4. Power is normalized to the 4-bit AccMul [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

Reducing power consumption in AI accelerators is increasingly important. Approximate computing can reduce power consumption while keeping the accuracy loss small. Since multipliers are power-hungry components in AI models, this paper focuses on synthesizing low-power approximate multipliers (AxMs). Unlike prior works that design AxMs separately from AI model training, we present TRAM, which jointly optimizes the AxM structure and AI model parameters to lower power with small accuracy loss. Experiments show that compared to state-of-the-art AxMs, TRAM achieves up to 25.05% AxM power reduction on CNNs with CIFAR-10, and reduces power by up to 27.09% on vision transformers with ImageNet.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TRAM jointly optimizes approximate multiplier structures with model weights during training and reports power cuts versus prior AxMs, but the gains depend on an unvalidated power estimator.

read the letter

The paper's main contribution is TRAM, which optimizes both the approximate multiplier structure and the AI model parameters in a single training loop. This differs from earlier work that designs the multipliers separately before training the model to tolerate their errors. The experiments give concrete numbers on standard tasks: up to 25.05% lower multiplier power on CNNs with CIFAR-10 and up to 27.09% on vision transformers with ImageNet, while keeping accuracy loss small. The model adapts to the chosen approximations during training, which is a reasonable way to find better trade-offs for power-hungry multipliers in accelerators. The approach is practical because it targets a real bottleneck in AI hardware and uses public datasets for the comparisons. The soft spot is the power model itself. Training uses a differentiable estimator to score different multiplier structures, yet the paper provides no correlation data between those estimates and actual post-synthesis or FPGA power measurements on the final designs. If the estimator is biased, the reported savings may not appear in real hardware. That concern is load-bearing and worth checking. There is no circular reasoning or obvious fitting problems in the presented results. This work is aimed at researchers doing hardware-aware training or approximate computing for accelerators. A reader focused on energy efficiency in deployed AI systems would find the benchmark numbers useful to consider. It deserves a serious referee because the core idea is testable and the claims are quantified on relevant tasks. Reviewers can ask for the missing hardware validation. I recommend sending it to peer review.

Referee Report

3 major / 3 minor

Summary. The manuscript introduces TRAM, a method that jointly optimizes approximate multiplier (AxM) structures together with AI model parameters during training to reduce power consumption in AI accelerators while incurring only small accuracy loss. Experiments on CNNs with CIFAR-10 and vision transformers with ImageNet report up to 25.05% and 27.09% AxM power reductions, respectively, relative to state-of-the-art fixed AxMs.

Significance. If the reported power savings are confirmed by hardware measurements that match the in-training estimator, TRAM would advance approximate computing by moving AxM design from a separate post-training step into the model training loop itself. This integrated approach could yield more effective power-accuracy trade-offs for multiplier-heavy workloads on edge devices.

major comments (3)

[§3.2 (Differentiable Power Model)] §3.2 (Differentiable Power Model): The central empirical claims rest on the assumption that the differentiable power estimator used to guide AxM structure search during joint optimization accurately predicts post-synthesis or FPGA power. No correlation coefficient, scatter plot, or error metric between the estimator and actual gate-level power on the final TRAM structures is reported; because the optimization directly trades accuracy against this estimated objective, any systematic mismatch directly undermines the 25%+ reduction figures.
[§5 (Experimental Evaluation)] §5 (Experimental Evaluation): The reported power reductions (25.05% on CIFAR-10 CNNs, 27.09% on ImageNet ViTs) are presented without details on power measurement methodology (synthesis tool, operating conditions), exact baseline AxM implementations and their configurations, number of independent runs, error bars, or statistical significance tests. These omissions are load-bearing for assessing whether the gains exceed those of fixed SOTA AxMs under comparable conditions.
[§4.1 (Joint Optimization)] §4.1 (Joint Optimization): The joint training procedure introduces additional structural parameters for the AxM and an extra loss term for estimated power. No ablation or analysis is provided on training stability, sensitivity to the power-accuracy trade-off hyperparameter, or the increase in wall-clock training time relative to standard fine-tuning, which is necessary to evaluate the practicality of the claimed co-optimization.

minor comments (3)

[Abstract] Abstract: The phrase 'state-of-the-art AxMs' is used without naming the specific prior designs; adding a short parenthetical or reference to the compared methods would improve immediate clarity.
[§2 (Related Work)] §2 (Related Work): Several recent hardware-aware approximate multiplier papers that include FPGA power measurements are not cited; including them would better situate the contribution.
[Notation] Notation: The symbols for AxM bit-width parameters, error metrics, and power terms are introduced piecemeal; a consolidated notation table would aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive feedback on our manuscript. We have carefully considered each major comment and provide point-by-point responses below. Where appropriate, we will revise the manuscript to address the concerns raised.

read point-by-point responses

Referee: [§3.2 (Differentiable Power Model)] §3.2 (Differentiable Power Model): The central empirical claims rest on the assumption that the differentiable power estimator used to guide AxM structure search during joint optimization accurately predicts post-synthesis or FPGA power. No correlation coefficient, scatter plot, or error metric between the estimator and actual gate-level power on the final TRAM structures is reported; because the optimization directly trades accuracy against this estimated objective, any systematic mismatch directly undermines the 25%+ reduction figures.

Authors: We agree that empirical validation of the differentiable power model is essential to support our claims. In the revised manuscript, we will add a new subsection or appendix providing a correlation analysis between the estimator and post-synthesis power measurements for the TRAM-optimized multipliers. This will include scatter plots, Pearson correlation coefficients, and mean absolute error metrics to quantify the estimator's accuracy. revision: yes
Referee: [§5 (Experimental Evaluation)] §5 (Experimental Evaluation): The reported power reductions (25.05% on CIFAR-10 CNNs, 27.09% on ImageNet ViTs) are presented without details on power measurement methodology (synthesis tool, operating conditions), exact baseline AxM implementations and their configurations, number of independent runs, error bars, or statistical significance tests. These omissions are load-bearing for assessing whether the gains exceed those of fixed SOTA AxMs under comparable conditions.

Authors: We acknowledge the need for greater transparency in the experimental setup. We will revise Section 5 to include detailed descriptions of the power measurement methodology, including the synthesis tool and operating conditions. We will specify the exact configurations of the baseline approximate multipliers from prior works. Additionally, we will report results from multiple independent runs with error bars and include statistical significance tests to confirm that the improvements are significant. revision: yes
Referee: [§4.1 (Joint Optimization)] §4.1 (Joint Optimization): The joint training procedure introduces additional structural parameters for the AxM and an extra loss term for estimated power. No ablation or analysis is provided on training stability, sensitivity to the power-accuracy trade-off hyperparameter, or the increase in wall-clock training time relative to standard fine-tuning, which is necessary to evaluate the practicality of the claimed co-optimization.

Authors: We will enhance the discussion in Section 4.1 by adding an ablation study on the sensitivity of results to the trade-off hyperparameter λ, showing performance across a range of values. We will also provide analysis on training stability and report the observed increase in wall-clock training time relative to standard fine-tuning. These additions will better demonstrate the practicality of TRAM. revision: yes

Circularity Check

0 steps flagged

No significant circularity in TRAM's joint optimization and empirical results

full rationale

The paper presents TRAM as a joint optimization framework for approximate multiplier (AxM) structures and AI model parameters, with power savings reported as experimental outcomes on public datasets (CIFAR-10 for CNNs, ImageNet for vision transformers). No derivation chain reduces a claimed prediction or first-principles result to its own inputs by construction. Power estimation occurs during training as part of the optimization objective, but the final claims are empirical comparisons to SOTA AxMs rather than self-referential identities or fitted quantities renamed as predictions. Self-citations, if present, are not load-bearing for uniqueness or ansatz adoption in a way that creates circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no specific free parameters, axioms, or invented entities can be identified; the method appears to rely on standard optimization and power modeling techniques from prior literature.

pith-pipeline@v0.9.0 · 5429 in / 997 out tokens · 36967 ms · 2026-05-12T00:52:15.469923+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We parameterize the AxM structure using continuous structure parameters... θ_c^(l) ∈ [0,1] controls the approximation degree of column c... f_power(Θ^(l)) = (Power_AccMul − Σ θ_c · Power_c) / Power_AccMul
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

min_Θ,W (L_power(Θ)·λ + L_AI_model(Θ,W,X)) solved by retraining

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Green AI.Commu- nications of the ACM, 63(12):54–63, 2020

Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. Green AI.Commu- nications of the ACM, 63(12):54–63, 2020

work page 2020
[2]

Approximate computing survey, part II: Application-specific & architectural approximation techniques and applications.ACM Computing Surveys, 57(7):1–36, 2025

Vasileios Leon, Muhammad Abdullah Hanif, Giorgos Armeniakos, Xun Jiao, Muhammad Shafique, Kiamal Pekmestzi, and Dimitrios Soudris. Approximate computing survey, part II: Application-specific & architectural approximation techniques and applications.ACM Computing Surveys, 57(7):1–36, 2025

work page 2025
[3]

Hardware approximate techniques for deep neural network accelerators: A survey

Giorgos Armeniakos, Georgios Zervakis, Dimitrios Soudris, and Jörg Henkel. Hardware approximate techniques for deep neural network accelerators: A survey. ACM Computing Surveys (CSUR), 55(4):1–36, 2022

work page 2022
[4]

A survey on approximate multiplier designs for energy efficiency: From algorithms to circuits.ACM Transactions on Design Automation of Electronic Systems (TODAES), 29(1):1–37, 2024

Ying Wu, Chuangtao Chen, Weihua Xiao, Xuan Wang, Chenyi Wen, Jie Han, Xun- zhao Yin, Weikang Qian, and Cheng Zhuo. A survey on approximate multiplier designs for energy efficiency: From algorithms to circuits.ACM Transactions on Design Automation of Electronic Systems (TODAES), 29(1):1–37, 2024

work page 2024
[5]

EvoAp- prox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods

Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek, and Lukas Sekanina. EvoAp- prox8b: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. InDesign, Automation & Test in Europe Conference & Exhibition (DATE), pages 258–261, 2017

work page 2017
[6]

Libraries of approximate circuits: Automated design and application in CNN accelerators.IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 10(4):406–418, 2020

Vojtech Mrazek, Lukas Sekanina, and Zdenek Vasicek. Libraries of approximate circuits: Automated design and application in CNN accelerators.IEEE Journal on Emerging and Selected Topics in Circuits and Systems, 10(4):406–418, 2020

work page 2020
[7]

OPACT: Optimization of approxi- mate compressor tree for approximate multiplier

Weihua Xiao, Cheng Zhuo, and Weikang Qian. OPACT: Optimization of approxi- mate compressor tree for approximate multiplier. InDesign, Automation & Test in Europe Conference & Exhibition (DATE), pages 178–183, 2022

work page 2022
[8]

A configurable approximate multiplier for CNNs using partial product speculation

Xiaolu Hu, Ao Liu, Xinkuang Geng, Zizhong Wei, Kai Jiang, and Honglan Jiang. A configurable approximate multiplier for CNNs using partial product speculation. InDesign, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6, 2024

work page 2024
[9]

DASALS: Differentiable architecture search-driven approximate logic synthesis

Xuan Wang, Zheyu Yan, Chang Meng, Yiyu Shi, and Weikang Qian. DASALS: Differentiable architecture search-driven approximate logic synthesis. InInter- national Conference on Computer Aided Design (ICCAD), pages 1–9, 2023

work page 2023
[10]

Approximate logic synthesis using Boolean matrix factorization.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 41(1):15–28, 2021

Jingxiao Ma, Soheil Hashemi, and Sherief Reda. Approximate logic synthesis using Boolean matrix factorization.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 41(1):15–28, 2021

work page 2021
[11]

Simulation-guided ap- proximate logic synthesis under the maximum error constraint.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2026

Chang Meng, Weikang Qian, and Giovanni De Micheli. Simulation-guided ap- proximate logic synthesis under the maximum error constraint.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2026

work page 2026
[12]

Exact neural networks from inexact multi- pliers via Fibonacci weight encoding

William Andrew Simon, Valérian Ray, Alexandre Levisse, Giovanni Ansaloni, Marina Zapater, and David Atienza. Exact neural networks from inexact multi- pliers via Fibonacci weight encoding. InDesign Automation Conference (DAC), pages 805–810, 2021

work page 2021
[13]

Learning to design accurate deep learning accelerators with inaccu- rate multipliers

Paras Jain, Safeen Huda, Martin Maas, Joseph E Gonzalez, Ion Stoical, and Azalia Mirhoseini. Learning to design accurate deep learning accelerators with inaccu- rate multipliers. InDesign, Automation & Test in Europe Conference & Exhibition (DATE), pages 184–189, 2022

work page 2022
[14]

Gradient approximation of approximate multipliers for high-accuracy deep neural network retraining

Chang Meng, Wayne Burleson, Weikang Qian, and Giovanni De Micheli. Gradient approximation of approximate multipliers for high-accuracy deep neural network retraining. InDesign, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–7, 2025

work page 2025
[15]

HEAM: High-efficiency approximate multiplier optimization for deep neural networks

Su Zheng, Zhen Li, Yao Lu, Jingbo Gao, Jide Zhang, and Lingli Wang. HEAM: High-efficiency approximate multiplier optimization for deep neural networks. In IEEE International Symposium on Circuits and Systems (ISCAS), pages 3359–3363, 2022

work page 2022
[16]

Approximate arithmetic circuits: A survey, characterization, and recent applications.Proceedings of the IEEE, 108(12):2108–2135, 2020

Honglan Jiang, Francisco Javier Hernandez Santiago, Hai Mo, Leibo Liu, and Jie Han. Approximate arithmetic circuits: A survey, characterization, and recent applications.Proceedings of the IEEE, 108(12):2108–2135, 2020

work page 2020
[17]

Quantization and training of neural networks for efficient integer-arithmetic-only inference

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, An- drew Howard, Hartwig Adam, and Dmitry Kalenichenko. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 2704– 2713, 2018

work page 2018
[18]

OmniQuant: Omnidirectionally calibrated quantization for large language models.International Conference on Learning Representations (ICLR), pages 1–25, 2024

Wenqi Shao, Mengzhao Chen, Zhaoyang Zhang, Peng Xu, Lirui Zhao, Zhiqian Li, Kaipeng Zhang, Peng Gao, Yu Qiao, and Ping Luo. OmniQuant: Omnidirectionally calibrated quantization for large language models.International Conference on Learning Representations (ICLR), pages 1–25, 2024

work page 2024
[19]

ALSRAC: Approximate logic synthesis by resubstitution with approximate care set

Chang Meng, Weikang Qian, and Alan Mishchenko. ALSRAC: Approximate logic synthesis by resubstitution with approximate care set. InDesign Automation Conference (DAC), pages 1–6, 2020

work page 2020
[20]

HEDALS: Highly efficient delay-driven approximate logic synthesis.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 42(11):3491–3504, 2023

Chang Meng, Zhuangzhuang Zhou, Yue Yao, Shuyang Huang, Yuhang Chen, and Weikang Qian. HEDALS: Highly efficient delay-driven approximate logic synthesis.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 42(11):3491–3504, 2023

work page 2023
[21]

BLASYS: Approximate logic synthesis using boolean matrix factorization

Soheil Hashemi, Hokchhay Tann, and Sherief Reda. BLASYS: Approximate logic synthesis using boolean matrix factorization. InDesign Automation Conference (DAC), pages 1–6, 2018

work page 2018
[22]

Ef- ficient resubstitution-based approximate logic synthesis.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 44(6):2040–2053, 2025

Chang Meng, Alan Mishchenko, Weikang Qian, and Giovanni De Micheli. Ef- ficient resubstitution-based approximate logic synthesis.IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 44(6):2040–2053, 2025

work page 2040
[23]

PyTorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, et al. PyTorch: An imperative style, high-performance deep learning library. InInternational Conference on Neural Information Processing Systems (NeurIPS), pages 8026–8037, 2019

work page 2019
[24]

Learning multiple layers of features from tiny images.Technical Report, University of Toronto, 2009

Alex Krizhevsky and Geoffrey Hinton. Learning multiple layers of features from tiny images.Technical Report, University of Toronto, 2009

work page 2009
[25]

ImageNet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. ImageNet: A large-scale hierarchical image database. InIEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009

work page 2009
[26]

ASAP7: A 7-nm FinFET predictive process design kit.Microelectronics Journal, 53:105–115, 2016

Lawrence T Clark, Vinay Vashishtha, Lucian Shifren, Aditya Gujja, Saurabh Sinha, Brian Cline, Chandarasekaran Ramamurthy, and Greg Yeric. ASAP7: A 7-nm FinFET predictive process design kit.Microelectronics Journal, 53:105–115, 2016

work page 2016
[27]

VECSEM: Verifying average errors in approximate circuits using simulation- enhanced model counting

Chang Meng, Hanyu Wang, Yuqi Mai, Weikang Qian, and Giovanni De Micheli. VECSEM: Verifying average errors in approximate circuits using simulation- enhanced model counting. InDesign, Automation & Test in Europe Conference & Exhibition (DATE), pages 1–6, 2024

work page 2024
[28]

AdaPT: Fast emulation of approximate DNN accelerators in PyTorch

Dimitrios Danopoulos, Georgios Zervakis, Kostas Siozios, Dimitrios Soudris, and Jörg Henkel. AdaPT: Fast emulation of approximate DNN accelerators in PyTorch. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 42(6):2074–2078, 2022

work page 2074
[29]

TransAxx: Efficient transformers with approximate computing.IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2(4):288–301, 2025

Dimitrios Danopoulos, Georgios Zervakis, Dimitrios Soudris, and Jörg Henkel. TransAxx: Efficient transformers with approximate computing.IEEE Transactions on Circuits and Systems for Artificial Intelligence, 2(4):288–301, 2025

work page 2025
[30]

Flavia Guella, Emanuele Valpreda, Michele Caon, Guido Masera, and Maurizio Martina. MARLIN: A co-design methodology for approximate reconfigurable inference of neural networks at the edge.IEEE Transactions on Circuits and Systems I: Regular Papers (TCAS-I), 71(5):2105–2118, 2024

work page 2024
[31]

ALWANN: Automatic layer-wise approximation of deep neural network accelerators without retraining

Vojtech Mrazek et al. ALWANN: Automatic layer-wise approximation of deep neural network accelerators without retraining. InInternational Conference on Computer Aided Design (ICCAD), pages 1–8, 2019

work page 2019