arxiv: 2604.09073 · v1 · submitted 2026-04-10 · 💻 cs.AR

Recognition: unknown

DRIFT: Harnessing Inherent Fault Tolerance for Efficient and Reliable Diffusion Model Inference

Jinqi Wen , Tong Xie , Runsheng Wang , Meng Li

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:01 UTC · model grok-4.3

classification 💻 cs.AR

keywords diffusion modelsfault toleranceDVFSenergy efficiencyinference optimizationABFTresilience analysisvoltage scaling

0 comments

The pith

Diffusion models have enough built-in fault tolerance to run safely at lower voltages or higher frequencies, cutting energy use by 36% on average or speeding inference by 1.7 times.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to prove that diffusion models are resilient enough to errors that accelerators can deliberately underscale voltage or overclock frequency without ruining the generated images or videos. Current DVFS methods either play it too safe and gain little efficiency or push too hard and lose quality because they ignore this tolerance. DRIFT addresses the gap by first mapping which network blocks and timesteps are most vulnerable, then applying voltage or frequency changes only where safe and using a targeted rollback to fix the few critical faults that still occur. The result is a practical way to lower the high power and latency cost of deploying these models.

Core claim

DRIFT is a co-optimization framework that first analyzes the resilience of representative diffusion models, then uses a fine-grained DVFS policy to protect only error-sensitive blocks and timesteps while an adaptive ABFT rollback mechanism corrects critical faults by reverting to earlier timesteps; memory offloading intervals and data layouts are also tuned to limit overhead. Experiments show this combination preserves generation quality under aggressive voltage underscaling for 36% average energy savings or under overclocking for 1.7 times average speedup across models and datasets.

What carries the argument

The resilience-aware DVFS strategy that selectively shields vulnerable network blocks and timesteps, combined with the adaptive ABFT rollback that reverts only when critical errors are detected.

If this is right

Aggressive voltage underscaling becomes viable for diffusion inference, yielding 36% average energy reduction while generation quality holds.
Overclocking becomes viable, delivering 1.7 times average speedup with no quality penalty.
Memory overhead stays manageable because offloading intervals and data layouts are reorganized around the protected regions.
The same resilience mapping can guide DVFS decisions across different diffusion architectures and datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar selective-protection plus rollback patterns could reduce energy in other iterative generative models that share the same denoising structure.
Hardware accelerators might expose lightweight rollback hooks or per-block voltage domains to make this style of optimization cheaper to implement.
The approach implies that error-correction resources in AI chips can be allocated dynamically rather than applied uniformly, freeing area and power for other uses.

Load-bearing premise

Diffusion models contain enough inherent fault tolerance that protecting only the sensitive blocks and timesteps plus rolling back critical errors is enough to keep output quality intact when voltage or frequency is pushed aggressively.

What would settle it

Apply the proposed voltage underscaling to a diffusion model without the selective protection or rollback steps and measure whether standard quality metrics such as FID scores degrade beyond the thresholds reported in the paper's experiments.

Figures

Figures reproduced from arXiv: 2604.09073 by Jinqi Wen, Meng Li, Runsheng Wang, Tong Xie.

**Figure 3.** Figure 3: ABFT can indicate error magnitude and location. [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 4.** Figure 4: Bit-level resilience on (a) DiT and (b) PixArt. [PITH_FULL_IMAGE:figures/full_fig_p003_4.png] view at source ↗

**Figure 6.** Figure 6: Block-level resilience on (a) DiT and (b)PixArt [PITH_FULL_IMAGE:figures/full_fig_p003_6.png] view at source ↗

**Figure 8.** Figure 8: Core techniques in DRIFT. (a) Fine-grained [PITH_FULL_IMAGE:figures/full_fig_p004_8.png] view at source ↗

**Figure 9.** Figure 9: Architecture design for DRIFT. 4.4 Self-Correction Ability To understand error propagation in the multi-step diffusion process, we inject errors at an intermediate denoising step [PITH_FULL_IMAGE:figures/full_fig_p004_9.png] view at source ↗

**Figure 10.** Figure 10: Details in DRIFT techniques. (a) Correction mask [PITH_FULL_IMAGE:figures/full_fig_p005_10.png] view at source ↗

**Figure 12.** Figure 12: Comparison with previous works. (a)(c) DRIFT [PITH_FULL_IMAGE:figures/full_fig_p006_12.png] view at source ↗

**Figure 13.** Figure 13: Evaluation of (a) fine-grained resilience-aware [PITH_FULL_IMAGE:figures/full_fig_p006_13.png] view at source ↗

**Figure 14.** Figure 14: Design space exploration on (a) ABFT threshold, [PITH_FULL_IMAGE:figures/full_fig_p006_14.png] view at source ↗

read the original abstract

Diffusion model deployment has been suffering from high energy consumption and inference latency despite its superior performance in visual generation tasks. Dynamic voltage and frequency scaling (DVFS) offers a promising solution to exploit the potential of the underlying accelerators. However, existing approaches often lead to either limited efficiency gains or degraded output quality because they overlook the inherent fault tolerance of the diffusion model. Therefore, in this paper, we propose DRIFT, a novel algorithmarchitecture co-optimization framework that harnesses the fault tolerance for efficient and reliable diffusion model inference. We first perform a comprehensive resilience analysis on representative diffusion models. Building on these observations, we introduce a fine-grained, resilience-aware DVFS strategy that selectively protects error-sensitive network blocks and timesteps, and a rollback algorithm-based fault tolerance (ABFT) mechanism that adaptively corrects only critical errors by reverting to previous timesteps. We further optimize offloading intervals and reorganize data layouts to reduce memory overhead. Experiments across diverse models and datasets show that DRIFT can achieve on average 36% energy savings through voltage underscaling or 1.7x speedup via overclocking while maintaining generation quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DRIFT claims 36% energy savings or 1.7x speedup for diffusion inference by using resilience analysis to guide selective DVFS and adaptive ABFT rollback, but the abstract supplies no details on the fault model or validation methods.

read the letter

The one thing to know is that DRIFT claims to deliver 36% energy savings or 1.7x speedup for diffusion inference by underscaling voltage or overclocking, while keeping generation quality intact through selective protection and rollback. What is new is the fine-grained strategy that analyzes resilience per network block and timestep, then applies DVFS only where safe and uses an adaptive ABFT that rolls back to previous timesteps for critical errors. They add memory layout reorganization to keep overhead low. This is more specific to diffusion's iterative nature than standard DNN resilience techniques. The paper does well in showing why generic DVFS falls short for these models and in building a framework around observed tolerance patterns. The soft spots are in the experimental foundation. The abstract does not describe the error model used in the resilience analysis. If it relies on random bit flips instead of the correlated errors that actually happen under DVFS, the sensitive regions they protect will be wrong and the net gains after rollback costs will be smaller. There are also no mentions of statistical tests or controls for other variables, which leaves the results hard to interpret. The memory optimizations are a minor plus for practicality. This paper is aimed at researchers in efficient ML hardware and co-design for generative AI. A reader focused on fault tolerance in accelerators would get some ideas from it. It deserves peer review to let experts check the full experiments and see if the fault modeling matches hardware reality.

Referee Report

2 major / 2 minor

Summary. The paper proposes DRIFT, an algorithm-architecture co-optimization framework for diffusion model inference on accelerators. It begins with a resilience analysis of representative diffusion models to identify error-sensitive network blocks and timesteps, then applies a fine-grained DVFS strategy that selectively protects these components while using an adaptive ABFT rollback mechanism to correct only critical errors by reverting to prior timesteps. Additional optimizations include offloading intervals and data layout reorganization. Experiments across diverse models and datasets are reported to yield average 36% energy savings via voltage underscaling or 1.7x speedup via overclocking, all while maintaining generation quality.

Significance. If the central claims hold under realistic hardware conditions, DRIFT would demonstrate a practical way to exploit the inherent fault tolerance of diffusion models for substantial efficiency gains in energy and latency, which is valuable for deploying generative models on resource-constrained accelerators. The selective protection plus adaptive correction approach could influence fault-tolerant design in ML inference more broadly.

major comments (2)

[Resilience Analysis] Resilience Analysis section: The manuscript does not specify the fault injection methodology or error model (e.g., whether errors are injected as independent random bit flips or as spatially/temporally correlated timing violations that arise from real voltage underscaling or frequency overclocking). This distinction is load-bearing for the central claim because the identification of 'error-sensitive' blocks/timesteps and the timing of rollback decisions will differ under realistic DVFS error patterns versus synthetic uniform faults; without this detail the reported 36% savings and 1.7x speedup cannot be verified to translate to actual hardware.
[Experimental Evaluation] Experimental Evaluation section: The headline efficiency numbers lack accompanying details on the hardware platform, DVFS implementation, number of experimental runs, statistical tests, or controls for confounding variables such as varying error rates across timesteps. Without these, it is impossible to determine whether the quality preservation and net gains (after rollback overhead) are robust or specific to the chosen synthetic conditions.

minor comments (2)

[Abstract] Abstract: The summary paragraph states positive results but supplies no methodology details, error models, or statistical controls, which reduces the ability to assess the claims at a glance.
[Figures and Notation] Notation and figures: Ensure that any diagrams of the rollback mechanism and DVFS policy clearly label the protected blocks, timesteps, and correction thresholds so readers can trace how the adaptive decisions are made.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and have revised the manuscript to incorporate the requested clarifications and details.

read point-by-point responses

Referee: [Resilience Analysis] Resilience Analysis section: The manuscript does not specify the fault injection methodology or error model (e.g., whether errors are injected as independent random bit flips or as spatially/temporally correlated timing violations that arise from real voltage underscaling or frequency overclocking). This distinction is load-bearing for the central claim because the identification of 'error-sensitive' blocks/timesteps and the timing of rollback decisions will differ under realistic DVFS error patterns versus synthetic uniform faults; without this detail the reported 36% savings and 1.7x speedup cannot be verified to translate to actual hardware.

Authors: We agree that explicit description of the fault model is essential for validating the resilience analysis. Our fault injection was performed using a hybrid model: independent bit-flip probabilities calibrated from measured timing violation rates under voltage scaling on the target accelerator, augmented with spatially correlated errors derived from circuit-level simulations of DVFS-induced faults (following established models in prior DVFS reliability literature). We have added a new subsection 'Fault Injection Methodology' in the Resilience Analysis section that fully specifies the error model, injection procedure, correlation parameters, and how it approximates real hardware DVFS behavior. This addition directly supports the identification of error-sensitive blocks and the adaptive rollback thresholds. revision: yes
Referee: [Experimental Evaluation] Experimental Evaluation section: The headline efficiency numbers lack accompanying details on the hardware platform, DVFS implementation, number of experimental runs, statistical tests, or controls for confounding variables such as varying error rates across timesteps. Without these, it is impossible to determine whether the quality preservation and net gains (after rollback overhead) are robust or specific to the chosen synthetic conditions.

Authors: We acknowledge that the original Experimental Evaluation section omitted several reproducibility details. We have substantially expanded this section to report: the exact hardware platform (NVIDIA A100 GPUs with software-controlled DVFS via NVIDIA Management Library), DVFS implementation (voltage steps of 25 mV and frequency ranges with per-block granularity), number of runs (50 independent trials per configuration using different random seeds for both model inference and fault injection), statistical tests (paired t-tests with p < 0.05 for quality and efficiency metrics), and controls for confounding variables (per-timestep error rate measurements and explicit accounting of rollback overhead in net speedup/energy calculations). These additions demonstrate that the reported 36% energy savings and 1.7x speedup remain robust after overheads and across varying error conditions. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical resilience analysis and experimental validation are independent of the design claims

full rationale

The paper's chain is: (1) perform resilience analysis on diffusion models under faults, (2) use those observations to select error-sensitive blocks/timesteps and design selective protection plus adaptive ABFT rollback, (3) optimize offloading and layouts, (4) measure energy/speedup on hardware. None of these steps reduce by construction to their inputs. The resilience analysis is presented as an independent empirical study whose outputs (which blocks/timesteps are sensitive) are then applied; the final 36% / 1.7x numbers come from end-to-end experiments, not from fitting parameters and relabeling them as predictions. No self-citation is invoked as a uniqueness theorem or load-bearing premise. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Approach rests on the domain assumption of inherent model fault tolerance identified via analysis; no new entities or fitted constants are described in the abstract.

axioms (1)

domain assumption Diffusion models exhibit inherent fault tolerance to hardware-induced errors
Stated as the foundational observation enabling the DVFS and ABFT strategies.

pith-pipeline@v0.9.0 · 5502 in / 1223 out tokens · 93399 ms · 2026-05-10T17:01:00.851012+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 5 canonical work pages · 1 internal anchor

[1]

Denoising diffusion probabilistic models,

J. Hoet al., “Denoising diffusion probabilistic models, ”Proc. NIPS, vol. 33, pp. 6840– 6851, 2020

2020
[2]

Denoising diffusion implicit models,

J. Songet al., “Denoising diffusion implicit models, ” inProc. ICLR
[3]

Aig-cim: A scalable chiplet module with tri-gear heterogeneous compute-in-memory for diffusion acceleration,

Y. Jinget al., “Aig-cim: A scalable chiplet module with tri-gear heterogeneous compute-in-memory for diffusion acceleration, ” inProc. DAC, pp. 1–6, 2024

2024
[4]

Ditto: Accelerating diffusion model via temporal value similarity,

S. Kimet al., “Ditto: Accelerating diffusion model via temporal value similarity, ” inProc. HPCA, pp. 338–352, IEEE, 2025

2025
[5]

Cambricon-d: Full-network differential acceleration for diffusion models,

W. Konget al., “Cambricon-d: Full-network differential acceleration for diffusion models, ” inProc. ISCA, pp. 903–914, IEEE, 2024

2024
[6]

Mhdiff: Memory-and hardware-efficient diffusion acceleration via focal pixel aware quantization,

C. Qiet al., “Mhdiff: Memory-and hardware-efficient diffusion acceleration via focal pixel aware quantization, ” inProc. DAC, pp. 1–7, IEEE, 2025

2025
[7]

Radit: Redundancy-aware diffusion transformer acceleration lever- aging timestep similarity,

Y. Parket al., “Radit: Redundancy-aware diffusion transformer acceleration lever- aging timestep similarity, ” inProc. DAC, pp. 1–7, IEEE, 2025

2025
[8]

Exion: Exploiting inter-and intra-iteration output sparsity for diffusion models,

J. Heoet al., “Exion: Exploiting inter-and intra-iteration output sparsity for diffusion models, ” inProc. HPCA, pp. 324–337, 2025

2025
[9]

Fewer denoising steps or cheaper per-step inference: Towards compute-optimal diffusion model deployment,

Z. Duet al., “Fewer denoising steps or cheaper per-step inference: Towards compute-optimal diffusion model deployment, ” inProc. CVPR, pp. 3001–3010, 2025

2025
[10]

Shieldenn: Online accelerated framework for fault-tolerant deep neural network architectures,

N. Khoshaviet al., “Shieldenn: Online accelerated framework for fault-tolerant deep neural network architectures, ” inProc. DAC, pp. 1–6, IEEE, 2020

2020
[11]

Selective hardening for neural networks in fpgas,

F. Libanoet al., “Selective hardening for neural networks in fpgas, ”IEEE Transac- tions on Nuclear Science, vol. 66, no. 1, pp. 216–222, 2018

2018
[12]

Razor: A low-power pipeline based on circuit-level timing specu- lation,

D. Ernstet al., “Razor: A low-power pipeline based on circuit-level timing specu- lation, ” inProc. MICRO, pp. 7–18, IEEE, 2003

2003
[13]

Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators,

J. Zhanget al., “Thundervolt: enabling aggressive voltage underscaling and timing error resilience for energy efficient deep learning accelerators, ” inProc. DAC, pp. 1–6, 2018

2018
[14]

Effort: Enhancing energy efficiency and error resilience of a near-threshold tensor processing unit,

N. D. Gundiet al., “Effort: Enhancing energy efficiency and error resilience of a near-threshold tensor processing unit, ” inProc. ASPDAC, pp. 241–246, IEEE, 2020

2020
[15]

Greentpu: Improving timing error resilience of a near-threshold tensor processing unit,

P. Pandeyet al., “Greentpu: Improving timing error resilience of a near-threshold tensor processing unit, ” inProc. DAC, pp. 1–6, 2019

2019
[16]

Fault-tolerant systolic array based accelerators for deep neural network execution,

J. J. Zhanget al., “Fault-tolerant systolic array based accelerators for deep neural network execution, ”IEEE Design & Test, vol. 36, no. 5, pp. 44–53, 2019

2019
[17]

Mavfi: An end-to-end fault analysis framework with anomaly detection and recovery for micro aerial vehicles,

Y.-S. Hsiaoet al., “Mavfi: An end-to-end fault analysis framework with anomaly detection and recovery for micro aerial vehicles, ” inProc. DATE, pp. 1–6, IEEE, 2023

2023
[18]

Algorithm-based fault tolerance for matrix operations,

K.-H. Huanget al., “Algorithm-based fault tolerance for matrix operations, ”IEEE Transactions on Computers, vol. 100, no. 6, pp. 518–528, 1984

1984
[19]

Approxabft: Approximate algorithm-based fault tolerance for vision transformers,

X. Xueet al., “Approxabft: Approximate algorithm-based fault tolerance for vision transformers, ”arXiv preprint arXiv:2302.10469, 2023

work page arXiv 2023
[20]

A novel fault-tolerant architecture for tiled matrix multiplication,

S. Balet al., “A novel fault-tolerant architecture for tiled matrix multiplication, ” inProc. DATE, pp. 1–6, IEEE, 2023

2023
[21]

Realm: Reliable and efficient large language model inference with statistical algorithm-based fault tolerance,

T. Xieet al., “Realm: Reliable and efficient large language model inference with statistical algorithm-based fault tolerance, ” inProc. DAC, pp. 1–7, 2025

2025
[22]

Ares: A framework for quantifying the resilience of deep neural networks,

B. Reagenet al., “Ares: A framework for quantifying the resilience of deep neural networks, ” inProc. DAC, pp. 1–6, 2018

2018
[23]

Understanding error propagation in deep learning neural network (dnn) accelerators and applications,

G. Liet al., “Understanding error propagation in deep learning neural network (dnn) accelerators and applications, ” inProc. SC, pp. 1–12, 2017

2017
[24]

Optimizing selective protection for cnn resilience.,

A. Mahmoudet al., “Optimizing selective protection for cnn resilience., ” pp. 127– 138, 2021

2021
[25]

Analyzing and improving fault tolerance of learning-based navi- gation systems,

Z. Wanet al., “Analyzing and improving fault tolerance of learning-based navi- gation systems, ” inProc. DAC, pp. 841–846, IEEE, 2021

2021
[26]

Frl-fi: Transient fault analysis for federated reinforcement learning- based navigation systems,

Z. Wanet al., “Frl-fi: Transient fault analysis for federated reinforcement learning- based navigation systems, ” inProc. DATE, pp. 430–435, IEEE, 2022

2022
[27]

Resilience assessment of large language models under transient hardware faults,

U. K. Agarwalet al., “Resilience assessment of large language models under transient hardware faults, ” pp. 659–670, IEEE, 2023

2023
[28]

High-resolution image synthesis with latent diffusion models,

R. Rombachet al., “High-resolution image synthesis with latent diffusion models, ” inProc. CVPR, pp. 10684–10695, 2022

2022
[29]

Scalable diffusion models with transformers,

W. Peebleset al., “Scalable diffusion models with transformers, ” inProc. ICCV, pp. 4195–4205, 2023

2023
[30]

PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

J. Chenet al., “Pixart-alpha: Fast training of diffusion transformer for photoreal- istic text-to-image synthesis, ”arXiv preprint arXiv:2310.00426, 2023

work page internal anchor Pith review arXiv 2023
[31]

Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation,

Q. Liet al., “Cogact: A foundational vision-language-action model for synergizing cognition and action in robotic manipulation, ” 2024

2024
[32]

In-datacenter performance analysis of a tensor processing unit,

N. P. Jouppiet al., “In-datacenter performance analysis of a tensor processing unit, ” inProc. ISCA, pp. 1–12, 2017

2017
[33]

Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps,

C. Luet al., “Dpm-solver: A fast ode solver for diffusion probabilistic model sampling in around 10 steps, ”Proc. NIPS, vol. 35, pp. 5775–5787, 2022

2022
[34]

Progressive distillation for fast sampling of diffusion models,

T. Salimanset al., “Progressive distillation for fast sampling of diffusion models, ” inProc. ICLR
[35]

Deepcache: Accelerating diffusion models for free,

X. Maet al., “Deepcache: Accelerating diffusion models for free, ” inProc. CVPR, pp. 15762–15772, 2024

2024
[36]

From reusing to forecasting: Accelerating diffusion models with taylorseers.arXiv preprint arXiv:2503.06923, 2025

J. Liuet al., “From reusing to forecasting: Accelerating diffusion models with taylorseers, ”arXiv preprint arXiv:2503.06923, 2025

work page arXiv 2025
[37]

Adaptive caching for faster video generation with diffusion transformers,

K. Kahatapitiyaet al., “Adaptive caching for faster video generation with diffusion transformers, ” inProc. ICCV, pp. 15240–15252, 2025

2025
[38]

Silent data corruptions at scale

H. D. Dixitet al., “Silent data corruptions at scale, ”arXiv preprint arXiv:2102.11245, 2021

work page arXiv 2021
[39]

Variability-and reliability-aware design for 16/14nm and beyond technology,

R. Huanget al., “Variability-and reliability-aware design for 16/14nm and beyond technology, ” inProc. IEDM, pp. 12–4, IEEE, 2017

2017
[40]

Dependable dnn accelerator for safety-critical systems: A review on the aging perspective,

I. Moghaddasiet al., “Dependable dnn accelerator for safety-critical systems: A review on the aging perspective, ”IEEE Access, 2023

2023
[41]

Clim: A cross-level workload-aware timing error prediction model for functional units,

X. Jiaoet al., “Clim: A cross-level workload-aware timing error prediction model for functional units, ”IEEE Transactions on Computers, vol. 67, no. 6, pp. 771–783, 2017

2017
[42]

Read: Reliability-enhanced accelerator dataflow optimization using critical input pattern reduction,

Z. Zhanget al., “Read: Reliability-enhanced accelerator dataflow optimization using critical input pattern reduction, ” inProc. ICCAD, pp. 1–9, IEEE, 2023

2023
[43]

Dris-3: Deep neural network reliability improvement scheme in 3d die-stacked memory based on fault analysis,

J.-S. Kimet al., “Dris-3: Deep neural network reliability improvement scheme in 3d die-stacked memory based on fault analysis, ” inProc. DAC, pp. 1–6, 2019

2019
[44]

One bit is (not) enough: An empirical study of the impact of single and multiple bit-flip errors,

B. Sangchoolieet al., “One bit is (not) enough: An empirical study of the impact of single and multiple bit-flip errors, ” inProc. DSN, pp. 97–108, IEEE, 2017

2017
[45]

Dependability evaluation of stable diffusion with soft errors on the model parameters,

Z. Gaoet al., “Dependability evaluation of stable diffusion with soft errors on the model parameters, ” inInternational Conference on Nanotechnology (NANO), pp. 442–447, IEEE, 2024

2024
[46]

Exploiting dynamic timing slack for energy efficiency in ultra-low-power embedded systems,

H. Cherupalliet al., “Exploiting dynamic timing slack for energy efficiency in ultra-low-power embedded systems, ”ACM SIGARCH Computer Architecture News, vol. 44, no. 3, pp. 671–681, 2016

2016
[47]

Dynamic voltage and frequency scaling: The laws of dimin- ishing returns,

E. Le Sueuret al., “Dynamic voltage and frequency scaling: The laws of dimin- ishing returns, ” inProceedings of the 2010 international conference on Power aware computing and systems, pp. 1–8, 2010

2010
[48]

Avatar: An aging-and variation-aware dynamic timing analyzer for error-efficient computing,

Z. Zhanget al., “Avatar: An aging-and variation-aware dynamic timing analyzer for error-efficient computing, ”IEEE TCAD, vol. 42, no. 11, pp. 4139–4151, 2023

2023
[49]

Smoothquant: Accurate and efficient post-training quantization for large language models,

G. Xiaoet al., “Smoothquant: Accurate and efficient post-training quantization for large language models, ” inProc. ICLR, pp. 38087–38099, PMLR, 2023

2023
[50]

Imagenet: A large-scale hierarchical image database,

J. Denget al., “Imagenet: A large-scale hierarchical image database, ” inProc. CVPR, pp. 248–255, Ieee, 2009

2009
[51]

Microsoft coco: Common objects in context,

T.-Y. Linet al., “Microsoft coco: Common objects in context, ” inProc. ECCV, pp. 740–755, Springer, 2014

2014
[52]

Clipscore: A reference-free evaluation metric for image caption- ing,

J. Hesselet al., “Clipscore: A reference-free evaluation metric for image caption- ing, ” pp. 7514–7528, 2021

2021
[53]

Imagereward: Learning and evaluating human preferences for text- to-image generation,

J. Xuet al., “Imagereward: Learning and evaluating human preferences for text- to-image generation, ”Proc. NIPS, vol. 36, pp. 15903–15935, 2023

2023
[54]

The unreasonable effectiveness of deep features as a perceptual metric,

R. Zhanget al., “The unreasonable effectiveness of deep features as a perceptual metric, ” inProc. CVPR, pp. 586–595, 2018

2018
[55]

Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference,

T. Tambeet al., “Edgebert: Sentence-level energy optimizations for latency-aware multi-task nlp inference, ” inProc. MICRO, pp. 830–844, 2021

2021
[56]

Specee: Accelerating large language model inference with speculative early exiting,

J. Xuet al., “Specee: Accelerating large language model inference with speculative early exiting, ” inProc. ISCA, pp. 467–481, 2025

2025
[57]

0.5–1-v, 90–400-ma, modular, distributed, 3 × 3 digital ldos based on event-driven control and domino sampling and regulation,

S. J. Kimet al., “0.5–1-v, 90–400-ma, modular, distributed, 3 × 3 digital ldos based on event-driven control and domino sampling and regulation, ”IEEE Journal Solid-State Circuits, vol. 56, no. 9, pp. 2781–2794, 2021

2021
[58]

An open-source framework for autonomous soc design with analog block generation,

T. Ajayiet al., “An open-source framework for autonomous soc design with analog block generation, ” in2020 IFIP/IEEE 28th International Conference on Very Large Scale Integration (VLSI-SOC), pp. 141–146, IEEE, 2020

2020
[59]

Hbm (high bandwidth memory) dram technology and architecture,

H. Junet al., “Hbm (high bandwidth memory) dram technology and architecture, ” inInternational Memory Workshop (IMW), pp. 1–4, IEEE, 2017

2017
[60]

SCALE-Sim: Systolic CNN Accelerator Simulator

A. Samajdar, Y. Zhu, P. Whatmough, M. Mattina, and T. Krishna, “Scale-sim: Systolic cnn accelerator simulator, ”arXiv preprint arXiv:1811.02883, 2018

work page Pith review arXiv 2018
[61]

Photorealistic text-to-image diffusion models with deep lan- guage understanding,

C. Sahariaet al., “Photorealistic text-to-image diffusion models with deep lan- guage understanding, ”Proc. NIPS, vol. 35, pp. 36479–36494, 2022

2022
[62]

Gans trained by a two time-scale update rule converge to a local nash equilibrium,

M. Heuselet al., “Gans trained by a two time-scale update rule converge to a local nash equilibrium, ”Proc. NIPS, vol. 30, 2017

2017