Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow

Ching-Lin Hsiung; Tian-Sheuan Chang

arxiv: 2510.14393 · v1 · submitted 2025-10-16 · 💻 cs.AR · cs.LG

Low Power Vision Transformer Accelerator with Hardware-Aware Pruning and Optimized Dataflow

Ching-Lin Hsiung , Tian-Sheuan Chang This is my paper

Pith reviewed 2026-05-18 06:41 UTC · model grok-4.3

classification 💻 cs.AR cs.LG

keywords vision transformerhardware acceleratordynamic pruningdataflow optimizationlow poweralgorithm hardware co-designfeed forward networkenergy efficiency

0 comments

The pith

Dynamic token pruning and FFN2 pruning cut vision transformer operations 61.5 percent and weights 59.3 percent while holding accuracy loss below 2 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to demonstrate that vision transformers with short token lengths have the feed-forward network as the main computational cost rather than self-attention. It shows this cost can be lowered through hardware-friendly dynamic token pruning, replacement of GELU with ReLU, and dynamic pruning of the second FFN layer. These steps are paired with a row-wise dataflow that removes the need for data transposition and supports the resulting sparsity with little added hardware. If the approach holds, accelerators for vision tasks can reach high throughput and energy efficiency on modest silicon area without complex extra logic.

Core claim

The paper claims that algorithm-hardware co-design using hardware-friendly dynamic token pruning, GELU-to-ReLU substitution, and dynamic FFN2 pruning reduces total operations by 61.5 percent and FFN2 weights by 59.3 percent with less than 2 percent accuracy loss. The accompanying hardware uses row-wise dataflow with output-oriented access to eliminate transposition and handles the dynamic operations with minimal area overhead. Implemented in TSMC 28 nm CMOS, the design uses 496.4 K gates and 232 KB SRAM to deliver 1024 GOPS at 1 GHz, 2.31 TOPS/W energy efficiency, and 858.61 GOPS/mm2 area efficiency.

What carries the argument

Row-wise dataflow with output-oriented access that supports dynamic pruning operations without data transposition.

Load-bearing premise

The dynamic token pruning and FFN2 pruning keep accuracy loss under 2 percent on the target vision tasks without extra hardware logic or retraining that would erase the reported efficiency gains.

What would settle it

Running the pruned model on a standard vision dataset such as ImageNet and checking whether top-1 accuracy drops more than 2 percent compared with the unpruned baseline.

Figures

Figures reproduced from arXiv: 2510.14393 by Ching-Lin Hsiung, Tian-Sheuan Chang.

**Figure 3.** Figure 3: Visualization of the token pruning process [PITH_FULL_IMAGE:figures/full_fig_p002_3.png] view at source ↗

**Figure 5.** Figure 5: Distribution of FFN post-activation matrix values [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗

**Figure 6.** Figure 6: Histogram of FFN post-activation accumulation along dimensions [PITH_FULL_IMAGE:figures/full_fig_p003_6.png] view at source ↗

**Figure 8.** Figure 8: The overview of hardware architecture [PITH_FULL_IMAGE:figures/full_fig_p004_8.png] view at source ↗

**Figure 9.** Figure 9: Detailed diagram of a PE Group(8x8) [PITH_FULL_IMAGE:figures/full_fig_p004_9.png] view at source ↗

**Figure 12.** Figure 12: Data access flow of FFN2 2) FFN: In the FFN computation, the FFN1 output need to be stored for the following FFN2. To avoid a larger intermediate buffer, we propose an interleaved FFN computing order. In this order, once FFN1 computations are completed, the results are passed through the ReLU activation function before entering the FFN2 pruning module, which determines whether the computed elements shoul… view at source ↗

**Figure 10.** Figure 10: Data access flow of fully connected operations. [PITH_FULL_IMAGE:figures/full_fig_p005_10.png] view at source ↗

**Figure 11.** Figure 11: Detailed mapping of the fully connected operations for a PE group [PITH_FULL_IMAGE:figures/full_fig_p005_11.png] view at source ↗

**Figure 14.** Figure 14: Detailed mapping of Q × KT of (a) PE Group 0 and (b) PE Group 7 at the same cycle. the weights to be mapped across all PE Groups in a single cycle. This efficient mapping, as shown in [PITH_FULL_IMAGE:figures/full_fig_p006_14.png] view at source ↗

**Figure 16.** Figure 16: Block diagram of the FFN2 pruning module [PITH_FULL_IMAGE:figures/full_fig_p006_16.png] view at source ↗

**Figure 15.** Figure 15: Block diagram of token pruning module [PITH_FULL_IMAGE:figures/full_fig_p006_15.png] view at source ↗

**Figure 19.** Figure 19: illustrates a 56.4% reduction in total data fetch requirements when applying token pruning with ρ = 0.5. This significantly reduces computational overhead while improving hardware efficiency. In addition, [PITH_FULL_IMAGE:figures/full_fig_p007_19.png] view at source ↗

**Figure 20.** Figure 20: External memory access for FFN2 weight [PITH_FULL_IMAGE:figures/full_fig_p007_20.png] view at source ↗

**Figure 18.** Figure 18: Computation comparison with FACT [6] [PITH_FULL_IMAGE:figures/full_fig_p007_18.png] view at source ↗

**Figure 21.** Figure 21: External memory access with dynamic Pruning [PITH_FULL_IMAGE:figures/full_fig_p007_21.png] view at source ↗

**Figure 22.** Figure 22: FFN2 weight skip ratio across layers. the accuracy loss remains under 2%. The proposed pruning methods introduce minimal area overhead while achieving a reduction of 59. 3% in external memory access for input tokens. The accelerator demonstrates notable improvements in speed and efficiency, achieving 2.31 TOPS/W and 858 GOPS/mm². With a gate count of 496.5K and a 232KB SRAM buffer, the chip occupies 1.19m… view at source ↗

read the original abstract

Current transformer accelerators primarily focus on optimizing self-attention due to its quadratic complexity. However, this focus is less relevant for vision transformers with short token lengths, where the Feed-Forward Network (FFN) tends to be the dominant computational bottleneck. This paper presents a low power Vision Transformer accelerator, optimized through algorithm-hardware co-design. The model complexity is reduced using hardware-friendly dynamic token pruning without introducing complex mechanisms. Sparsity is further improved by replacing GELU with ReLU activations and employing dynamic FFN2 pruning, achieving a 61.5\% reduction in operations and a 59.3\% reduction in FFN2 weights, with an accuracy loss of less than 2\%. The hardware adopts a row-wise dataflow with output-oriented data access to eliminate data transposition, and supports dynamic operations with minimal area overhead. Implemented in TSMC's 28nm CMOS technology, our design occupies 496.4K gates and includes a 232KB SRAM buffer, achieving a peak throughput of 1024 GOPS at 1GHz, with an energy efficiency of 2.31 TOPS/W and an area efficiency of 858.61 GOPS/mm2.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a practical co-design for short-token ViT accelerators that cuts operations and weights via simple pruning and a row-wise dataflow, but the accuracy numbers lack visible dataset or baseline support.

read the letter

The main thing here is a hardware-focused redesign for vision transformers where short token lengths make the FFN the real bottleneck rather than attention. The authors combine dynamic token pruning kept simple for hardware, a GELU-to-ReLU swap, and dynamic pruning on the second FFN layer, then pair it with an output-oriented row-wise dataflow that removes transposition steps. They report 61.5% fewer operations and 59.3% fewer FFN2 weights at under 2% accuracy loss, plus concrete 28nm numbers: 496.4K gates, 232KB SRAM, 1024 GOPS peak at 1 GHz, 2.31 TOPS/W, and 858.61 GOPS/mm2. That efficiency package is the useful part for edge designs. The dataflow choice directly addresses memory access power, and the claim of minimal extra logic for dynamic support is reasonable if the control stays lightweight. This is a straightforward extension of known pruning ideas tuned to the FFN-dominant regime, and the implementation metrics give something concrete to compare against other accelerators. The soft spots are in the validation. The abstract states the accuracy loss but does not name the dataset, the starting model, or show the exact before-and-after top-1 figures, so it is hard to judge whether the result holds on ImageNet-scale tasks or only on narrower ones. Dynamic pruning can add variable control overhead or require retraining that is not quantified here, and without an error analysis or power breakdown that isolates the pruning logic, the net gain could shrink. If the less-than-2% figure depends on specific fine-tuning or limited test conditions, the co-design story weakens. This work is for hardware designers building low-power ViT inference engines for mobile or embedded use. Readers who follow accelerator papers with algorithm-hardware pairing will find the dataflow and pruning schedule details worth examining. It is solid enough on the implementation side to merit peer review, even if the authors need to add the missing accuracy tables and overhead measurements.

Referee Report

2 major / 2 minor

Summary. The paper proposes an algorithm-hardware co-design for a low-power Vision Transformer accelerator focused on short-token ViTs, where FFN is the dominant bottleneck. It applies hardware-friendly dynamic token pruning, replaces GELU with ReLU, and uses dynamic FFN2 pruning to reduce operations by 61.5% and FFN2 weights by 59.3% with <2% accuracy loss. The hardware employs row-wise dataflow to eliminate transposition and supports dynamic sparsity with minimal overhead. Implemented in TSMC 28nm, it reports 496.4K gates, 232KB SRAM, 1024 GOPS at 1 GHz, 2.31 TOPS/W energy efficiency, and 858.61 GOPS/mm² area efficiency.

Significance. If the accuracy claims hold on standard datasets without hidden retraining costs or excessive control logic overhead, the work offers a practical contribution to efficient ViT inference on edge hardware by targeting the FFN rather than attention. The concrete hardware metrics and co-design optimizations provide measurable efficiency gains that could inform future accelerator designs.

major comments (2)

[Abstract] Abstract: The headline claims of 61.5% operation reduction, 59.3% FFN2 weight reduction, and less than 2% accuracy loss are presented without any reference to the datasets (e.g., ImageNet or CIFAR), baseline ViT model and its original top-1 accuracy, pruned accuracy numbers, or pruning-rate schedule. This absence makes the central co-design claim unverifiable and load-bearing for the efficiency results.
[Hardware Architecture] Hardware implementation description: The assertion that dynamic operations incur 'minimal area overhead' lacks a quantitative breakdown (e.g., percentage of total gates or SRAM attributed to pruning control logic versus baseline accelerator), which is necessary to confirm that the reported 496.4K gates and efficiencies are not offset by the added dynamic support.

minor comments (2)

[Abstract] The abstract and introduction could explicitly state the ViT variant and token length (e.g., 16 or 32 tokens) used for the short-token experiments to contextualize the FFN dominance claim.
[Evaluation] A comparison table against prior ViT accelerators (e.g., reporting energy efficiency and area efficiency) would strengthen the hardware results section.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The comments identify opportunities to strengthen the verifiability of our claims and the transparency of our hardware overhead analysis. We address each point below and indicate the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claims of 61.5% operation reduction, 59.3% FFN2 weight reduction, and less than 2% accuracy loss are presented without any reference to the datasets (e.g., ImageNet or CIFAR), baseline ViT model and its original top-1 accuracy, pruned accuracy numbers, or pruning-rate schedule. This absence makes the central co-design claim unverifiable and load-bearing for the efficiency results.

Authors: We agree that the abstract would be more self-contained with explicit references to the evaluation dataset, baseline model, and accuracy figures. While these details appear in the experimental results section, we will revise the abstract to concisely include the dataset (ImageNet), the baseline short-token ViT model, the original and pruned top-1 accuracies, and a note on the dynamic pruning schedule to improve immediate verifiability of the co-design claims. revision: yes
Referee: [Hardware Architecture] Hardware implementation description: The assertion that dynamic operations incur 'minimal area overhead' lacks a quantitative breakdown (e.g., percentage of total gates or SRAM attributed to pruning control logic versus baseline accelerator), which is necessary to confirm that the reported 496.4K gates and efficiencies are not offset by the added dynamic support.

Authors: We acknowledge that an explicit quantitative breakdown would strengthen the 'minimal area overhead' statement. Our synthesis data show that the additional control logic for dynamic token pruning, ReLU replacement, and FFN2 pruning contributes only a modest fraction of the total gate count. In the revised manuscript we will add a table or paragraph providing the gate-count and SRAM breakdown, separating the dynamic support overhead from the baseline accelerator components. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical hardware results are self-contained

full rationale

The paper describes an algorithm-hardware co-design for a Vision Transformer accelerator, reporting concrete implementation metrics such as gate count, SRAM size, throughput, energy efficiency, and area efficiency in TSMC 28nm, along with measured reductions from dynamic token pruning, GELU-to-ReLU replacement, and dynamic FFN2 pruning. No equations, first-principles derivations, fitted parameters, or predictions are presented that reduce to their own inputs by construction. The central claims rest on reported post-implementation outcomes rather than any self-definitional logic, self-citation chains, or renamed known results, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The design relies on standard CMOS process assumptions and the premise that pruning thresholds can be chosen to keep accuracy loss under 2% without introducing new unverified mechanisms.

axioms (1)

domain assumption Standard 28nm CMOS process characteristics and SRAM behavior hold as reported by the foundry.
Invoked when stating gate count, SRAM size, and power numbers.

pith-pipeline@v0.9.0 · 5744 in / 1237 out tokens · 31699 ms · 2026-05-18T06:41:06.673615+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

[1]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jegou, “Training data-efficient image transformers & distillation through attention,” inInternational Conference on Machine Learning, vol. 139, July 2021, pp. 10 347–10 357

work page 2021
[2]

A 3: Accelerating attention mechanisms in neural networks with approximation,

T. J. Ham, S. J. Jung, S. Kim, Y . H. Oh, Y . Park, Y . Song, J.-H. Park, S. Lee, K. Park, J. W. Lee, and D.-K. Jeong, “A 3: Accelerating attention mechanisms in neural networks with approximation,” inIEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 328–341

work page 2020
[3]

ELSA: hardware-software co-design for efficient, lightweight self- attention mechanism in neural networks,

T. J. Ham, Y . Lee, S. H. Seo, S. Kim, H. Choi, S. J. Jung, and J. W. Lee, “ELSA: hardware-software co-design for efficient, lightweight self- attention mechanism in neural networks,” inProceedings of the 48th Annual International Symposium on Computer Architecture, 2021, p. 692–705

work page 2021
[4]

A 28nm 27.5TOPS/W approximate- computing-based transformer processor with asymptotic sparsity spec- ulating and out-of-order computing,

Y . Wang, Y . Qin, D. Deng, J. Wei, Y . Zhou, Y . Fan, T. Chen, H. Sun, L. Liu, S. Wei, and S. Yin, “A 28nm 27.5TOPS/W approximate- computing-based transformer processor with asymptotic sparsity spec- ulating and out-of-order computing,” inIEEE International Solid-State Circuits Conference (ISSCC), vol. 65, 2022, pp. 1–3

work page 2022
[5]

SpAtten: Efficient sparse attention architecture with cascade token and head pruning,

H. Wang, Z. Zhang, and S. Han, “SpAtten: Efficient sparse attention architecture with cascade token and head pruning,” inIEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 97–110

work page 2021
[6]

FACT: FFN-attention co-optimized transformer architecture with eager correlation prediction,

Y . Qin, Y . Wang, D. Deng, Z. Zhao, X. Yang, L. Liu, S. Wei, Y . Hu, and S. Yin, “FACT: FFN-attention co-optimized transformer architecture with eager correlation prediction,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA ’23. Association for Computing Machinery, 2023

work page 2023
[7]

Bsvit: A bit-serial vision transformer accelerator exploiting dynamic patch and weight bit-group quantization,

G. Wang, S. Cai, W. Li, D. Lyu, and G. He, “Bsvit: A bit-serial vision transformer accelerator exploiting dynamic patch and weight bit-group quantization,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 71, no. 9, pp. 4064–4077, 2024

work page 2024
[8]

Evo-ViT: Slow-fast token evolution for dynamic vision transformer,

Y . Xu, Z. Zhang, M. Zhang, K. Sheng, K. Li, W. Dong, L. Zhang, C. Xu, and X. Sun, “Evo-ViT: Slow-fast token evolution for dynamic vision transformer,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2964–2972, Jun. 2022

work page 2022
[9]

Not all patches are what you need: Expediting vision transformers via token reorganiza- tions,

Y . Liang, C. Ge, Z. Tong, Y . Song, J. Wang, and P. Xie, “Not all patches are what you need: Expediting vision transformers via token reorganiza- tions,” inInternational Conference on Learning Representations, 2022

work page 2022
[10]

A-ViT: adaptive tokens for efficient vision transformer,

H. Yin, A. Vahdat, J. M. Alvarez, A. Mallya, J. Kautz, and P. Molchanov, “A-ViT: adaptive tokens for efficient vision transformer,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10 799–10 808. 10

work page 2022
[11]

Adaptive token sampling for efficient vision transformers,

M. Fayyaz, S. Abbasi Kouhpayegani, F. Rezaei Jafari, E. Sommerlade, H. R. Vaezi Joze, H. Pirsiavash, and J. Gall, “Adaptive token sampling for efficient vision transformers,”European Conference on Computer Vision (ECCV), 2022

work page 2022
[12]

Dynam- icViT: Efficient vision transformers with dynamic token sparsification,

Y . Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh, “Dynam- icViT: Efficient vision transformers with dynamic token sparsification,” Advances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021
[13]

Pruning self-attentions into convolutional layers in single path,

H. He, J. Cai, J. Liu, Z. Pan, J. Zhang, D. Tao, and B. Zhuang, “Pruning self-attentions into convolutional layers in single path,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3910–3922, 2024

work page 2024
[14]

HeatViT: hardware-efficient adaptive token pruning for vision transformers,

P. Dong, M. Sun, A. Lu, Y . Xie, K. Liu, Z. Kong, X. Meng, Z. Li, X. Lin, Z. Fang, and Y . Wang, “HeatViT: hardware-efficient adaptive token pruning for vision transformers,” inIEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 442–455

work page 2023
[15]

An image is worth 16x16 words: Trans- formers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,”The International Conference on Learning Representations, 2021

work page 2021
[16]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022

work page 2021
[17]

Tokens-to-token vit: Training vision transformers from scratch on imagenet,

L. Yuan, Y . Chen, T. Wang, W. Yu, Y . Shi, F.-F. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 558–567

work page 2021
[18]

Go- ing deeper with image transformers,

H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, and H. J ´egou, “Go- ing deeper with image transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 32–42

work page 2021
[19]

ViTA: A vision transformer inference accelerator for edge applications,

S. Nag, G. Datta, S. Kundu, N. Chandrachoodan, and P. A. Beerel, “ViTA: A vision transformer inference accelerator for edge applications,” inIEEE International Symposium on Circuits and Systems (ISCAS), 2023, pp. 1–5

work page 2023
[20]

A comparison-free hardware sorting engine,

S. Ghosh, S. Dasgupta, and S. Saha Ray, “A comparison-free hardware sorting engine,” inIEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2019, pp. 586–591

work page 2019
[21]

K-degree parallel comparison-free hardware sorter for complete sorting,

S. Saha Ray and S. Ghosh, “K-degree parallel comparison-free hardware sorter for complete sorting,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 5, pp. 1438– 1449, 2023

work page 2023
[22]

ImageNet: a large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: a large-scale hierarchical image database,” inIEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. Ching-Lin Hsiungreceived the M.S. degree in elec- tronics engineering from the National Yang Ming Chiao Tung University, Hsinchu, Taiwan, in 2024. He is curren...

work page 2009

[1] [1]

Training data-efficient image transformers & distillation through attention,

H. Touvron, M. Cord, M. Douze, F. Massa, A. Sablayrolles, and H. Jegou, “Training data-efficient image transformers & distillation through attention,” inInternational Conference on Machine Learning, vol. 139, July 2021, pp. 10 347–10 357

work page 2021

[2] [2]

A 3: Accelerating attention mechanisms in neural networks with approximation,

T. J. Ham, S. J. Jung, S. Kim, Y . H. Oh, Y . Park, Y . Song, J.-H. Park, S. Lee, K. Park, J. W. Lee, and D.-K. Jeong, “A 3: Accelerating attention mechanisms in neural networks with approximation,” inIEEE International Symposium on High Performance Computer Architecture (HPCA), 2020, pp. 328–341

work page 2020

[3] [3]

ELSA: hardware-software co-design for efficient, lightweight self- attention mechanism in neural networks,

T. J. Ham, Y . Lee, S. H. Seo, S. Kim, H. Choi, S. J. Jung, and J. W. Lee, “ELSA: hardware-software co-design for efficient, lightweight self- attention mechanism in neural networks,” inProceedings of the 48th Annual International Symposium on Computer Architecture, 2021, p. 692–705

work page 2021

[4] [4]

A 28nm 27.5TOPS/W approximate- computing-based transformer processor with asymptotic sparsity spec- ulating and out-of-order computing,

Y . Wang, Y . Qin, D. Deng, J. Wei, Y . Zhou, Y . Fan, T. Chen, H. Sun, L. Liu, S. Wei, and S. Yin, “A 28nm 27.5TOPS/W approximate- computing-based transformer processor with asymptotic sparsity spec- ulating and out-of-order computing,” inIEEE International Solid-State Circuits Conference (ISSCC), vol. 65, 2022, pp. 1–3

work page 2022

[5] [5]

SpAtten: Efficient sparse attention architecture with cascade token and head pruning,

H. Wang, Z. Zhang, and S. Han, “SpAtten: Efficient sparse attention architecture with cascade token and head pruning,” inIEEE International Symposium on High-Performance Computer Architecture (HPCA), 2021, pp. 97–110

work page 2021

[6] [6]

FACT: FFN-attention co-optimized transformer architecture with eager correlation prediction,

Y . Qin, Y . Wang, D. Deng, Z. Zhao, X. Yang, L. Liu, S. Wei, Y . Hu, and S. Yin, “FACT: FFN-attention co-optimized transformer architecture with eager correlation prediction,” inProceedings of the 50th Annual International Symposium on Computer Architecture, ser. ISCA ’23. Association for Computing Machinery, 2023

work page 2023

[7] [7]

Bsvit: A bit-serial vision transformer accelerator exploiting dynamic patch and weight bit-group quantization,

G. Wang, S. Cai, W. Li, D. Lyu, and G. He, “Bsvit: A bit-serial vision transformer accelerator exploiting dynamic patch and weight bit-group quantization,”IEEE Transactions on Circuits and Systems I: Regular Papers, vol. 71, no. 9, pp. 4064–4077, 2024

work page 2024

[8] [8]

Evo-ViT: Slow-fast token evolution for dynamic vision transformer,

Y . Xu, Z. Zhang, M. Zhang, K. Sheng, K. Li, W. Dong, L. Zhang, C. Xu, and X. Sun, “Evo-ViT: Slow-fast token evolution for dynamic vision transformer,”Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 3, pp. 2964–2972, Jun. 2022

work page 2022

[9] [9]

Not all patches are what you need: Expediting vision transformers via token reorganiza- tions,

Y . Liang, C. Ge, Z. Tong, Y . Song, J. Wang, and P. Xie, “Not all patches are what you need: Expediting vision transformers via token reorganiza- tions,” inInternational Conference on Learning Representations, 2022

work page 2022

[10] [10]

A-ViT: adaptive tokens for efficient vision transformer,

H. Yin, A. Vahdat, J. M. Alvarez, A. Mallya, J. Kautz, and P. Molchanov, “A-ViT: adaptive tokens for efficient vision transformer,” inIEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2022, pp. 10 799–10 808. 10

work page 2022

[11] [11]

Adaptive token sampling for efficient vision transformers,

M. Fayyaz, S. Abbasi Kouhpayegani, F. Rezaei Jafari, E. Sommerlade, H. R. Vaezi Joze, H. Pirsiavash, and J. Gall, “Adaptive token sampling for efficient vision transformers,”European Conference on Computer Vision (ECCV), 2022

work page 2022

[12] [12]

Dynam- icViT: Efficient vision transformers with dynamic token sparsification,

Y . Rao, W. Zhao, B. Liu, J. Lu, J. Zhou, and C.-J. Hsieh, “Dynam- icViT: Efficient vision transformers with dynamic token sparsification,” Advances in Neural Information Processing Systems (NeurIPS), 2021

work page 2021

[13] [13]

Pruning self-attentions into convolutional layers in single path,

H. He, J. Cai, J. Liu, Z. Pan, J. Zhang, D. Tao, and B. Zhuang, “Pruning self-attentions into convolutional layers in single path,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 46, no. 5, pp. 3910–3922, 2024

work page 2024

[14] [14]

HeatViT: hardware-efficient adaptive token pruning for vision transformers,

P. Dong, M. Sun, A. Lu, Y . Xie, K. Liu, Z. Kong, X. Meng, Z. Li, X. Lin, Z. Fang, and Y . Wang, “HeatViT: hardware-efficient adaptive token pruning for vision transformers,” inIEEE International Symposium on High-Performance Computer Architecture (HPCA), 2023, pp. 442–455

work page 2023

[15] [15]

An image is worth 16x16 words: Trans- formers for image recognition at scale,

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,”The International Conference on Learning Representations, 2021

work page 2021

[16] [16]

Swin transformer: Hierarchical vision transformer using shifted windows,

Z. Liu, Y . Lin, Y . Cao, H. Hu, Y . Wei, Z. Zhang, S. Lin, and B. Guo, “Swin transformer: Hierarchical vision transformer using shifted windows,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 10 012–10 022

work page 2021

[17] [17]

Tokens-to-token vit: Training vision transformers from scratch on imagenet,

L. Yuan, Y . Chen, T. Wang, W. Yu, Y . Shi, F.-F. Tay, J. Feng, and S. Yan, “Tokens-to-token vit: Training vision transformers from scratch on imagenet,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 558–567

work page 2021

[18] [18]

Go- ing deeper with image transformers,

H. Touvron, M. Cord, A. Sablayrolles, G. Synnaeve, and H. J ´egou, “Go- ing deeper with image transformers,” inProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 32–42

work page 2021

[19] [19]

ViTA: A vision transformer inference accelerator for edge applications,

S. Nag, G. Datta, S. Kundu, N. Chandrachoodan, and P. A. Beerel, “ViTA: A vision transformer inference accelerator for edge applications,” inIEEE International Symposium on Circuits and Systems (ISCAS), 2023, pp. 1–5

work page 2023

[20] [20]

A comparison-free hardware sorting engine,

S. Ghosh, S. Dasgupta, and S. Saha Ray, “A comparison-free hardware sorting engine,” inIEEE Computer Society Annual Symposium on VLSI (ISVLSI), 2019, pp. 586–591

work page 2019

[21] [21]

K-degree parallel comparison-free hardware sorter for complete sorting,

S. Saha Ray and S. Ghosh, “K-degree parallel comparison-free hardware sorter for complete sorting,”IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 42, no. 5, pp. 1438– 1449, 2023

work page 2023

[22] [22]

ImageNet: a large-scale hierarchical image database,

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet: a large-scale hierarchical image database,” inIEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 248–255. Ching-Lin Hsiungreceived the M.S. degree in elec- tronics engineering from the National Yang Ming Chiao Tung University, Hsinchu, Taiwan, in 2024. He is curren...

work page 2009