FTerViT: Fully Ternary Vision Transformer
Pith reviewed 2026-05-21 05:56 UTC · model grok-4.3
The pith
Vision Transformers can be made fully ternary by replacing every weight matrix and normalization parameter, enabling 15x compression for microcontroller deployment.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
FTerViT fully ternarizes every weight matrix and normalization parameter inside a Vision Transformer. The method introduces TernaryBitConv2d with per-channel scaling for the patch embedding and TernaryLayerNorm, then trains via knowledge distillation plus a lightweight quantization-aware recovery phase. The resulting W2A8 DeiT-III-S model at 384x384 resolution reaches 82.43 percent ImageNet-1K top-1 accuracy at 6.09 MB, a roughly 15x compression from FP32 with a 2.42 percentage point drop, and outperforms earlier ternary ViT methods by up to 8 points. The same approach yields the first working ternary Vision Transformer on the dual-core XTensa LX7 inside the ESP32-S3, delivering 79.64% top-1
What carries the argument
TernaryBitConv2d operator with per-channel scaling for patch embedding together with TernaryLayerNorm operator, which replace the last full-precision blocks and allow the entire model to run with only ternary weights and activations.
If this is right
- The complete removal of full-precision parameters reduces model size to 5.81-6.09 MB, making on-chip storage feasible on microcontrollers.
- Ternary ViTs can now execute inference directly on the ESP32-S3 without external memory for floating-point values.
- Accuracy remains within 2.5 points of the floating-point baseline while delivering up to 8 points higher than previous partial-ternary ViT methods.
- The same full-ternarization recipe can be applied to other compact ViT backbones to produce variants suitable for real-time edge vision.
Where Pith is reading between the lines
- The technique could extend to other transformer families, allowing similar memory reductions for language models on embedded hardware.
- Combining full ternarization with further optimizations such as structured pruning might push sizes below 5 MB while preserving accuracy.
- Successful microcontroller deployment suggests that ternary ViTs could support always-on camera applications in battery-powered sensors without cloud offload.
Load-bearing premise
That knowledge distillation followed by a short quantization-aware recovery phase can restore usable accuracy after every weight and normalization parameter has been forced into ternary form without any full-precision components remaining.
What would settle it
Train the identical architecture with only the distillation stage and no recovery phase, then measure whether ImageNet top-1 accuracy falls below 79 percent when the model runs on the ESP32-S3.
Figures
read the original abstract
Ternary Vision Transformers offer substantial model compression, however state-of-the-art methods only ternarize the encoder layers, leaving patch embeddings, LayerNorm parameters, and classifier heads in full precision. In compact models targeting resource-constrained processors, such as microcontrollers, these remaining full-precision components determine the total memory footprint, severely limiting deployment efficiency and on-device feasibility. In this work, we introduce a fully ternarized Vision Transformer in which \emph{all} weight matrices and normalization parameters are ternarized (FTerViT). To this end, we introduce two novel operators : TernaryBitConv2d with per-channel scaling for patch embedding and TernaryLayerNorm. FTerViT is trained using knowledge distillation, followed by a lightweight quantization-aware recovery phase. Our ternary W2A8 DeiT-III-S at 384$\times$384 resolution achieves 82.43\% ImageNet-1K top-1 at 6.09\,MB (${\sim}$15$\times$ compression, $-$2.42\,pp vs.\ FP32), outperforming prior ternary ViTs methods up to 8 pp. Finally, we demonstrate the first implementation of ternary vision transformers on a dual cores XTensa LX7 microcontroller inside the ESP32-S3 system-on-chip. By deploying FTerViT-Small (based on DeiT-III-Small at 224$\times$224 resolution, 5.81\,MB), we achieve 79.64\% ImageNet-1K top-1 accuracy.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces FTerViT, a fully ternarized Vision Transformer in which all weight matrices and normalization parameters (including patch embeddings, LayerNorm, and classifier heads) are forced into ternary representation, unlike prior methods that retain full-precision components for these elements. Novel operators TernaryBitConv2d (with per-channel scaling) and TernaryLayerNorm are proposed. Training uses knowledge distillation followed by a lightweight quantization-aware recovery phase. The central empirical result is a W2A8 DeiT-III-S model at 384×384 resolution achieving 82.43% ImageNet-1K top-1 accuracy at 6.09 MB (~15× compression, −2.42 pp vs. FP32), outperforming prior ternary ViT methods by up to 8 pp, plus the first reported deployment of a ternary ViT on an ESP32-S3 microcontroller (79.64% for the 224×224 Small variant at 5.81 MB).
Significance. If the accuracy numbers and hardware demonstration hold under scrutiny, this work would be significant for enabling Vision Transformers on severely resource-constrained microcontrollers by eliminating the memory footprint of retained full-precision components. The full-ternarization approach directly targets a practical deployment bottleneck and includes a concrete on-device implementation, which strengthens its engineering contribution if the training procedure proves robust.
major comments (3)
- [Training Procedure / Experimental Setup] The central accuracy claim (82.43% top-1 for the 384×384 W2A8 model) rests on the two-stage procedure (knowledge distillation followed by lightweight quantization-aware recovery) being sufficient to offset the drop from complete ternarization of weights, embeddings, and all LayerNorm parameters. No ablations are shown on recovery-phase length, learning rates, or accuracy immediately before versus after the recovery stage; this is load-bearing because ternarizing LayerNorm and patch embeddings is more aggressive than in prior partial-ternary ViT works.
- [Method / TernaryLayerNorm definition] TernaryLayerNorm is introduced as a novel operator to handle normalization parameters in ternary form, yet the manuscript provides insufficient detail on its forward/backward pass, how per-channel scaling factors interact with it, and whether it introduces additional free parameters beyond those already listed. This directly affects both reproducibility and the strength of the 'fully ternary' claim.
- [Hardware Implementation / Results] Hardware demonstration on the dual-core XTensa LX7 inside the ESP32-S3 is presented as a first, but the manuscript lacks quantitative details on inference latency, power, or memory usage during on-device execution, which are necessary to substantiate the deployment feasibility claim alongside the accuracy numbers.
minor comments (2)
- [Abstract and §4] The abstract and method sections use 'lightweight quantization-aware recovery phase' without specifying its duration or hyper-parameters; adding these would improve clarity without altering the core claims.
- [Table of results] Ensure all tables comparing against prior ternary ViT methods explicitly state the bit-widths for weights, activations, and any retained full-precision components so that the 'up to 8 pp' improvement is directly verifiable.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate planned revisions to improve the manuscript's clarity, reproducibility, and completeness.
read point-by-point responses
-
Referee: [Training Procedure / Experimental Setup] The central accuracy claim (82.43% top-1 for the 384×384 W2A8 model) rests on the two-stage procedure (knowledge distillation followed by lightweight quantization-aware recovery) being sufficient to offset the drop from complete ternarization of weights, embeddings, and all LayerNorm parameters. No ablations are shown on recovery-phase length, learning rates, or accuracy immediately before versus after the recovery stage; this is load-bearing because ternarizing LayerNorm and patch embeddings is more aggressive than in prior partial-ternary ViT works.
Authors: We agree that additional ablations would strengthen the validation of the two-stage training procedure. In the revised manuscript, we will include new experiments ablating recovery-phase length and learning rates, along with accuracy metrics immediately before and after the recovery stage. These additions will better illustrate how the lightweight quantization-aware recovery offsets performance drops from full ternarization of LayerNorm and embeddings. revision: yes
-
Referee: [Method / TernaryLayerNorm definition] TernaryLayerNorm is introduced as a novel operator to handle normalization parameters in ternary form, yet the manuscript provides insufficient detail on its forward/backward pass, how per-channel scaling factors interact with it, and whether it introduces additional free parameters beyond those already listed. This directly affects both reproducibility and the strength of the 'fully ternary' claim.
Authors: We appreciate this observation on the need for greater methodological detail. We will revise the manuscript to provide explicit mathematical formulations for the forward and backward passes of TernaryLayerNorm, clarify how per-channel scaling factors from TernaryBitConv2d interact with it during normalization, and explicitly confirm that no additional free parameters are introduced beyond the ternary weights and scales already described. revision: yes
-
Referee: [Hardware Implementation / Results] Hardware demonstration on the dual-core XTensa LX7 inside the ESP32-S3 is presented as a first, but the manuscript lacks quantitative details on inference latency, power, or memory usage during on-device execution, which are necessary to substantiate the deployment feasibility claim alongside the accuracy numbers.
Authors: We acknowledge the value of these metrics for fully substantiating on-device feasibility. The model size (5.81 MB for the Small variant) is already reported and directly addresses the memory footprint, which is the primary constraint on microcontrollers. We will expand the hardware section in revision to include any available runtime memory measurements and implementation notes. Comprehensive latency and power figures were not obtained in our initial deployment experiments due to hardware profiling limitations; we will note this limitation and identify it as future work while highlighting the achieved on-device accuracy. revision: partial
Circularity Check
No circularity: empirical results from measured accuracies
full rationale
This is an empirical engineering paper whose central claims consist of measured ImageNet-1K top-1 accuracies (e.g., 82.43% for W2A8 DeiT-III-S at 384×384) obtained after applying the proposed TernaryBitConv2d and TernaryLayerNorm operators plus a two-stage training procedure. No mathematical derivation, first-principles prediction, or uniqueness theorem is presented that reduces by construction to fitted inputs, self-citations, or renamed empirical patterns. The reported compression ratios and hardware deployment results are direct experimental outcomes rather than self-referential definitions.
Axiom & Free-Parameter Ledger
free parameters (1)
- per-channel scaling factors
axioms (1)
- domain assumption Knowledge distillation followed by quantization-aware recovery can restore accuracy after full ternarization of all parameters
invented entities (1)
-
TernaryLayerNorm
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce two novel operators: TernaryBitConv2d with per-channel scaling for patch embedding and TernaryLayerNorm... trained using knowledge distillation, followed by a lightweight quantization-aware recovery phase.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
FTerViT is the first to show that the most fragile components of ViT: patch embedding, LayerNorms, and classifier head can be ternarized to {-1,0,+1}
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[2]
Training data-efficient image transformers & distillation through attention
Hugo Touvron et al. Training data-efficient image transformers & distillation through attention. ICML, 2021
work page 2021
-
[3]
Deit iii: Revenge of the vit.ECCV, 2022
Hugo Touvron, Matthieu Cord, and Hervé Jégou. Deit iii: Revenge of the vit.ECCV, 2022
work page 2022
-
[4]
Tinytracker: Ultra-fast and ultra-low-power edge vision for in-sensor gaze estimation
Pietro Bonazzi, Thomas Rüegg, Sizhen Bian, Yawei Li, and Michele Magno. Tinytracker: Ultra-fast and ultra-low-power edge vision for in-sensor gaze estimation. InIEEE Sensors, 2023
work page 2023
-
[5]
Vit-1.58b: Mobile vision transformers in the 1-bit era.arXiv preprint arXiv:2406.18051, 2024
Zhengqing Yuan et al. Vit-1.58b: Mobile vision transformers in the 1-bit era.arXiv preprint arXiv:2406.18051, 2024
-
[6]
Tervit: An efficient ternary vision transformer.arXiv preprint arXiv:2201.08050, 2022
Sheng Xu, Yanjing Li, Teli Ma, Bohan Zeng, Baochang Zhang, Peng Gao, and Jinhu Lu. Tervit: An efficient ternary vision transformer.arXiv preprint arXiv:2201.08050, 2022
-
[7]
Bitmedvit: Ternary-quantized vision transformer for medical ai assistants on the edge.ICCAD, 2025
Mikolaj Walczak, Uttej Kallakuri, Edward Humes, Xiaomin Lin, and Tinoosh Mohsenin. Bitmedvit: Ternary-quantized vision transformer for medical ai assistants on the edge.ICCAD, 2025
work page 2025
-
[8]
Shu-Hao Zhang, Yue-Lu Gong, Kun-Peng Ning, Hao-Yang He, Yu-Jie Yuan, Jin-Dong Wang, and Shao-Qun Zhang. TernaryCLIP: Efficiently compressing vision-language models with ternary weights and distilled knowledge.arXiv preprint arXiv:2510.21879, 2025
-
[9]
Bivit: Extremely compressed binary vision transformers.ICCV, 2023
Yefei He et al. Bivit: Extremely compressed binary vision transformers.ICCV, 2023
work page 2023
-
[10]
Bi-vit: Pushing the limit of vision transformer quantization.AAAI, 2024
Yanjing Li et al. Bi-vit: Pushing the limit of vision transformer quantization.AAAI, 2024
work page 2024
-
[11]
BinaryViT: Pushing binary vision transformers towards convolutional models.CVPR Workshops, 2023
Phuoc-Hoan Charles Le and Xinlin Li. BinaryViT: Pushing binary vision transformers towards convolutional models.CVPR Workshops, 2023
work page 2023
-
[12]
Junrui Xiao, Zhikai Li, Lianwei Yang, and Qingyi Gu. BinaryViT: Towards efficient and accurate binary vision transformers.IEEE Transactions on Circuits and Systems for Video Technology, 2025
work page 2025
-
[13]
Q-vit: Accurate and fully quantized low-bit vision transformer.NeurIPS, 2022
Yanjing Li et al. Q-vit: Accurate and fully quantized low-bit vision transformer.NeurIPS, 2022
work page 2022
-
[14]
Oscillation-free quantization for low-bit vision transformers.ICML, 2023
Shih-Yang Liu, Zechun Liu, and Kwang-Ting Cheng. Oscillation-free quantization for low-bit vision transformers.ICML, 2023
work page 2023
-
[15]
Quantization variation: A new perspective on training transformers with low-bit precision.TMLR, 2024
Xijie Huang, Zhiqiang Shen, Pingcheng Dong, and Tim Kwang-Ting Cheng. Quantization variation: A new perspective on training transformers with low-bit precision.TMLR, 2024
work page 2024
-
[16]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or -1.arXiv preprint arXiv:1602.02830, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[17]
DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients.arXiv preprint arXiv:1606.06160, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[18]
Zhen Dong, Zhewei Yao, Amir Gholami, Michael W. Mahoney, and Kurt Keutzer. Hawq: Hessian aware quantization of neural networks with mixed-precision.ICCV, 2019
work page 2019
-
[19]
Post-training quantization for vision transformer.NeurIPS, 2021
Zhenhua Liu, Yunhe Wang, Kai Han, Wei Zhang, Siwei Ma, and Wen Gao. Post-training quantization for vision transformer.NeurIPS, 2021
work page 2021
-
[20]
Zhihang Yuan et al. Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization.ECCV, 2022
work page 2022
-
[21]
Fq-vit: Post-training quantization for fully quantized vision transformer.IJCAI, 2022
Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, and Shuchang Zhou. Fq-vit: Post-training quantization for fully quantized vision transformer.IJCAI, 2022. 10
work page 2022
-
[22]
Repq-vit: Scale reparameterization for post-training quantization of vision transformers.ICCV, 2023
Zhikai Li, Junrui Xiao, Lianwei Yang, and Qingyi Gu. Repq-vit: Scale reparameterization for post-training quantization of vision transformers.ICCV, 2023
work page 2023
-
[23]
Lianwei Yang, Haisong Gong, and Qingyi Gu. DopQ-ViT: Towards distribution-friendly and outlier-aware post-training quantization for vision transformers.TMLR, 2024
work page 2024
-
[24]
Navin Ranjan and Andreas Savakis. LRP-QViT: Mixed-precision vision transformer quantiza- tion via layer-wise relevance propagation.TMLR, 2024
work page 2024
-
[25]
Yu-Shan Tai and An-Yeu Wu. AMP-ViT: Optimizing vision transformer efficiency with adaptive mixed-precision post-training quantization.WACV, 2025
work page 2025
-
[26]
Importance estimation for neural network pruning.CVPR, 2019
Pavlo Molchanov, Arun Mallya, Stephen Tyree, Iuri Frosio, and Jan Kautz. Importance estimation for neural network pruning.CVPR, 2019
work page 2019
-
[27]
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits
Shuming Ma, Hongyu Wang, Lingxiao Ma, Lei Wang, Wenhui Wang, Shaohan Huang, Li Dong, Ruiping Wang, Jilong Xue, and Furu Wei. The era of 1-bit llms: All large language models are in 1.58 bits.arXiv preprint arXiv:2402.17764, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
Fengfu Li, Bo Zhang, and Bin Liu. Ternary weight networks.arXiv preprint arXiv:1605.04711, 2016
-
[29]
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. Estimating or propagating gradients through stochastic neurons for conditional computation.arXiv preprint arXiv:1308.3432, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[30]
Data-free quantization through weight equalization and bias correction.ICCV, 2019
Markus Nagel, Mart van Baalen, Tijmen Blankevoort, and Max Welling. Data-free quantization through weight equalization and bias correction.ICCV, 2019
work page 2019
-
[31]
Quantizing deep convolutional networks for efficient inference: A whitepaper
Raghuraman Krishnamoorthi. Quantizing deep convolutional networks for efficient inference: A whitepaper.arXiv preprint arXiv:1806.08342, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[32]
ReActNet: Towards precise binary neural network with generalized activation functions.ECCV, 2020
Zechun Liu, Zhiqiang Shen, Marios Savvides, and Kwang-Ting Cheng. ReActNet: Towards precise binary neural network with generalized activation functions.ECCV, 2020
work page 2020
-
[33]
Minsoo Kim, Sihwa Lee, Sukjin Hong, Du-Seong Chang, and Jungwook Choi. Understanding and improving knowledge distillation for quantization aware training of large transformer encoders.EMNLP, 2022
work page 2022
-
[34]
Navin Ranjan and Andreas Savakis. Vision transformer quantization with multi-step knowledge distillation.arXiv preprint arXiv:2406.14004, 2024
-
[35]
Self-supervised quantization-aware knowledge distillation.AIS- TATS, 2024
Kaiqi Zhao and Ming Zhao. Self-supervised quantization-aware knowledge distillation.AIS- TATS, 2024
work page 2024
-
[36]
Sharath Turuvekere Sreenivas, Saurav Muralidharan, Raviraj Joshi, Marcin Chochowski, Ameya Sunil Mahabaleshwarkar, Gerald Shen, Jiaqi Zeng, Zijia Chen, Yoshi Suhara, Shizhe Diao, Chenhan Yu, Wei-Chun Chen, Hayley Ross, Oluwatobi Olabiyi, Ashwath Aithal, Oleksii Kuchaiev, Daniel Korzekwa, Pavlo Molchanov, Mostofa Patwary, Mohammad Shoeybi, Jan Kautz, and B...
-
[37]
Meng Xin, Sweta Priyadarshi, Jingyu Xin, Bilal Kartal, Aditya Vavre, Asma Kuriparambil Thekkumpate, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Ido Shahaf, Akhiad Bercovich, et al. Quantization-aware distillation for nvfp4 inference accuracy recovery.arXiv preprint arXiv:2601.20088, 2026
-
[38]
TinyViT: Fast pretraining distillation for small vision transformers.ECCV, 2022
Kan Wu, Jinnian Zhang, Houwen Peng, Mengchen Liu, Bin Xiao, Jianlong Fu, and Lu Yuan. TinyViT: Fast pretraining distillation for small vision transformers.ECCV, 2022
work page 2022
-
[39]
Seungwoo Son, Jegwang Ryu, Namhoon Lee, and Jaeho Lee. The role of masking for efficient supervised knowledge distillation of vision transformers.arXiv preprint arXiv:2302.10494, 2023
-
[40]
Logit standardization in knowledge distillation.CVPR, 2024
Shangquan Sun, Wenqi Ren, Jingzhi Li, Rui Wang, and Xiaochun Cao. Logit standardization in knowledge distillation.CVPR, 2024. 11
work page 2024
-
[41]
Decoupled weight decay regularization.ICLR, 2019
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization.ICLR, 2019
work page 2019
-
[42]
Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S. Modha. Learned step size quantization.ICLR, 2020
work page 2020
-
[43]
Quantifying attention flow in transformers.Proc
Samira Abnar and Willem Zuidema. Quantifying attention flow in transformers.Proc. ACL, 2020
work page 2020
-
[44]
Yinan Liang, Ziwei Wang, Xiuwei Xu, Yansong Tang, Jie Zhou, and Jiwen Lu. Mcuformer: Deploying vision transformers on microcontrollers with limited memory.arXiv preprint arXiv:2310.16898, 2023
-
[45]
Jianlei Yang, Jiacheng Liao, Fanding Lei, Meichen Liu, Lingkun Long, Junyi Chen, Han Wan, Bei Yu, and Weisheng Zhao. Tinyformer: Efficient transformer design and deployment on tiny devices.arXiv preprint arXiv:2311.01759, 2023
-
[46]
Can llms revolutionize the design of explainable and efficient tinyml models?IJCNN, 2025
Christophe El Zeinaty, Wassim Hamidouche, Glenn Herrou, Daniel Ménard, and Mérouane Debbah. Can llms revolutionize the design of explainable and efficient tinyml models?IJCNN, 2025
work page 2025
-
[47]
Ouyang Xu, Tao Ge, Thomas Hartvigsen, Zhisong Zhang, Haitao Mi, and Dong Yu. Low-bit quantization favors undertrained LLMs: Scaling laws for quantized LLMs with 100T training tokens.arXiv preprint arXiv:2411.17691, 2024
-
[48]
Ali Hassani, Steven Walton, Nikhil Shah, Abulikemu Abuduweili, Jiachen Li, and Humphrey Shi. Escaping the big data paradigm with compact transformers.arXiv preprint arXiv:2104.05704, 2021
-
[49]
Shashank Nag, Alan T. L. Bacellar, Zachary Susskind, Anshul Jha, Logan Liberty, Aman Sivakumar, Eugene B. John, Karthik Kailas, Paulo M. Lima, Neeraja Yadwadkar, Felipe M. G. França, and Lizy K. John. Ll-vit: Edge deployable vision transformers with look up table neurons.FPT, 2025
work page 2025
-
[50]
I-vit: Integer-only quantization for efficient vision transformer inference.ICCV, 2023
Zhikai Li and Qingyi Gu. I-vit: Integer-only quantization for efficient vision transformer inference.ICCV, 2023
work page 2023
-
[51]
Mobilenetv2: Inverted residuals and linear bottlenecks.CVPR, 2018
Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, and Liang-Chieh Chen. Mobilenetv2: Inverted residuals and linear bottlenecks.CVPR, 2018. A Experiments Appendix A.1 Benchmark Results on CIFAR-10 and CIFAR-100 As shown in Table 6, our ternary model achieves 97.43% top-1 accuracy on CIFAR-10 and 86.01% on CIFAR-100. These results are within 0.09...
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.