pith. sign in

arxiv: 2503.03088 · v4 · submitted 2025-03-05 · 💻 cs.CV · cs.AR· cs.LG

AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization

Pith reviewed 2026-05-23 01:43 UTC · model grok-4.3

classification 💻 cs.CV cs.ARcs.LG
keywords post-training quantizationSegment Anything ModelSAM4-bit quantizationhardware-compatible quantizationmodel compressionFPGA implementationedge deployment
0
0 comments X

The pith

AHCQ-SAM applies four targeted techniques to overcome specific quantization barriers in SAM, enabling higher-accuracy 4-bit models for segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to demonstrate that post-training quantization of the Segment Anything Model can be made accurate and hardware-friendly by directly addressing four distinct problems in its weights and activations. It introduces a framework with four matching components that regularize weights, handle skewed values, group channels, and rescale attention scores. If these steps work as described, 4-bit versions of SAM and SAM2 would run with substantially less accuracy loss than prior methods while fitting edge hardware constraints. The reported gains appear on standard detection and segmentation benchmarks plus an FPGA test. This would matter for moving large zero-shot segmentation models from cloud servers to local devices.

Core claim

The central claim is that the four challenges—ill-conditioned weights, skewed post-GELU activations, inter-channel variance in linear layers, and heterogeneous attention scores—limit existing PTQ for SAM, and that the proposed Activation-aware Condition Number Reduction, Hybrid Log-Uniform Quantization, Channel-Aware Grouping, and Logarithmic Nonlinear Quantization together remove those limits, producing 15.2 percent higher mAP on COCO for 4-bit SAM-B and 14.01 percent higher J&F on SA-V for 4-bit SAM2-Tiny, plus measured FPGA speed and power gains.

What carries the argument

The AHCQ-SAM PTQ framework built from four synergistic components that each target one listed quantization challenge in SAM's architecture.

If this is right

  • 4-bit SAM-B with Faster R-CNN reaches 15.2 percent higher mAP on COCO than the previous best PTQ method.
  • 4-bit SAM2-Tiny reaches 14.01 percent higher J&F on the SA-V Test set than prior methods.
  • An FPGA implementation delivers 7.12 times speedup and 6.62 times better power efficiency versus the floating-point baseline.
  • The work supplies the first reported PTQ benchmark numbers for SAM2 models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same component design could be tested on other large vision transformers that share similar activation and attention patterns.
  • Direct integration of the channel-grouping and log-scale rules into hardware accelerators might reduce memory traffic further.
  • If the accuracy holds at 4 bits, on-device zero-shot segmentation becomes feasible on mobile or embedded processors without retraining.

Load-bearing premise

The four listed quantization challenges are the dominant performance limiters for existing PTQ on SAM and that the four proposed components mitigate them without introducing offsetting accuracy or hardware costs.

What would settle it

Measuring 4-bit SAM-B with Faster R-CNN on the COCO dataset and finding that mAP does not exceed the prior SOTA method by a clear margin would falsify the accuracy claim.

Figures

Figures reproduced from arXiv: 2503.03088 by Kentaro Yoshioka, Shengchuan Zhang, Shimpei Ando, Weiqi Yan, Wenlun Zhang, Yunshan Zhong.

Figure 1
Figure 1. Figure 1: Challenges in SAM quantization (Data obtained from [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: AHCPTQ framework: HLUQ refines quantization resolution for post-GELU activations, while CAG effectively groups parame [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Cosine similarity of normalized quantization parameter [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hardware cost analysis of the linear projection layer in [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Ablation study comparing the effectiveness of HLUQ [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dependence of SAM performance on group number in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
Figure 8
Figure 8. Figure 8: Range distribution and cosine similarity of Linear-1 ac [PITH_FULL_IMAGE:figures/full_fig_p011_8.png] view at source ↗
Figure 7
Figure 7. Figure 7: Range distribution and cosine similarity of QKV projec [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 10
Figure 10. Figure 10: Range distribution and cosine similarity of pre [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Range distribution and cosine similarity of pre [PITH_FULL_IMAGE:figures/full_fig_p012_11.png] view at source ↗
Figure 13
Figure 13. Figure 13: Overview of the evaluation system and the accelerator [PITH_FULL_IMAGE:figures/full_fig_p013_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: FPGA validation environment. uration: (1) a standard FP32 implementation, (2) a default INT8 implementation. In all designs, floating-point opera￾tions such as quantization and dequantization are handled by an IP generator utilizing on-chip DSP resources. The detailed experimental results are presented in Sec. 4.4. D. Experiment on Vision Transformers To ensure that AHCPTQ generalizes to other vision mode… view at source ↗
Figure 15
Figure 15. Figure 15: Qualitative comparison of segmentation masks generated by different quantization methods on SAM-B with YOLOX. Our [PITH_FULL_IMAGE:figures/full_fig_p015_15.png] view at source ↗
read the original abstract

The Segment Anything Model (SAM) has revolutionized image and video segmentation with its powerful zero-shot capabilities. However, its massive parameter scale and high computational demands hinder efficient deployment on resource-constrained edge devices. While Post-Training Quantization (PTQ) offers a practical solution, existing methods still fail to handle four critical quantization challenges: (1) ill-conditioned weights; (2) skewed and long-tailed post-GELU activations; (3) pronounced inter-channel variance in linear projections; and (4) exponentially scaled and heterogeneous attention scores. To mitigate these bottlenecks, we propose AHCQ-SAM, an accurate and hardware-compatible PTQ framework featuring four synergistic components: (1) Activation-aware Condition Number Reduction (ACNR), which regularizes weight matrices via a proximal point algorithm to suppress ill-conditioning; (2) Hybrid Log-Uniform Quantization (HLUQ), which combines power-of-two and uniform quantizers to capture skewed post-GELU activations; (3) Channel-Aware Grouping (CAG), which clusters channels with homogeneous statistics to achieve high accuracy with minimal hardware overhead; and (4) Logarithmic Nonlinear Quantization (LNQ), which utilizes logarithmic transformations to adaptively adjust quantization resolution for exponential and heterogeneous attention scores. Experimental results demonstrate that AHCQ-SAM outperforms current methods on SAM. Compared with the SOTA method, it achieves a 15.2% improvement in mAP for 4-bit SAM-B with Faster R-CNN on the COCO dataset. Furthermore, we establish a PTQ benchmark for SAM2, where AHCQ-SAM yields a 14.01% improvement in J&F for 4-bit SAM2-Tiny on the SA-V Test dataset. Finally, FPGA-based implementation validates the practical utility of AHCQ-SAM, delivering a 7.12x speedup and a 6.62x power efficiency improvement over the floating-point baseline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper presents AHCQ-SAM, a post-training quantization (PTQ) framework for SAM and SAM2 that identifies four quantization challenges (ill-conditioned weights, skewed post-GELU activations, inter-channel variance, heterogeneous attention scores) and proposes four components (ACNR via proximal point algorithm, HLUQ combining power-of-two and uniform quantizers, CAG for homogeneous channel clustering, LNQ with logarithmic transformations) to address them. It reports 15.2% mAP gain for 4-bit SAM-B on COCO with Faster R-CNN and 14.01% J&F gain for 4-bit SAM2-Tiny on SA-V Test, plus 7.12x FPGA speedup.

Significance. If substantiated, the work would advance practical PTQ for large vision transformers by targeting SAM-specific issues and establishing a SAM2 PTQ benchmark. The FPGA implementation provides a concrete hardware demonstration. Strengths include the attempt to link specific model properties (condition numbers, activation tails, attention heterogeneity) to quantization design.

major comments (3)
  1. [Method (ACNR)] Method section (ACNR): no pre/post condition-number diagnostics or matrix-norm measurements are supplied to verify that the proximal-point regularization actually suppresses ill-conditioning rather than merely altering the loss landscape.
  2. [Experiments] Experiments section: ablation tables isolating each of ACNR/HLUQ/CAG/LNQ versus a common PTQ baseline are absent, so the 15.2% mAP and 14.01% J&F gains cannot be attributed to the claimed mechanisms versus calibration-set choice or hyper-parameter search.
  3. [Hardware evaluation] Hardware evaluation: the single FPGA result reports 7.12x speedup and 6.62x power efficiency but provides no cycle-accurate or LUT/BRAM overhead figures for the logarithmic and grouping operations introduced by HLUQ and CAG.
minor comments (2)
  1. [Introduction] Abstract and §1: the four challenges are asserted as dominant without a supporting citation or preliminary measurement showing they exceed other known PTQ error sources for SAM.
  2. [Method (HLUQ/LNQ)] Notation in HLUQ and LNQ: the hybrid quantizer and logarithmic mapping lack explicit equations defining the scale factors and breakpoints, hindering reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and indicate planned revisions.

read point-by-point responses
  1. Referee: [Method (ACNR)] Method section (ACNR): no pre/post condition-number diagnostics or matrix-norm measurements are supplied to verify that the proximal-point regularization actually suppresses ill-conditioning rather than merely altering the loss landscape.

    Authors: We agree that direct pre- and post-ACNR condition number and matrix norm measurements would strengthen validation of the proximal point algorithm's effect. The ACNR component is motivated by the ill-conditioning analysis in Section 3.1. In the revised manuscript we will add these diagnostics for representative weight matrices to confirm suppression of ill-conditioning. revision: yes

  2. Referee: [Experiments] Experiments section: ablation tables isolating each of ACNR/HLUQ/CAG/LNQ versus a common PTQ baseline are absent, so the 15.2% mAP and 14.01% J&F gains cannot be attributed to the claimed mechanisms versus calibration-set choice or hyper-parameter search.

    Authors: We acknowledge that component-wise ablations are needed to isolate contributions. While overall comparisons to prior PTQ methods are provided, we will add an ablation table in the revised manuscript that evaluates each technique (ACNR, HLUQ, CAG, LNQ) individually against a shared baseline to clarify attribution of the reported gains. revision: yes

  3. Referee: [Hardware evaluation] Hardware evaluation: the single FPGA result reports 7.12x speedup and 6.62x power efficiency but provides no cycle-accurate or LUT/BRAM overhead figures for the logarithmic and grouping operations introduced by HLUQ and CAG.

    Authors: We recognize that detailed overhead metrics for the logarithmic and grouping operations would better demonstrate hardware compatibility. Our FPGA results report end-to-end gains; we will add available LUT/BRAM utilization data for these operations in the revision, though cycle-accurate per-operation breakdowns may be limited by our current implementation setup. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical results are independently measured

full rationale

The paper identifies four quantization challenges and proposes four corresponding modules (ACNR, HLUQ, CAG, LNQ) to address them, then reports measured accuracy gains on COCO and SA-V datasets. No equations appear in the provided text that would define any reported prediction or improvement as equivalent to a fitted parameter or input by construction. No self-citations are invoked to justify uniqueness theorems, ansatzes, or load-bearing premises. The central claims rest on external benchmark comparisons rather than reducing to self-referential definitions or statistical forcing from the same calibration data. This qualifies as a self-contained empirical contribution with no detectable circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; free parameters likely exist inside the proximal algorithm, grouping, and log transforms but are not enumerated. No invented entities or domain axioms are stated.

pith-pipeline@v0.9.0 · 5913 in / 1032 out tokens · 51901 ms · 2026-05-23T01:43:31.166214+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization

    cs.CV 2026-04 unverdicted novelty 7.0

    COD-TDQ uses token-group scaling and dual-constraint projection to fix 4-bit activation quantization for camouflaged object detection, delivering more than 0.12 higher Sα scores than prior methods on four benchmarks w...

Reference graph

Works this paper leans on

49 extracted references · 49 canonical work pages · cited by 1 Pith paper · 3 internal anchors

  1. [1]

    Post train- ing 4-bit quantization of convolutional networks for rapid- deployment

    Ron Banner, Yury Nahshan, Daniel Soudry, et al. Post train- ing 4-bit quantization of convolutional networks for rapid- deployment. In Proceedings of the Advances in Neural In- formation Processing Systems, pages 7950–7958, 2019. 4

  2. [2]

    Sam-med2d

    Junlong Cheng, Jin Ye, Zhongying Deng, Jianpin Chen, Tianbin Li, Haoyu Wang, Yanzhou Su, Ziyan Huang, Ji- long Chen, Lei Jiang, et al. Sam-med2d. arXiv preprint arXiv:2308.16184, 2023. 1

  3. [3]

    Data-free network compression via parametric non-uniform mixed precision quantization

    Vladimir Chikin and Mikhail Antiukh. Data-free network compression via parametric non-uniform mixed precision quantization. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 450– 459, 2022. 2, 4

  4. [4]

    Towards accurate post- training quantization for vision transformer

    Yifu Ding, Haotong Qin, Qinghua Yan, Zhenhua Chai, Junjie Liu, Xiaolin Wei, and Xianglong Liu. Towards accurate post- training quantization for vision transformer. In Proceedings of the 30th ACM international conference on multimedia , pages 5380–5388, 2022. 2, 4

  5. [5]

    K., McKinstry, J

    Steven K Esser, Jeffrey L McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S Modha. Learned step size quantization. arXiv preprint arXiv:1902.08153, 2019. 1

  6. [6]

    Jumping through local minima: Quantization in the loss landscape of vision transformers

    Natalia Frumkin, Dibakar Gope, and Diana Marculescu. Jumping through local minima: Quantization in the loss landscape of vision transformers. In Proceedings of the IEEE/CVF International Conference on Computer Vision , pages 16978–16988, 2023. 2

  7. [7]

    YOLOX: Exceeding YOLO Series in 2021

    Z Ge. Yolox: Exceeding yolo series in 2021. arXiv preprint arXiv:2107.08430, 2021. 6

  8. [8]

    Differ- entiable soft quantization: Bridging full-precision and low- bit neural networks

    Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan. Differ- entiable soft quantization: Bridging full-precision and low- bit neural networks. In Proceedings of the IEEE/CVF inter- national conference on computer vision , pages 4852–4861,

  9. [9]

    Daq: distribution-aware quantization for deep image super-resolution networks

    Cheeun Hong, Heewon Kim, Junghun Oh, and Ky- oung Mu Lee. Daq: distribution-aware quantization for deep image super-resolution networks. arXiv preprint arXiv:2012.11230, 2020. 2, 4

  10. [10]

    1.1 computing’s energy problem (and what we can do about it)

    Mark Horowitz. 1.1 computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC) , pages 10–14, 2014. 4, 2

  11. [11]

    Multi-dimensional vi- sion transformer compression via dependency guided gaus- sian process search

    Zejiang Hou and Sun-Yuan Kung. Multi-dimensional vi- sion transformer compression via dependency guided gaus- sian process search. In Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 3669–3678, 2022. 1

  12. [12]

    Detrs with hybrid matching

    Ding Jia, Yuhui Yuan, Haodi He, Xiaopei Wu, Haojun Yu, Weihong Lin, Lei Sun, Chao Zhang, and Han Hu. Detrs with hybrid matching. In Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition , pages 19702–19712, 2023. 6

  13. [13]

    Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross B

    Alexander Kirillov, Eric Mintun, Nikhila Ravi, Hanzi Mao, Chlo´e Rolland, Laura Gustafson, Tete Xiao, Spencer White- head, Alexander C. Berg, Wan-Yen Lo, Piotr Doll ´ar, and Ross B. Girshick. Segment anything. 2023 IEEE/CVF In- ternational Conference on Computer Vision (ICCV) , pages 3992–4003, 2023. 1, 2

  14. [14]

    Quantizing deep convolutional networks for efficient inference: A whitepaper

    Raghuraman Krishnamoorthi. Quantizing deep convolu- tional networks for efficient inference: A whitepaper. arXiv preprint arXiv:1806.08342, 2018. 1

  15. [16]

    Additive powers-of- two quantization: An efficient non-uniform discretization for neural networks

    Yuhang Li, Xin Dong, and Wei Wang. Additive powers-of- two quantization: An efficient non-uniform discretization for neural networks. arXiv preprint arXiv:1909.13144, 2019. 1

  16. [17]

    Brecq: Pushing the limit of post-training quantization by block reconstruction.arXiv preprint arXiv:2102.05426,

    Yuhang Li, Ruihao Gong, Xu Tan, Yang Yang, Peng Hu, Qi Zhang, Fengwei Yu, Wei Wang, and Shi Gu. Brecq: Pushing the limit of post-training quantization by block reconstruc- tion. arXiv preprint arXiv:2102.05426, 2021. 2, 6, 4, 5

  17. [18]

    Q-vit: Accurate and fully quantized low-bit vision transformer

    Yanjing Li, Sheng Xu, Baochang Zhang, Xianbin Cao, Peng Gao, and Guodong Guo. Q-vit: Accurate and fully quantized low-bit vision transformer. Advances in neural information processing systems, 35:34451–34463, 2022. 1

  18. [19]

    I-vit: Integer-only quantization for efficient vision transformer inference

    Zhikai Li and Qingyi Gu. I-vit: Integer-only quantization for efficient vision transformer inference. In Proceedings of the IEEE/CVF International Conference on Computer Vi- sion, pages 17065–17075, 2023. 1

  19. [20]

    Patch similarity aware data-free quantization for vision transformers

    Zhikai Li, Liping Ma, Mengjuan Chen, Junrui Xiao, and Qingyi Gu. Patch similarity aware data-free quantization for vision transformers. In European conference on computer vision, pages 154–170. Springer, 2022. 2

  20. [21]

    Repq- vit: Scale reparameterization for post-training quantization of vision transformers

    Zhikai Li, Junrui Xiao, Lianwei Yang, and Qingyi Gu. Repq- vit: Scale reparameterization for post-training quantization of vision transformers. In Proceedings of the IEEE/CVF In- ternational Conference on Computer Vision , pages 17227– 17236, 2023. 2, 4

  21. [22]

    Microsoft coco: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision–ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13, pages 740–755. Springer, 2014. 6

  22. [23]

    Fq-vit: Post-training quantization for fully quantized vision transformer

    Yang Lin, Tianyu Zhang, Peiqin Sun, Zheng Li, and Shuchang Zhou. Fq-vit: Post-training quantization for fully quantized vision transformer. arXiv preprint arXiv:2111.13824, 2021. 2, 4

  23. [24]

    Pd-quant: Post-training quantiza- tion based on prediction difference metric

    Jiawei Liu, Lin Niu, Zhihang Yuan, Dawei Yang, Xinggang Wang, and Wenyu Liu. Pd-quant: Post-training quantiza- tion based on prediction difference metric. InProceedings of 9 the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24427–24437, 2023. 6, 4

  24. [25]

    Oscillation-free quantization for low-bit vision transformers

    Shih-Yang Liu, Zechun Liu, and Kwang-Ting Cheng. Oscillation-free quantization for low-bit vision transformers. In International Conference on Machine Learning , pages 21813–21824. PMLR, 2023. 1

  25. [26]

    Pq-sam: Post-training quantization for segment any- thing model

    Xiaoyu Liu, Xin Ding, Lei Yu, Yuanyuan Xi, Wei Li, Zhi- jun Tu, Jie Hu, Hanting Chen, Baoqun Yin, and Zhiwei Xiong. Pq-sam: Post-training quantization for segment any- thing model. In European Conference on Computer Vision, pages 420–437. Springer, 2024. 2

  26. [27]

    Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers

    Yijiang Liu, Huanrui Yang, Zhen Dong, Kurt Keutzer, Li Du, and Shanghang Zhang. Noisyquant: Noisy bias-enhanced post-training activation quantization for vision transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20321–20330, 2023. 2

  27. [28]

    Post-training quantization for vision trans- former

    Zhenhua Liu, Yunhe Wang, Kai Han, Wei Zhang, Siwei Ma, and Wen Gao. Post-training quantization for vision trans- former. Advances in Neural Information Processing Systems, 34:28092–28103, 2021. 1

  28. [29]

    Ptq4sam: Post-training quantization for seg- ment anything

    Chengtao Lv, Hong Chen, Jinyang Guo, Yifu Ding, and Xi- anglong Liu. Ptq4sam: Post-training quantization for seg- ment anything. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 15941– 15951, 2024. 2, 3, 6, 7, 5

  29. [30]

    Follow anything: Open- set detection, tracking, and following in real-time

    Alaa Maalouf, Ninad Jadhav, Krishna Murthy Jatavallab- hula, Makram Chahine, Daniel M V ogt, Robert J Wood, An- tonio Torralba, and Daniela Rus. Follow anything: Open- set detection, tracking, and following in real-time. IEEE Robotics and Automation Letters, 9(4):3283–3290, 2024. 1

  30. [31]

    Up or down? adap- tive rounding for post-training quantization

    Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Chris- tos Louizos, and Tijmen Blankevoort. Up or down? adap- tive rounding for post-training quantization. In International Conference on Machine Learning, pages 7197–7206. PMLR,

  31. [32]

    Faster r-cnn: Towards real-time object detection with region proposal networks

    Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun. Faster r-cnn: Towards real-time object detection with region proposal networks. IEEE transactions on pattern analysis and machine intelligence, 39(6):1137–1149, 2016. 6

  32. [33]

    arXiv preprint arXiv:2304.10261 (2023)

    Qiuhong Shen, Xingyi Yang, and Xinchao Wang. Anything- 3d: Towards single-view anything reconstruction in the wild. arXiv preprint arXiv:2304.10261, 2023. 1

  33. [34]

    Trio-vit: Post-training quantization and acceleration for softmax-free efficient vision transformer

    Huihong Shi, Haikuo Shao, Wendong Mao, and Zhongfeng Wang. Trio-vit: Post-training quantization and acceleration for softmax-free efficient vision transformer. arXiv preprint arXiv:2405.03882, 2024. 2

  34. [35]

    Tinysam: Pushing the envelope for efficient segment any- thing model

    Han Shu, Wenshuo Li, Yehui Tang, Yiman Zhang, Yi- hao Chen, Houqiang Li, Yunhe Wang, and Xinghao Chen. Tinysam: Pushing the envelope for efficient segment any- thing model. arXiv preprint arXiv:2312.13789, 2023. 2

  35. [36]

    Learnable lookup table for neural network quantization

    Longguang Wang, Xiaoyu Dong, Yingqian Wang, Li Liu, Wei An, and Yu Kuen Guo. Learnable lookup table for neural network quantization. 2022 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR), pages 12413– 12423, 2022. 2, 4

  36. [37]

    Towards accurate post-training network quantization via bit- split and stitching

    Peisong Wang, Qiang Chen, Xiangyu He, and Jian Cheng. Towards accurate post-training network quantization via bit- split and stitching. In Proceedings of the International Con- ference on Machine Learning, pages 9847–9856, 2020. 4

  37. [38]

    Detect any shadow: Segment anything for video shadow detection

    Yonghui Wang, Wengang Zhou, Yunyao Mao, and Houqiang Li. Detect any shadow: Segment anything for video shadow detection. IEEE Transactions on Circuits and Systems for Video Technology, 34(5):3782–3794, 2024. 1

  38. [39]

    Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization

    Xiuying Wei, Ruihao Gong, Yuhang Li, Xianglong Liu, and Fengwei Yu. Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization. arXiv preprint arXiv:2203.05740, 2022. 2, 3, 6, 4, 5

  39. [40]

    An energy-and-area- efficient cnn accelerator for universal powers-of-two quan- tization

    Tian Xia, Boran Zhao, Jian Ma, Gelin Fu, Wenzhe Zhao, Nanning Zheng, and Pengju Ren. An energy-and-area- efficient cnn accelerator for universal powers-of-two quan- tization. IEEE Transactions on Circuits and Systems I: Reg- ular Papers, 70(3):1242–1255, 2022. 4

  40. [41]

    Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization

    Zhihang Yuan, Chenhao Xue, Yiqi Chen, Qiang Wu, and Guangyu Sun. Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization. In European conference on computer vision , pages 191–207. Springer,

  41. [42]

    RPTQ: reorder-based post-training quantization for large language models

    Zhihang Yuan, Lin Niu, Jiawei Liu, Wenyu Liu, Xinggang Wang, Yuzhang Shang, Guangyu Sun, Qiang Wu, Jiaxiang Wu, and Bingzhe Wu. Rptq: Reorder-based post-training quantization for large language models. arXiv preprint arXiv:2304.01089, 2023. 5

  42. [43]

    DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

    Hao Zhang, Feng Li, Shilong Liu, Lei Zhang, Hang Su, Jun Zhu, Lionel M Ni, and Heung-Yeung Shum. Dino: Detr with improved denoising anchor boxes for end-to-end object detection. arXiv preprint arXiv:2203.03605, 2022. 6

  43. [44]

    Personalize segment anything model with one shot

    Renrui Zhang, Zhengkai Jiang, Ziyu Guo, Shilin Yan, Junt- ing Pan, Xianzheng Ma, Hao Dong, Peng Gao, and Hong- sheng Li. Personalize segment anything model with one shot. arXiv preprint arXiv:2305.03048, 2023. 1

  44. [45]

    Less is more: Focus attention for efficient detr

    Dehua Zheng, Wenhui Dong, Hailin Hu, Xinghao Chen, and Yunhe Wang. Less is more: Focus attention for efficient detr. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6674–6683, 2023. 1

  45. [46]

    Dy- namic dual trainable bounds for ultra-low precision super- resolution networks

    Yunshan Zhong, Mingbao Lin, Xunchao Li, Ke Li, Yun- hang Shen, Fei Chao, Yongjian Wu, and Rongrong Ji. Dy- namic dual trainable bounds for ultra-low precision super- resolution networks. In European Conference on Computer Vision, pages 1–18. Springer, 2022. 4

  46. [47]

    I&s-vit: An inclusive & stable method for pushing the limit of post-training vits quantization

    Yunshan Zhong, Jiawei Hu, Mingbao Lin, Mengzhao Chen, and Rongrong Ji. I&s-vit: An inclusive & stable method for pushing the limit of post-training vits quantization. arXiv preprint arXiv:2311.10126, 2023. 2, 4

  47. [48]

    Erq: Error reduction for post-training quanti- zation of vision transformers

    Yunshan Zhong, Jiawei Hu, You Huang, Yuxin Zhang, and Rongrong Ji. Erq: Error reduction for post-training quanti- zation of vision transformers. In Proceedings of the Interna- tional Conference on Machine Learning (ICML), 2024. 2

  48. [49]

    Towards accurate post-training quantization of vision transformers via error reduction

    Yunshan Zhong, You Huang, Jiawei Hu, Yuxin Zhang, and Rongrong Ji. Towards accurate post-training quantization of vision transformers via error reduction. IEEE Transactions on Pattern Analysis and Machine Intelligence , pages 1–18,

  49. [50]

    2 10 AHCPTQ: Accurate and Hardware-Compatible Post-Training Quantization for Segment Anything Model Supplementary Material A. Analysis of Inter-Channel Variation and Inter-Sample Similarity in SAM Model In this section, we provide an in-depth analysis of inter- channel variation and inter-sample similarity using the SAM-B model with YOLOX as the prompt de...