pith. machine review for the scientific record. sign in

arxiv: 2604.16855 · v1 · submitted 2026-04-18 · 💻 cs.CV

Recognition: unknown

When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:44 UTC · model grok-4.3

classification 💻 cs.CV
keywords camouflaged object detectionpost-training quantizationW4A4 quantizationtransformer modelsactivation rangetoken groupingboundary cues
0
0 comments X

The pith

Token-group quantization recovers camouflaged object detection performance under aggressive 4-bit constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Camouflaged object detection needs to identify objects that deliberately match their background through faint texture and boundary signals. Standard post-training W4A4 quantization of transformer models collapses on this task because heavy-tailed background tokens stretch the shared activation range and force those weak signals into the zero bin. The paper isolates this token-local bottleneck and shows it can be removed by grouping tokens for separate scaling and then projecting each group's clip range under two constraints. The resulting method raises Sα scores by more than 0.12 on four benchmarks and two models while using no retraining or fine-tuning. A reader would care because it makes accurate low-memory inference possible for real-world COD applications that must run on edge devices.

Core claim

In post-training W4A4 quantization of Transformer-based camouflaged object detection, heavy-tailed background tokens dominate the shared activation range, inflating the quantization step size and pushing weak boundary cues into the zero bin. COD-TDQ addresses this by Direct-Sum Token-Group (DSTG) which assigns scales per token group to remove cross-token domination, and Dual-Constraint Range Projection (DCRP) which projects clip ranges to bound the step-to-dispersion ratio and zero-bin mass, resulting in consistent Sα improvements exceeding 0.12 across four benchmarks and two models without retraining.

What carries the argument

COD-TDQ, a token-group dual-constraint activation quantization that pairs Direct-Sum Token-Group (DSTG) scale assignment with Dual-Constraint Range Projection (DCRP) to suppress cross-token range domination.

If this is right

  • COD-TDQ preserves subtle texture and boundary cues that standard W4A4 quantization erases.
  • Performance gains hold across CFRN and ESCNet baselines on four standard COD benchmarks.
  • The approach requires no retraining or task-specific fine-tuning after quantization.
  • Both the step-to-dispersion ratio and zero-bin mass remain bounded under 4-bit activations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same token-group scaling could help quantization in other vision tasks where background statistics overwhelm foreground signals.
  • Measuring zero-bin occupancy before and after the two steps on new transformer models would test how generally the bottleneck applies.
  • If token grouping proves central, similar local constraints might improve low-bit inference for other dense prediction problems.

Load-bearing premise

That the heavy-tailed background tokens creating a shared activation bottleneck is the main reason W4A4 fails for COD, and that token-group scaling plus range projection can fix it without retraining.

What would settle it

If applying the method to the same models and benchmarks produces Sα scores no better than prior quantization techniques, or if zero-bin mass stays high after DSTG and DCRP, the claimed solution to the token-local bottleneck would be refuted.

Figures

Figures reproduced from arXiv: 2604.16855 by Tianqi Li, Wenyu Fang, Xin He, Xu Cheng, Xue Geng, Yun Liu.

Figure 1
Figure 1. Figure 1: COD-specific W4A4 failure. Naive W4A4 inflates a shared clipping range, producing a coarse step size and high zero-bin mass that erases weak boundary evi￾dence. The inset summarizes representative diagnostics (c_g,\Delta ,\rho _0) and the associated S_\alpha collapse/recovery on CFRN/NC4K (rounded to three decimals). two Transformer COD models (CFRN and ESCNet [45]), we perform compre￾hensive and extensive… view at source ↗
Figure 2
Figure 2. Figure 2: Reduces cross-token scale interference. (a–c) Token-wise range dispar￾ity under FP32, naive W4A4, and DSTG: token-group scaling mitigates background￾dominated range inflation. (d–e) Boundary-region activation magnitudes before/after quantization: naive W4A4 collapses many small responses to zero, while COD-TDQ preserves them, reducing the zeroed-activation fraction from 41.6% to 14.2%. Step-to-dispersion r… view at source ↗
Figure 3
Figure 3. Figure 3: DCRP prevents zero-bin mass collapse. DCRP projects each token-group clip radius to satisfy a step-to-dispersion bound and a zero-bin mass bound. The frac￾tion of non-boundary token-groups exceeding the step-to-std threshold drops from 72.60% (pre-projection) to 0.00% after C1, and the fraction with pre-projection \rho _0>\mathrm {zr} drops from 98.36% (naive W4A4) to 20.87% under COD-TDQ statistics. which… view at source ↗
Figure 4
Figure 4. Figure 4: COD-TDQ (DSTG and DCRP) pipeline overview [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparison. The first column shows the input image and the GT mask. The remaining columns present the prediction masks produced by different quantization methods(RepQ-ViT, IGQ–ViT, PTQ4SAM). For each example, the two rows correspond to results obtained with the CFRN and ESCNet baselines, respectively [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
read the original abstract

Camouflaged object detection (COD) segments objects that intentionally blend with the background, so predictions depend on subtle texture and boundary cues. COD is often needed under tight on-device memory and latency budgets, making low-bit inference highly desirable. However, COD is unusually hard to quantify aggressively. We study post-training W4A4 quantization of Transformer-based COD and find a task-specific cliff: heavy-tailed background tokens dominate a shared activation range, inflating the step size and pushing weak-but-structured boundary cues into the zero bin. This exposes a token-local bottleneck -- remove cross-token range domination and bound the zero-bin mass under 4-bit activations. To address this, we introduce COD-TDQ, a COD-aware Token-group Dual-constraint activation Quantization method. COD-TDQ addresses this token-local bottleneck with two coupled steps: Direct-Sum Token-Group (DSTG) assigns token-group scales to suppress cross-token range domination, and Dual-Constraint Range Projection (DCRP) projects each token-group clip range to keep the step-to-dispersion ratio and the zero-bin mass bounded. Across four COD benchmarks and two baseline models (CFRN and ESCNet), COD-TDQ consistently achieves an S{\alpha}score more than 0.12 higher than that of the state-of-the-art quantization method without retraining. The code will be released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper identifies a task-specific failure mode in post-training W4A4 quantization for camouflaged object detection (COD): heavy-tailed background tokens dominate the shared activation range, inflating step size and pushing subtle boundary cues into the zero bin. It proposes COD-TDQ, which uses Direct-Sum Token-Group (DSTG) scaling to suppress cross-token range domination and Dual-Constraint Range Projection (DCRP) to bound zero-bin mass and step-to-dispersion ratio. On four COD benchmarks with CFRN and ESCNet, COD-TDQ reports Sα gains exceeding 0.12 over prior quantization methods without retraining or fine-tuning.

Significance. If the gains prove robust and causally attributable to the DSTG+DCRP bounds rather than unisolated implementation details, the result would be significant for enabling low-bit on-device COD inference. The work usefully diagnoses why standard quantization cliffs are more severe for tasks relying on weak, structured cues and offers a post-training fix that avoids retraining costs.

major comments (2)
  1. [Abstract] Abstract: the central claim that DSTG and DCRP 'keep the step-to-dispersion ratio and the zero-bin mass bounded' is asserted without any equations, pseudocode, or quantitative verification on real activation tensors showing that the bounds are actually achieved or that they correlate with the observed Sα lift; this causal link is load-bearing for both the bottleneck diagnosis and the 'without retraining' guarantee.
  2. [Experiments] The manuscript provides no ablation isolating DSTG from DCRP, no error-bar or statistical-significance analysis on the reported >0.12 Sα margin, and no verification that the zero-bin mass bound (rather than other factors) drives the improvement; these omissions prevent confirmation that the proposed constraints are the operative mechanism.
minor comments (1)
  1. The abstract states that code will be released; this is a positive step for reproducibility that should be retained.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have helped us identify areas where the manuscript can be strengthened in terms of clarity and empirical support. We address each major comment point by point below, indicating the revisions we will incorporate in the next version of the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central claim that DSTG and DCRP 'keep the step-to-dispersion ratio and the zero-bin mass bounded' is asserted without any equations, pseudocode, or quantitative verification on real activation tensors showing that the bounds are actually achieved or that they correlate with the observed Sα lift; this causal link is load-bearing for both the bottleneck diagnosis and the 'without retraining' guarantee.

    Authors: We agree that the abstract is a high-level summary and does not contain equations or direct verification. The full manuscript presents the mathematical definitions of DSTG and DCRP, including the step-to-dispersion ratio and zero-bin mass, in Section 3 with accompanying pseudocode. To address the concern, we have revised the abstract to include a concise reference to these bounding mechanisms. We have also added a new quantitative analysis subsection in the experiments, with figures showing the step-to-dispersion ratio and zero-bin mass computed on real activation tensors from the four COD benchmarks. These demonstrate that the bounds are achieved post-quantization and correlate with the reported Sα gains, supporting the post-training nature of the method. revision: yes

  2. Referee: [Experiments] The manuscript provides no ablation isolating DSTG from DCRP, no error-bar or statistical-significance analysis on the reported >0.12 Sα margin, and no verification that the zero-bin mass bound (rather than other factors) drives the improvement; these omissions prevent confirmation that the proposed constraints are the operative mechanism.

    Authors: We concur that isolating the individual contributions and providing statistical support would strengthen the causal attribution. In the revised manuscript, we have added a dedicated ablation study evaluating DSTG alone, DCRP alone, and their combination on all four benchmarks and both baseline models (CFRN and ESCNet). This shows that the full gains require both components. We now report Sα scores with standard deviations computed over five independent runs to provide error bars and context for the >0.12 margin. We have further included a targeted verification experiment that isolates the zero-bin mass constraint, comparing performance against variants without it to confirm its role in preserving subtle boundary cues over other potential factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is analysis-driven and empirically measured

full rationale

The provided abstract and description present a diagnostic analysis of a token-local bottleneck in W4A4 quantization for COD, followed by two proposed operations (DSTG and DCRP) whose effects are asserted to bound zero-bin mass and step-to-dispersion ratio. No equations, derivations, or self-citations are exhibited that reduce the claimed >0.12 Sα gain to a fitted parameter, renamed input, or self-referential quantity by construction. The performance improvement is stated as an empirical result across benchmarks without retraining, and the central claim remains independent of any tautological reduction. This is the common honest case of a method paper whose load-bearing step is experimental verification rather than algebraic identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on the domain assumption that activation outliers are primarily background-driven and that per-group scaling plus bounded zero-bin mass will preserve boundary cues; no explicit free parameters or new invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5560 in / 1288 out tokens · 51963 ms · 2026-05-10T06:44:59.953683+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 11 canonical work pages · 3 internal anchors

  1. [1]

    arXiv preprint arXiv:2509.25149 , year=

    Abecassis, F., Agrusa, A., Ahn, D., Alben, J., Alborghetti, S., Andersch, M., Arayandi, S., Bjorlin, A., Blakeman, A., Briones, E., et al.: Pretraining large lan- guage models with nvfp4. arXiv preprint arXiv:2509.25149 (2025)

  2. [2]

    In: Neural Information Processing Systems (2018)

    Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolu- tional networks for rapid-deployment. In: Neural Information Processing Systems (2018)

  3. [3]

    In: ICCV

    Chen, X., Ren, G., Dai, T., Stathaki, T., Liu, H.: Enhancing prompt generation with adaptive refinement for camouflaged object detection. In: ICCV. pp. 20672– 20682 (2025)

  4. [4]

    What’s in the image? a deep-dive into the vision of vision language models

    Das, B., Gopalakrishnan, V.: Camouflage anything: Learning to hide using con- trolled out-painting and representation engineering. In: 2025 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 3603–3613 (2025). https://doi.org/10.1109/CVPR52734.2025.00341

  5. [5]

    An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

    Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

  6. [6]

    2025 IEEE/CVF ConferenceonComputerVisionandPatternRecognition(CVPR)pp.19271–19282 (2025),https://api.semanticscholar.org/CorpusID:280062379

    Du, J., Hao, F., Yu, M., Kong, D., Wu, J., Wang, B., Xu, J., Li, P.: Shift the lens: Environment-aware unsupervised camouflaged object detection. 2025 IEEE/CVF ConferenceonComputerVisionandPatternRecognition(CVPR)pp.19271–19282 (2025),https://api.semanticscholar.org/CorpusID:280062379

  7. [7]

    In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Fan, D.P., Ji, G.P., Sun, G., Cheng, M.M., Shen, J., Shao, L.: Camouflaged ob- ject detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2774–2784 (2020)

  8. [8]

    GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

    Frantar, E., Ashkboos, S., Hoefler, T., Alistarh, D.: Gptq: Accurate post- training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323 (2022)

  9. [9]

    IEEE Transactions on Pattern Analysis and Machine Intel- ligence47(7), 5556–5570 (2025) 16 Li et al

    Gong, R., Liu, X., Li, Y., Fan, Y., Wei, X., Guo, J.: Pushing the limit of post- training quantization. IEEE Transactions on Pattern Analysis and Machine Intel- ligence47(7), 5556–5570 (2025) 16 Li et al

  10. [10]

    IEEE Transactions on Image Processing34, 608–622 (2025)

    Hao, C., Yu, Z., Liu, X., Xu, J., Yue, H., Yang, J.: A simple yet effective network based on vision transformer for camouflaged object and salient object detection. IEEE Transactions on Image Processing34, 608–622 (2025)

  11. [11]

    Journal of Systems Architecture168, 103530 (2025)

    He, X., Lu, Y., Liu, H., Gong, C., He, W.: Orq-vit: Outlier resilient post training quantization for vision transformers via outlier decomposition. Journal of Systems Architecture168, 103530 (2025)

  12. [12]

    Neural Networks186, 107289 (2025)

    Jiang, Y., Sun, N., Xie, X., Yang, F., Li, T.: Adfq-vit: Activation-distribution- friendly post-training quantization for vision transformers. Neural Networks186, 107289 (2025)

  13. [13]

    Pattern Recognition171, 112269 (2026)

    Kim, D., Moon, J., Lee, J., Lee, G., Jeon, J., Ham, B.: Token-based dynamic bit- width assignment for vit quantization. Pattern Recognition171, 112269 (2026)

  14. [14]

    Kim, D., Lee, D., Chang, I.J., Bae, S.H.: Post-training quantization via residual truncation and zero suppression for diffusion models (2025)

  15. [15]

    Computer Vision and Image Understanding 184, 45–56 (2019)

    Le, T.N., Nguyen, T.V., Nie, Z., Tran, M.T., Sugimoto, A.: Anabranch network for camouflaged object segmentation. Computer Vision and Image Understanding 184, 45–56 (2019)

  16. [16]

    IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11990–12004 (2025).https://doi.org/10.1109/TPAMI.2025.3600461

    Lei, C., Fan, J., Li, X., Xiang, T.Z., Li, A., Zhu, C., Zhang, L.: Towards real zero- shot camouflaged object segmentation without camouflaged annotations. IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11990–12004 (2025).https://doi.org/10.1109/TPAMI.2025.3600461

  17. [17]

    IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing18, 26489–26504 (2025)

    Li, T., Guo, T., Xiang, D.: Lersgan: A gan-based model for low-light remote sensing image enhancement. IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing18, 26489–26504 (2025)

  18. [18]

    In: ICCV

    Li, Z., Xiao, J., Yang, L., Gu, Q.: Repq-vit: Scale reparameterization for post- training quantization of vision transformers. In: ICCV. pp. 17181–17190 (2023)

  19. [20]

    Improving sam for camouflaged object detection via dual stream adapters,

    Liu, J., Kong, L., Chen, G.: Improving sam for camouflaged object detec- tion via dual stream adapters. ArXivabs/2503.06042(2025),https://api. semanticscholar.org/CorpusID:276903749

  20. [21]

    In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

    Liu, X., Ding, X., Yu, L., Xi, Y., Li, W., Tu, Z., Hu, J., Chen, H., Yin, B., Xiong, Z.: Pq-sam: Post-training quantization for segment anything model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 420–437. Springer Nature Switzerland, Cham (2025)

  21. [22]

    In: CVPR

    Liu, Y., Yang, H., Dong, Z., Keutzer, K., Du, L., Zhang, S.: Noisyquant: Noisy bias- enhanced post-training activation quantization for vision transformers. In: CVPR. pp. 20321–20330 (2022)

  22. [23]

    In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Lv, C., Chen, H., Guo, J., Guo, J., Guo, J., Ding, Y., Liu, X., Liu, X., Liu, X.: Ptq4sam: Post-training quantization for segment anything. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15941– 15951 (2024)

  23. [24]

    In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Lv, Y., Zhang, J., Dai, Y., Li, A., Liu, B., Barnes, N., Fan, D.P.: Simultaneously lo- calize, segment and rank the camouflaged objects. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 11586–11596 (2021)

  24. [25]

    Meng, H., Luo, Y., Zhao, Y., Liu, W., Zhang, P., Ma, X.: Arcquant: Boosting nvfp4 quantization with augmented residual channels for llms (2026)

  25. [26]

    In: CVPR

    Moon, J., Kim, D., Cheon, J., Ham, B.: Instance-aware group quantization for vision transformers. In: CVPR. pp. 16132–16141 (2024) COD-TDQ 17

  26. [27]

    In: ICML

    Nagel, M., Amjad, R.A., Van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. In: ICML. JMLR.org (2020)

  27. [28]

    In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

    Pang, Y., Zhao, X., Zuo, J., Zhang, L., Lu, H.: Open-vocabulary camouflaged ob- ject segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 476–495. Springer Nature Switzerland, Cham (2025)

  28. [29]

    University of Michigan Press (1959)

    Portmann, A.: Animal camouflage. University of Michigan Press (1959)

  29. [30]

    In: 2025 25th International Conference on Digital Signal Processing (DSP)

    Ranjan, N., Savakis, A.: Lrp-qvit: Mixed-precision vision transformer quantization using layer importance score. In: 2025 25th International Conference on Digital Signal Processing (DSP). pp. 1–5 (2025)

  30. [31]

    In: International Conference on Computer Vision 2025 (ICCV 2025) (Jun 2025)

    Ren, G., Liu, H., Lazarou, M., Stathaki, T.: Multi-modal segment anything model for camouflaged scene segmentation. In: International Conference on Computer Vision 2025 (ICCV 2025) (Jun 2025)

  31. [32]

    In: ICCV

    Ren, G., Liu, H., Lazarou, M., Stathaki, T.: Multi-modal segment anything model for camouflaged scene segmentation. In: ICCV. pp. 19882–19892 (2025)

  32. [33]

    Knowledge-Based Systems311, 113056 (2025)

    Ren, P., Bai, T., Sun, F.: Esnet: An efficient skeleton-guided network for camou- flaged object detection. Knowledge-Based Systems311, 113056 (2025)

  33. [34]

    IEEE Transactions on Image Processing 34, 5672–5685 (2025)

    Song, Z., Kang, X., Wei, X., Liu, J., Lin, Z., Li, S.: Continuous feature represen- tation for camouflaged object detection. IEEE Transactions on Image Processing 34, 5672–5685 (2025)

  34. [35]

    In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

    Sun, G., An, Z., Liu, Y., Liu, C., Sakaridis, C., Fan, D.P., Gool, L.V.: Indiscernible object counting in underwater scenes. In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 13791–13801 (2023)

  35. [36]

    IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2833–2848 (2025)

    Sun, K., Chen, Z., Lin, X., Sun, X., Liu, H., Ji, R.: Conditional diffusion models for camouflaged and salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2833–2848 (2025)

  36. [37]

    In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision

    Sun, Y., Lian, J., Yang, J., Luo, L.: Controllable-lpmoe: Adapting to challenging object segmentation via dynamic local priors from mixture-of-experts. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision. pp. 22327– 22337 (2025)

  37. [38]

    Advances in neural information pro- cessing systems30(2017)

    Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017)

  38. [39]

    In: ICLR

    Wei, X., Gong, R., Li, Y., Liu, X., Yu, F.: Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization. In: ICLR. vol. abs/2203.05740 (2022)

  39. [40]

    Engineering Applications of Artificial Intelligence153, 110771 (2025)

    Wu, D., Wang, M., Sun, J., Jia, X.: Knowledge-guided and collaborative learning network for camouflaged object detection. Engineering Applications of Artificial Intelligence153, 110771 (2025)

  40. [41]

    Wu, Z., Wang, S., Zhang, J., Chen, J., Wang, Y.: Fima-q: Post-training quanti- zation for vision transformers by fisher information matrix approximation (2025), arXiv:2506.11543

  41. [42]

    In: ICML (2023)

    Xiao,G.,Lin,J.,Seznec,M.,Wu,H.,Demouth,J.,Han,S.:Smoothquant:Accurate and efficient post-training quantization for large language models. In: ICML (2023)

  42. [43]

    IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)

    Xu, L., Xie, H., Qin, S.J., Tao, X., Wang, F.L.: Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)

  43. [44]

    What’s in the image? a deep-dive into the vision of vision language models

    Yan, F., Jiang, X., Lu, Y., Cao, J., Chen, D., Xu, M.: Wavelet and prototype aug- mented query-based transformer for pixel-level surface defect detection. In: 2025 18 Li et al. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 23860–23869 (2025).https://doi.org/10.1109/CVPR52734.2025.02222

  44. [45]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Ye, S., Chen, X., Zhang, Y., Lin, X., Cao, L.: Escnet: Edge-semantic collabora- tive network for camouflaged object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 20053–20063 (2025)

  45. [46]

    IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 10362–10374 (2024)

    Yin, B., Zhang, X., Fan, D.P., Jiao, S., Cheng, M.M., Van Gool, L., Hou, Q.: Camoformer: Masked separable attention for camouflaged object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 10362–10374 (2024)

  46. [47]

    In: ECCV

    Yuan, Z., Xue, C., Chen, Y., Wu, Q., Sun, G.: Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization. In: ECCV. pp. 191–207 (2022)

  47. [48]

    AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization

    Zhang, W., Ando, S., Yoshioka, K.: Ahcptq: Accurate and hardware-compatible post-training quantization for segment anything model. In: ICCV. vol. abs/2503.03088 (2025)

  48. [49]

    Visual Intelligence3(1), 10 (2025)

    Zhou, Y., Sun, G., Li, Y., Xie, G.S., Benini, L., Konukoglu, E.: When SAM2 meets video camouflaged object segmentation: A comprehensive evaluation and adaptation. Visual Intelligence3(1), 10 (2025)

  49. [50]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision

    Zhou, Z., Li, Y., Zhong, C., Huang, J., Pei, J., Li, H., Tang, H.: Rethinking de- tecting salient and camouflaged objects in unconstrained scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 22372–22382 (2025) When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization Supplementa...

  50. [51]

    Padxon the last dimension toC pad such thatC pad modg= 0

  51. [52]

    3.DSTG:computec base ← ∥x g∥∞ orQ p(|xg|)along the last dimension

    Reshapex g ←view(x,· · ·, C pad/g, g). 3.DSTG:computec base ← ∥x g∥∞ orQ p(|xg|)along the last dimension. 4.DCRP:if bothτandzrare provided, then (i)σ←Std(x g)withunbiased=False, andc (τ) ←q maxτ σ. (ii)thr←Q zr(|xg|)via akthvaluequantile, andc (zr) ←2q maxthr. (iii)c←min(c base, c(τ) , c(zr)). Otherwise,c←c base

  52. [53]

    Clip˜x←clip(x g,−c, c)

  53. [54]

    Step∆←max(c/q max,10 −8)

  54. [55]

    Quantizeq←clip(round(˜x/∆), q min, qmax)

  55. [56]

    Reshapeˆxg back and unpad to obtainˆx

  56. [57]

    S1.3 Default Hyperparameters The main paper uses a single shared setting across all datasets and both back- bones, namely DSTG group sizeg= 32(Eq

    Computey= Linear(ˆx, ˆW), where ˆWis dequantized from staticw-bit weights and the operator runs in floating-pointcompute_dtype. S1.3 Default Hyperparameters The main paper uses a single shared setting across all datasets and both back- bones, namely DSTG group sizeg= 32(Eq. (5)), DCRP resolution bound τ= 1.0(Eq. (10)), and zero-bin mass boundzr = 0.2(Eq. ...