arxiv: 2604.16855 · v1 · submitted 2026-04-18 · 💻 cs.CV

Recognition: unknown

When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization

Tianqi Li , Wenyu Fang , Xin He , Xue Geng , Xu Cheng , Yun Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-10 06:44 UTC · model grok-4.3

classification 💻 cs.CV

keywords camouflaged object detectionpost-training quantizationW4A4 quantizationtransformer modelsactivation rangetoken groupingboundary cues

0 comments

The pith

Token-group quantization recovers camouflaged object detection performance under aggressive 4-bit constraints.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Camouflaged object detection needs to identify objects that deliberately match their background through faint texture and boundary signals. Standard post-training W4A4 quantization of transformer models collapses on this task because heavy-tailed background tokens stretch the shared activation range and force those weak signals into the zero bin. The paper isolates this token-local bottleneck and shows it can be removed by grouping tokens for separate scaling and then projecting each group's clip range under two constraints. The resulting method raises Sα scores by more than 0.12 on four benchmarks and two models while using no retraining or fine-tuning. A reader would care because it makes accurate low-memory inference possible for real-world COD applications that must run on edge devices.

Core claim

In post-training W4A4 quantization of Transformer-based camouflaged object detection, heavy-tailed background tokens dominate the shared activation range, inflating the quantization step size and pushing weak boundary cues into the zero bin. COD-TDQ addresses this by Direct-Sum Token-Group (DSTG) which assigns scales per token group to remove cross-token domination, and Dual-Constraint Range Projection (DCRP) which projects clip ranges to bound the step-to-dispersion ratio and zero-bin mass, resulting in consistent Sα improvements exceeding 0.12 across four benchmarks and two models without retraining.

What carries the argument

COD-TDQ, a token-group dual-constraint activation quantization that pairs Direct-Sum Token-Group (DSTG) scale assignment with Dual-Constraint Range Projection (DCRP) to suppress cross-token range domination.

If this is right

COD-TDQ preserves subtle texture and boundary cues that standard W4A4 quantization erases.
Performance gains hold across CFRN and ESCNet baselines on four standard COD benchmarks.
The approach requires no retraining or task-specific fine-tuning after quantization.
Both the step-to-dispersion ratio and zero-bin mass remain bounded under 4-bit activations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same token-group scaling could help quantization in other vision tasks where background statistics overwhelm foreground signals.
Measuring zero-bin occupancy before and after the two steps on new transformer models would test how generally the bottleneck applies.
If token grouping proves central, similar local constraints might improve low-bit inference for other dense prediction problems.

Load-bearing premise

That the heavy-tailed background tokens creating a shared activation bottleneck is the main reason W4A4 fails for COD, and that token-group scaling plus range projection can fix it without retraining.

What would settle it

If applying the method to the same models and benchmarks produces Sα scores no better than prior quantization techniques, or if zero-bin mass stays high after DSTG and DCRP, the claimed solution to the token-local bottleneck would be refuted.

Figures

Figures reproduced from arXiv: 2604.16855 by Tianqi Li, Wenyu Fang, Xin He, Xu Cheng, Xue Geng, Yun Liu.

**Figure 1.** Figure 1: COD-specific W4A4 failure. Naive W4A4 inflates a shared clipping range, producing a coarse step size and high zero-bin mass that erases weak boundary evidence. The inset summarizes representative diagnostics (c_g,\Delta ,\rho _0) and the associated S_\alpha collapse/recovery on CFRN/NC4K (rounded to three decimals). two Transformer COD models (CFRN and ESCNet [45]), we perform comprehensive and extensive… view at source ↗

**Figure 2.** Figure 2: Reduces cross-token scale interference. (a–c) Token-wise range disparity under FP32, naive W4A4, and DSTG: token-group scaling mitigates backgrounddominated range inflation. (d–e) Boundary-region activation magnitudes before/after quantization: naive W4A4 collapses many small responses to zero, while COD-TDQ preserves them, reducing the zeroed-activation fraction from 41.6% to 14.2%. Step-to-dispersion r… view at source ↗

**Figure 3.** Figure 3: DCRP prevents zero-bin mass collapse. DCRP projects each token-group clip radius to satisfy a step-to-dispersion bound and a zero-bin mass bound. The fraction of non-boundary token-groups exceeding the step-to-std threshold drops from 72.60% (pre-projection) to 0.00% after C1, and the fraction with pre-projection \rho _0>\mathrm {zr} drops from 98.36% (naive W4A4) to 20.87% under COD-TDQ statistics. which… view at source ↗

**Figure 4.** Figure 4: COD-TDQ (DSTG and DCRP) pipeline overview [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison. The first column shows the input image and the GT mask. The remaining columns present the prediction masks produced by different quantization methods(RepQ-ViT, IGQ–ViT, PTQ4SAM). For each example, the two rows correspond to results obtained with the CFRN and ESCNet baselines, respectively [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

read the original abstract

Camouflaged object detection (COD) segments objects that intentionally blend with the background, so predictions depend on subtle texture and boundary cues. COD is often needed under tight on-device memory and latency budgets, making low-bit inference highly desirable. However, COD is unusually hard to quantify aggressively. We study post-training W4A4 quantization of Transformer-based COD and find a task-specific cliff: heavy-tailed background tokens dominate a shared activation range, inflating the step size and pushing weak-but-structured boundary cues into the zero bin. This exposes a token-local bottleneck -- remove cross-token range domination and bound the zero-bin mass under 4-bit activations. To address this, we introduce COD-TDQ, a COD-aware Token-group Dual-constraint activation Quantization method. COD-TDQ addresses this token-local bottleneck with two coupled steps: Direct-Sum Token-Group (DSTG) assigns token-group scales to suppress cross-token range domination, and Dual-Constraint Range Projection (DCRP) projects each token-group clip range to keep the step-to-dispersion ratio and the zero-bin mass bounded. Across four COD benchmarks and two baseline models (CFRN and ESCNet), COD-TDQ consistently achieves an S{\alpha}score more than 0.12 higher than that of the state-of-the-art quantization method without retraining. The code will be released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper diagnoses a token-range problem specific to W4A4 COD quantization and reports a >0.12 Sα lift from DSTG plus DCRP without retraining, but the causal evidence is still thin.

read the letter

The core contribution is the observation that background tokens with heavy tails dominate the shared activation range in Transformer COD models, pushing weak boundary cues into the zero bin under 4-bit activation. COD-TDQ counters this with Direct-Sum Token-Group scaling to break cross-token domination and Dual-Constraint Range Projection to keep zero-bin mass and step-to-dispersion bounded per group. The abstract states this delivers more than 0.12 higher Sα than prior quantization methods on four benchmarks and two models, all post-training only. That targeted diagnosis and the no-retraining constraint are the parts that feel new rather than routine extensions of general quantization tricks. The empirical headline is also useful for anyone who needs low-bit inference on edge devices for surveillance or medical imaging where COD appears. The soft spots sit in the missing links. The description asserts that DSTG and DCRP achieve the claimed bounds on real tensors and that those bounds drive the gain, yet the abstract supplies no ablations isolating each step, no activation histograms or zero-bin mass measurements before and after, and no error bars. Until those checks appear, it remains possible that other implementation choices or dataset quirks explain the lift. The stress-test concern about unverified causality therefore holds on the current evidence. This paper is for researchers working on quantization-aware deployment of fine-grained vision models or on COD itself. A reader in either group would find the problem framing and the reported numbers worth examining. It deserves peer review because the task-specific cliff is real, the performance delta is large enough to matter for practice, and the proposed fix is concrete even if the current draft needs tighter validation of the mechanism.

Referee Report

2 major / 1 minor

Summary. The paper identifies a task-specific failure mode in post-training W4A4 quantization for camouflaged object detection (COD): heavy-tailed background tokens dominate the shared activation range, inflating step size and pushing subtle boundary cues into the zero bin. It proposes COD-TDQ, which uses Direct-Sum Token-Group (DSTG) scaling to suppress cross-token range domination and Dual-Constraint Range Projection (DCRP) to bound zero-bin mass and step-to-dispersion ratio. On four COD benchmarks with CFRN and ESCNet, COD-TDQ reports Sα gains exceeding 0.12 over prior quantization methods without retraining or fine-tuning.

Significance. If the gains prove robust and causally attributable to the DSTG+DCRP bounds rather than unisolated implementation details, the result would be significant for enabling low-bit on-device COD inference. The work usefully diagnoses why standard quantization cliffs are more severe for tasks relying on weak, structured cues and offers a post-training fix that avoids retraining costs.

major comments (2)

[Abstract] Abstract: the central claim that DSTG and DCRP 'keep the step-to-dispersion ratio and the zero-bin mass bounded' is asserted without any equations, pseudocode, or quantitative verification on real activation tensors showing that the bounds are actually achieved or that they correlate with the observed Sα lift; this causal link is load-bearing for both the bottleneck diagnosis and the 'without retraining' guarantee.
[Experiments] The manuscript provides no ablation isolating DSTG from DCRP, no error-bar or statistical-significance analysis on the reported >0.12 Sα margin, and no verification that the zero-bin mass bound (rather than other factors) drives the improvement; these omissions prevent confirmation that the proposed constraints are the operative mechanism.

minor comments (1)

The abstract states that code will be released; this is a positive step for reproducibility that should be retained.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. These have helped us identify areas where the manuscript can be strengthened in terms of clarity and empirical support. We address each major comment point by point below, indicating the revisions we will incorporate in the next version of the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that DSTG and DCRP 'keep the step-to-dispersion ratio and the zero-bin mass bounded' is asserted without any equations, pseudocode, or quantitative verification on real activation tensors showing that the bounds are actually achieved or that they correlate with the observed Sα lift; this causal link is load-bearing for both the bottleneck diagnosis and the 'without retraining' guarantee.

Authors: We agree that the abstract is a high-level summary and does not contain equations or direct verification. The full manuscript presents the mathematical definitions of DSTG and DCRP, including the step-to-dispersion ratio and zero-bin mass, in Section 3 with accompanying pseudocode. To address the concern, we have revised the abstract to include a concise reference to these bounding mechanisms. We have also added a new quantitative analysis subsection in the experiments, with figures showing the step-to-dispersion ratio and zero-bin mass computed on real activation tensors from the four COD benchmarks. These demonstrate that the bounds are achieved post-quantization and correlate with the reported Sα gains, supporting the post-training nature of the method. revision: yes
Referee: [Experiments] The manuscript provides no ablation isolating DSTG from DCRP, no error-bar or statistical-significance analysis on the reported >0.12 Sα margin, and no verification that the zero-bin mass bound (rather than other factors) drives the improvement; these omissions prevent confirmation that the proposed constraints are the operative mechanism.

Authors: We concur that isolating the individual contributions and providing statistical support would strengthen the causal attribution. In the revised manuscript, we have added a dedicated ablation study evaluating DSTG alone, DCRP alone, and their combination on all four benchmarks and both baseline models (CFRN and ESCNet). This shows that the full gains require both components. We now report Sα scores with standard deviations computed over five independent runs to provide error bars and context for the >0.12 margin. We have further included a targeted verification experiment that isolates the zero-bin mass constraint, comparing performance against variants without it to confirm its role in preserving subtle boundary cues over other potential factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is analysis-driven and empirically measured

full rationale

The provided abstract and description present a diagnostic analysis of a token-local bottleneck in W4A4 quantization for COD, followed by two proposed operations (DSTG and DCRP) whose effects are asserted to bound zero-bin mass and step-to-dispersion ratio. No equations, derivations, or self-citations are exhibited that reduce the claimed >0.12 Sα gain to a fitted parameter, renamed input, or self-referential quantity by construction. The performance improvement is stated as an empirical result across benchmarks without retraining, and the central claim remains independent of any tautological reduction. This is the common honest case of a method paper whose load-bearing step is experimental verification rather than algebraic identity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The approach rests on the domain assumption that activation outliers are primarily background-driven and that per-group scaling plus bounded zero-bin mass will preserve boundary cues; no explicit free parameters or new invented entities are named in the abstract.

pith-pipeline@v0.9.0 · 5560 in / 1288 out tokens · 51963 ms · 2026-05-10T06:44:59.953683+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 11 canonical work pages · 3 internal anchors

[1]

arXiv preprint arXiv:2509.25149 , year=

Abecassis, F., Agrusa, A., Ahn, D., Alben, J., Alborghetti, S., Andersch, M., Arayandi, S., Bjorlin, A., Blakeman, A., Briones, E., et al.: Pretraining large lan- guage models with nvfp4. arXiv preprint arXiv:2509.25149 (2025)

work page arXiv 2025
[2]

In: Neural Information Processing Systems (2018)

Banner, R., Nahshan, Y., Soudry, D.: Post training 4-bit quantization of convolu- tional networks for rapid-deployment. In: Neural Information Processing Systems (2018)

2018
[3]

In: ICCV

Chen, X., Ren, G., Dai, T., Stathaki, T., Liu, H.: Enhancing prompt generation with adaptive refinement for camouflaged object detection. In: ICCV. pp. 20672– 20682 (2025)

2025
[4]

What’s in the image? a deep-dive into the vision of vision language models

Das, B., Gopalakrishnan, V.: Camouflage anything: Learning to hide using con- trolled out-painting and representation engineering. In: 2025 IEEE/CVF Confer- ence on Computer Vision and Pattern Recognition (CVPR). pp. 3603–3613 (2025). https://doi.org/10.1109/CVPR52734.2025.00341

work page doi:10.1109/cvpr52734.2025.00341 2025
[5]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S., et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)

work page internal anchor Pith review Pith/arXiv arXiv 2010
[6]

2025 IEEE/CVF ConferenceonComputerVisionandPatternRecognition(CVPR)pp.19271–19282 (2025),https://api.semanticscholar.org/CorpusID:280062379

Du, J., Hao, F., Yu, M., Kong, D., Wu, J., Wang, B., Xu, J., Li, P.: Shift the lens: Environment-aware unsupervised camouflaged object detection. 2025 IEEE/CVF ConferenceonComputerVisionandPatternRecognition(CVPR)pp.19271–19282 (2025),https://api.semanticscholar.org/CorpusID:280062379

2025
[7]

In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Fan, D.P., Ji, G.P., Sun, G., Cheng, M.M., Shen, J., Shao, L.: Camouflaged ob- ject detection. In: 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2774–2784 (2020)

2020
[8]

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Frantar, E., Ashkboos, S., Hoefler, T., Alistarh, D.: Gptq: Accurate post- training quantization for generative pre-trained transformers. arXiv preprint arXiv:2210.17323 (2022)

work page internal anchor Pith review arXiv 2022
[9]

IEEE Transactions on Pattern Analysis and Machine Intel- ligence47(7), 5556–5570 (2025) 16 Li et al

Gong, R., Liu, X., Li, Y., Fan, Y., Wei, X., Guo, J.: Pushing the limit of post- training quantization. IEEE Transactions on Pattern Analysis and Machine Intel- ligence47(7), 5556–5570 (2025) 16 Li et al

2025
[10]

IEEE Transactions on Image Processing34, 608–622 (2025)

Hao, C., Yu, Z., Liu, X., Xu, J., Yue, H., Yang, J.: A simple yet effective network based on vision transformer for camouflaged object and salient object detection. IEEE Transactions on Image Processing34, 608–622 (2025)

2025
[11]

Journal of Systems Architecture168, 103530 (2025)

He, X., Lu, Y., Liu, H., Gong, C., He, W.: Orq-vit: Outlier resilient post training quantization for vision transformers via outlier decomposition. Journal of Systems Architecture168, 103530 (2025)

2025
[12]

Neural Networks186, 107289 (2025)

Jiang, Y., Sun, N., Xie, X., Yang, F., Li, T.: Adfq-vit: Activation-distribution- friendly post-training quantization for vision transformers. Neural Networks186, 107289 (2025)

2025
[13]

Pattern Recognition171, 112269 (2026)

Kim, D., Moon, J., Lee, J., Lee, G., Jeon, J., Ham, B.: Token-based dynamic bit- width assignment for vit quantization. Pattern Recognition171, 112269 (2026)

2026
[14]

Kim, D., Lee, D., Chang, I.J., Bae, S.H.: Post-training quantization via residual truncation and zero suppression for diffusion models (2025)

2025
[15]

Computer Vision and Image Understanding 184, 45–56 (2019)

Le, T.N., Nguyen, T.V., Nie, Z., Tran, M.T., Sugimoto, A.: Anabranch network for camouflaged object segmentation. Computer Vision and Image Understanding 184, 45–56 (2019)

2019
[16]

IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11990–12004 (2025).https://doi.org/10.1109/TPAMI.2025.3600461

Lei, C., Fan, J., Li, X., Xiang, T.Z., Li, A., Zhu, C., Zhang, L.: Towards real zero- shot camouflaged object segmentation without camouflaged annotations. IEEE Transactions on Pattern Analysis and Machine Intelligence47(12), 11990–12004 (2025).https://doi.org/10.1109/TPAMI.2025.3600461

work page doi:10.1109/tpami.2025.3600461 2025
[17]

IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing18, 26489–26504 (2025)

Li, T., Guo, T., Xiang, D.: Lersgan: A gan-based model for low-light remote sensing image enhancement. IEEE Journal of Selected Topics in Applied Earth Observa- tions and Remote Sensing18, 26489–26504 (2025)

2025
[18]

In: ICCV

Li, Z., Xiao, J., Yang, L., Gu, Q.: Repq-vit: Scale reparameterization for post- training quantization of vision transformers. In: ICCV. pp. 17181–17190 (2023)

2023
[20]

Improving sam for camouflaged object detection via dual stream adapters,

Liu, J., Kong, L., Chen, G.: Improving sam for camouflaged object detec- tion via dual stream adapters. ArXivabs/2503.06042(2025),https://api. semanticscholar.org/CorpusID:276903749

work page arXiv 2025
[21]

In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

Liu, X., Ding, X., Yu, L., Xi, Y., Li, W., Tu, Z., Hu, J., Chen, H., Yin, B., Xiong, Z.: Pq-sam: Post-training quantization for segment anything model. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 420–437. Springer Nature Switzerland, Cham (2025)

2024
[22]

In: CVPR

Liu, Y., Yang, H., Dong, Z., Keutzer, K., Du, L., Zhang, S.: Noisyquant: Noisy bias- enhanced post-training activation quantization for vision transformers. In: CVPR. pp. 20321–20330 (2022)

2022
[23]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Lv, C., Chen, H., Guo, J., Guo, J., Guo, J., Ding, Y., Liu, X., Liu, X., Liu, X.: Ptq4sam: Post-training quantization for segment anything. In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 15941– 15951 (2024)

2024
[24]

In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Lv, Y., Zhang, J., Dai, Y., Li, A., Liu, B., Barnes, N., Fan, D.P.: Simultaneously lo- calize, segment and rank the camouflaged objects. In: 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 11586–11596 (2021)

2021
[25]

Meng, H., Luo, Y., Zhao, Y., Liu, W., Zhang, P., Ma, X.: Arcquant: Boosting nvfp4 quantization with augmented residual channels for llms (2026)

2026
[26]

In: CVPR

Moon, J., Kim, D., Cheon, J., Ham, B.: Instance-aware group quantization for vision transformers. In: CVPR. pp. 16132–16141 (2024) COD-TDQ 17

2024
[27]

In: ICML

Nagel, M., Amjad, R.A., Van Baalen, M., Louizos, C., Blankevoort, T.: Up or down? adaptive rounding for post-training quantization. In: ICML. JMLR.org (2020)

2020
[28]

In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

Pang, Y., Zhao, X., Zuo, J., Zhang, L., Lu, H.: Open-vocabulary camouflaged ob- ject segmentation. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision – ECCV 2024. pp. 476–495. Springer Nature Switzerland, Cham (2025)

2024
[29]

University of Michigan Press (1959)

Portmann, A.: Animal camouflage. University of Michigan Press (1959)

1959
[30]

In: 2025 25th International Conference on Digital Signal Processing (DSP)

Ranjan, N., Savakis, A.: Lrp-qvit: Mixed-precision vision transformer quantization using layer importance score. In: 2025 25th International Conference on Digital Signal Processing (DSP). pp. 1–5 (2025)

2025
[31]

In: International Conference on Computer Vision 2025 (ICCV 2025) (Jun 2025)

Ren, G., Liu, H., Lazarou, M., Stathaki, T.: Multi-modal segment anything model for camouflaged scene segmentation. In: International Conference on Computer Vision 2025 (ICCV 2025) (Jun 2025)

2025
[32]

In: ICCV

Ren, G., Liu, H., Lazarou, M., Stathaki, T.: Multi-modal segment anything model for camouflaged scene segmentation. In: ICCV. pp. 19882–19892 (2025)

2025
[33]

Knowledge-Based Systems311, 113056 (2025)

Ren, P., Bai, T., Sun, F.: Esnet: An efficient skeleton-guided network for camou- flaged object detection. Knowledge-Based Systems311, 113056 (2025)

2025
[34]

IEEE Transactions on Image Processing 34, 5672–5685 (2025)

Song, Z., Kang, X., Wei, X., Liu, J., Lin, Z., Li, S.: Continuous feature represen- tation for camouflaged object detection. IEEE Transactions on Image Processing 34, 5672–5685 (2025)

2025
[35]

In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR)

Sun, G., An, Z., Liu, Y., Liu, C., Sakaridis, C., Fan, D.P., Gool, L.V.: Indiscernible object counting in underwater scenes. In: 2023 IEEE/CVF Conference on Com- puter Vision and Pattern Recognition (CVPR). pp. 13791–13801 (2023)

2023
[36]

IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2833–2848 (2025)

Sun, K., Chen, Z., Lin, X., Sun, X., Liu, H., Ji, R.: Conditional diffusion models for camouflaged and salient object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence47(4), 2833–2848 (2025)

2025
[37]

In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision

Sun, Y., Lian, J., Yang, J., Luo, L.: Controllable-lpmoe: Adapting to challenging object segmentation via dynamic local priors from mixture-of-experts. In: Proceed- ings of the IEEE/CVF International Conference on Computer Vision. pp. 22327– 22337 (2025)

2025
[38]

Advances in neural information pro- cessing systems30(2017)

Vaswani,A.,Shazeer,N.,Parmar,N.,Uszkoreit,J.,Jones,L.,Gomez,A.N.,Kaiser, Ł., Polosukhin, I.: Attention is all you need. Advances in neural information pro- cessing systems30(2017)

2017
[39]

In: ICLR

Wei, X., Gong, R., Li, Y., Liu, X., Yu, F.: Qdrop: Randomly dropping quantization for extremely low-bit post-training quantization. In: ICLR. vol. abs/2203.05740 (2022)

work page arXiv 2022
[40]

Engineering Applications of Artificial Intelligence153, 110771 (2025)

Wu, D., Wang, M., Sun, J., Jia, X.: Knowledge-guided and collaborative learning network for camouflaged object detection. Engineering Applications of Artificial Intelligence153, 110771 (2025)

2025
[41]

Wu, Z., Wang, S., Zhang, J., Chen, J., Wang, Y.: Fima-q: Post-training quanti- zation for vision transformers by fisher information matrix approximation (2025), arXiv:2506.11543

work page arXiv 2025
[42]

In: ICML (2023)

Xiao,G.,Lin,J.,Seznec,M.,Wu,H.,Demouth,J.,Han,S.:Smoothquant:Accurate and efficient post-training quantization for large language models. In: ICML (2023)

2023
[43]

IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)

Xu, L., Xie, H., Qin, S.J., Tao, X., Wang, F.L.: Parameter-efficient fine-tuning methods for pretrained language models: A critical review and assessment. IEEE Transactions on Pattern Analysis and Machine Intelligence (2026)

2026
[44]

What’s in the image? a deep-dive into the vision of vision language models

Yan, F., Jiang, X., Lu, Y., Cao, J., Chen, D., Xu, M.: Wavelet and prototype aug- mented query-based transformer for pixel-level surface defect detection. In: 2025 18 Li et al. IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 23860–23869 (2025).https://doi.org/10.1109/CVPR52734.2025.02222

work page doi:10.1109/cvpr52734.2025.02222 2025
[45]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Ye, S., Chen, X., Zhang, Y., Lin, X., Cao, L.: Escnet: Edge-semantic collabora- tive network for camouflaged object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 20053–20063 (2025)

2025
[46]

IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 10362–10374 (2024)

Yin, B., Zhang, X., Fan, D.P., Jiao, S., Cheng, M.M., Van Gool, L., Hou, Q.: Camoformer: Masked separable attention for camouflaged object detection. IEEE Transactions on Pattern Analysis and Machine Intelligence46(12), 10362–10374 (2024)

2024
[47]

In: ECCV

Yuan, Z., Xue, C., Chen, Y., Wu, Q., Sun, G.: Ptq4vit: Post-training quantization for vision transformers with twin uniform quantization. In: ECCV. pp. 191–207 (2022)

2022
[48]

AHCQ-SAM: Toward Accurate and Hardware-Compatible Post-Training Segment Anything Model Quantization

Zhang, W., Ando, S., Yoshioka, K.: Ahcptq: Accurate and hardware-compatible post-training quantization for segment anything model. In: ICCV. vol. abs/2503.03088 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[49]

Visual Intelligence3(1), 10 (2025)

Zhou, Y., Sun, G., Li, Y., Xie, G.S., Benini, L., Konukoglu, E.: When SAM2 meets video camouflaged object segmentation: A comprehensive evaluation and adaptation. Visual Intelligence3(1), 10 (2025)

2025
[50]

In: Proceedings of the IEEE/CVF International Conference on Computer Vision

Zhou, Z., Li, Y., Zhong, C., Huang, J., Pei, J., Li, H., Tang, H.: Rethinking de- tecting salient and camouflaged objects in unconstrained scenes. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 22372–22382 (2025) When W4A4 Breaks Camouflaged Object Detection: Token-Group Dual-Constraint Activation Quantization Supplementa...

2025
[51]

Padxon the last dimension toC pad such thatC pad modg= 0
[52]

3.DSTG:computec base ← ∥x g∥∞ orQ p(|xg|)along the last dimension

Reshapex g ←view(x,· · ·, C pad/g, g). 3.DSTG:computec base ← ∥x g∥∞ orQ p(|xg|)along the last dimension. 4.DCRP:if bothτandzrare provided, then (i)σ←Std(x g)withunbiased=False, andc (τ) ←q maxτ σ. (ii)thr←Q zr(|xg|)via akthvaluequantile, andc (zr) ←2q maxthr. (iii)c←min(c base, c(τ) , c(zr)). Otherwise,c←c base
[53]

Clip˜x←clip(x g,−c, c)
[54]

Step∆←max(c/q max,10 −8)
[55]

Quantizeq←clip(round(˜x/∆), q min, qmax)
[56]

Reshapeˆxg back and unpad to obtainˆx
[57]

S1.3 Default Hyperparameters The main paper uses a single shared setting across all datasets and both back- bones, namely DSTG group sizeg= 32(Eq

Computey= Linear(ˆx, ˆW), where ˆWis dequantized from staticw-bit weights and the operator runs in floating-pointcompute_dtype. S1.3 Default Hyperparameters The main paper uses a single shared setting across all datasets and both back- bones, namely DSTG group sizeg= 32(Eq. (5)), DCRP resolution bound τ= 1.0(Eq. (10)), and zero-bin mass boundzr = 0.2(Eq. ...

work page arXiv 1920