AWPD: Frequency Shield Network for Agnostic Watermark Presence Detection

Mengru Chen; Siyang Lu; Xiang Ao; Yilin Du; Zidan Wang

arxiv: 2603.06723 · v3 · submitted 2026-03-06 · 💻 cs.CV · cs.AI

AWPD: Frequency Shield Network for Agnostic Watermark Presence Detection

Xiang Ao , Yilin Du , Zidan Wang , Mengru Chen , Siyang Lu This is my paper

Pith reviewed 2026-05-15 16:07 UTC · model grok-4.3

classification 💻 cs.CV cs.AI

keywords agnostic watermark detectioninvisible watermarkfrequency domain analysiszero-shot detectionimage copyright protectionspectral attentionhigh-frequency anomaliesAIGC forensics

0 comments

The pith

A frequency-focused neural network detects invisible watermarks without knowing the embedding algorithm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines the Agnostic Watermark Presence Detection task to determine whether an image carries an invisible copyright mark when the specific embedding method is unknown. It releases the UniFreq-100K dataset spanning many different watermark algorithms to support training and evaluation. The Frequency Shield Network amplifies high-frequency signals likely created by watermarks while suppressing ordinary image content through learnable gating in early layers and mines energy anomalies with multi-spectral attention and extremum pooling in later layers. Experiments show this design yields higher zero-shot accuracy than prior models when tested on watermark types absent from training. The result matters for copyright enforcement in open settings where new or proprietary watermarking techniques appear constantly.

Core claim

The Frequency Shield Network detects the presence of unknown invisible watermarks by deploying an Adaptive Spectral Perception Module that applies learnable frequency gating to boost high-frequency watermark signals and suppress low-frequency semantics in shallow layers, then uses Dynamic Multi-Spectral Attention combined with tri-stream extremum pooling in deep layers to isolate watermark energy anomalies, delivering superior zero-shot performance on the AWPD task compared with existing baselines.

What carries the argument

Frequency Shield Network that uses adaptive spectral perception for dynamic high-frequency amplification and dynamic multi-spectral attention with tri-stream extremum pooling to isolate watermark energy anomalies.

If this is right

Detection becomes possible for watermark methods absent from any fixed training set.
Copyright screening can proceed without maintaining a catalog of known watermark decoders.
Marked images can be flagged in open AIGC and social-media pipelines before specific decoding is attempted.
The same frequency-anomaly approach could reduce false negatives when watermark strength is low or the carrier image is complex.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach might transfer to video or audio if their watermarking methods also leave detectable spectral footprints.
Future watermark designers could evade detection by deliberately confining changes to low-frequency bands.
Real-world performance would be clarified by running the model on images scraped from public platforms that use undisclosed watermarking.
Combining presence detection with subsequent targeted decoding could create a two-stage pipeline for both flagging and identifying marks.

Load-bearing premise

High-frequency anomalies and energy patterns produced by watermarks stay consistent and detectable for embedding algorithms never seen in the UniFreq-100K dataset.

What would settle it

An experiment that introduces a new watermark embedding algorithm whose outputs produce no distinguishable high-frequency anomalies, causing FSNet zero-shot accuracy to drop below baseline levels, would falsify the claim.

Figures

Figures reproduced from arXiv: 2603.06723 by Mengru Chen, Siyang Lu, Xiang Ao, Yilin Du, Zidan Wang.

**Figure 1.** Figure 1: Overall distribution of the UniFreq-100K dataset. (a) Distribution across five image categories totaling 190K images. (b) Distri [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗

**Figure 2.** Figure 2: Overview of the Frequency Shield Network (FSNet) architecture showing the Adaptive Spectral Perception Module (ASPM) and [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Model performance on different dataset ratios (10% to [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Performance comparison after completely removing [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Visualization of absolute residual extremum binarization triggered by four watermarking algorithms under a pure white back [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Learnable frequency domain gating heatmap of ASPM [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of channel attention weight distributions [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

read the original abstract

Invisible watermarks, as an essential technology for image copyright protection, have been widely deployed with the rapid development of social media and AIGC. However, existing invisible watermark detection heavily relies on prior knowledge of specific algorithms, leading to limited detection capabilities for ``unknown watermarks'' in open environments. To this end, we propose a novel task named Agnostic Watermark Presence Detection (AWPD), which aims to identify whether an image carries a copyright mark without requiring decoding information. We construct the UniFreq-100K dataset, comprising large-scale samples across various invisible watermark embedding algorithms. Furthermore, we propose the Frequency Shield Network (FSNet). This model deploys an Adaptive Spectral Perception Module (ASPM) in the shallow layers, utilizing learnable frequency gating to dynamically amplify high-frequency watermark signals while suppressing low-frequency semantics. In the deep layers, the network introduces Dynamic Multi-Spectral Attention (DMSA) combined with tri-stream extremum pooling to deeply mine watermark energy anomalies, forcing the model to precisely focus on sensitive frequency bands. Extensive experiments demonstrate that FSNet exhibits superior zero-shot detection capabilities on the AWPD task, outperforming existing baseline models. Code and datasets will be released upon acceptance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper defines a useful AWPD task and FSNet architecture for frequency-based agnostic watermark detection, but the abstract supplies no metrics so the generalization claims stay unverified.

read the letter

The paper defines a new AWPD task for detecting watermarks without knowing the embedding algorithm, which fits the practical problem of checking images from mixed AIGC and social media sources. They built UniFreq-100K to cover multiple embedding methods and introduced FSNet with an Adaptive Spectral Perception Module that uses learnable gating to boost high-frequency signals early, plus Dynamic Multi-Spectral Attention with tri-stream extremum pooling in deeper layers to isolate energy anomalies.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces the Agnostic Watermark Presence Detection (AWPD) task for identifying the presence of invisible watermarks in images without knowledge of the embedding algorithm. It constructs the UniFreq-100K dataset spanning multiple watermarking methods and proposes the Frequency Shield Network (FSNet), which incorporates an Adaptive Spectral Perception Module (ASPM) using learnable frequency gating in shallow layers to amplify high-frequency signals and a Dynamic Multi-Spectral Attention (DMSA) module with tri-stream extremum pooling in deeper layers to mine energy anomalies. The central claim is that FSNet achieves superior zero-shot detection performance on AWPD compared to existing baselines.

Significance. If the generalization claims hold under rigorous testing, the work would advance practical copyright protection for images in open environments, particularly with AIGC content, by shifting from algorithm-specific detectors to agnostic presence detection. The construction of the large-scale UniFreq-100K dataset is a concrete contribution that could enable further research in frequency-based forensics. The frequency-gating and multi-spectral attention design is a plausible direction for isolating watermark perturbations, though its impact hinges on empirical validation of transfer to unseen methods.

major comments (2)

Abstract: the claim of superior zero-shot detection capabilities is asserted without any quantitative metrics, error bars, baseline comparisons, or experimental setup details, preventing assessment of whether the data actually supports the central performance claim.
Method and Experiments sections (implied by claims): the zero-shot generalization of ASPM learnable frequency gating and DMSA tri-stream pooling to watermark algorithms absent from UniFreq-100K is not demonstrated. The central claim requires that high-frequency perturbations from unseen methods (e.g., adaptive or content-dependent allocation) remain statistically similar to those in training, but no ablation studies, hold-out algorithm tests, or failure-case analysis are referenced to support transfer beyond the dataset.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which helps clarify the presentation of our results and the strength of our generalization claims. We address each major comment below and have made targeted revisions to the manuscript.

read point-by-point responses

Referee: Abstract: the claim of superior zero-shot detection capabilities is asserted without any quantitative metrics, error bars, baseline comparisons, or experimental setup details, preventing assessment of whether the data actually supports the central performance claim.

Authors: We agree that the abstract would be strengthened by including key quantitative results. In the revised version, we have updated the abstract to report specific zero-shot metrics (e.g., FSNet accuracy versus baselines on UniFreq-100K), along with a brief note on the evaluation protocol. This directly addresses the concern while preserving the abstract's brevity. revision: yes
Referee: Method and Experiments sections (implied by claims): the zero-shot generalization of ASPM learnable frequency gating and DMSA tri-stream pooling to watermark algorithms absent from UniFreq-100K is not demonstrated. The central claim requires that high-frequency perturbations from unseen methods (e.g., adaptive or content-dependent allocation) remain statistically similar to those in training, but no ablation studies, hold-out algorithm tests, or failure-case analysis are referenced to support transfer beyond the dataset.

Authors: We acknowledge the need for explicit demonstration of zero-shot transfer. The full manuscript already contains hold-out experiments that exclude specific watermarking algorithms from training and evaluate on the unseen methods within UniFreq-100K. We have now added explicit cross-references to these results in the experiments section, included new ablation tables isolating the contribution of ASPM frequency gating and DMSA tri-stream pooling to generalization performance, and expanded the failure-case discussion to cover adaptive embedding strategies. These additions provide clearer evidence that the learned high-frequency cues transfer beyond the training distribution. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical architecture and dataset validation are self-contained

full rationale

The paper defines a new task (AWPD), releases a new dataset (UniFreq-100K) spanning multiple embedding algorithms, and introduces FSNet with ASPM (learnable frequency gating) and DMSA (tri-stream extremum pooling). Performance claims rest on standard supervised training followed by zero-shot evaluation on held-out or unseen watermark types; no equations, parameters, or claims reduce by construction to fitted inputs or self-citations. The central result is an empirical comparison of detection accuracy, which is falsifiable against external benchmarks and does not rely on any load-bearing self-referential step.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The approach rests on the assumption that watermarks produce consistent frequency-domain anomalies across methods and on a newly constructed dataset whose coverage is unknown.

free parameters (2)

learnable frequency gating parameters
Introduced in the Adaptive Spectral Perception Module to dynamically amplify high-frequency watermark signals.
DMSA attention weights
Parameters in the Dynamic Multi-Spectral Attention module for focusing on sensitive frequency bands.

axioms (1)

domain assumption Invisible watermarks manifest as detectable high-frequency energy anomalies distinguishable from semantic content
Invoked to justify the design of ASPM and DMSA modules.

invented entities (1)

UniFreq-100K dataset no independent evidence
purpose: Large-scale collection of images with watermarks from diverse embedding algorithms for training and evaluation
Newly constructed for the AWPD task; no independent evidence of coverage or balance provided in abstract.

pith-pipeline@v0.9.0 · 5519 in / 1325 out tokens · 66891 ms · 2026-05-15T16:07:09.112716+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

UniFreq-100K dataset ... leave-one-algorithm-out cross-validation ... zero-shot detection capabilities

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 2 internal anchors

[1]

Hiding images in plain sight: Deep steganography

Shumeet Baluja. Hiding images in plain sight: Deep steganography. 2017. 1, 2

work page 2017
[2]

Techniques for data hiding.IBM systems journal, 35(3.4):313–336, 1996

Walter Bender, Daniel Gruhl, Norishige Morimoto, and An- thony Lu. Techniques for data hiding.IBM systems journal, 35(3.4):313–336, 1996. 2, 4

work page 1996
[3]

Deep residual network for steganalysis of digital images.IEEE Transactions on Information Forensics and Security, 14(5): 1181–1193, 2018

Mehdi Boroumand, Mo Chen, and Jessica Fridrich. Deep residual network for steganalysis of digital images.IEEE Transactions on Information Forensics and Security, 14(5): 1181–1193, 2018. 3

work page 2018
[4]

What makes fake images detectable? understanding prop- erties that generalize

Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. What makes fake images detectable? understanding prop- erties that generalize. InEuropean conference on computer vision, pages 103–120. Springer, 2020. 3

work page 2020
[5]

Intriguing properties of syn- thetic images: from generative adversarial networks to diffu- sion models

Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. Intriguing properties of syn- thetic images: from generative adversarial networks to diffu- sion models. pages 973–982, 2023. 1, 2

work page 2023
[6]

Secure spread spectrum watermarking for multi- media.IEEE transactions on image processing, 6(12):1673– 1687, 1997

Ingemar J Cox, Joe Kilian, F Thomson Leighton, and Talal Shamoon. Secure spread spectrum watermarking for multi- media.IEEE transactions on image processing, 6(12):1673– 1687, 1997. 1, 2, 4

work page 1997
[7]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 2, 6

work page internal anchor Pith review Pith/arXiv arXiv 2010
[8]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Guillaume Couairon, Herv ´e J ´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 22466–22477, 2023. 1, 2, 4

work page 2023
[9]

Rich models for steganal- ysis of digital images.IEEE Transactions on information Forensics and Security, 7(3):868–882, 2012

Jessica Fridrich and Jan Kodovsky. Rich models for steganal- ysis of digital images.IEEE Transactions on information Forensics and Security, 7(3):868–882, 2012. 3

work page 2012
[10]

Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025

Sven Gowal, Rudy Bunel, Florian Stimberg, David Stutz, Guillermo Ortiz-Jimenez, Christina Kouridi, Mel Vecerik, Jamie Hayes, Sylvestre-Alvise Rebuffi, Paul Bernard, et al. Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025. 1, 2, 4

work page arXiv 2025
[11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 2, 6

work page 2016
[12]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InProceedings of the 34th Inter- national Conference on Neural Information Processing Sys- tems, Red Hook, NY , USA, 2020. Curran Associates Inc. 1

work page 2020
[13]

Lecun, L

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient- based learning applied to document recognition.Proceed- ings of the IEEE, 86(11):2278–2324, 1998. 6

work page 1998
[14]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham,

work page 2014
[15]

Springer International Publishing. 4

work page
[16]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 2

work page 2021
[17]

Swin trans- former: Hierarchical vision transformer using shifted win- dows, 2021

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin trans- former: Hierarchical vision transformer using shifted win- dows, 2021. 6

work page 2021
[18]

Decoupled weight decay regularization, 2019

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. 6

work page 2019
[19]

Modality-agnostic domain generalizable medical image segmentation by multi-frequency in multi- scale attention

Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim, and Sang-Chul Lee. Modality-agnostic domain generalizable medical image segmentation by multi-frequency in multi- scale attention. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11480– 11491, 2024. 3, 6

work page 2024
[20]

Towards uni- versal fake image detectors that generalize across genera- tive models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across genera- tive models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24480– 24489, 2023. 3

work page 2023
[21]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023
[22]

Dinov2: Learning robust visual features with- out supervision, 2024

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

work page 2024
[23]

Thinking in frequency: Face forgery detection by min- ing frequency-aware clues

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by min- ing frequency-aware clues. InEuropean conference on com- puter vision, pages 86–103. Springer, 2020. 3

work page 2020
[24]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 1

work page 2021
[25]

V Padmanabha Reddy and S Varadarajan. An effective wavelet-based watermarking scheme using human visual system for protecting copyrights of digital images.Inter- national Journal of Computer and Electrical Engineering, 2 (1):32, 2010. 2

work page 2010
[26]

You only look once: Unified, real-time object de- tection, 2016

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection, 2016. 2

work page 2016
[27]

High-resolution image syn- thesis with latent diffusion models, 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models, 2022. 1

work page 2022
[28]

U-net: Convolutional networks for biomedical image segmentation,

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation,

work page
[29]

Hs-fpn: High frequency and spatial perception fpn for tiny object detec- tion

Zican Shi, Jing Hu, Jie Ren, Hengkang Ye, Xuyang Yuan, Yan Ouyang, Jia He, Bo Ji, and Junyu Guo. Hs-fpn: High frequency and spatial perception fpn for tiny object detec- tion. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6896–6904, 2025. 3, 5

work page 2025
[30]

Stegastamp: Invisible hyperlinks in physical photographs, 2020

Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical photographs, 2020. 2, 4

work page 2020
[31]

Stegastamp: Invisible hyperlinks in physical photographs

Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical photographs. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2117–2126, 2020. 1

work page 2020
[32]

A digital watermark

Ron G Van Schyndel, Andrew Z Tirkel, and Charles F Os- borne. A digital watermark. 2:86–90, 1994. 1, 2, 4

work page 1994
[33]

Cnn-generated images are surprisingly easy to spot

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020. 3

work page 2020
[34]

Dire for diffusion-generated image detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22445–22455, 2023. 3

work page 2023
[35]

Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust

Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust.arXiv preprint arXiv:2305.20030, 2023. 1, 2, 4

work page arXiv 2023
[36]

A watermark for digital images

Raymond B Wolfgang and Edward J Delp. A watermark for digital images. 3:219–222, 1996. 1, 2, 4

work page 1996
[37]

Con- vnext v2: Co-designing and scaling convnets with masked autoencoders, 2023

Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Con- vnext v2: Co-designing and scaling convnets with masked autoencoders, 2023. 6

work page 2023
[38]

Perturbing attention gives you more bang for the buck: Subtle imaging perturbations that effi- ciently fool customized diffusion models

Jingyao Xu, Yuetong Lu, Yandong Li, Siyang Lu, Dongdong Wang, and Xiang Wei. Perturbing attention gives you more bang for the buck: Subtle imaging perturbations that effi- ciently fool customized diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24534–24543, 2024. 2

work page 2024
[39]

A fourier perspective on model robustness in computer vision

Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, and Justin Gilmer. A fourier perspective on model robustness in computer vision. 2019. 2, 3

work page 2019
[40]

Adaptive video watermarking integrating a fuzzy wavelet-based human visual system perceptual model.Mul- timedia tools and applications, 73(3):1545–1573, 2014

Sherin M Youssef, Ahmed Abou ElFarag, and Noha M Ghat- wary. Adaptive video watermarking integrating a fuzzy wavelet-based human visual system perceptual model.Mul- timedia tools and applications, 73(3):1545–1573, 2014. 1, 2

work page 2014
[41]

Editguard: Versatile image watermarking for tamper localization and copyright protection, 2023

Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, and Jian Zhang. Editguard: Versatile image watermarking for tamper localization and copyright protection, 2023. 2

work page 2023
[42]

Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking, 2025

Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, and Jian Zhang. Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking, 2025. 2

work page 2025
[43]

Clean Images

Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. InProceedings of the European conference on computer vision (ECCV), pages 657–672, 2018. 1, 2, 4 A. UniFreq-100K Dataset Construction Details To comprehensively evaluate the model’s performance on the Agnostic Watermark Presence Detection (AWPD) task, we con...

work page 2018
[44]

Feature Masking,

Based on these results, we conducted an in-depth mathe- matical and physical analysis from the dimensions of spatial sparsity and amplitude signal-to-noise ratio. B.1. Patchwork: Extreme Spatial Sparsity and Sig- nal Annihilation During Downsampling The core embedding logic of the Patchwork algorithm relies on using a pseudo-random sequence to select a ve...

work page

[1] [1]

Hiding images in plain sight: Deep steganography

Shumeet Baluja. Hiding images in plain sight: Deep steganography. 2017. 1, 2

work page 2017

[2] [2]

Techniques for data hiding.IBM systems journal, 35(3.4):313–336, 1996

Walter Bender, Daniel Gruhl, Norishige Morimoto, and An- thony Lu. Techniques for data hiding.IBM systems journal, 35(3.4):313–336, 1996. 2, 4

work page 1996

[3] [3]

Deep residual network for steganalysis of digital images.IEEE Transactions on Information Forensics and Security, 14(5): 1181–1193, 2018

Mehdi Boroumand, Mo Chen, and Jessica Fridrich. Deep residual network for steganalysis of digital images.IEEE Transactions on Information Forensics and Security, 14(5): 1181–1193, 2018. 3

work page 2018

[4] [4]

What makes fake images detectable? understanding prop- erties that generalize

Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. What makes fake images detectable? understanding prop- erties that generalize. InEuropean conference on computer vision, pages 103–120. Springer, 2020. 3

work page 2020

[5] [5]

Intriguing properties of syn- thetic images: from generative adversarial networks to diffu- sion models

Riccardo Corvi, Davide Cozzolino, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. Intriguing properties of syn- thetic images: from generative adversarial networks to diffu- sion models. pages 973–982, 2023. 1, 2

work page 2023

[6] [6]

Secure spread spectrum watermarking for multi- media.IEEE transactions on image processing, 6(12):1673– 1687, 1997

Ingemar J Cox, Joe Kilian, F Thomson Leighton, and Talal Shamoon. Secure spread spectrum watermarking for multi- media.IEEE transactions on image processing, 6(12):1673– 1687, 1997. 1, 2, 4

work page 1997

[7] [7]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weissenborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Syl- vain Gelly, et al. An image is worth 16x16 words: Trans- formers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020. 2, 6

work page internal anchor Pith review Pith/arXiv arXiv 2010

[8] [8]

The stable signature: Rooting watermarks in latent diffusion models

Pierre Fernandez, Guillaume Couairon, Herv ´e J ´egou, Matthijs Douze, and Teddy Furon. The stable signature: Rooting watermarks in latent diffusion models. InProceed- ings of the IEEE/CVF International Conference on Com- puter Vision, pages 22466–22477, 2023. 1, 2, 4

work page 2023

[9] [9]

Rich models for steganal- ysis of digital images.IEEE Transactions on information Forensics and Security, 7(3):868–882, 2012

Jessica Fridrich and Jan Kodovsky. Rich models for steganal- ysis of digital images.IEEE Transactions on information Forensics and Security, 7(3):868–882, 2012. 3

work page 2012

[10] [10]

Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025

Sven Gowal, Rudy Bunel, Florian Stimberg, David Stutz, Guillermo Ortiz-Jimenez, Christina Kouridi, Mel Vecerik, Jamie Hayes, Sylvestre-Alvise Rebuffi, Paul Bernard, et al. Synthid-image: Image watermarking at internet scale.arXiv preprint arXiv:2510.09263, 2025. 1, 2, 4

work page arXiv 2025

[11] [11]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 2, 6

work page 2016

[12] [12]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InProceedings of the 34th Inter- national Conference on Neural Information Processing Sys- tems, Red Hook, NY , USA, 2020. Curran Associates Inc. 1

work page 2020

[13] [13]

Lecun, L

Y . Lecun, L. Bottou, Y . Bengio, and P. Haffner. Gradient- based learning applied to document recognition.Proceed- ings of the IEEE, 86(11):2278–2324, 1998. 6

work page 1998

[14] [14]

Lawrence Zitnick

Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Doll´ar, and C. Lawrence Zitnick. Microsoft coco: Common objects in context. In Computer Vision – ECCV 2014, pages 740–755, Cham,

work page 2014

[15] [15]

Springer International Publishing. 4

work page

[16] [16]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 2

work page 2021

[17] [17]

Swin trans- former: Hierarchical vision transformer using shifted win- dows, 2021

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin trans- former: Hierarchical vision transformer using shifted win- dows, 2021. 6

work page 2021

[18] [18]

Decoupled weight decay regularization, 2019

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization, 2019. 6

work page 2019

[19] [19]

Modality-agnostic domain generalizable medical image segmentation by multi-frequency in multi- scale attention

Ju-Hyeon Nam, Nur Suriza Syazwany, Su Jung Kim, and Sang-Chul Lee. Modality-agnostic domain generalizable medical image segmentation by multi-frequency in multi- scale attention. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11480– 11491, 2024. 3, 6

work page 2024

[20] [20]

Towards uni- versal fake image detectors that generalize across genera- tive models

Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards uni- versal fake image detectors that generalize across genera- tive models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 24480– 24489, 2023. 3

work page 2023

[21] [21]

DINOv2: Learning Robust Visual Features without Supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. arXiv preprint arXiv:2304.07193, 2023. 2

work page internal anchor Pith review Pith/arXiv arXiv 2023

[22] [22]

Dinov2: Learning robust visual features with- out supervision, 2024

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...

work page 2024

[23] [23]

Thinking in frequency: Face forgery detection by min- ing frequency-aware clues

Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by min- ing frequency-aware clues. InEuropean conference on com- puter vision, pages 86–103. Springer, 2020. 3

work page 2020

[24] [24]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InProceedings of the 38th International Conference on Machine Learning, pages 8748–8763. PMLR, 2021. 1

work page 2021

[25] [25]

V Padmanabha Reddy and S Varadarajan. An effective wavelet-based watermarking scheme using human visual system for protecting copyrights of digital images.Inter- national Journal of Computer and Electrical Engineering, 2 (1):32, 2010. 2

work page 2010

[26] [26]

You only look once: Unified, real-time object de- tection, 2016

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. You only look once: Unified, real-time object de- tection, 2016. 2

work page 2016

[27] [27]

High-resolution image syn- thesis with latent diffusion models, 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models, 2022. 1

work page 2022

[28] [28]

U-net: Convolutional networks for biomedical image segmentation,

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation,

work page

[29] [29]

Hs-fpn: High frequency and spatial perception fpn for tiny object detec- tion

Zican Shi, Jing Hu, Jie Ren, Hengkang Ye, Xuyang Yuan, Yan Ouyang, Jia He, Bo Ji, and Junyu Guo. Hs-fpn: High frequency and spatial perception fpn for tiny object detec- tion. InProceedings of the AAAI Conference on Artificial Intelligence, pages 6896–6904, 2025. 3, 5

work page 2025

[30] [30]

Stegastamp: Invisible hyperlinks in physical photographs, 2020

Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical photographs, 2020. 2, 4

work page 2020

[31] [31]

Stegastamp: Invisible hyperlinks in physical photographs

Matthew Tancik, Ben Mildenhall, and Ren Ng. Stegastamp: Invisible hyperlinks in physical photographs. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2117–2126, 2020. 1

work page 2020

[32] [32]

A digital watermark

Ron G Van Schyndel, Andrew Z Tirkel, and Charles F Os- borne. A digital watermark. 2:86–90, 1994. 1, 2, 4

work page 1994

[33] [33]

Cnn-generated images are surprisingly easy to spot

Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn-generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020. 3

work page 2020

[34] [34]

Dire for diffusion-generated image detection

Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22445–22455, 2023. 3

work page 2023

[35] [35]

Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust

Yuxin Wen, John Kirchenbauer, Jonas Geiping, and Tom Goldstein. Tree-ring watermarks: Fingerprints for diffu- sion images that are invisible and robust.arXiv preprint arXiv:2305.20030, 2023. 1, 2, 4

work page arXiv 2023

[36] [36]

A watermark for digital images

Raymond B Wolfgang and Edward J Delp. A watermark for digital images. 3:219–222, 1996. 1, 2, 4

work page 1996

[37] [37]

Con- vnext v2: Co-designing and scaling convnets with masked autoencoders, 2023

Sanghyun Woo, Shoubhik Debnath, Ronghang Hu, Xinlei Chen, Zhuang Liu, In So Kweon, and Saining Xie. Con- vnext v2: Co-designing and scaling convnets with masked autoencoders, 2023. 6

work page 2023

[38] [38]

Perturbing attention gives you more bang for the buck: Subtle imaging perturbations that effi- ciently fool customized diffusion models

Jingyao Xu, Yuetong Lu, Yandong Li, Siyang Lu, Dongdong Wang, and Xiang Wei. Perturbing attention gives you more bang for the buck: Subtle imaging perturbations that effi- ciently fool customized diffusion models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 24534–24543, 2024. 2

work page 2024

[39] [39]

A fourier perspective on model robustness in computer vision

Dong Yin, Raphael Gontijo Lopes, Jon Shlens, Ekin Dogus Cubuk, and Justin Gilmer. A fourier perspective on model robustness in computer vision. 2019. 2, 3

work page 2019

[40] [40]

Adaptive video watermarking integrating a fuzzy wavelet-based human visual system perceptual model.Mul- timedia tools and applications, 73(3):1545–1573, 2014

Sherin M Youssef, Ahmed Abou ElFarag, and Noha M Ghat- wary. Adaptive video watermarking integrating a fuzzy wavelet-based human visual system perceptual model.Mul- timedia tools and applications, 73(3):1545–1573, 2014. 1, 2

work page 2014

[41] [41]

Editguard: Versatile image watermarking for tamper localization and copyright protection, 2023

Xuanyu Zhang, Runyi Li, Jiwen Yu, Youmin Xu, Weiqi Li, and Jian Zhang. Editguard: Versatile image watermarking for tamper localization and copyright protection, 2023. 2

work page 2023

[42] [42]

Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking, 2025

Xuanyu Zhang, Zecheng Tang, Zhipei Xu, Runyi Li, Youmin Xu, Bin Chen, Feng Gao, and Jian Zhang. Omniguard: Hy- brid manipulation localization via augmented versatile deep image watermarking, 2025. 2

work page 2025

[43] [43]

Clean Images

Jiren Zhu, Russell Kaplan, Justin Johnson, and Li Fei-Fei. Hidden: Hiding data with deep networks. InProceedings of the European conference on computer vision (ECCV), pages 657–672, 2018. 1, 2, 4 A. UniFreq-100K Dataset Construction Details To comprehensively evaluate the model’s performance on the Agnostic Watermark Presence Detection (AWPD) task, we con...

work page 2018

[44] [44]

Feature Masking,

Based on these results, we conducted an in-depth mathe- matical and physical analysis from the dimensions of spatial sparsity and amplitude signal-to-noise ratio. B.1. Patchwork: Extreme Spatial Sparsity and Sig- nal Annihilation During Downsampling The core embedding logic of the Patchwork algorithm relies on using a pseudo-random sequence to select a ve...

work page