DINOv3 Beats Specialized Detectors: A Simple Foundation Model Baseline for Image Forensics
Pith reviewed 2026-05-10 09:00 UTC · model grok-4.3
The pith
DINOv3 with LoRA adaptation and a lightweight convolutional decoder achieves higher average pixel-level F1 scores than previous state-of-the-art specialized methods for image manipulation localization on multiple benchmarks.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under the CAT-Net protocol, our best model improves average pixel-level F1 by 17.0 points over the previous state of the art on four standard benchmarks using only 9.1M trainable parameters on top of a frozen ViT-L backbone, and even our smallest variant surpasses all prior specialized methods.
Load-bearing premise
That the general visual representations learned by DINOv3 on natural images already encode sufficient forensic traces of manipulations, so that minimal adaptation suffices without domain-specific pre-training or architectural priors tailored to image forensics.
Figures
read the original abstract
With the rapid advancement of deep generative models, realistic fake images have become increasingly accessible, yet existing localization methods rely on complex designs and still struggle to generalize across manipulation types and imaging conditions. We present a simple but strong baseline based on DINOv3 with LoRA adaptation and a lightweight convolutional decoder. Under the CAT-Net protocol, our best model improves average pixel-level F1 by 17.0 points over the previous state of the art on four standard benchmarks using only 9.1\,M trainable parameters on top of a frozen ViT-L backbone, and even our smallest variant surpasses all prior specialized methods. LoRA consistently outperforms full fine-tuning across all backbone scales. Under the data-scarce MVSS-Net protocol, LoRA reaches an average F1 of 0.774 versus 0.530 for the strongest prior method, while full fine-tuning becomes highly unstable, suggesting that pre-trained representations encode forensic information that is better preserved than overwritten. The baseline also exhibits strong robustness to Gaussian noise, JPEG re-compression, and Gaussian blur. We hope this work can serve as a reliable baseline for the research community and a practical starting point for future image-forensic applications. Code is available at https://github.com/Irennnne/DINOv3-IML.
Editorial analysis
A structured set of objections, weighed in public.
Axiom & Free-Parameter Ledger
free parameters (1)
- LoRA rank and scaling
axioms (1)
- domain assumption Frozen DINOv3 ViT-L features contain transferable information relevant to pixel-level manipulation localization
Reference graph
Works this paper leans on
-
[1]
Emerg- ing properties in self-supervised vision transformers, 2021
Mathilde Caron, Hugo Touvron, Ishan Misra, Herv ´e J´egou, Julien Mairal, Piotr Bojanowski, and Armand Joulin. Emerg- ing properties in self-supervised vision transformers, 2021. 2
work page 2021
-
[2]
Noiseprint: a cnn- based camera model fingerprint, 2018
Davide Cozzolino and Luisa Verdoliva. Noiseprint: a cnn- based camera model fingerprint, 2018. 2
work page 2018
-
[3]
Chengbo Dong, Xinru Chen, Ruohan Hu, Juan Cao, and Xirong Li. Mvss-net: Multi-view multi-scale supervised networks for image manipulation detection.IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 45(3): 3539–3553, 2023. 2, 3, 5, 6, 7
work page 2023
-
[4]
CASIA image tam- pering detection evaluation database
Jing Dong, Wei Wang, and Tieniu Tan. CASIA image tam- pering detection evaluation database. In2013 IEEE China Summit and International Conference on Signal and Infor- mation Processing. IEEE, 2013. 2, 3
work page 2013
-
[5]
Bo Du, Xuekang Zhu, Xiaochen Ma, Chenfan Qu, Kai- wen Feng, Zhe Yang, Chi-Man Pun, Jian Liu, and Ji-Zhe Zhou. Forensichub: A unified benchmark & codebase for all-domain fake image detection and localization, 2026. 1
work page 2026
-
[6]
Yates, Andrew Delgado, Daniel Zhou, Timothee Kheyrkhah, Jeff Smith, and Jonathan Fiscus
Haiying Guan, Mark Kozak, Eric Robertson, Yooyoung Lee, Amy N. Yates, Andrew Delgado, Daniel Zhou, Timothee Kheyrkhah, Jeff Smith, and Jonathan Fiscus. Mfc datasets: Large-scale benchmark datasets for media forensic challenge evaluation. In2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), pages 63–72, 2019. 2, 3
work page 2019
-
[7]
Trufor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion, 2023
Fabrizio Guillaro, Davide Cozzolino, Avneesh Sud, Nicholas Dufour, and Luisa Verdoliva. Trufor: Leveraging all-round clues for trustworthy image forgery detection and localiza- tion, 2023. 2, 5, 6, 7
work page 2023
-
[8]
Detecting image splicing using geometry invariants and camera characteristics consis- tency
Yu-feng Hsu and Shih-fu Chang. Detecting image splicing using geometry invariants and camera characteristics consis- tency. In2006 IEEE International Conference on Multime- dia and Expo, pages 549–552, 2006. 2, 3
work page 2006
-
[9]
Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen
Edward J. Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen. Lora: Low-rank adaptation of large language models, 2021. 1, 2, 3
work page 2021
-
[10]
Kniaz, Vladimir Knyaz, and Fabio Remondino
Vladimir V . Kniaz, Vladimir Knyaz, and Fabio Remondino. The point where reality meets fantasy: Mixed adversarial generators for image splice detection. InNeurIPS, 2019. 3
work page 2019
-
[11]
Myung-Joon Kwon, Seung-Hun Nam, In-Jae Yu, Heung- Kyu Lee, and Changick Kim. Learning jpeg compression ar- tifacts for image manipulation detection and localization.In- ternational Journal of Computer Vision, 130(8):1875–1895,
-
[12]
Xiaohong Liu, Yaojie Liu, Jun Chen, and Xiaoming Liu. Pscc-net: Progressive spatio-channel correlation network for image manipulation detection and localization, 2022. 2, 5, 6, 7
work page 2022
-
[13]
Xiaochen Ma, Bo Du, Zhuohang Jiang, Xia Du, Ahmed Y . Al Hammadi, and Jizhe Zhou. Iml-vit: Benchmarking image manipulation localization by vision transformer, 2024. 1, 2, 5
work page 2024
-
[14]
Xiaochen Ma, Xuekang Zhu, Lei Su, Bo Du, Zhuohang Jiang, Bingkui Tong, Zeyu Lei, Xinyu Yang, Chi-Man Pun, Jiancheng Lv, and Jizhe Zhou. Imdl-benco: A comprehen- sive benchmark and codebase for image manipulation detec- tion & localization, 2024. 1, 2, 4, 5
work page 2024
-
[15]
Ju-Hyeon Nam, Dong-Hyun Moon, and Sang-Chul Lee. M2sformer: Multi-spectral and multi-scale attention with edge-aware difficulty guidance for image forgery localiza- tion, 2025. 2
work page 2025
-
[16]
Imd2020: A large-scale annotated dataset tailored for de- tecting manipulated images
Adam Novoz ´amsk´y, Babak Mahdian, and Stanislav Saic. Imd2020: A large-scale annotated dataset tailored for de- tecting manipulated images. In2020 IEEE Winter Applica- tions of Computer Vision Workshops (WACVW), pages 71– 80, 2020. 3
work page 2020
-
[17]
Dinov2: Learning robust visual features with- out supervision, 2024
Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, Mah- moud Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herv ´e Je- gou, Julien Mairal, ...
work page 2024
-
[18]
Oriane Sim ´eoni, Huy V . V o, Maximilian Seitzer, Federico Baldassarre, Maxime Oquab, Cijo Jose, Vasil Khalidov, Marc Szafraniec, Seungeun Yi, Micha ¨el Ramamonjisoa, Francisco Massa, Daniel Haziza, Luca Wehrstedt, Jianyuan Wang, Timoth´ee Darcet, Th´eo Moutakanni, Leonel Sentana, Claire Roberts, Andrea Vedaldi, Jamie Tolan, John Brandt, Camille Couprie,...
work page 2025
-
[19]
Luisa Verdoliva. Media forensics and deepfakes: An overview.IEEE Journal of Selected Topics in Signal Pro- cessing, 14(5):910–932, 2020. 1
work page 2020
-
[20]
Ob- jectformer for image manipulation detection and localiza- tion, 2022
Junke Wang, Zuxuan Wu, Jingjing Chen, Xintong Han, Ab- hinav Shrivastava, Ser-Nam Lim, and Yu-Gang Jiang. Ob- jectformer for image manipulation detection and localiza- tion, 2022. 5
work page 2022
-
[21]
Coverage — a novel database for copy-move forgery detection
Bihan Wen, Ye Zhu, Ramanathan Subramanian, Tian-Tsong Ng, Xuanjing Shen, and Stefan Winkler. Coverage — a novel database for copy-move forgery detection. In2016 IEEE International Conference on Image Processing (ICIP), pages 161–165, 2016. 2, 3
work page 2016
-
[22]
Yue Wu, Wael Abd-Almageed, and Premkumar Natarajan. ManTra-Net: Manipulation tracing network for detection and localization of image forgeries with anomalous features. InCVPR, 2019. 5
work page 2019
-
[23]
Jizhe Zhou, Xiaochen Ma, Xia Du, Ahmed Y . Alhammadi, and Wentao Feng. Pre-training-free image manipulation lo- calization through non-mutually exclusive contrastive learn- ing, 2023. 5
work page 2023
-
[24]
Xuekang Zhu, Xiaochen Ma, Lei Su, Zhuohang Jiang, Bo Du, Xiwen Wang, Zeyu Lei, Wentao Feng, Chi-Man Pun, and Jizhe Zhou. Mesoscopic insights: Orchestrating multi- scale & hybrid architecture for image manipulation localiza- tion, 2024. 2, 5, 6, 7
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.