ForgeryGPT: A Multimodal LLM for Interpretable Image Forgery Detection and Localization
Pith reviewed 2026-05-23 19:03 UTC · model grok-4.3
The pith
ForgeryGPT integrates a mask-aware extractor into a multimodal LLM to enable explainable image forgery detection and localization.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ForgeryGPT advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model architecture. Specifically, it enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts. The extractor consists of a Forgery Localization Expert augmented with an Object-agnostic Forgery Prompt and a Vocabulary-enhanced Vision Encoder, along with a Mask Encoder.
What carries the argument
Mask-Aware Forgery Extractor that excavates precise forgery mask information from input images for pixel-level understanding of tampering artifacts.
If this is right
- Supports explainable generation of detection results beyond single judgments.
- Enables interactive dialogue about the forgery analysis.
- Captures multi-scale fine-grained forgery details for improved accuracy.
- Aligns vision and language modalities through dedicated datasets.
- Improves instruction-following capabilities for IFDL tasks.
Where Pith is reading between the lines
- Such a system could be adapted for detecting forgeries in video or other media types.
- The use of linguistic feature spaces might reveal patterns not visible in purely visual methods.
- Interactive features could facilitate collaboration between AI and human experts in forensic analysis.
- Testing on more diverse real-world datasets would validate its robustness beyond controlled experiments.
Load-bearing premise
The Mask-Aware Forgery Extractor can excavate precise forgery mask information from input images to enable pixel-level understanding of tampering artifacts.
What would settle it
An experiment where the model fails to produce accurate pixel-level forgery masks on a benchmark dataset with varied tampering techniques would disprove the central claim.
Figures
read the original abstract
Multimodal Large Language Models (MLLMs), such as GPT4o, have shown strong capabilities in visual reasoning and explanation generation. However, despite these strengths, they face significant challenges in the increasingly critical task of Image Forgery Detection and Localization (IFDL). Moreover, existing IFDL methods are typically limited to the learning of low-level semantic-agnostic clues and merely provide a single outcome judgment. To tackle these issues, we propose ForgeryGPT, a novel framework that advances the IFDL task by capturing high-order forensics knowledge correlations of forged images from diverse linguistic feature spaces, while enabling explainable generation and interactive dialogue through a newly customized Large Language Model (LLM) architecture. Specifically, ForgeryGPT enhances traditional LLMs by integrating the Mask-Aware Forgery Extractor, which enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts. The Mask-Aware Forgery Extractor consists of a Forgery Localization Expert (FL-Expert) and a Mask Encoder, where the FL-Expert is augmented with an Object-agnostic Forgery Prompt and a Vocabulary-enhanced Vision Encoder, allowing for effectively capturing of multi-scale fine-grained forgery details. To enhance its performance, we implement a three-stage training strategy, supported by our designed Mask-Text Alignment and IFDL Task-Specific Instruction Tuning datasets, which align vision-language modalities and improve forgery detection and instruction-following capabilities. Extensive experiments demonstrate the effectiveness of the proposed method.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes ForgeryGPT, a multimodal LLM framework for image forgery detection and localization (IFDL). It augments an LLM with a Mask-Aware Forgery Extractor (Forgery Localization Expert or FL-Expert, augmented by an Object-agnostic Forgery Prompt and Vocabulary-enhanced Vision Encoder, plus a Mask Encoder) to extract precise forgery masks for pixel-level tampering understanding. A three-stage training strategy uses custom Mask-Text Alignment and IFDL Task-Specific Instruction Tuning datasets to align modalities and improve detection/instruction-following. The work claims to capture high-order forensics knowledge correlations across linguistic spaces while enabling explainable outputs and interactive dialogue, with the abstract asserting that extensive experiments demonstrate effectiveness over prior low-level IFDL methods.
Significance. If the central claims hold, the integration of specialized forgery extraction with MLLM reasoning could advance IFDL by adding interpretability and interactivity beyond binary or low-level outputs. The three-stage training and custom alignment datasets represent a structured effort to bridge vision and language for forensics, which is a potentially useful direction if the extractor delivers on pixel-level precision.
major comments (2)
- [Abstract] Abstract: the claim that 'extensive experiments demonstrate the effectiveness' is unsupported because the abstract (and the provided description) supplies no quantitative results, ablation studies, baseline comparisons, or error analysis. Without these, it is impossible to verify whether the architecture supports the performance claims on high-order correlations or localization.
- [Method (Mask-Aware Forgery Extractor)] Method description of the Mask-Aware Forgery Extractor: the central claim that this module 'enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts' is load-bearing, yet the description provides no concrete mechanism (e.g., mask-prediction loss, supervision signal, or architectural difference from standard vision encoders) that would guarantee focus on tampering artifacts rather than generic object boundaries. This directly affects whether the subsequent LLM stages can be shown to advance IFDL.
minor comments (1)
- [Abstract] The abstract is lengthy and could be condensed while retaining the core technical contributions.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each major comment below and will make revisions to improve clarity and support for the claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that 'extensive experiments demonstrate the effectiveness' is unsupported because the abstract (and the provided description) supplies no quantitative results, ablation studies, baseline comparisons, or error analysis. Without these, it is impossible to verify whether the architecture supports the performance claims on high-order correlations or localization.
Authors: We agree that the abstract would be strengthened by including key quantitative results. The full manuscript contains sections detailing experiments with baseline comparisons, ablation studies on the FL-Expert components, and metrics for pixel-level localization and detection. In revision we will update the abstract to reference specific performance gains, such as improved localization IoU over prior IFDL methods. revision: yes
-
Referee: [Method (Mask-Aware Forgery Extractor)] Method description of the Mask-Aware Forgery Extractor: the central claim that this module 'enables the excavating of precise forgery mask information from input images and facilitating pixel-level understanding of tampering artifacts' is load-bearing, yet the description provides no concrete mechanism (e.g., mask-prediction loss, supervision signal, or architectural difference from standard vision encoders) that would guarantee focus on tampering artifacts rather than generic object boundaries. This directly affects whether the subsequent LLM stages can be shown to advance IFDL.
Authors: The abstract provides a high-level overview. The full method section describes the Object-agnostic Forgery Prompt and Vocabulary-enhanced Vision Encoder as mechanisms to prioritize forgery artifacts over object boundaries, with the three-stage training using the Mask-Text Alignment dataset for supervision. We acknowledge the need for explicit details on the mask-prediction loss and supervision signals. We will expand the method description to include these elements and clarify the architectural differences from standard encoders. revision: yes
Circularity Check
No circularity detected; derivation is self-contained architectural proposal
full rationale
The paper proposes ForgeryGPT as a novel MLLM framework integrating a Mask-Aware Forgery Extractor (FL-Expert with Object-agnostic Forgery Prompt, Vocabulary-enhanced Vision Encoder, and Mask Encoder) plus three-stage training on custom Mask-Text Alignment and IFDL datasets. No equations, parameter-fitting steps, or self-citation chains appear in the provided text that reduce any claimed prediction or result to the inputs by construction. The central claims rest on the described modules and experimental validation rather than any definitional equivalence or fitted-input renaming, making the derivation self-contained.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 4 Pith papers
-
ReAlign: Generalizable Image Forgery Detection via Reasoning-Aligned Representation
ReAlign distills LLM-generated reasoning texts into a lightweight AIGI forgery detector via contrastive image-text alignment to improve generalization on complex forgeries.
-
Code-in-the-Loop Forensics: Agentic Tool Use for Image Forgery Detection
ForenAgent lets MLLMs create and iteratively improve low-level Python tools for image forgery detection via a two-stage training pipeline and a new 100k-image benchmark dataset.
-
Venus-DeFakerOne: Unified Fake Image Detection & Localization
DeFakerOne integrates InternVL2 and SAM2 into a single model that achieves state-of-the-art results on 39 detection and 9 localization benchmarks for unified fake image detection and localization.
-
UniGenDet: A Unified Generative-Discriminative Framework for Co-Evolutionary Image Generation and Generated Image Detection
UniGenDet unifies generative and discriminative models through symbiotic self-attention and detector-guided alignment to co-evolve image generation and authenticity detection.
Reference graph
Works this paper leans on
-
[1]
Y . Rao, J. Ni, W. Zhang, and J. Huang, “Towards jpeg-resistant image forgery detection and localization via self-supervised domain adapta- tion,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2022
work page 2022
-
[2]
Detecting and grounding multi-modal media manipulation and beyond,
R. Shao, T. Wu, J. Wu, L. Nie, and Z. Liu, “Detecting and grounding multi-modal media manipulation and beyond,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024
work page 2024
-
[3]
Face forgery detection by 3d decomposition and composition search,
X. Zhu, H. Fei, B. Zhang, T. Zhang, X. Zhang, S. Z. Li, and Z. Lei, “Face forgery detection by 3d decomposition and composition search,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 7, pp. 8342–8357, 2023
work page 2023
-
[4]
A principled design of image representation: Towards forensic tasks,
S. Qi, Y . Zhang, C. Wang, J. Zhou, and X. Cao, “A principled design of image representation: Towards forensic tasks,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 45, no. 5, pp. 5337– 5354, 2022
work page 2022
-
[5]
Fully unsupervised deepfake video detection via enhanced contrastive learning,
T. Qiao, S. Xie, Y . Chen, F. Retraint, and X. Luo, “Fully unsupervised deepfake video detection via enhanced contrastive learning,” IEEE Transactions on Pattern Analysis and Machine Intelligence , 2024
work page 2024
-
[6]
Photorealistic text-to-image diffusion models with deep language understanding,
C. Saharia, W. Chan, S. Saxena, L. Li, J. Whang, E. L. Denton, K. Ghasemipour, R. Gontijo Lopes, B. Karagol Ayan, T. Salimans et al., “Photorealistic text-to-image diffusion models with deep language understanding,” Advances in Neural Information Processing Systems , vol. 35, pp. 36 479–36 494, 2022
work page 2022
-
[7]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image gen- eration and editing with text-guided diffusion models,” arXiv preprint arXiv:2112.10741, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[8]
Forgery- aware adaptive transformer for generalizable synthetic image detection,
H. Liu, Z. Tan, C. Tan, Y . Wei, J. Wang, and Y . Zhao, “Forgery- aware adaptive transformer for generalizable synthetic image detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 10 770–10 780
work page 2024
-
[9]
Learning rich features for image manipulation detection,
P. Zhou, X. Han, V . I. Morariu, and L. S. Davis, “Learning rich features for image manipulation detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2018, pp. 1053–1061
work page 2018
-
[10]
Image manipulation detection by multi-view multi-scale supervision,
X. Chen, C. Dong, J. Ji, J. Cao, and X. Li, “Image manipulation detection by multi-view multi-scale supervision,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2021, pp. 14 185–14 193
work page 2021
-
[11]
Edge-aware regional message passing controller for image forgery localization,
D. Li, J. Zhu, M. Wang, J. Liu, X. Fu, and Z.-J. Zha, “Edge-aware regional message passing controller for image forgery localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 8222–8232
work page 2023
-
[12]
Learning discriminative noise guidance for image forgery detection and localization,
J. Zhu, D. Li, X. Fu, G. Yang, J. Huang, A. Liu, and Z.-J. Zha, “Learning discriminative noise guidance for image forgery detection and localization,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 7, 2024, pp. 7739–7747
work page 2024
-
[13]
Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization,
F. Guillaro, D. Cozzolino, A. Sud, N. Dufour, and L. Verdoliva, “Trufor: Leveraging all-round clues for trustworthy image forgery detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2023, pp. 20 606–20 615
work page 2023
-
[14]
Diffforensics: Leveraging diffu- sion prior to image forgery detection and localization,
Z. Yu, J. Ni, Y . Lin, H. Deng, and B. Li, “Diffforensics: Leveraging diffu- sion prior to image forgery detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 12 765–12 774
work page 2024
-
[15]
Objectformer for image manipulation detection and localization,
J. Wang, Z. Wu, J. Chen, X. Han, A. Shrivastava, S.-N. Lim, and Y .-G. Jiang, “Objectformer for image manipulation detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 2364–2373
work page 2022
-
[16]
A Survey on Multimodal Large Language Models
S. Yin, C. Fu, S. Zhao, K. Li, X. Sun, T. Xu, and E. Chen, “A survey on multimodal large language models,” CoRR, vol. abs/2306.13549,
work page internal anchor Pith review Pith/arXiv arXiv
-
[17]
A Survey on Multimodal Large Language Models
[Online]. Available: https://doi.org/10.48550/arXiv.2306.13549
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.13549
- [18]
-
[19]
H. Liu, C. Li, Q. Wu, and Y . J. Lee, “Visual instruction tuning,” in Advances in Neural Information Processing Systems , 2023
work page 2023
-
[20]
High- resolution image synthesis with latent diffusion models,
R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion, 2022, pp. 10 684–10 695
work page 2022
-
[21]
OpenAI, “GPT-4v(ision) system card,” https://cdn.openai.com/papers/ GPTV System Card.pdf, 2023
work page 2023
-
[22]
The point where reality meets fantasy: Mixed adversarial generators for image splice detection,
V . V . Kniaz, V . Knyaz, and F. Remondino, “The point where reality meets fantasy: Mixed adversarial generators for image splice detection,” Advances in Neural Information Processing Systems , vol. 32, 2019
work page 2019
-
[23]
Casia image tampering detection evaluation database,
J. Dong, W. Wang, and T. Tan, “Casia image tampering detection evaluation database,” in 2013 IEEE China Summit and International Conference on Signal and Information Processing . IEEE, 2013, pp. 422–426
work page 2013
-
[24]
Noiseprint: a cnn-based camera model fingerprint,
D. Cozzolino and L. Verdoliva, “Noiseprint: a cnn-based camera model fingerprint,” IEEE Transactions on Information Forensics and Security , vol. 15, pp. 144–159, 2019
work page 2019
-
[25]
Coverage—a novel database for copy-move forgery detection,
B. Wen, Y . Zhu, R. Subramanian, T.-T. Ng, X. Shen, and S. Winkler, “Coverage—a novel database for copy-move forgery detection,” in 2016 IEEE International Conference on Image Processing (ICIP) . IEEE, 2016, pp. 161–165
work page 2016
-
[27]
Q. Yang, D. Yu, Z. Zhang, Y . Yao, and L. Chen, “Spatiotemporal trident networks: detection and localization of object removal tampering in video passive forensics,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 31, no. 10, pp. 4131–4144, 2020
work page 2020
-
[28]
A deep learning approach to patch-based image inpainting forensics,
X. Zhu, Y . Qian, X. Zhao, B. Sun, and Y . Sun, “A deep learning approach to patch-based image inpainting forensics,” Signal Processing: Image Communication, vol. 67, pp. 90–99, 2018
work page 2018
-
[29]
Y . Wu, W. AbdAlmageed, and P. Natarajan, “Mantra-net: Manipulation tracing network for detection and localization of image forgeries with anomalous features,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2019, pp. 9543–9552
work page 2019
-
[30]
Span: Spatial pyramid attention network for image manipulation localization,
X. Hu, Z. Zhang, Z. Jiang, S. Chaudhuri, Z. Yang, and R. Nevatia, “Span: Spatial pyramid attention network for image manipulation localization,” in European Conference on Computer Vision. Springer, 2020, pp. 312– 328. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, VOL. XX, NO. XX, OCTOBER 2024 16
work page 2020
-
[31]
Self-adversarial training incorporating forgery attention for image forgery localization,
L. Zhuo, S. Tan, B. Li, and J. Huang, “Self-adversarial training incorporating forgery attention for image forgery localization,” IEEE Transactions on Information Forensics and Security , vol. 17, pp. 819– 834, 2022
work page 2022
-
[32]
Cat-net: Compression artifact tracing network for detection and localization of image splicing,
M.-J. Kwon, I.-J. Yu, S.-H. Nam, and H.-K. Lee, “Cat-net: Compression artifact tracing network for detection and localization of image splicing,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2021, pp. 375–384
work page 2021
-
[33]
X. Liu, Y . Liu, J. Chen, and X. Liu, “Pscc-net: Progressive spatio- channel correlation network for image manipulation detection and localization,” IEEE Transactions on Circuits and Systems for Video Technology, 2022
work page 2022
-
[34]
Hierarchical fine-grained image forgery detection and localization,
X. Guo, X. Liu, Z. Ren, S. Grosz, I. Masi, and X. Liu, “Hierarchical fine-grained image forgery detection and localization,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2023, pp. 3155–3165
work page 2023
-
[35]
J. Li, D. Li, S. Savarese, and S. C. H. Hoi, “BLIP-2: bootstrapping language-image pre-training with frozen image encoders and large language models,” in International Conference on Machine Learning , ser. Proceedings of Machine Learning Research, A. Krause, E. Brunskill, K. Cho, B. Engelhardt, S. Sabato, and J. Scarlett, Eds., vol. 202, 2023, pp. 19 730–19 742
work page 2023
-
[36]
MiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models
D. Zhu, J. Chen, X. Shen, X. Li, and M. Elhoseiny, “MiniGPT-4: Enhancing vision-language understanding with advanced large language models,” arXiv preprint arXiv:2304.10592 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Osprey: Pixel understanding with visual instruction tuning,
Y . Yuan, W. Li, J. Liu, D. Tang, X. Luo, C. Qin, L. Zhang, and J. Zhu, “Osprey: Pixel understanding with visual instruction tuning,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 28 202–28 211
work page 2024
-
[38]
Anomalygpt: Detecting industrial anomalies using large vision-language models,
Z. Gu, B. Zhu, G. Zhu, Y . Chen, M. Tang, and J. Wang, “Anomalygpt: Detecting industrial anomalies using large vision-language models,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 38, no. 3, 2024, pp. 1932–1940
work page 2024
-
[39]
PandaGPT: One Model To Instruction-Follow Them All
Y . Su, T. Lan, H. Li, J. Xu, Y . Wang, and D. Cai, “Pandagpt: One model to instruction-follow them all,” arXiv preprint arXiv:2305.16355 , 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[40]
Myriad: Large multimodal model by applying vision experts for industrial anomaly detection,
Y . Li, H. Wang, S. Yuan, M. Liu, D. Zhao, Y . Guo, C. Xu, G. Shi, and W. Zuo, “Myriad: Large multimodal model by applying vision experts for industrial anomaly detection,” arXiv preprint arXiv:2310.19070 , 2023
-
[41]
Sniffer: Multimodal large lan- guage model for explainable out-of-context misinformation detection,
P. Qi, Z. Yan, W. Hsu, and M. L. Lee, “Sniffer: Multimodal large lan- guage model for explainable out-of-context misinformation detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 13 052–13 062
work page 2024
-
[42]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
W. Dai, J. Li, D. Li, A. M. H. Tiong, J. Zhao, W. Wang, B. Li, P. Fung, and S. C. H. Hoi, “InstructBLIP: Towards general-purpose vision- language models with instruction tuning,” CoRR, vol. abs/2305.06500,
work page internal anchor Pith review Pith/arXiv arXiv
-
[43]
InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
[Online]. Available: https://doi.org/10.48550/arXiv.2305.06500
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.06500
-
[44]
Learning transferable visual models from natural language supervision,
A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark et al., “Learning transferable visual models from natural language supervision,” in International Conference on Machine Learning . PMLR, 2021, pp. 8748–8763
work page 2021
-
[45]
F-vlm: Open-vocabulary object detection upon frozen vision and language models,
W. Kuo, Y . Cui, X. Gu, A. Piergiovanni, and A. Angelova, “F-vlm: Open-vocabulary object detection upon frozen vision and language models,” arXiv preprint arXiv:2209.15639 , 2022
-
[46]
Extract free dense labels from clip,
C. Zhou, C. C. Loy, and B. Dai, “Extract free dense labels from clip,” in European Conference on Computer Vision. Springer, 2022, pp. 696– 712
work page 2022
-
[47]
Iterative prompt learning for unsupervised backlit image enhancement,
Z. Liang, C. Li, S. Zhou, R. Feng, and C. C. Loy, “Iterative prompt learning for unsupervised backlit image enhancement,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 8094–8103
work page 2023
-
[48]
Exploring clip for assessing the look and feel of images,
J. Wang, K. C. Chan, and C. C. Loy, “Exploring clip for assessing the look and feel of images,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 37, no. 2, 2023, pp. 2555–2563
work page 2023
-
[49]
How can we know what language models know?
Z. Jiang, F. F. Xu, J. Araki, and G. Neubig, “How can we know what language models know?” Transactions of the Association for Computational Linguistics, vol. 8, pp. 423–438, 2020
work page 2020
-
[50]
Language models are few-shot learners,
T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Am...
work page 2020
-
[51]
Improving zero- shot generalization for clip with synthesized prompts,
Z. Wang, J. Liang, R. He, N. Xu, Z. Wang, and T. Tan, “Improving zero- shot generalization for clip with synthesized prompts,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2023, pp. 3032–3042
work page 2023
-
[52]
Vicuna: An open-source chatbot impressing gpt- 4 with 90%* chatgpt quality,
W.-L. Chiang, Z. Li, Z. Lin, Y . Sheng, Z. Wu, H. Zhang, L. Zheng, S. Zhuang, Y . Zhuang, J. E. Gonzalez, I. Stoica, and E. P. Xing, “Vicuna: An open-source chatbot impressing gpt- 4 with 90%* chatgpt quality,” March 2023. [Online]. Available: https://lmsys.org/blog/2023-03-30-vicuna/
work page 2023
-
[53]
An image is worth 16x16 words: Trans- formers for image recognition at scale,
A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale,” in 9th International Conference on Learning Representations , 2021
work page 2021
-
[54]
Microsoft coco: Common objects in context,
T.-Y . Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Doll ´ar, and C. L. Zitnick, “Microsoft coco: Common objects in context,” in European Conference on Computer Vision. Springer, 2014, pp. 740–755
work page 2014
-
[55]
Busternet: Detecting copy- move image forgery with source/target localization,
Y . Wu, W. Abd-Almageed, and P. Natarajan, “Busternet: Detecting copy- move image forgery with source/target localization,” in Proceedings of the European Conference on Computer Vision , 2018, pp. 168–184
work page 2018
-
[56]
Recurrent feature reasoning for image inpainting,
J. Li, N. Wang, L. Zhang, B. Du, and D. Tao, “Recurrent feature reasoning for image inpainting,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 7760–7768
work page 2020
-
[57]
Imd2020: A large-scale annotated dataset tailored for detecting manipulated images,
A. Novozamsky, B. Mahdian, and S. Saic, “Imd2020: A large-scale annotated dataset tailored for detecting manipulated images,” inProceed- ings of the IEEE/CVF Winter Conference on Applications of Computer Vision Workshops, 2020, pp. 71–80
work page 2020
-
[58]
Mfc datasets: Large- scale benchmark datasets for media forensic challenge evaluation,
H. Guan, M. Kozak, E. Robertson, Y . Lee, A. N. Yates, A. Delgado, D. Zhou, T. Kheyrkhah, J. Smith, and J. Fiscus, “Mfc datasets: Large- scale benchmark datasets for media forensic challenge evaluation,” in 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW). IEEE, 2019, pp. 63–72
work page 2019
-
[59]
Autosplice: A text-prompt manipulated image dataset for media forensics,
S. Jia, M. Huang, Z. Zhou, Y . Ju, J. Cai, and S. Lyu, “Autosplice: A text-prompt manipulated image dataset for media forensics,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 893–903
work page 2023
-
[60]
A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 4015–4026
work page 2023
-
[61]
Woodpecker: Hallucination correction for multimodal large language models,
S. Yin, C. Fu, S. Zhao, T. Xu, H. Wang, D. Sui, Y . Shen, K. Li, X. Sun, and E. Chen, “Woodpecker: Hallucination correction for multimodal large language models,” CoRR, vol. abs/2310.16045, 2023. [Online]. Available: https://doi.org/10.48550/arXiv.2310.16045
-
[62]
Localization of deep inpainting using high- pass fully convolutional network,
H. Li and J. Huang, “Localization of deep inpainting using high- pass fully convolutional network,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2019, pp. 8301–8310
work page 2019
-
[63]
Generate, segment, and refine: Towards generic manipulation segmentation,
P. Zhou, B.-C. Chen, X. Han, M. Najibi, A. Shrivastava, S.-N. Lim, and L. Davis, “Generate, segment, and refine: Towards generic manipulation segmentation,” in Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34, no. 07, 2020, pp. 13 058–13 065
work page 2020
-
[64]
Detecting image splicing using geometry invariants and camera characteristics consistency,
Y .-F. Hsu and S.-F. Chang, “Detecting image splicing using geometry invariants and camera characteristics consistency,” in2006 IEEE Interna- tional Conference on Multimedia and Expo . IEEE, 2006, pp. 549–552
work page 2006
-
[65]
Exposing digital image forgeries by illumination color classification,
T. J. De Carvalho, C. Riess, E. Angelopoulou, H. Pedrini, and A. de Rezende Rocha, “Exposing digital image forgeries by illumination color classification,” IEEE Transactions on Information Forensics and Security, vol. 8, no. 7, pp. 1182–1194, 2013
work page 2013
-
[66]
Multi-scale analysis strategies in prnu-based tampering localization,
P. Korus and J. Huang, “Multi-scale analysis strategies in prnu-based tampering localization,” IEEE Transactions on Information Forensics and Security, vol. 12, no. 4, pp. 809–824, 2016
work page 2016
-
[67]
T.-N. Le, H. H. Nguyen, J. Yamagishi, and I. Echizen, “Openforensics: Large-scale challenging dataset for multi-face forgery detection and seg- mentation in-the-wild,” in Proceedings of the IEEE/CVF International Conference on Computer Vision , 2021, pp. 10 117–10 127
work page 2021
-
[68]
Hybrid lstm and encoder–decoder architecture for de- tection of image forgeries,
J. H. Bappy, C. Simons, L. Nataraj, B. Manjunath, and A. K. Roy- Chowdhury, “Hybrid lstm and encoder–decoder architecture for de- tection of image forgeries,” IEEE Transactions on Image Processing , vol. 28, no. 7, pp. 3286–3300, 2019
work page 2019
-
[69]
Rouge: A package for automatic evaluation of summaries,
C.-Y . Lin, “Rouge: A package for automatic evaluation of summaries,” in Text Summarization Branches Out , 2004, pp. 74–81
work page 2004
-
[70]
Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection,
C. Wang, W. Zhu, B.-B. Gao, Z. Gan, J. Zhang, Z. Gu, S. Qian, M. Chen, and L. Ma, “Real-iad: A real-world multi-view dataset for benchmarking versatile industrial anomaly detection,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2024, pp. 22 883–22 892
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.