Multi-axis Analysis of Image Manipulation Localization

Anna Rohrbach; Bryan A. Plummer; Dina Bashkirova; Divya Appapogu; Giscard Biamby; Keanu Nichols

arxiv: 2605.20174 · v1 · pith:5QJ4F74Xnew · submitted 2026-05-19 · 💻 cs.CV · cs.LG

Multi-axis Analysis of Image Manipulation Localization

Keanu Nichols , Divya Appapogu , Giscard Biamby , Dina Bashkirova , Anna Rohrbach , Bryan A. Plummer This is my paper

Pith reviewed 2026-05-20 05:18 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords image manipulation detectionbenchmarkdomain shiftdiffusion inpaintinglocalizationrobustnessgenerative editing

0 comments

The pith

The AUDITS benchmark enables multi-axis testing of image manipulation detectors with over 530K diffusion-inpainted user and news photos.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces AUDITS as a large-scale benchmark built from more than 530,000 images drawn from user and news photographs and altered via diffusion-based inpainting. The dataset is organized to let researchers examine detector behavior along several axes at once, including domain shifts, image quality, manipulation type, and manipulation size. Experiments measure how well existing detection methods maintain performance when these conditions change. A reader would care because current detectors often degrade under realistic shifts, limiting their usefulness against AI-generated misinformation.

Core claim

We introduce Analysis Under Domain-shifts, qualIty, Type, and Size (AUDITS), a comprehensive benchmark designed for studying axes of analysis in image manipulation detection. AUDITS comprises over 530K images from two distinct sources (user and news photos). We curate our dataset to support analysis across multiple axes using recent diffusion-based inpaintings, spanning a diverse range of manipulation types and sizes. We conduct experiments under different types of domain shift to evaluate robustness of existing image manipulation detection methods.

What carries the argument

AUDITS benchmark, which curates diffusion-based inpaintings on user and news photos to support structured evaluation across domain shifts, quality, manipulation type, and size.

If this is right

Existing detection methods can be ranked by how their accuracy changes under controlled domain shifts using the AUDITS splits.
Performance differences can be isolated to specific manipulation sizes or types within the same benchmark.
Results from the multi-axis tests can guide the design of detectors intended to work across varied visual domains.
The benchmark supplies a common testbed that future methods can use to demonstrate improved generalization.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If detectors prove brittle on certain axes, training procedures that explicitly simulate those shifts during learning may become necessary.
The same multi-axis structure could later be applied to other generative editing techniques such as full-image synthesis or face swaps.
Public release of the benchmark may encourage standardized reporting of robustness metrics rather than single-number accuracy on narrow test sets.

Load-bearing premise

The chosen diffusion-based inpaintings on user and news photos capture enough of the variety and realism found in advanced real-world manipulations to test detector robustness under domain shifts.

What would settle it

A detector that scores high across all AUDITS axes yet fails to detect manipulations in an independent collection of real social-media or news images that were not generated by the same diffusion process.

Figures

Figures reproduced from arXiv: 2605.20174 by Anna Rohrbach, Bryan A. Plummer, Dina Bashkirova, Divya Appapogu, Giscard Biamby, Keanu Nichols.

**Figure 2.** Figure 2: Examples from our benchmark AUDITS, which contains 530,640 images from two visually [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Statistics on manipulation types in AUDITS-News and AUDITS-COCO. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Comparing the AUC/Precision/Recall performance of models trained on the AUDITS-News and [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparison of EVP and MMFusion across object categories from AUDITS-NEWS (top) [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: An example of a Adobe Firefly inpainted image from COCO described in Section 3.1.1 in the main [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗

**Figure 7.** Figure 7: An example of an image from our dataset which human evaluators were given to answer questions [PITH_FULL_IMAGE:figures/full_fig_p023_7.png] view at source ↗

**Figure 8.** Figure 8: Visualization of manipulation types across different editing techniques from in Figure 3 of the main [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗

**Figure 9.** Figure 9: Distribution of different editing techniques. [PITH_FULL_IMAGE:figures/full_fig_p024_9.png] view at source ↗

**Figure 10.** Figure 10: Visualization of manipulation sizes across different image sources from Figure 3 of the main paper. [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗

**Figure 11.** Figure 11: Evaluation of mean AUC Object Categories using MMFusion and EVP [PITH_FULL_IMAGE:figures/full_fig_p027_11.png] view at source ↗

read the original abstract

Advanced image editing software enables easy creation of highly convincing image manipulations, which has been made even more accessible in recent years due to advances in generative AI. Manipulated images, while often harmless, could spread misinformation, create false narratives, and influence people's opinions on important issues. Despite this growing threat, there is limited research on detecting advanced manipulations across different visual domains. Thus, we introduce Analysis Under Domain-shifts, qualIty, Type, and Size (AUDITS), a comprehensive benchmark designed for studying axes of analysis in image manipulation detection. AUDITS comprises over 530K images from two distinct sources (user and news photos). We curate our dataset to support analysis across multiple axes using recent diffusion-based inpaintings, spanning a diverse range of manipulation types and sizes. We conduct experiments under different types of domain shift to evaluate robustness of existing image manipulation detection methods. Our goal is to drive further research in this area by offering new insights that would help develop more reliable and generalizable image manipulation detection methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AUDITS is a new large-scale benchmark for multi-axis evaluation of image manipulation localization that could organize testing practices, but its synthetic diffusion inpaintings need checks against real manipulation artifacts.

read the letter

The main thing here is that the authors have released AUDITS, a benchmark with over 530k images from user and news photos built on diffusion inpaintings, set up for analysis across domain shifts, quality, manipulation type, and size. This gives a structured way to test how well localization detectors hold up under different conditions. The scale and the dual sources stand out compared to earlier datasets, and the plan to run experiments on domain shifts is a practical step toward more generalizable methods. It does a solid job defining the axes and curating the data to support that kind of breakdown. The field could use more resources like this to move past narrow evaluations. The soft spot is the representativeness of the inpaintings. The stress-test concern lands: without quantitative comparisons of artifact patterns, such as frequency or edge statistics, against real splicing or copy-move operations, the robustness conclusions under domain shift may not transfer. The abstract lays out the construction but leaves the actual results and any validation metrics for the full paper. If those sections show clear baselines and some artifact analysis, the work strengthens. This paper is for researchers in image forensics who build or test detectors and want a new testbed for multi-axis checks. A reader focused on evaluation frameworks or misinformation tools would find the dataset and analysis setup useful. It shows honest engagement with the evaluation gaps in the area. I would send it for peer review so referees can examine the curation details and experimental outcomes directly.

Referee Report

1 major / 1 minor

Summary. The paper introduces AUDITS, a benchmark of over 530K images from user and news photos curated via diffusion-based inpaintings. It supports multi-axis analysis of image manipulation localization detectors along domain shifts, quality, type, and size, and reports experiments evaluating robustness of existing methods under different domain shifts to drive development of more reliable detectors.

Significance. A large-scale, multi-axis benchmark could help identify failure modes in manipulation localization under realistic shifts if the synthetic artifacts are representative. The work's value lies in its external utility for the community rather than internal derivations or proofs.

major comments (1)

[Section 3] Section 3 (dataset curation): The construction of the 530K-image set relies on diffusion-based inpaintings applied to user and news photos, but provides no quantitative comparison of artifact distributions (e.g., frequency spectra, edge statistics, or semantic consistency) against real-world manipulations such as splicing or copy-move. This is load-bearing for the central claim that AUDITS enables valid robustness analysis under domain shifts; if the synthetic artifacts occupy a narrow region of the manipulation space, conclusions about detector generalizability will not transfer.

minor comments (1)

[Abstract] Abstract: While the abstract outlines the dataset and experiment plan, it contains no quantitative results, baseline comparisons, or error metrics, making it difficult for readers to immediately gauge the strength of the reported robustness findings.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript introducing the AUDITS benchmark. We address the major comment point by point below and outline the revisions we will make.

read point-by-point responses

Referee: [Section 3] Section 3 (dataset curation): The construction of the 530K-image set relies on diffusion-based inpaintings applied to user and news photos, but provides no quantitative comparison of artifact distributions (e.g., frequency spectra, edge statistics, or semantic consistency) against real-world manipulations such as splicing or copy-move. This is load-bearing for the central claim that AUDITS enables valid robustness analysis under domain shifts; if the synthetic artifacts occupy a narrow region of the manipulation space, conclusions about detector generalizability will not transfer.

Authors: We agree that explicitly demonstrating the representativeness of the diffusion-based inpainting artifacts is important for supporting the benchmark's use in robustness analysis. The manuscript describes the curation from real user and news photographs with diverse manipulation types and sizes to approximate realistic conditions, but does not include the requested quantitative comparisons. To address this, the revised version will add a dedicated analysis in Section 3 comparing frequency spectra, edge statistics, and semantic consistency metrics between the AUDITS manipulations and real-world examples of splicing and copy-move from public datasets. This addition will help substantiate that the synthetic artifacts are sufficiently broad to enable meaningful conclusions about detector generalizability under domain shifts. revision: yes

Circularity Check

0 steps flagged

No circularity: benchmark dataset and empirical evaluation are self-contained

full rationale

The paper introduces the AUDITS benchmark comprising over 530K curated images using diffusion-based inpaintings on user and news photos, then evaluates existing manipulation localization methods under domain shifts, quality, type, and size axes. No mathematical derivations, equations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text or abstract. The central claims rest on dataset curation and external detector performance rather than any internal reduction to inputs by construction, making the work a standard empirical benchmark contribution with independent value.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical benchmark introduction paper. No mathematical free parameters, axioms, or invented entities are described in the abstract.

pith-pipeline@v0.9.0 · 5721 in / 1028 out tokens · 38471 ms · 2026-05-20T05:18:06.624103+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Analysis Under Domain-shifts, qualIty, Type, and Size (AUDITS), a comprehensive benchmark... using recent diffusion-based inpaintings... evaluate robustness of existing image manipulation detection methods.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

AUDITS comprises over 530K images from two distinct sources (user and news photos)... 11 image manipulation techniques.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Accessed: 2024-09-20

URLhttps://www.adobe.com/products/firefly.html. Accessed: 2024-09-20. Omri Avrahami, Dani Lischinski, and Ohad Fried. Blended diffusion for text-driven editing of natural images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18208–18218, June

work page 2024
[2]

URLhttps://doi.org/10.1109/chinasip.2013.6625374

doi: 10.1109/chinasip.2013.6625374. URLhttps://doi.org/10.1109/chinasip.2013.6625374. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27,

work page doi:10.1109/chinasip.2013.6625374 2013
[3]

Span: Spatial pyramid attention network for image manipulation localization

Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, and Ram Nevatia. Span: Spatial pyramid attention network for image manipulation localization. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 312–328. Springer,

work page 2020
[4]

2024 , url =

doi: 10.1109/CVPR52733.2024.02135. Shan Jia, Mingzhen Huang, Zhou Zhou, Yan Ju, Jialing Cai, and Siwei Lyu. Autosplice: A text-prompt manipulated image dataset for media forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 893–903,

work page doi:10.1109/cvpr52733.2024.02135 2024
[5]

Visual news: Benchmark and challenges in news image captioning

Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. Visual news: Benchmark and challenges in news image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6761–6771. Association for Computational Linguistics, November 2021a. Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual...

work page 2021
[6]

Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi

doi: 10.23919/EUSIPCO.2019.8903181. Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. HD-painter: High-resolution and prompt-faithful text-guided image inpainting with diffusion models. InThe Thirteenth International Conference on Learning Representations,

work page doi:10.23919/eusipco.2019.8903181 2019
[7]

net/forum?id=6lB5qtdYAg

URLhttps://openreview. net/forum?id=6lB5qtdYAg. Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, and Symeon Papadopoulos. Tgif: Text-guided inpainting forgery dataset. InProc. Int. Workshop on Information Forensics and Security (WIFS) 2024,

work page 2024
[8]

Exploring multi-modal fusion for image manipulation detection and localization

Konstantinos Triaridis and Vasileios Mezaris. Exploring multi-modal fusion for image manipulation detection and localization. InProc. 30th Int. Conf. on MultiMedia Modeling (MMM 2024), Jan.-Feb

work page 2024
[9]

COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations

16 Haozhen Yan, Yan Hong, Jiahui Zhan, Yikun Ji, Jun Lan, Huijia Zhu, Weiqiang Wang, and Jianfu Zhang. Coco-inpaint: A benchmark for image inpainting detection and manipulation localization.arXiv preprint arXiv:2504.18361,

work page internal anchor Pith review Pith/arXiv arXiv
[10]

A task is worth one word: Learning with task prompts for high-quality versatile image inpainting

Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, and Kai Chen. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. In Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol (eds.),Computer Vision – ECCV 2024, pp. 195–211, Cham,

work page 2024
[11]

Additionally, we include an example of what the manually create Adobe Firefly manipulations looked like before and after and discuss the ethical considerations of our work

We include an example image of what a human annotator would have seen during the human evaluation. Additionally, we include an example of what the manually create Adobe Firefly manipulations looked like before and after and discuss the ethical considerations of our work. 6 Ethical Considerations Our work focuses on benchmarking and advancing the methods f...

work page 2023
[12]

EVP+MIRO (Cha et al., 2022). Trained on: AUDITS-News AUDITS-COCO AUDITS-COCO AUDITS-News Tested on: AUDITS-News AUDITS-COCO MT-ID MT-OOD MT-ID MT-OOD MT-ID MT-OOD MT-ID MT-OOD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD EVP 81.1 0.54 79.3 0.97 71.7 0.44 64.5 1.2 79.8 0.14 81.5 0.81 71.8 1.50 61.5 2.77 MMFusion 84.0 0.08 79.5 1.92 76.9 ...

work page 2022
[13]

The results, summarized in Table 13, show that PSCC-Net outperforms HiFi in most cases despite their similar architecture

and HiFi (Guo et al., 2023), as both models include a classification head. The results, summarized in Table 13, show that PSCC-Net outperforms HiFi in most cases despite their similar architecture. Interestingly, PSCC-Net performs particularly well when trained and tested on AUDITS-COCO, likely due to its HRNet (Wang et al.,

work page 2023
[14]

However, both models experience a significant drop in performance when tested on OOD images

backbone, which is pre-trained on ImageNet (Deng et al., 2009). However, both models experience a significant drop in performance when tested on OOD images. This highlights the importance of our work in exposing these generalization challenges. 10 Qualitative Analysis of Object Categories To further illustrate our qualitative findings, we plotted the aver...

work page 2009
[15]

model trained on data from either DEFACTO (MAHFOUDI et al., 2019), AUDITS-COCO or both and testing on classic image manipulation datasets that contain Copymove (CM) and Splicing (SP) images, namely CASIAv1, CASIAv2 (Dong et al.,

work page 2019
[16]

model trained on data from either DEFACTO (MAHFOUDI et al., 2019), AUDITS-COCO or both and testing on our AUDITS dataset to determine the diffusion based inpainting performance Tested on: AUDITS-News AUDITS-COCO MT-ID MT-OOD MT-ID MT-OOD AUC F1 AUC F1 AUC F1 AUC F1 Trained on: AUDITS+DEFACTO 72.6 50.665.044.488.8 51.3 82.1 41.8 DEFACTO 55.4 29.3 55.4 29.5...

work page 2019

[1] [1]

Accessed: 2024-09-20

URLhttps://www.adobe.com/products/firefly.html. Accessed: 2024-09-20. Omri Avrahami, Dani Lischinski, and Ohad Fried. Blended diffusion for text-driven editing of natural images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pp. 18208–18218, June

work page 2024

[2] [2]

URLhttps://doi.org/10.1109/chinasip.2013.6625374

doi: 10.1109/chinasip.2013.6625374. URLhttps://doi.org/10.1109/chinasip.2013.6625374. Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets.Advances in neural information processing systems, 27,

work page doi:10.1109/chinasip.2013.6625374 2013

[3] [3]

Span: Spatial pyramid attention network for image manipulation localization

Xuefeng Hu, Zhihan Zhang, Zhenye Jiang, Syomantak Chaudhuri, Zhenheng Yang, and Ram Nevatia. Span: Spatial pyramid attention network for image manipulation localization. InComputer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XXI 16, pp. 312–328. Springer,

work page 2020

[4] [4]

2024 , url =

doi: 10.1109/CVPR52733.2024.02135. Shan Jia, Mingzhen Huang, Zhou Zhou, Yan Ju, Jialing Cai, and Siwei Lyu. Autosplice: A text-prompt manipulated image dataset for media forensics. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pp. 893–903,

work page doi:10.1109/cvpr52733.2024.02135 2024

[5] [5]

Visual news: Benchmark and challenges in news image captioning

Fuxiao Liu, Yinghan Wang, Tianlu Wang, and Vicente Ordonez. Visual news: Benchmark and challenges in news image captioning. InProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, pp. 6761–6771. Association for Computational Linguistics, November 2021a. Weihuang Liu, Xi Shen, Chi-Man Pun, and Xiaodong Cun. Explicit visual...

work page 2021

[6] [6]

Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi

doi: 10.23919/EUSIPCO.2019.8903181. Hayk Manukyan, Andranik Sargsyan, Barsegh Atanyan, Zhangyang Wang, Shant Navasardyan, and Humphrey Shi. HD-painter: High-resolution and prompt-faithful text-guided image inpainting with diffusion models. InThe Thirteenth International Conference on Learning Representations,

work page doi:10.23919/eusipco.2019.8903181 2019

[7] [7]

net/forum?id=6lB5qtdYAg

URLhttps://openreview. net/forum?id=6lB5qtdYAg. Hannes Mareen, Dimitrios Karageorgiou, Glenn Van Wallendael, Peter Lambert, and Symeon Papadopoulos. Tgif: Text-guided inpainting forgery dataset. InProc. Int. Workshop on Information Forensics and Security (WIFS) 2024,

work page 2024

[8] [8]

Exploring multi-modal fusion for image manipulation detection and localization

Konstantinos Triaridis and Vasileios Mezaris. Exploring multi-modal fusion for image manipulation detection and localization. InProc. 30th Int. Conf. on MultiMedia Modeling (MMM 2024), Jan.-Feb

work page 2024

[9] [9]

COCO-Inpaint: A Benchmark for Detecting and Localizing Inpainting-Based Image Manipulations

16 Haozhen Yan, Yan Hong, Jiahui Zhan, Yikun Ji, Jun Lan, Huijia Zhu, Weiqiang Wang, and Jianfu Zhang. Coco-inpaint: A benchmark for image inpainting detection and manipulation localization.arXiv preprint arXiv:2504.18361,

work page internal anchor Pith review Pith/arXiv arXiv

[10] [10]

A task is worth one word: Learning with task prompts for high-quality versatile image inpainting

Junhao Zhuang, Yanhong Zeng, Wenran Liu, Chun Yuan, and Kai Chen. A task is worth one word: Learning with task prompts for high-quality versatile image inpainting. In Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol (eds.),Computer Vision – ECCV 2024, pp. 195–211, Cham,

work page 2024

[11] [11]

Additionally, we include an example of what the manually create Adobe Firefly manipulations looked like before and after and discuss the ethical considerations of our work

We include an example image of what a human annotator would have seen during the human evaluation. Additionally, we include an example of what the manually create Adobe Firefly manipulations looked like before and after and discuss the ethical considerations of our work. 6 Ethical Considerations Our work focuses on benchmarking and advancing the methods f...

work page 2023

[12] [12]

EVP+MIRO (Cha et al., 2022). Trained on: AUDITS-News AUDITS-COCO AUDITS-COCO AUDITS-News Tested on: AUDITS-News AUDITS-COCO MT-ID MT-OOD MT-ID MT-OOD MT-ID MT-OOD MT-ID MT-OOD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD Mean SD EVP 81.1 0.54 79.3 0.97 71.7 0.44 64.5 1.2 79.8 0.14 81.5 0.81 71.8 1.50 61.5 2.77 MMFusion 84.0 0.08 79.5 1.92 76.9 ...

work page 2022

[13] [13]

The results, summarized in Table 13, show that PSCC-Net outperforms HiFi in most cases despite their similar architecture

and HiFi (Guo et al., 2023), as both models include a classification head. The results, summarized in Table 13, show that PSCC-Net outperforms HiFi in most cases despite their similar architecture. Interestingly, PSCC-Net performs particularly well when trained and tested on AUDITS-COCO, likely due to its HRNet (Wang et al.,

work page 2023

[14] [14]

However, both models experience a significant drop in performance when tested on OOD images

backbone, which is pre-trained on ImageNet (Deng et al., 2009). However, both models experience a significant drop in performance when tested on OOD images. This highlights the importance of our work in exposing these generalization challenges. 10 Qualitative Analysis of Object Categories To further illustrate our qualitative findings, we plotted the aver...

work page 2009

[15] [15]

model trained on data from either DEFACTO (MAHFOUDI et al., 2019), AUDITS-COCO or both and testing on classic image manipulation datasets that contain Copymove (CM) and Splicing (SP) images, namely CASIAv1, CASIAv2 (Dong et al.,

work page 2019

[16] [16]

model trained on data from either DEFACTO (MAHFOUDI et al., 2019), AUDITS-COCO or both and testing on our AUDITS dataset to determine the diffusion based inpainting performance Tested on: AUDITS-News AUDITS-COCO MT-ID MT-OOD MT-ID MT-OOD AUC F1 AUC F1 AUC F1 AUC F1 Trained on: AUDITS+DEFACTO 72.6 50.665.044.488.8 51.3 82.1 41.8 DEFACTO 55.4 29.3 55.4 29.5...

work page 2019