pith. machine review for the scientific record. sign in

arxiv: 2604.10522 · v1 · submitted 2026-04-12 · 💻 cs.CR

Recognition: unknown

SEED: A Large-Scale Benchmark for Provenance Tracing in Sequential Deepfake Facial Edits

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:07 UTC · model grok-4.3

classification 💻 cs.CR
keywords sequential deepfakeprovenance tracingfrequency analysiswavelet componentsbenchmark datasetforgery detectiondiffusion editingedit ordering
0
0 comments X

The pith

A new benchmark of sequentially edited deepfakes shows that high-frequency wavelet signals can reveal the order of multiple latent manipulations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper builds SEED, a collection of more than 90,000 facial images produced by applying one to four sequential attribute edits through diffusion-based pipelines, along with detailed labels for edit order, textual prompts, masks, and source models. Standard spatial-only detection methods lose effectiveness once artifacts from successive edits begin to overlap and distribute across the image. The authors therefore present FAITH, a transformer architecture that combines spatial features with frequency-domain information, especially wavelet high-frequency components, to recover both the presence and the sequence of edits. This combination remains useful even when the final images have undergone degradation. Recovering edit histories matters for tracing the origin of AI-generated faces in forensic, moderation, and misinformation contexts where single-step assumptions no longer hold.

Core claim

Sequential deepfake edits create cumulative, distributed artifacts that spatial methods alone cannot reliably disentangle; SEED supplies the annotated data needed to study this problem at scale, while FAITH shows that frequency-aware aggregation, particularly of high-frequency wavelet signals, supplies effective cues for identifying and ordering the latent editing events.

What carries the argument

The SEED benchmark dataset of sequentially edited images with fine-grained provenance metadata, together with the FAITH frequency-aware Transformer that fuses spatial and frequency-domain cues to trace edit order.

Load-bearing premise

The synthetic one-to-four-step diffusion edits used to build the dataset produce the same cumulative artifacts, degradation patterns, and editing distributions that appear in real-world multi-step deepfake pipelines.

What would settle it

Run FAITH on a separate collection of real-world facial images that carry documented multi-step deepfake histories and measure whether the predicted edit order matches the known sequence.

Figures

Figures reproduced from arXiv: 2604.10522 by Mengieong Hoi, Ping Liu, Wei Liu, Zhedong Zheng.

Figure 1
Figure 1. Figure 1: Overview of SEED and supported provenance analysis tasks. SEED targets diffusion-based sequential facial editing and provides supervision for three complementary tasks: Authenticity Analysis, which distinguishes real images from sequentially edited ones; Editing Trace Analysis, which characterizes the edited attributes and their sequential patterns; and Spatial Evidence Analysis, which analyzes manipulated… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the SEED dataset construction pipeline. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sequential edit examples and edit order. Each row shows a multi-step editing trajectory in SEED [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: SEED dataset distributions. (a) Distribution of sequence length. (b) Distribution of diffusion editing models. (c) Distribution of manipulated attribute categories. 3.2 Dataset Distributions [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Overview of the proposed FAITH architecture. The model explicitly integrates high-frequency domain information extracted by Discrete Wavelet Transform (DWT) and spatial-domain features via a Transformer-based encoder-decoder structure, significantly enhancing sequential facial editing detection accuracy. and realistic diffusion-based edits [27], where spatial cues are weak and accu￾mulated artifacts are di… view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative robustness examples under Compress-50% and Noise intensity-15%. Tokens are replaced with semantic attributes. EOS denotes early stopping. Correct tokens are highlighted in light green and incorrect tokens in light red. Ground truth is shown in olive-green text. 5.3 Robustness Study In realistic image transmission and storage, post-processing operations can distort provenance cues. We therefore … view at source ↗
read the original abstract

Deepfake content on social networks is increasingly produced through multiple \emph{sequential} edits to biometric data such as facial imagery. Consequently, the final appearance of an image often reflects a latent chain of operations rather than a single manipulation. Recovering these editing histories is essential for visual provenance analysis, misinformation auditing, and forensic or platform moderation workflows that must trace the origin and evolution of AI-generated media. However, existing datasets predominantly focus on single-step editing and overlook the cumulative artifacts introduced by realistic multi-step pipelines. To address this gap, we introduce Sequential Editing in Diffusion (\textbf{SEED}), a large-scale benchmark for sequential provenance tracing in facial imagery. SEED contains over 90K images constructed via one to four sequential attribute edits using diffusion-based editing pipelines, with fine-grained annotations including edit order, textual instructions, manipulation masks, and generation models. These metadata enable step-wise evidence analysis and support forgery detection, sequence prediction. To benchmark the challenges posed by SEED, we evaluate representative analysis strategies and observe that spatial-only approaches struggle under subtle and distributed diffusion artifacts, especially when such artifacts accumulate across multiple edits. Motivated by this observation, we further establish \textbf{FAITH}, a frequency-aware Transformer baseline that aggregates spatial and frequency-domain cues to identify and order latent editing events. Results show that high-frequency signals, particularly wavelet components, provide effective cues even under image degradation. Overall, SEED facilitates systematic study of sequential provenance tracing and evidence aggregation for trustworthy analysis of AI-generated visual content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces SEED, a large-scale benchmark of over 90K facial images generated via 1-4 sequential diffusion-based attribute edits, annotated with edit order, textual instructions, manipulation masks, and generation models. It reports that spatial-only methods struggle with accumulated subtle diffusion artifacts and proposes FAITH, a frequency-aware Transformer baseline that aggregates spatial and frequency-domain (wavelet) cues to identify and order latent editing events, claiming that high-frequency signals provide effective cues even under image degradation.

Significance. If the results hold, SEED supplies a valuable, richly annotated resource for systematic study of multi-step deepfake provenance, addressing a clear gap in forensics and misinformation auditing. The frequency-cue observation in FAITH offers a concrete direction for more robust detectors. The work is strengthened by its emphasis on cumulative pipeline effects and the metadata enabling step-wise evidence analysis.

major comments (2)
  1. [Abstract] Abstract: the claim that 'high-frequency signals, particularly wavelet components, provide effective cues even under image degradation' is load-bearing for motivating FAITH, yet the abstract supplies no quantitative metrics, ablation results, dataset splits, or statistical tests to support the improvement over spatial-only baselines.
  2. [Results / Evaluation] Results / Evaluation: the reported advantage of wavelet cues in FAITH is demonstrated exclusively on synthetic 1-4 step diffusion edits; without cross-evaluation on real-world sequential deepfakes (commercial tools, user chains, or post-processed images), the claim that frequency cues help 'even under image degradation' risks being benchmark-specific rather than general.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by including one or two key numerical results (e.g., accuracy or F1 gains for FAITH) to summarize the empirical findings.
  2. [Dataset construction] Clarify the exact image counts and edit-step distribution (1-step vs. 2-step, etc.) in the dataset construction section to aid reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment of SEED's value and for the constructive major comments. We respond to each point below, indicating planned revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that 'high-frequency signals, particularly wavelet components, provide effective cues even under image degradation' is load-bearing for motivating FAITH, yet the abstract supplies no quantitative metrics, ablation results, dataset splits, or statistical tests to support the improvement over spatial-only baselines.

    Authors: We agree that the abstract should supply concrete support for this central claim. In the revised version we will expand the abstract to report the key quantitative gains of FAITH (e.g., accuracy and ordering metrics) relative to spatial-only baselines, together with a brief note on the degradation experiments, while preserving conciseness. revision: yes

  2. Referee: [Results / Evaluation] Results / Evaluation: the reported advantage of wavelet cues in FAITH is demonstrated exclusively on synthetic 1-4 step diffusion edits; without cross-evaluation on real-world sequential deepfakes (commercial tools, user chains, or post-processed images), the claim that frequency cues help 'even under image degradation' risks being benchmark-specific rather than general.

    Authors: We acknowledge the limitation on generalizability. SEED is intentionally constructed as a controlled, richly annotated synthetic benchmark to isolate cumulative diffusion artifacts; no comparably annotated real-world sequential deepfake corpus currently exists. We will revise the manuscript to state this scope explicitly, add a dedicated limitations paragraph, and outline directions for future real-world validation. The frequency-cue findings remain valid within the controlled setting where degradation is produced by successive edits and post-processing. revision: partial

Circularity Check

0 steps flagged

No significant circularity: empirical benchmark and baseline evaluation only

full rationale

The paper constructs the SEED dataset from diffusion-based sequential edits and evaluates existing strategies plus a new frequency-aware Transformer baseline (FAITH) on it. No equations, parameter fits, derivations, or self-citation chains are present in the abstract or described methodology. All results are direct empirical measurements on the synthetic data; the central claim about wavelet cues is an observation on that data rather than a reduction to prior inputs by construction. This matches the default expectation for a benchmark paper.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the construction of the SEED dataset and the empirical evaluation of FAITH; no additional free parameters, axioms, or invented entities beyond standard machine-learning components are specified.

pith-pipeline@v0.9.0 · 5580 in / 1224 out tokens · 60036 ms · 2026-05-10T16:07:03.042979+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 26 canonical work pages · 1 internal anchor

  1. [1]

    Ba, Z., Liu, Q., Liu, Z., Wu, S., Lin, F., Lu, L., Ren, K.: Exposing the deception: Uncovering more forgery clues for deepfake detection. Proceedings of the AAAI Conference on Artificial Intelligence38(2), 719–728 (2024).https://doi.org/ 10.1609/aaai.v38i2.27829, https://ojs.aaai.org/index.php/AAAI/article/ view/27829

  2. [2]

    arXiv preprint arXiv:2407.20020 (2024),https://arxiv.org/pdf/ 2407.20020?

    Boychev, D., Cholakov, R.: Imaginet: A multi-content benchmark for synthetic image detection. arXiv preprint arXiv:2407.20020 (2024),https://arxiv.org/pdf/ 2407.20020?

  3. [3]

    In: Proceedings of the 41st International Conference on Machine Learning

    Chen, B., Zeng, J., Yang, J., Yang, R.: DRCT: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. In: Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 235, pp. 7621–7639. PMLR (2024),https://proceedings. mlr.press/v235/chen24ay.html

  4. [4]

    Diffusionface: Towards a comprehensive dataset for diffusion-based face forgery analysis,

    Chen, Z., Sun, K., Zhou, Z., Lin, X., Sun, X., Cao, L., Ji, R.: Diffusionface: Towards a comprehensive dataset for diffusion-based face forgery analysis. arXiv preprint arXiv:2403.18471 (2024)

  5. [5]

    In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

    Cocchi, F., Cornia, M., Baraldi, L., Nicolosi, A., Cucchiara, R.: Contrasting deep- fakes diffusion via contrastive learning and global-local similarities. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vi- sion - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings,...

  6. [6]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR) Workshops

    Corvi, R., Cozzolino, D., Poggi, G., Nagano, K., Verdoliva, L.: Intriguing properties of synthetic images: From generative adversarial networks to diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR) Workshops. pp. 973–982 (2023),https://openaccess.thecvf.com/ content/CVPR2023W/WMF/html/Corvi_Int...

  7. [7]

    In: Proceedings of the 41st International Conference on Machine Learning (2024)

    Esser, P., Kulal, S., Blattmann, A., Entezari, R., Müller, J., Saini, H., Levi, Y., Lorenz, D., Sauer, A., Boesel, F., Podell, D., Dockhorn, T., English, Z., Lacey, K., Goodwin, A., Marek, Y., Rombach, R.: Scaling rectified flow transformers for high-resolution image synthesis. In: Proceedings of the 41st International Conference on Machine Learning (2024)

  8. [8]

    In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T

    Gu, Z., Yao, T., Chen, Y., Ding, S., Ma, L.: Hierarchical contrastive inconsistency learning for deepfake video detection. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XII. Lecture Notes in Computer Science, vol...

  9. [9]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR)

    He, Y., Gan, B., Chen, S., Zhou, Y., Yin, G., Song, L., Sheng, L., Shao, J., Liu, Z.: Forgerynet: A versatile benchmark for comprehensive forgery analysis. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recog- nition (CVPR). pp. 4360–4369 (2021),https://openaccess.thecvf.com/content/ CVPR2021/html/He_ForgeryNet_A_Versatile_Bench...

  10. [10]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Jiang, L., Li, R., Wu, W., Qian, C., Loy, C.C.: Deeperforensics-1.0: A large- scale dataset for real-world face forgery detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 2889–2898 (2020),https://openaccess.thecvf.com/content_CVPR_2020/html/ Jiang_DeeperForensics- 1.0_A_Large- Scale_Dataset_for_Rea...

  11. [11]

    & Aila, T

    Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 4396–4405 (2019).https://doi.org/ 10.1109/CVPR.2019.00453

  12. [12]

    Le, T.N., Nguyen, H.H., Yamagishi, J., Echizen, I.: Openforensics: Large-scale chal- lengingdatasetformulti-faceforgerydetectionandsegmentationin-the-wild.In:Pro- ceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 10117–10127 (2021),https://openaccess.thecvf.com/content/ICCV2021/ html/Le_OpenForensics_Large-Scale_Challenging_...

  13. [13]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

    Lee, C.H., Liu, Z., Wu, L., Luo, P.: Maskgan: Towards diverse and interactive facial image manipulation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2020)

  14. [14]

    In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

    Li, D., Zhu, J., Fu, X., Guo, X., Liu, Y., Yang, G., Liu, J., Zha, Z.: Noise-assisted prompt learning for image forgery detection and localization. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part XI. ...

  15. [15]

    In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

    Li, Y., Yang, X., Sun, P., Qi, H., Lyu, S.: Celeb-df: A large-scale challenging dataset for deepfake forensics. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3207–3216 (2020)

  16. [16]

    In: Computer Vision – ECCV 2022 Workshops, pp

    Lin, Y., Song, W., Li, B., Li, Y., Ni, J., Chen, H., Li, Q.: Fake it till you make it: Curricular dynamic forgery augmentations towards general deepfake detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Procee...

  17. [18]

    Advances in neural information processing systems36, 34892–34916 (2023)

    Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023)

  18. [19]

    Liu, P., Tao, Q., Zhou, J.T.: Evolving from single-modal to multi-modal facial deepfake detection: Progress and challenges (2024).https://doi.org/10.48550/ arXiv.2406.06965,https://arxiv.org/abs/2406.06965

  19. [20]

    Lu, Z., Huang, D., Bai, L., Qu, J., Wu, C., Liu, X., Ouyang, W.: Seeing is not always believing: Benchmarking human and model perception of ai-generated images (2023)

  20. [21]

    Mahara, A., Rishe, N.: Methods and trends in detecting generated images: A comprehensive review (2025).https://doi.org/10.48550/arXiv.2502.15176 Title Suppressed Due to Excessive Length 17

  21. [22]

    In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR)

    Narayan, K., Agarwal, H., Thakral, K., Mittal, S., Vatsa, M., Singh, R.: Df-platter: Multi-face heterogeneous deepfake dataset. In: Proceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition (CVPR). pp. 9739–9748 (2023), https://openaccess.thecvf.com/content/CVPR2023/html/Narayan_ DF-Platter_Multi- Face_Heterogeneous_Deepfake_Data...

  22. [23]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Nguyen, D., Mejri, N., Singh, I.P., Kuleshova, P., Astrid, M., Kacem, A., Ghorbel, E., Aouada, D.: Laa-net: Localized artifact attention network for quality-agnostic and generalizable deepfake detection. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 17395–17405 (2024),https://openaccess.thecvf.com/conten...

  23. [24]

    arXiv preprint arXiv:2411.17911 , year=

    Nguyen-Le, H.H., Tran, V.T., Nguyen, D.T., Le-Khac, N.A.: Passive deepfake detection across multi-modalities: A comprehensive survey (2024).https://doi. org/10.48550/arXiv.2411.17911

  24. [25]

    Podell, D., English, Z., Lacey, K., Blattmann, A., Dockhorn, T., Müller, J., Penna, J., Rombach, R.: Sdxl: Improving latent diffusion models for high-resolution image synthesis (2023),https://arxiv.org/abs/2307.01952

  25. [27]

    arXiv preprint arXiv:2210.14571 (2022)

    Ricker, J., Damm, S., Holz, T., Fischer, A.: Towards the detection of diffusion model deepfakes. arXiv preprint arXiv:2210.14571 (2022)

  26. [28]

    In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 10684–10695 (2022), https://openaccess.thecvf.com/content/CVPR2022/html/Rombach_ High - Resolution _ Image _ Synthesis _ With _ Latent ...

  27. [29]

    In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)

    Rössler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., Niessner, M.: Face- forensics++: Learning to detect manipulated facial images. In: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV). pp. 1– 11 (2019), https://openaccess.thecvf.com/content_ICCV_2019/html/Rossler_ FaceForensics_Learning_to_Detect_Manipulated_Faci...

  28. [30]

    In: Computer Vision – ECCV 2022

    Shao, R., Wu,T., Liu, Z.: Detecting and recovering sequentialdeepfake manipulation. In: Computer Vision – ECCV 2022. Lecture Notes in Computer Science, vol. 13673, pp. 712–728. Springer (2022).https://doi.org/10.1007/978-3-031-19778-9_ 41, https://www.ecva.net/papers/eccv_2022/papers_ECCV/papers/136730710. pdf

  29. [31]

    International Journal of Computer Vision133(6), 3278–3295 (2025)

    Shao, R., Wu, T., Liu, Z.: Robust sequential deepfake detection. International Journal of Computer Vision133(6), 3278–3295 (2025)

  30. [32]

    In: Proceedings of the 31st ACM International Conference on Multimedia (MM 18 Mengieong Hoi, Zhedong Zheng, Ping LiuB, and Wei LiuB ’23)

    Shuai, C., Zhong, J., Wu, S., Lin, F., Wang, Z., Ba, Z., Liu, Z., Cavallaro, L., Ren, K.: Locate and verify: A two-stream network for improved deepfake detection. In: Proceedings of the 31st ACM International Conference on Multimedia (MM 18 Mengieong Hoi, Zhedong Zheng, Ping LiuB, and Wei LiuB ’23). pp. 7131–7142 (2023).https://doi.org/10.1145/3581783.361...

  31. [33]

    48550/arXiv.2309.02218

    Song, H., Huang, S., Dong, Y., Tu, W.W.: Robustness and generalizability of deepfake detection: A study with diffusion models (2023).https://doi.org/10. 48550/arXiv.2309.02218

  32. [34]

    In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T

    Sun, K., Liu, H., Yao, T., Sun, X., Chen, S., Ding, S., Ji, R.: An information theoretic approach for attention-driven face forgery detection. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Proceedings, Part XIV. Lecture Notes in C...

  33. [35]

    Proceedings of the AAAI Conference on Artificial Intelligence38(5), 5052–5060 (2024)

    Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning. Proceedings of the AAAI Conference on Artificial Intelligence38(5), 5052–5060 (2024). https://doi.org/10.1609/aaai.v38i5.28310 , https://ojs.aaai.org/ index.php/AAAI/article/view/28310

  34. [36]

    In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

    Tan, C., Zhao, Y., Wei, S., Gu, G., Liu, P., Wei, Y.: Rethinking the up-sampling oper- ations in cnn-based generative network for generalizable deepfake detection. In: Pro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 28130–28139 (2024),https://openaccess.thecvf.com/content/ CVPR2024/html/Tan_Rethinking_the_Up...

  35. [37]

    In: Proceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV)

    Tantaru, D.C., Oneata, E., Oneata, D.: Weakly-supervised deepfake localization in diffusion-generated images. In: Proceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV). pp. 6258–6268 (2024), https://openaccess.thecvf.com/content/WACV2024/html/Tantaru_Weakly- Supervised_Deepfake_Localization_in_Diffusion- Generated_Image...

  36. [39]

    https://doi.org/10.48550/arXiv.2307.00522, https://arxiv

    Tsaban, L., Passos, A.: Ledits: Real image editing with ddpm inversion and semantic guidance (2023). https://doi.org/10.48550/arXiv.2307.00522, https://arxiv. org/abs/2307.00522

  37. [40]

    ACM Computing Surveys (2024)

    Wang, T., Liao, X., Chow, K.P., Lin, X., Wang, Y.: Deepfake detection: A compre- hensive survey from the reliability perspective. ACM Computing Surveys (2024). https://doi.org/10.1145/3699710

  38. [41]

    In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV)

    Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: Dire for diffusion-generated image detection. In: Proceedings of the IEEE/CVF Interna- tional Conference on Computer Vision (ICCV). pp. 22445–22455 (2023),https: //openaccess.thecvf.com/content/ICCV2023/html/Wang_DIRE_for_Diffusion- Generated_Image_Detection_ICCV_2023_paper.html

  39. [42]

    Wang, Z.J., Montoya, E., Munechika, D., Yang, H., Hoover, B., Chau, D.H.: Diffu- siondb: A large-scale prompt gallery dataset for text-to-image generative models (2022),https://arxiv.org/abs/2210.14896 Title Suppressed Due to Excessive Length 19

  40. [43]

    Information Fusion (2025)

    Xie, S., Qiao, T., Li, S., Zhang, X., Zhou, J., Feng, G.: Deepfake detection in the aigc era: A survey, benchmarks, and future perspectives. Information Fusion (2025). https://doi.org/10.1016/j.inffus.2025.103740

  41. [44]

    Learning natural consistency representation for face forgery video detection

    Zhang, D., Xiao, Z., Li, S., Lin, F., Li, J., Ge, S.: Learning natural consistency representation for face forgery video detection. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXIII. Lecture Not...

  42. [45]

    (eds.) Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXVIII

    Zhang,Y.,Colman,B.,Guo,X.,Shahriyari,A.,Bharaj,G.:Commonsensereasoning fordeepfakedetection.In:Leonardis,A.,Ricci,E.,Roth,S.,Russakovsky,O.,Sattler, T., Varol, G. (eds.) Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXVIII. Lecture Notes in Computer Science, vol. 15146, pp. 399–415....

  43. [46]

    In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G

    Zhang, Z., Li, M., Li, X., Chang, M., Hsieh, J.: Image manipulation detection with implicit neural representation and limited supervision. In: Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G. (eds.) Computer Vision - ECCV 2024 - 18th European Conference, Milan, Italy, September 29-October 4, 2024, Proceedings, Part LXXXVIII. Lec...

  44. [47]

    In: Advances in Neural Information Processing Systems (2024),https://proceedings.neurips

    Zhao, H., Ma, X., Chen, L., Si, S., Wu, R., An, K., Yu, P., Zhang, M., Li, Q., Chang, B.: Ultraedit: Instruction-based fine-grained image editing at scale. In: Advances in Neural Information Processing Systems (2024),https://proceedings.neurips. cc/paper_files/paper/2024/file/05a30a0fc9e6bacdd3abd4ca8508a9e6-Paper- Datasets_and_Benchmarks_Track.pdf

  45. [48]

    Advances in neural information processing systems36, 77771–77782 (2023)

    Zhu, M., Chen, H., Yan, Q., Huang, X., Lin, G., Li, W., Tu, Z., Hu, H., Hu, J., Wang, Y.: Genimage: A million-scale benchmark for detecting ai-generated image. Advances in neural information processing systems36, 77771–77782 (2023)

  46. [49]

    Cartesian vs. Radial – A Comparative Evaluation of Two Visualization Tools

    Zhuang, W., Chu, Q., Tan, Z., Liu, Q., Yuan, H., Miao, C., Luo, Z., Yu, N.: Uia-vit: Unsupervised inconsistency-aware method based on vision transformer for face forgery detection. In: Avidan, S., Brostow, G.J., Cissé, M., Farinella, G.M., Hassner, T. (eds.) Computer Vision - ECCV 2022 - 17th European Conference, Tel Aviv, Israel, October 23-27, 2022, Pro...