pith. sign in

arxiv: 2604.12781 · v1 · submitted 2026-04-14 · 💻 cs.CV

Fragile Reconstruction: Adversarial Vulnerability of Reconstruction-Based Detectors for Diffusion-Generated Images

Pith reviewed 2026-05-10 14:46 UTC · model grok-4.3

classification 💻 cs.CV
keywords adversarial robustnessdiffusion modelsimage detectionreconstruction-based detectorsAI-generated imagestransfer attacks
0
0 comments X

The pith

Imperceptible adversarial perturbations can collapse the accuracy of reconstruction-based detectors for diffusion-generated images to near zero.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that detectors relying on image reconstruction to spot AI-generated content from diffusion models are vulnerable to small, hard-to-detect changes in the input. These changes cause the classifiers to misclassify generated images as real, or vice versa, with accuracy dropping sharply. The attacks work in white-box settings where the detector is known and transfer to other detectors, allowing black-box use. Common defense techniques offer little help because the perturbed inputs create low signal-to-noise conditions that the detectors cannot handle well.

Core claim

Reconstruction-based detectors for diffusion-generated images exhibit severe security vulnerabilities: adding imperceptible adversarial perturbations to input images causes detection accuracy to collapse to near zero across three representative detectors and four generative backbones. The attacks succeed in white-box scenarios, transfer between detectors to enable black-box attacks, and resist standard countermeasures, which the authors link to the low signal-to-noise ratio of the attacked samples as seen by the detectors.

What carries the argument

Adversarial perturbation crafting that targets the reconstruction step and exploits the resulting low signal-to-noise ratio perceived by the detector.

If this is right

  • All evaluated detectors lose effectiveness under white-box adversarial attacks.
  • Attacks transfer across detectors, enabling construction in black-box settings.
  • Standard adversarial defenses give only limited protection.
  • The low signal-to-noise ratio of attacked samples explains why current detectors fail.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Detection approaches may need to incorporate explicit robustness testing against small input changes rather than relying solely on reconstruction quality.
  • If the vulnerability stems from the reconstruction mechanism itself, similar issues could appear in other generative-image detectors that use comparable pipelines.

Load-bearing premise

The three tested detectors stand in for the wider set of reconstruction-based methods and the perturbations stay imperceptible and practical in real conditions.

What would settle it

Measure whether detection accuracy remains near zero when the same attack method is applied to a new reconstruction-based detector trained on different data or architectures.

Figures

Figures reproduced from arXiv: 2604.12781 by Haoyang Jiang, Ju Fan, Junxian Cai, Mingyang Yi, Qingbin Liu, Shaolei Zhang, Xi Chen.

Figure 1
Figure 1. Figure 1: Adversarial attack on reconstruction-based detectors. Left: The de￾tector correctly classifies real images (top row) and generated images (bot￾tom row). Middle: Imperceptible adversarial perturbations (×15 magnified). Right: Adversarial samples obtained by adding the perturbations, the detec￾tor’s predictions are completely reversed. Abstract. Recently, detecting AI-generated images produced by diffusion￾b… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the adversarial attack pipeline. Perturbations are optimized by backpropagating gradients through the coupled reconstruction and detection modules, treating the entire pipeline as a differentiable function. 4.1 Threat Model and Attack Formulation To evaluate worst-case robustness, we adopt a strict white-box threat model [3], assuming the adversary possesses complete knowledge of the target det… view at source ↗
Figure 3
Figure 3. Figure 3: Detection performance under white-box attack. Blue lines indicate baseline accuracy on benign samples; red lines denote robust accuracy on adversarial examples. starting from a randomized initialization δ (0), where Π denotes the projection operator onto the ℓ∞ ε-ball, α is the step size, and t denotes the iteration step. 4.3 Experimental Results To evaluate detectors, we construct four datasets, each cont… view at source ↗
Figure 4
Figure 4. Figure 4: Adversarial transferability threat models. [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Cross-generator transfer attack results. Rows indicate the generator used to train the surrogate detector (source); columns indicate the generator used to train the target detector. and adversarial perturbations inevitably inherit this generalization capability, allowing the attacks to transfer effectively. 5.2 Cross-Method Transferability: Unknown Detection Method Next, we investigate cross-method transfe… view at source ↗
Figure 6
Figure 6. Figure 6: Cross-method transfer attack results. Heatmaps show post-attack accuracy (%) where the source method (rows) attacks the target method (columns). As illustrated in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Impact of Diffusion Purification on defense robustness. The plots illustrate de￾tection accuracy (%) across varying purification strengths t for four generative sources. 6 Defense Given the identified vulnerabilities, we investigate whether reconstruction-based detectors can be fortified via standard defense paradigms. We analyze two rep￾resentative techniques: Diffusion Purification [23] and Adversarial T… view at source ↗
Figure 8
Figure 8. Figure 8: Analysis of the relative perturbation variation ρ(x). (a) and (b) illustrate the Proportion distributions of ρ for real and fake images evaluated on FLUX. (c) compares the mean ρ across different generative models. the latent representations of LaRE2 are explicitly decoded into pixel space, per￾fectly aligning the evaluation domains. From these results, we derive two key observations: 1. A constrained rela… view at source ↗
Figure 9
Figure 9. Figure 9: Detection performance under benign conditions. Each heatmap visualizes the accuracy (%) when classifiers trained on one generator (rows) are evaluated on images from another (columns). FLUX, VQDM). Performance is reported on a separate, held-out evaluation set comprising 10,000 images (5,000 real, 5,000 generated) per generator. All generated images utilize prompts derived from ImageNet class labels to mai… view at source ↗
Figure 10
Figure 10. Figure 10: Ablation of attack hyperparameters on DIRE. (a) ADM (b) FLUX (c) SDv1.5 (d) VQDM [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Ablation of attack hyperparameters on LaRE2 . LaRE2 shows inconsistent robustness across datasets; while it offers transient resistance on ADM and FLUX at lower budgets, it suffers rapid degradation on SDv1.5 and VQDM, eventually collapsing to near 0% across all benchmarks as optimization proceeds. Finally, DIRE proves to be the most fragile, suffering a complete and instantaneous collapse upon attack ini… view at source ↗
Figure 12
Figure 12. Figure 12: Ablation of attack hyperparameters on AEROBLADE. (a) ADM (b) FLUX (c) SDv1.5 (d) VQDM [PITH_FULL_IMAGE:figures/full_fig_p021_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Detection accuracy under random Gaussian noise. targeted disruption of a shared mechanism is precisely what underpins the high transferability and potency of our attack. Discussion. Collectively, these extended evaluations highlight a fundamental dis￾tinction between arbitrary signal corruption and targeted adversarial exploita￾tion. The hyperparameter ablations reveal that while advanced architectures li… view at source ↗
Figure 14
Figure 14. Figure 14: Fully black-box transfer attack results. Each subplot shows transferability from one detection method (rows: surrogate) to another (columns: target), with gen￾erators varying across cells. Values denote post-attack accuracy (%); diagonal entries share the same generator (cross-method only), off-diagonal entries represent the fully black-box setting. due to a symmetric flipping of labels, but rather a dete… view at source ↗
Figure 15
Figure 15. Figure 15: The fraction of pure random noise images predicted as “Real” by various classifiers. confirming that the adversarial artifact has completely dominated the model’s discriminative capability. A.6 Extended Analysis of Feature Perturbation Variations This section provides a more comprehensive analysis of the relative perturbation variation ρ, extending the discussions presented in Section 6.3. Specifically, w… view at source ↗
Figure 16
Figure 16. Figure 16: Extended distributions of relative perturbation variations ρ across four gen￾erators and three detectors. Comparative Impact: Adversarial vs. Random Perturbations. To further validate that our adversarial perturbations exert a more profound and targeted impact on the detection feature space than simple signal corruption, we compare the relative perturbation variations induced by different noise types. Fig… view at source ↗
Figure 17
Figure 17. Figure 17: Mean Relative Perturbation (µρ) across detectors and datasets. We compare the perturbation magnitude induced by White-box attacks (IID), Cross￾generator Transfer attacks (OOD), and Random Gaussian noise. significantly further away from their clean states, thereby demonstrating a su￾perior and targeted disruptive capability. A.7 Adversarial Purification Setup To evaluate the robustness of detectors against… view at source ↗
Figure 18
Figure 18. Figure 18: Density distributions of ρ comparing IID attacks, OOD transfer attacks, and Random Gaussian noise (ε = 8/255). To address this, we adopt an efficient optimization strategy tailored to each detector’s architecture. For DIRE, we treat the reconstruction process as a fixed pre-processing step ϕDIRE(x) and perform attacks directly on the residual features r = ϕDIRE(x). The training objective is: \min _\theta … view at source ↗
Figure 19
Figure 19. Figure 19: Visualizations of Adversarial Attacks against DIRE. [PITH_FULL_IMAGE:figures/full_fig_p028_19.png] view at source ↗
Figure 20
Figure 20. Figure 20: Visualizations of Adversarial Attacks against LaRE [PITH_FULL_IMAGE:figures/full_fig_p029_20.png] view at source ↗
Figure 21
Figure 21. Figure 21: Visualizations of Adversarial Attacks against AEROBLADE. [PITH_FULL_IMAGE:figures/full_fig_p030_21.png] view at source ↗
read the original abstract

Recently, detecting AI-generated images produced by diffusion-based models has attracted increasing attention due to their potential threat to safety. Among existing approaches, reconstruction-based methods have emerged as a prominent paradigm for this task. However, we find that such methods exhibit severe security vulnerabilities to adversarial perturbations; that is, by adding imperceptible adversarial perturbations to input images, the detection accuracy of classifiers collapses to near zero. To verify this threat, we present a systematic evaluation of the adversarial robustness of three representative detectors across four diverse generative backbone models. First, we construct adversarial attacks in white-box scenarios, which degrade the performance of all well-trained detectors. Moreover, we find that these attacks demonstrate transferability; specifically, attacks crafted against one detector can be transferred to others, indicating that adversarial attacks on detectors can also be constructed in a black-box setting. Finally, we assess common countermeasures and find that standard defense methods against adversarial attacks provide limited mitigation. We attribute these failures to the low signal-to-noise ratio (SNR) of attacked samples as perceived by the detectors. Overall, our results reveal fundamental security limitations of reconstruction-based detectors and highlight the need to rethink existing detection strategies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that reconstruction-based detectors for diffusion-generated images are severely vulnerable to adversarial perturbations. By adding imperceptible perturbations, detection accuracy collapses to near zero across three representative detectors evaluated on four generative models. White-box attacks succeed, attacks transfer across detectors enabling black-box scenarios, standard defenses offer limited mitigation, and the failures are attributed to low SNR of attacked samples as perceived by the detectors.

Significance. If the empirical results hold, the finding is significant because it identifies a practical security limitation in a prominent detection paradigm for AI-generated content, with timely implications for deployment. The systematic scope—multiple detectors, generators, white-box/transfer/defense tests—provides concrete evidence that could guide future robust designs. The work earns credit for its empirical breadth and falsifiable predictions about attack success rates.

major comments (3)
  1. [Abstract and §3] Abstract and §3 (Detector Selection): The central claim that 'such methods exhibit severe security vulnerabilities' generalizes from only three detectors. Without explicit criteria for representativeness or evaluation of additional reconstruction-based variants in §3, it remains unclear whether the observed collapse is paradigm-wide or tied to shared architectural motifs in the chosen implementations.
  2. [§5.2] §5.2 (Transferability Experiments): Transfer success is reported, but without the number of independent runs, standard deviations, or statistical tests on the accuracy drops, the reliability of the black-box transfer claim is difficult to assess and weakens support for the security-vulnerability conclusion.
  3. [§6] §6 (Defense Assessment): The statement that 'standard defense methods against adversarial attacks provide limited mitigation' is load-bearing for the final recommendation to rethink strategies, yet lacks quantitative before/after metrics or ablation on which defenses were tested and why they failed.
minor comments (2)
  1. The abstract mentions low SNR but does not define how SNR is computed for the detectors; add a brief equation or procedure in the main text or appendix.
  2. [Figures] Table captions and axis labels in experimental result figures should explicitly state the generators and attack strengths used for each row/column.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the scope and rigor of our empirical claims. We address each major point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Detector Selection): The central claim that 'such methods exhibit severe security vulnerabilities' generalizes from only three detectors. Without explicit criteria for representativeness or evaluation of additional reconstruction-based variants in §3, it remains unclear whether the observed collapse is paradigm-wide or tied to shared architectural motifs in the chosen implementations.

    Authors: We agree that the manuscript would benefit from greater transparency on detector selection. In the revised version, we will expand §3 to explicitly state the criteria used: prominence in recent literature, coverage of distinct reconstruction architectures (e.g., autoencoder-based, diffusion-inversion-based, and hybrid), and public availability of trained models. We will also add a short discussion acknowledging that while the three detectors are representative of the dominant paradigms, the results may not cover every possible variant; however, the consistent failure mode across them supports our broader security concern. No additional experiments are planned at this stage due to computational constraints, but we will frame the claims more cautiously. revision: partial

  2. Referee: [§5.2] §5.2 (Transferability Experiments): Transfer success is reported, but without the number of independent runs, standard deviations, or statistical tests on the accuracy drops, the reliability of the black-box transfer claim is difficult to assess and weakens support for the security-vulnerability conclusion.

    Authors: We accept this criticism. The original experiments were run with multiple random seeds, but the details were omitted for brevity. In the revision, we will report the exact number of independent runs (five per transfer pair), include standard deviations on the reported accuracy drops, and add paired t-test results to establish statistical significance of the observed transferability. These additions will be placed in §5.2 and the corresponding tables. revision: yes

  3. Referee: [§6] §6 (Defense Assessment): The statement that 'standard defense methods against adversarial attacks provide limited mitigation' is load-bearing for the final recommendation to rethink strategies, yet lacks quantitative before/after metrics or ablation on which defenses were tested and why they failed.

    Authors: We will revise §6 to address this directly. The updated section will include a table with before-and-after detection accuracies for each tested defense (adversarial training, JPEG compression, and Gaussian smoothing), along with an ablation study showing the effect of defense strength hyperparameters. We will also expand the discussion of why these methods fail, linking it quantitatively to the low-SNR observation already present in the paper. This will make the limited-mitigation claim fully supported by data. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation with independent experimental results

full rationale

The paper conducts a systematic empirical evaluation of adversarial attacks on three specific reconstruction-based detectors across four generative models. It reports white-box attack success, transferability to black-box settings, and limited effectiveness of standard defenses, attributing failures to low SNR. No derivations, equations, fitted parameters, or self-citations are used to derive the central claims; results follow directly from the described attack constructions and accuracy measurements on held-out data. The representativeness concern raised in the skeptic note is a question of external validity, not a reduction of the reported findings to their own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper rests on standard assumptions from adversarial machine learning; no new free parameters, axioms beyond domain norms, or invented entities are introduced in the abstract.

axioms (1)
  • standard math Adversarial perturbations exist that can fool neural classifiers while remaining imperceptible
    Core premise of adversarial ML literature invoked to motivate the attacks.

pith-pipeline@v0.9.0 · 5520 in / 1147 out tokens · 49406 ms · 2026-05-10T14:46:29.865560+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

  1. [1]

    Black Forest Labs: FLUX.1 (2024) 2, 3, 7, 11, 18

  2. [2]

    Brooks, T., Peebles, B., Holmes, C., DePue, W., Guo, Y., Jing, L., Schnurr, D., Taylor, J., Luhman, T., Luhman, E., et al.: Video generation models as world simulators (2024) 2

  3. [3]

    In: IEEE Computer Society (2017) 3, 6

    Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE Computer Society (2017) 3, 6

  4. [4]

    In: NeurIPS (2018) 4, 6

    Chen, R.T., Rubanova, Y., Bettencourt, J., Duvenaud, D.K.: Neural ordinary dif- ferential equations. In: NeurIPS (2018) 4, 6

  5. [5]

    In: CVPR (2025) 3

    Chu, B., Xu, X., Wang, X., Zhang, Y., You, W., Zhou, L.: FIRE: robust detection of diffusion-generated images via frequency-guided reconstruction error. In: CVPR (2025) 3

  6. [6]

    In: ICASSP (2023) 2, 3

    Corvi, R., Cozzolino, D., Zingarini, G., Poggi, G., Nagano, K., Verdoliva, L.: On the detection of synthetic images generated by diffusion models. In: ICASSP (2023) 2, 3

  7. [7]

    In: ICML (2020) 2, 3, 7

    Croce, F., Hein, M.: Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In: ICML (2020) 2, 3, 7

  8. [8]

    In: CVPR (2009) 3, 7

    Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: CVPR (2009) 3, 7

  9. [9]

    In: NeurIPS (2021) 2, 3, 7, 18

    Dhariwal, P., Nichol, A.: Diffusion models beat gans on image synthesis. In: NeurIPS (2021) 2, 3, 7, 18

  10. [10]

    In: ICLR (2015) 2, 3, 6

    Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: ICLR (2015) 2, 3, 6

  11. [11]

    In: CVPR (2022) 2, 7, 11, 18

    Gu, S., Chen, D., Bao, J., Wen, F., Zhang, B., Chen, D., Yuan, L., Guo, B.: Vector quantized diffusion model for text-to-image synthesis. In: CVPR (2022) 2, 7, 11, 18

  12. [12]

    Preprint (2026) 3

    He, S., Li, X., Yang, X., Xiong, Y., Li, K.: GRRE: leveraging g-channel removed reconstruction error for robust detection of ai-generated images. Preprint (2026) 3

  13. [13]

    In: ICLR (2023) 4

    Hertz, A., Mokady, R., Tenenbaum, J., Aberman, K., Pritch, Y., Cohen-Or, D.: Prompt-to-prompt image editing with cross-attention control. In: ICLR (2023) 4

  14. [14]

    In: NeurIPS (2020) 2, 3 16 H

    Ho, J., Jain, A., Abbeel, P.: Denoising diffusion probabilistic models. In: NeurIPS (2020) 2, 3 16 H. Jiang et al

  15. [15]

    In: NeurIPS (2022) 2

    Ho, J., Salimans, T., Gritsenko, A., Chan, W., Norouzi, M., Fleet, D.J.: Video diffusion models. In: NeurIPS (2022) 2

  16. [16]

    Preprint (2025) 3

    Kang, J.Y., Park, J., Kim, S., Yoon, J.W., Kim, N.S.: Semantic-aware reconstruc- tion error for detecting ai-generated images. Preprint (2025) 3

  17. [17]

    In: CVPR (2024) 2, 3, 5, 6, 7, 18

    Luo, Y., Du, J., Yan, K., Ding, S.: Lare2: Latent reconstruction error based method for diffusion-generated image detection. In: CVPR (2024) 2, 3, 5, 6, 7, 18

  18. [18]

    Preprint (2023) 3

    Ma, R., Duan, J., Kong, F., Shi, X., Xu, K.: Exposing the fake: Effective diffusion- generated images detection. Preprint (2023) 3

  19. [19]

    TMLR (2025) 2

    Ma, X., Wang, Y., Chen, X., Jia, G., Liu, Z., Li, Y.F., Chen, C., Qiao, Y.: Latte: Latent diffusion transformer for video generation. TMLR (2025) 2

  20. [20]

    In: ICLR (2018) 2, 3, 6, 11, 12

    Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018) 2, 3, 6, 11, 12

  21. [21]

    ACM Comput

    Mirsky, Y., Lee, W.: The creation and detection of deepfakes: A survey. ACM Comput. Surv. (2021) 2, 3

  22. [22]

    In: CVPR (2023) 4

    Mokady, R., Hertz, A., Aberman, K., Pritch, Y., Cohen-Or, D.: Null-text inversion for editing real images using guided diffusion models. In: CVPR (2023) 4

  23. [23]

    In: ICML (2022) 2, 3, 11

    Nie, W., Guo, B., Huang, Y., Xiao, C., Vahdat, A., Anandkumar, A.: Diffusion models for adversarial purification. In: ICML (2022) 2, 3, 11

  24. [24]

    In: ICCV (2023) 3

    Peebles, W., Xie, S.: Scalable diffusion models with transformers. In: ICCV (2023) 3

  25. [25]

    In: CVPR (2024) 2, 3, 5, 6, 7, 18

    Ricker, J., Lukovnikov, D., Fischer, A.: AEROBLADE: training-free detection of latent diffusion images using autoencoder reconstruction error. In: CVPR (2024) 2, 3, 5, 6, 7, 18

  26. [26]

    In: CVPR (2022) 2, 3, 5, 7, 18

    Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: CVPR (2022) 2, 3, 5, 7, 18

  27. [27]

    In: CVPR (2023) 2, 3

    Somepalli, G., Singla, V., Goldblum, M., Geiping, J., Goldstein, T.: Diffusion art or digital forgery? investigating data replication in diffusion models. In: CVPR (2023) 2, 3

  28. [28]

    In: ICLR (2021) 2, 3, 4

    Song, J., Meng, C., Ermon, S.: Denoising diffusion implicit models. In: ICLR (2021) 2, 3, 4

  29. [29]

    In: ICLR (2021) 3, 4, 11

    Song, Y., Sohl-Dickstein, J., Kingma, D.P., Kumar, A., Ermon, S., Poole, B.: Score- based generative modeling through stochastic differential equations. In: ICLR (2021) 3, 4, 11

  30. [30]

    In: ICLR (2014) 2, 3

    Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. In: ICLR (2014) 2, 3

  31. [31]

    Tsipras, D., Santurkar, S., Engstrom, L., Turner, A., Madry, A.: Robustness may be at odds with accuracy (2019) 13

  32. [32]

    Social media + society (2020) 2, 3

    Vaccari, C., Chadwick, A.: Deepfakes and disinformation: Exploring the impact of synthetic political video on deception, uncertainty, and trust in news. Social media + society (2020) 2, 3

  33. [33]

    Preprint (2025) 3

    Vasilcoiu, A., Najdenkoska, I., Geradts, Z., Worring, M.: LATTE: latent trajectory embedding for diffusion-generated image detection. Preprint (2025) 3

  34. [34]

    In: CVPR (2022) 8

    Wang, R., Yi, M., Chen, Z., Zhu, S.: Out-of-distribution generalization with causal invariant transformations. In: CVPR (2022) 8

  35. [35]

    Wang, S.Y., Wang, O., Zhang, R., Owens, A., Efros, A.A.: Cnn-generated images are surprisingly easy to spot... for now. In: CVPR (2020) 2, 3

  36. [36]

    In: ICLR (2020) 3

    Wang, Y., Zou, D., Yi, J., Bailey, J., Ma, X., Gu, Q.: Improving adversarial ro- bustness requires revisiting misclassified examples. In: ICLR (2020) 3

  37. [37]

    In: ICLR (2025) 3 Fragile Reconstruction 17

    Wang, Z., Yi, M., Xue, S., Li, Z., Liu, M., Qin, B., Ma, Z.M.: Improved diffusion- based generative model with better adversarial robustness. In: ICLR (2025) 3 Fragile Reconstruction 17

  38. [38]

    In: ICCV (2023) 2, 3, 5, 6, 7, 18

    Wang, Z., Bao, J., Zhou, W., Wang, W., Hu, H., Chen, H., Li, H.: DIRE for diffusion-generated image detection. In: ICCV (2023) 2, 3, 5, 6, 7, 18

  39. [39]

    In: ICLR (2021) 3

    Yi, M., Hou, L., Shang, L., Jiang, X., Liu, Q., Ma, Z.M.: Reweighting augmented samples by minimizing the maximal expected loss. In: ICLR (2021) 3

  40. [40]

    In: ICML (2021) 3

    Yi, M., Hou, L., Sun, J., Shang, L., Jiang, X., Liu, Q., Ma, Z.: Improved OOD generalization via adversarial training and pretraing. In: ICML (2021) 3

  41. [41]

    In: NeurIPS (2024) 3

    Yi, M., Li, A., Xin, Y., Li, Z.: Towards understanding the working mechanism of text-to-image diffusion model. In: NeurIPS (2024) 3

  42. [42]

    Preprint (2023) 3

    Yi, M., Sun, J., Li, Z.: On the generalization of diffusion model. Preprint (2023) 3

  43. [43]

    In: ICLR (2023) 8

    Yi, M., Wang, R., Sun, J., Li, Z., Ma, Z.M.: Breaking correlation shift via condi- tional invariant regularizer. In: ICLR (2023) 8

  44. [44]

    In: ICML (2019) 3, 13

    Zhang, H., Yu, Y., Jiao, J., Xing, E., El Ghaoui, L., Jordan, M.: Theoretically principled trade-off between robustness and accuracy. In: ICML (2019) 3, 13

  45. [45]

    Fake” (0%Real), whereas AEROBLADE uniformly defaults to predicting 100%“Real

    Zhang, R., Isola, P., Efros, A.A., Shechtman, E., Wang, O.: The unreasonable effectiveness of deep features as a perceptual metric. In: CVPR (2018) 5 18 H. Jiang et al. A Appendix A.1 Details of Generative Models To ensure the comprehensive diversity of our benchmark, we select four gen- erative models that exemplify distinct evolutionary stages and archi...