Where Detectors Fail: Probing Generative Space for Generalizable AI-Generated Image Detection

Liang Lin; Pengxu Wei; Weijian Deng; Weijie Tu; Yao Xiao; Zijie Cao

arxiv: 2605.24906 · v3 · pith:RSTROAEVnew · submitted 2026-05-24 · 💻 cs.CV

Where Detectors Fail: Probing Generative Space for Generalizable AI-Generated Image Detection

Zijie Cao , Weijie Tu , Yao Xiao , Weijian Deng , Liang Lin , Pengxu Wei This is my paper

Pith reviewed 2026-06-30 11:53 UTC · model grok-4.3

classification 💻 cs.CV

keywords AI-generated image detectiongeneralization to unseen generatorsmanifold explorationboundary probingdetector robustnessgenerative model editingPROBE framework

0 comments

The pith

A framework uses the detector itself to steer generators via manifold modifications, creating hard samples that train detectors to generalize to unseen generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current AI-generated image detectors degrade on new generators even after training on large datasets because standard sampling misses many generative variations. The paper introduces PROBE, which treats generators as editable and lets the detector act as a critic to guide small internal changes that produce realistic but difficult images. These boundary samples reveal uncommon failure modes and are fed back to refine the detector. Experiments across benchmarks show the resulting detectors perform better on generators never seen during training. The method shifts focus from collecting more fixed data to actively exploring the generative manifold.

Core claim

PROBE improves AIGI detector generalization by actively exploring challenging regions of the generative process. Instead of treating the generator as a fixed data source, PROBE uses the detector as a critic to steer the generator through manifold-level modifications, producing realistic samples that are difficult to classify. These samples expose failure cases that are uncommon under standard data sampling strategies and are used to refine the detector.

What carries the argument

PROBE framework that uses the detector as critic to steer the generator through manifold-level modifications and generate challenging training samples.

If this is right

Detectors refined with PROBE samples achieve better performance on unseen generators across multiple benchmarks.
Failure cases uncovered by boundary exploration are uncommon under standard sampling and help close coverage gaps.
Manifold modifications allow creation of diverse realistic variations without relying solely on larger fixed datasets.
The approach reframes the generator as an editable source rather than a static data provider.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Detectors may need ongoing adaptation loops that re-probe generators as new models appear rather than one-time training.
The same steering idea could apply to other generative domains such as text or video where unseen models also cause detector failure.
If manifold modifications can be made even more controlled, they might serve as a diagnostic tool to map the exact boundaries of current detectors.

Load-bearing premise

Manifold-level modifications guided by the detector produce realistic images whose failure cases transfer to actual unseen generators.

What would settle it

Train a detector with PROBE samples and test whether its accuracy on images from a new generator (never involved in probing) fails to exceed a baseline trained on standard samples.

Figures

Figures reproduced from arXiv: 2605.24906 by Liang Lin, Pengxu Wei, Weijian Deng, Weijie Tu, Yao Xiao, Zijie Cao.

**Figure 1.** Figure 1: PROBE: Improving detector generalization via boundary-induced fake samples. (a) A detector trained on real images and samples from seen generators often fails to generalize to unseen generators, leading to misclassification of unseen fake images. (b) PROBE uses the detector as a critic to guide a seen generator toward challenging regions of the generative space. By steering generation based on detector fee… view at source ↗

**Figure 2.** Figure 2: Overall framework of PROBE. PROBE consists of two stages: 1) Generative Space Probing: Treat the detector as a critic, use its outputs to directly supervise generator fine-tuning, probe the generative space and obtain challenging samples that reflect the detector’s failure modes; 2) Detector Fine-tuning: Fine-tune the detector with the boundary-induced fake samples generated in Stage 1, which refines the d… view at source ↗

**Figure 3.** Figure 3: Effect of perceptual regularization. We measure image quality and text–image alignment using HPSv3 (Ma et al., 2025). Removing perceptual regularization leads to degraded visual fidelity and semantic inconsistency, while moderate regularization preserves realism and stabilizes generation. trained on real images and samples from seen generators often learns a decision boundary that performs well on famili… view at source ↗

**Figure 4.** Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Generalization under high-quality generation. We evaluate multiple detectors on images generated by Stable Diffusion 1.5 and SD 1.5-Realistic-Vision. SD 1.5-Realistic-Vision is designed to produce more photorealistic images through finetuning on high-quality data. Both generators are unseen for all detectors during training. While most detectors perform well on SD 1.5, their performance drops substantial… view at source ↗

**Figure 6.** Figure 6: Visualization of boundary-induced fake samples guided by different detectors. We project boundary-induced samples generated using different detectors separately (ResNet50, UnivFD, and DRCT) into the feature space of the ResNet50 detector. Despite being guided by different critics, the resulting samples largely overlap, indicating that these detectors uncover similar challenging regions of the generative s… view at source ↗

**Figure 7.** Figure 7: (a) Feature space visualization of the baseline ResNet50 detector. PROBE boundary-induced samples exhibit substantial overlap with those from architecturally diverse unseen generators (DALL·E 3, Midjourney, and FLUX), which indicates that PROBE effectively captures shared detector-relevant hard regions. (b) Failure case visualization. BigGAN samples exhibit significant deviation from PROBE samples, resulti… view at source ↗

**Figure 8.** Figure 8: Energy spectra of denoising residuals of images from different sources. Midjourney, FLUX, DALL·E 3, and PROBE samples exhibit similar spectral patterns, while BigGAN samples show a clearly different grid-like structure. Spectral analysis. Following (Corvi et al., 2023), we extract noise patterns from generated images using a denoising network and analyze their energy spectra. As shown in [PITH_FULL_IMAGE:… view at source ↗

**Figure 9.** Figure 9: Qualitative results of boundary-induced fake samples using various detectors as critics. Each row shows images corresponding to: the original SD 1.4 images, PGD adversarial attack examples, and boundary-induced fake samples from the generator steered by ResNet50, UnivFD, CoDE, and DRCT, respectively. PROBE samples remain realistic while effectively evading detection [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Visualization of SD 1.5 and SD 1.5-Realistic-Vision images. Images from SD 1.5-Realistic-Vision (first row) are more visually realistic than SD 1.5(second row) and thus pose greater challenges to detectors. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

read the original abstract

Detecting AI-generated images (AIGI) remains challenging because detectors often fail to generalize to unseen generators. Although existing methods are trained on large datasets, their performance still degrades when generation settings change, indicating that data scale alone is insufficient and that limited coverage of generative variations during training is a key factor. Studies on generative model editing show that small changes in internal representations can produce diverse and meaningful image variations, many of which are not explored under standard sampling. Leveraging this insight, we propose PROBE (Probing Robustness via Boundary Exploration), a framework that improves detector generalization by actively exploring challenging regions of the generative process. Instead of treating the generator as a fixed data source, PROBE uses the detector as a critic to steer the generator through manifold-level modifications, producing realistic samples that are difficult to classify. These samples expose failure cases that are uncommon under standard data sampling strategies and are used to refine the detector. Experimental results across multiple benchmarks indicate that PROBE enhances generalization to unseen generators, resulting in more generalizable AIGI detection performance. Code and models are available at https://github.com/Amamiya-C/PROBE-AIGI-Detection

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PROBE's detector-steered manifold exploration is a fresh angle on generating hard AIGI training samples, but the claim that these samples produce transferable failure cases to real unseen generators rests on weak validation.

read the letter

The main takeaway is that this paper uses the detector as a critic to actively steer a generator's internal representations, creating samples that expose uncommon failure modes and then retrains on them to boost generalization. The active steering via manifold modifications is the distinct piece not covered in the referenced prior work.

It does a solid job explaining why data scale alone falls short and why targeted exploration of generative variations matters. The experiments across benchmarks are presented as showing gains on unseen generators, and releasing code is a practical plus for anyone who wants to test or extend the approach.

The soft spot is exactly the one the stress-test flags: the steered samples need to lie on the same manifold as real new generators for the failure cases to transfer, yet the abstract gives no controls like perceptual similarity checks, human realism ratings, or ablations comparing steered versus non-steered hard negatives. Without those, it's difficult to rule out that the improvements come from detector-specific artifacts rather than genuine coverage of new generators. The full methods would need to show this equivalence holds.

This is aimed at people working on robust AIGI detection for forensics or misinformation. A reader looking for concrete augmentation strategies would find the framework worth examining, even with the current gaps.

It deserves peer review because the problem is real, the idea is coherent on its own terms, and the code makes follow-up possible.

Referee Report

2 major / 1 minor

Summary. The paper proposes PROBE, a framework that treats the generator as steerable rather than fixed: the detector acts as a critic to perform manifold-level modifications, generating realistic samples that expose uncommon failure cases under standard sampling. These samples are then used to refine the detector, with the central claim being that this process yields improved generalization to unseen generators, as supported by experiments across multiple benchmarks.

Significance. If validated, the result would be significant for AIGI detection because it offers an active, detector-guided way to cover generative variations beyond passive dataset scaling. The public release of code and models is a clear strength that enables reproducibility.

major comments (2)

[§3] §3 (PROBE framework): the claim that detector-steered manifold modifications produce realistic samples whose failure cases transfer to real unseen generators lacks direct validation via perceptual similarity metrics, human realism ratings, or an ablation showing that non-steered hard negatives are insufficient; this equivalence is load-bearing for the generalization improvement.
[§4] §4 (Experiments): the reported gains in cross-generator performance are stated without accompanying details on data splits, exact metrics per unseen generator, or controls that isolate the contribution of manifold steering versus standard adversarial augmentation; without these, attribution of the result to the proposed mechanism cannot be verified.

minor comments (1)

[Abstract] Abstract: the magnitude of improvement and the specific unseen generators tested are not quantified, which would help readers assess practical impact.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major point below and will incorporate the suggested additions and clarifications in the revised manuscript.

read point-by-point responses

Referee: [§3] §3 (PROBE framework): the claim that detector-steered manifold modifications produce realistic samples whose failure cases transfer to real unseen generators lacks direct validation via perceptual similarity metrics, human realism ratings, or an ablation showing that non-steered hard negatives are insufficient; this equivalence is load-bearing for the generalization improvement.

Authors: We agree that direct validation of sample realism and an ablation isolating the steering mechanism would strengthen the central claim. In the revision we will add FID and LPIPS scores comparing steered samples to their unsteered counterparts, include a small-scale human realism rating study, and report an ablation that replaces manifold steering with standard hard-negative mining to quantify the contribution of the proposed mechanism. revision: yes
Referee: [§4] §4 (Experiments): the reported gains in cross-generator performance are stated without accompanying details on data splits, exact metrics per unseen generator, or controls that isolate the contribution of manifold steering versus standard adversarial augmentation; without these, attribution of the result to the proposed mechanism cannot be verified.

Authors: We will expand §4 to provide explicit train/validation/test split details, a table of per-generator accuracy and AUC numbers, and additional ablation experiments that directly compare PROBE against standard adversarial augmentation baselines, thereby clarifying the source of the observed generalization gains. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical refinement via external generators

full rationale

The paper describes an empirical framework (PROBE) that steers generators using a detector to produce challenging samples for detector refinement, with generalization claims resting on experimental results across multiple benchmarks and unseen generators. No equations, derivations, or self-referential definitions are present that would reduce the claimed generalization to a fitted quantity or self-citation chain by construction. The method is presented as data-driven exploration rather than a mathematical derivation, and the approach remains self-contained against external benchmarks without invoking uniqueness theorems or ansatzes from prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that small internal generative changes produce diverse realistic variations and that detector-guided steering yields transferable failure cases; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Small changes in internal representations of generative models can produce diverse and meaningful image variations not explored under standard sampling.
Invoked as the key insight from studies on generative model editing that motivates the PROBE approach.

pith-pipeline@v0.9.1-grok · 5752 in / 1205 out tokens · 33002 ms · 2026-06-30T11:53:55.928396+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 6 canonical work pages · 2 internal anchors

[1]

Real- time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398,

Cavia, B., Horwitz, E., Reiss, T., and Hoshen, Y . Real- time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398,

work page arXiv
[2]

Towards Deep Learning Models Resistant to Adversarial Attacks

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K. and Zisserman, A. Very deep convolu- tional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

A sanity check for ai-generated image detection

Yan, S., Li, O., Cai, J., Hao, Y ., Jiang, X., Hu, Y ., and Xie, W. A sanity check for ai-generated image detection. InThe Thirteenth International Conference on Learning Representations, 2025a. Yan, Z., Wang, J., Jin, P., Zhang, K.-Y ., Liu, C., Chen, S., Yao, T., Ding, S., Wu, B., and Yuan, L. Orthogonal subspace decomposition for generalizable ai-gener...

work page arXiv
[5]

PatchCraft: Exploring texture patch for efficient AI-generated image detection,

Zhong, N., Xu, Y ., Li, S., Qian, Z., and Zhang, X. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397,

work page arXiv
[6]

Gendet: Towards good generalizations for ai-generated image detection, 2023

Zhu, M., Chen, H., Huang, M., Li, W., Hu, H., Hu, J., and Wang, Y . Gendet: Towards good generaliza- tions for ai-generated image detection.arXiv preprint arXiv:2312.08880, 2023a. Zhu, M., Chen, H., Yan, Q., Huang, X., Lin, G., Li, W., Tu, Z., Hu, H., Hu, J., and Wang, Y . Genimage: A million-scale benchmark for detecting ai-generated image. InAdvances in...

work page arXiv 2023
[7]

to fine-tune generator more efficiently. Denoising in the diffusion model is an iterative process, the result xt−1 at the next time step t−1 is obtained by denoising the resultx t at the current stept: xt−1 =a txt +b tϵϕ(xt, t) +c tϵ. (3) where ϵ∼ N(0, I) is random Gaussian noise, and a, b, c are coefficients determined by the sampling algorithm. Therefor...

2023
[8]

A linear classifier is attached to each backbone to perform binary classification

and the Transformer-based DINOv2-ViT-L (Oquab et al., 2024). A linear classifier is attached to each backbone to perform binary classification. The input resolutions are set to 224×224 for ResNet50 and 336×336 for DINOV2. For both training and inference, we extract patches of the corresponding resolution via cropping rather than resizing; padding is appli...

2024
[9]

We utilize the AdamW optimizer with a weight decay of 1e-5, a learning rate of 1e-5, and a batch size of

• DINOv2: Trained on the Reconstruction Training Set (Guillaro et al., 2025). We utilize the AdamW optimizer with a weight decay of 1e-5, a learning rate of 1e-5, and a batch size of

2025
[10]

For image synthesis, we employ the DDIM sampler (Song et al.) with 35 sampling steps

as our target generators for fine-tuning, which align with the seen generators of baseline detectors. For image synthesis, we employ the DDIM sampler (Song et al.) with 35 sampling steps. The classifier-free guidance scale is set to 7.5, and the output resolution is fixed at 512×512. To achieve efficient and controllable fine-tuning, we incorporate lightw...

2022
[11]

Finally, the cost of PROBE is a one-time offline process

require 236k and 258k generated samples, respectively, whereas PROBE achieves stronger generalization with only 20k samples. Finally, the cost of PROBE is a one-time offline process. At inference time, the detector architecture remains unchanged and introduces no additional latency. D. Generalization Mechanism of PROBE PROBE performs boundary exploration ...

2008
[12]

These datasets encompass a wide array of mainstream generators and exhibit high diversity in terms of both content and format, thereby minimizing evaluation bias. F. Additional Results Following (Wang et al., 2020; Ojha et al., 2023; Tan et al., 2024; Yan et al., 2025b; Guillaro et al., 2025), we additionally report Average Precision (AP) scores in Table

2020

[1] [1]

Real- time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398,

Cavia, B., Horwitz, E., Reiss, T., and Hoshen, Y . Real- time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398,

work page arXiv

[2] [2]

Towards Deep Learning Models Resistant to Adversarial Attacks

Madry, A., Makelov, A., Schmidt, L., Tsipras, D., and Vladu, A. Towards deep learning models resistant to adversarial attacks.arXiv preprint arXiv:1706.06083,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Simonyan, K. and Zisserman, A. Very deep convolu- tional networks for large-scale image recognition.arXiv preprint arXiv:1409.1556,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

A sanity check for ai-generated image detection

Yan, S., Li, O., Cai, J., Hao, Y ., Jiang, X., Hu, Y ., and Xie, W. A sanity check for ai-generated image detection. InThe Thirteenth International Conference on Learning Representations, 2025a. Yan, Z., Wang, J., Jin, P., Zhang, K.-Y ., Liu, C., Chen, S., Yao, T., Ding, S., Wu, B., and Yuan, L. Orthogonal subspace decomposition for generalizable ai-gener...

work page arXiv

[5] [5]

PatchCraft: Exploring texture patch for efficient AI-generated image detection,

Zhong, N., Xu, Y ., Li, S., Qian, Z., and Zhang, X. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397,

work page arXiv

[6] [6]

Gendet: Towards good generalizations for ai-generated image detection, 2023

Zhu, M., Chen, H., Huang, M., Li, W., Hu, H., Hu, J., and Wang, Y . Gendet: Towards good generaliza- tions for ai-generated image detection.arXiv preprint arXiv:2312.08880, 2023a. Zhu, M., Chen, H., Yan, Q., Huang, X., Lin, G., Li, W., Tu, Z., Hu, H., Hu, J., and Wang, Y . Genimage: A million-scale benchmark for detecting ai-generated image. InAdvances in...

work page arXiv 2023

[7] [7]

to fine-tune generator more efficiently. Denoising in the diffusion model is an iterative process, the result xt−1 at the next time step t−1 is obtained by denoising the resultx t at the current stept: xt−1 =a txt +b tϵϕ(xt, t) +c tϵ. (3) where ϵ∼ N(0, I) is random Gaussian noise, and a, b, c are coefficients determined by the sampling algorithm. Therefor...

2023

[8] [8]

A linear classifier is attached to each backbone to perform binary classification

and the Transformer-based DINOv2-ViT-L (Oquab et al., 2024). A linear classifier is attached to each backbone to perform binary classification. The input resolutions are set to 224×224 for ResNet50 and 336×336 for DINOV2. For both training and inference, we extract patches of the corresponding resolution via cropping rather than resizing; padding is appli...

2024

[9] [9]

We utilize the AdamW optimizer with a weight decay of 1e-5, a learning rate of 1e-5, and a batch size of

• DINOv2: Trained on the Reconstruction Training Set (Guillaro et al., 2025). We utilize the AdamW optimizer with a weight decay of 1e-5, a learning rate of 1e-5, and a batch size of

2025

[10] [10]

For image synthesis, we employ the DDIM sampler (Song et al.) with 35 sampling steps

as our target generators for fine-tuning, which align with the seen generators of baseline detectors. For image synthesis, we employ the DDIM sampler (Song et al.) with 35 sampling steps. The classifier-free guidance scale is set to 7.5, and the output resolution is fixed at 512×512. To achieve efficient and controllable fine-tuning, we incorporate lightw...

2022

[11] [11]

Finally, the cost of PROBE is a one-time offline process

require 236k and 258k generated samples, respectively, whereas PROBE achieves stronger generalization with only 20k samples. Finally, the cost of PROBE is a one-time offline process. At inference time, the detector architecture remains unchanged and introduces no additional latency. D. Generalization Mechanism of PROBE PROBE performs boundary exploration ...

2008

[12] [12]

These datasets encompass a wide array of mainstream generators and exhibit high diversity in terms of both content and format, thereby minimizing evaluation bias. F. Additional Results Following (Wang et al., 2020; Ojha et al., 2023; Tan et al., 2024; Yan et al., 2025b; Guillaro et al., 2025), we additionally report Average Precision (AP) scores in Table

2020