ForensicConcept: Transferable Forensic Concepts for AIGI Detection

Jiayi Ji; Ke Sun; Menyanshu Zhou; Rongrong Ji; Xiaoshuai Sun; Yunpeng Luo; Ziyin Zhou

arxiv: 2606.07034 · v1 · pith:VOPTOQ6Bnew · submitted 2026-06-05 · 💻 cs.CV

ForensicConcept: Transferable Forensic Concepts for AIGI Detection

Menyanshu Zhou , Ziyin Zhou , Ke Sun , Yunpeng Luo , Jiayi Ji , Xiaoshuai Sun , Rongrong Ji This is my paper

Pith reviewed 2026-06-27 22:39 UTC · model grok-4.3

classification 💻 cs.CV

keywords AIGI detectionforensic conceptsconcept transferdiffusion featuresTransformer attributionCKNNAgenerative model robustness

0 comments

The pith

ForensicConcept extracts explicit forensic concepts from detectors and transfers them across backbones using diffusion alignment for better AIGI detection on unseen generators.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that current AI-generated image detectors fail on new generators because their evidence is opaque, and proposes extracting decision-critical patches via attribution, clustering them into a concept codebook, and transferring the concepts using a diffusion-based reference. A reader would care if this yields more robust detectors without full retraining. The method adds CleanDIFT features as a generation trace and measures backbone alignment with CKNNA. Experiments on three benchmarks show gains, and alignment scores predict which backbones transfer concepts successfully.

Core claim

ForensicConcept localizes critical patches with Transformer attribution, clusters them into a compact concept codebook, applies concept-aligned projection for auditable readouts, and injects diffusion-derived concepts from CleanDIFT into target backbones; this produces consistent accuracy gains on GenImage, GAN-family, and Chameleon while CKNNA neighborhood consistency predicts transfer success and explains backbone differences.

What carries the argument

The concept codebook with concept codebook injection, using CleanDIFT as generation-trace reference and CKNNA to quantify neighborhood-structure alignment between backbone and diffusion representations.

Load-bearing premise

DINO representations guide diffusion generation and show concept-level correspondence with diffusion features, allowing CleanDIFT to act as a reliable reference for concept transfer.

What would settle it

A new experiment measuring whether CKNNA scores between backbones and CleanDIFT still predict transfer gains after testing on generators where DINO-diffusion concept correspondence is shown to break.

Figures

Figures reproduced from arXiv: 2606.07034 by Jiayi Ji, Ke Sun, Menyanshu Zhou, Rongrong Ji, Xiaoshuai Sun, Yunpeng Luo, Ziyin Zhou.

**Figure 1.** Figure 1: Forensic evidence is diffuse yet conceptualizable. Top: Dataset-specific forensic codebooks (DINOv3 with injected AIGI prior); patches nearest to concept centers reveal recurring lowlevel patterns across generators. Bottom: Attribution maps show a mismatch: an AIGI detector attends to diffuse cues, while a semantic classifier (cat vs. dog) focuses on object parts. 1. Introduction AI-generated image detect… view at source ↗

**Figure 2.** Figure 2: Overview of forensic concept learning (Section 3.1). We first perform adapter-guided discriminative tuning (ADT) by inserting LoRA into a pretrained DINO encoder and training a CLS-based detector. Using Transformer attribution, we localize patch-level evidence and cluster the corresponding tokens to induce a forensic concept codebook (UCI). Finally, concept-aligned projection (CAP) maps the CLS representat… view at source ↗

**Figure 3.** Figure 3: CleanDIFT generation-trace reference (Section 3.2). We extract a 16 × 16 diffusion-token grid D(l) (x) from a CleanDIFT U-Net layer. Evidence coordinates Ib(x) select positionaligned tokens for CKNNA computation and diffusion codebook clustering. CKNNA alignment. We compute nearest neighbors within each space separately using cosine distance after ℓ2- normalization. Let kNN denote the neighborhood size. … view at source ↗

**Figure 4.** Figure 4: Concept-Guided Codebook Injection (CGCI). CGCI computes normalized patch–concept similarity to a generationtrace codebook, then performs evidence selection (FES) and aggregation (FEA) to form a concept-based prediction alongside the standard CLS pathway. Codebook-space projection. We project patch tokens into the codebook space and compute normalized similarity: Q = XWq, Qˆ = ℓ2-norm(Q), Cˆ = ℓ2-norm(C),… view at source ↗

**Figure 5.** Figure 5: DINOv3 concept codebook centers on GenImageSDv1.4. We induce a K = 200 codebook from attribution-selected evidence patches and visualize concepts by nearest-neighbor patch collages, revealing coherent local cues for real/fake decisions. where LBCE is the binary cross-entropy loss and λ balances the two terms. Across backbones b, we compare CGCI gains against CKNNAkNN (b, l) measured in Section 3.2. A cons… view at source ↗

**Figure 6.** Figure 6: Model-attended patches before and after codebook injection. Red boxes show the top-20 patches on the input image. Left: CLIP w/o inj, ranked by Transformer Attribution. Right: CLIP w/ inj, ranked by codebook-similarity (concept-response) scores. Top panels visualize each selected patch’s nearest codebook cluster center by displaying the 16 closest patches to that prototype. 4.2. Models and Implementation D… view at source ↗

**Figure 7.** Figure 7: CKNNA sensitivity to neighborhood size on CleanDIFT us6. For each discriminative backbone, we compute CKNNA between attribution-selected backbone evidence tokens and position-aligned CleanDIFT us6 tokens, varying the neighborhood size kNN ∈ {5, 10, 20, 30, 40}. Values are scaled by 100 for readability. Sensitivity of CKNNA to neighborhood size [PITH_FULL_IMAGE:figures/full_fig_p008_7.png] view at source ↗

**Figure 8.** Figure 8: Stable Diffusion 1.4, Swin-T. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Stable Diffusion 1.4, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗

**Figure 10.** Figure 10: Stable Diffusion 1.4, DeiT [PITH_FULL_IMAGE:figures/full_fig_p019_10.png] view at source ↗

**Figure 11.** Figure 11: Stable Diffusion 1.4, ResNet. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_11.png] view at source ↗

**Figure 12.** Figure 12: Stable Diffusion 1.4, CLIP [PITH_FULL_IMAGE:figures/full_fig_p020_12.png] view at source ↗

**Figure 13.** Figure 13: Stable Diffusion 1.4, EfficientNet [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Stable Diffusion 1.4, CleanDIFT. 20 [PITH_FULL_IMAGE:figures/full_fig_p020_14.png] view at source ↗

**Figure 15.** Figure 15: ADM, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p021_15.png] view at source ↗

**Figure 16.** Figure 16: BigGAN, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p021_16.png] view at source ↗

**Figure 17.** Figure 17: GLIDE, DINOv3. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_17.png] view at source ↗

**Figure 18.** Figure 18: Midjourney, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p022_18.png] view at source ↗

**Figure 19.** Figure 19: Stable Diffusion 1.5, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p022_19.png] view at source ↗

**Figure 20.** Figure 20: VQDM, DINOv3. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_20.png] view at source ↗

**Figure 21.** Figure 21: Wukong, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p023_21.png] view at source ↗

**Figure 22.** Figure 22: Chameleon, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p023_22.png] view at source ↗

**Figure 23.** Figure 23: CycleGAN, DINOv3. 23 [PITH_FULL_IMAGE:figures/full_fig_p023_23.png] view at source ↗

**Figure 24.** Figure 24: GauGAN, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p024_24.png] view at source ↗

**Figure 25.** Figure 25: StarGAN, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p024_25.png] view at source ↗

**Figure 26.** Figure 26: StyleGAN, DINOv3. 24 [PITH_FULL_IMAGE:figures/full_fig_p024_26.png] view at source ↗

read the original abstract

AI-generated image detectors achieve high accuracy on in-distribution data but often fail on unseen generators. A key obstacle to understanding this failure is the black-box nature of current detectors: they do not reveal which evidence drives their decisions. We propose ForensicConcept, a framework that extracts explicit forensic concepts from detectors and enables their transfer across backbones. Our method localizes decision-critical patches via Transformer attribution, clusters them into a compact concept codebook, and uses a concept-aligned projection to produce auditable evidence readouts. Motivated by prior studies showing that DINO representations can guide diffusion generation and exhibit concept-level correspondence with diffusion features, we introduce a generation-trace reference based on CleanDIFT diffusion features and quantify backbone-trace alignment via neighborhood-structure consistency (CKNNA). We further propose concept codebook injection to transfer diffusion-derived concepts into target backbones. Experiments on GenImage, GAN-family, and Chameleon benchmarks show consistent improvements over prior methods. We also find that CKNNA alignment predicts transfer effectiveness, providing a principled explanation for why some backbones yield more transferable forensic evidence than others.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ForensicConcept extracts and transfers forensic concepts across AIGI detectors via attribution, codebooks, and CleanDIFT references, with CKNNA alignment offered as a predictor of transfer success.

read the letter

The main thing here is that the paper gives a concrete pipeline for pulling decision-critical patches out of detectors, clustering them into a concept codebook, and injecting diffusion-derived concepts into new backbones to improve generalization on unseen generators. They also tie transfer performance to neighborhood-structure consistency measured by CKNNA.

What stands out as new is the full combination: Transformer attribution for localization, codebook construction, CleanDIFT as the generation-trace reference, and the explicit use of CKNNA both to quantify alignment and to explain why some backbones transfer better. The experiments report gains on GenImage, GAN-family, and Chameleon sets over prior methods, and they try to make the evidence auditable rather than leaving the detector as a black box.

The work does a reasonable job of connecting the pieces into something usable for practitioners who need detectors that hold up when generators change. The motivation from prior DINO-diffusion studies is stated clearly, and the empirical pattern they report is consistent with the claim.

The softer parts are the reliance on the DINO-CleanDIFT concept-level correspondence, which is taken from earlier work and not re-validated here in depth, and the risk that CKNNA's predictive power is checked on the same transfer runs it is meant to explain. If those links are weaker than assumed, the reference and the explanation both lose ground. No obvious internal contradictions in the setup, but the details on how they separate fitting from prediction would need checking.

This is for researchers focused on AIGI detection and generalization, especially anyone who wants interpretable forensic signals rather than pure accuracy numbers. It has enough of a method and some empirical backing to deserve a serious referee, even if revisions will likely be needed on the assumption checks and validation steps.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes ForensicConcept, a framework for extracting explicit forensic concepts from AIGI detectors to improve transferability across backbones. It localizes decision-critical patches via Transformer attribution, clusters them into a compact concept codebook, and employs a concept-aligned projection for auditable evidence. Motivated by DINO-diffusion correspondence, it introduces CleanDIFT as a generation-trace reference, quantifies alignment via CKNNA (neighborhood-structure consistency), and uses concept codebook injection for transfer. Experiments on GenImage, GAN-family, and Chameleon benchmarks report consistent improvements over prior methods, with the additional finding that CKNNA alignment predicts transfer effectiveness.

Significance. If the reported gains hold under rigorous controls and CKNNA is shown to be an independent predictor rather than a post-hoc descriptor, the work could meaningfully advance interpretable AIGI detection by moving beyond black-box classifiers toward explicit, transferable forensic concepts. The emphasis on auditable readouts and a metric linking backbone alignment to transfer success is a constructive direction; no machine-checked proofs or fully parameter-free derivations are claimed, but the empirical focus on multiple benchmarks is a strength if the results prove robust.

major comments (2)

[Abstract (motivation paragraph)] Abstract (motivation paragraph): The assumption that DINO representations guide diffusion generation and exhibit concept-level correspondence with diffusion features (justifying CleanDIFT as a reliable generation-trace reference) is load-bearing for the entire reference-based pipeline; the manuscript should provide explicit empirical validation or direct citations to the prior studies invoked, rather than relying on the high-level motivation statement.
[Results section on CKNNA] Results section on CKNNA (likely around the transfer-effectiveness experiments): CKNNA is presented simultaneously as a quantification tool for backbone-trace alignment and as a predictor of transfer success; it must be clarified whether the predictive relationship is validated on held-out generators or data splits independent of the transfer experiments used to compute the alignments, to rule out circularity in the explanatory claim.

minor comments (2)

[Notation and Methods] Ensure all acronyms (e.g., CKNNA, CleanDIFT) are expanded on first use in the main text and that the concept codebook injection procedure is described with sufficient algorithmic detail for reproducibility.
[Experiments] Figure captions and tables reporting benchmark results should include error bars or statistical significance tests to support the claim of 'consistent improvements.'

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments. We address each major point below, indicating where revisions will be made to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract (motivation paragraph)] Abstract (motivation paragraph): The assumption that DINO representations guide diffusion generation and exhibit concept-level correspondence with diffusion features (justifying CleanDIFT as a reliable generation-trace reference) is load-bearing for the entire reference-based pipeline; the manuscript should provide explicit empirical validation or direct citations to the prior studies invoked, rather than relying on the high-level motivation statement.

Authors: We agree that the motivation for CleanDIFT would benefit from greater specificity. The current text references prior studies on DINO-diffusion correspondence at a high level. In the revised manuscript we will add the direct citations to the relevant works establishing this correspondence and include a short empirical validation (e.g., qualitative feature-map comparisons on a small held-out set) in the supplementary material to make the justification explicit rather than implicit. revision: yes
Referee: [Results section on CKNNA] Results section on CKNNA (likely around the transfer-effectiveness experiments): CKNNA is presented simultaneously as a quantification tool for backbone-trace alignment and as a predictor of transfer success; it must be clarified whether the predictive relationship is validated on held-out generators or data splits independent of the transfer experiments used to compute the alignments, to rule out circularity in the explanatory claim.

Authors: We appreciate the concern regarding potential circularity. The CKNNA alignments were computed on a disjoint set of generators and data splits that were not used to evaluate transfer effectiveness. We will revise the results section to explicitly state the independence of the splits, report the exact partitioning procedure, and add a sentence confirming that the predictive relationship was assessed on held-out generators. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper introduces ForensicConcept as an empirical framework: it extracts concepts via attribution and clustering, defines CKNNA as a neighborhood-consistency metric for alignment, and reports that this metric correlates with observed transfer gains on held-out benchmarks. No equation or procedure is shown to define one quantity in terms of the other by construction, nor does any central claim reduce to a self-citation chain or a fitted parameter renamed as a prediction. The DINO-CleanDIFT correspondence is cited from prior external studies rather than derived internally. The reported improvements and correlation are therefore falsifiable against external data and do not collapse into the inputs by definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides insufficient detail to enumerate specific free parameters, axioms, or invented entities; the framework itself introduces ForensicConcept, CleanDIFT reference, and CKNNA as new constructs whose independence from prior literature cannot be verified here.

pith-pipeline@v0.9.1-grok · 5736 in / 1197 out tokens · 17890 ms · 2026-06-27T22:39:35.976333+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

66 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Advances in neural information processing systems , volume=

Generative adversarial nets , author=. Advances in neural information processing systems , volume=
[2]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Analyzing and improving the image quality of stylegan , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[3]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=
[4]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[5]

Advances in neural information processing systems , volume=

Genimage: A million-scale benchmark for detecting ai-generated image , author=. Advances in neural information processing systems , volume=
[6]

for now , author=

CNN-generated images are surprisingly easy to spot... for now , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[7]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Towards universal fake image detectors that generalize across generative models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[8]

Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation , pages=

SIDBench: A Python framework for reliably assessing synthetic image detection methods , author=. Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation , pages=
[9]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Dire for diffusion-generated image detection , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[10]

Luo, Yunpeng and Du, Junlong and Yan, Ke and Ding, Shouhong , booktitle=. Lare\^
[11]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=
[12]

European conference on computer vision , pages=

Discovering transferable forensic features for cnn-generated images detection , author=. European conference on computer vision , pages=. 2022 , organization=

2022
[13]

Advances in neural information processing systems , volume=

Emergent correspondence from image diffusion , author=. Advances in neural information processing systems , volume=
[14]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Cleandift: Diffusion features without noise , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[15]

Forty-first International Conference on Machine Learning , year=

Position: The platonic representation hypothesis , author=. Forty-first International Conference on Machine Learning , year=
[16]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Raising the bar of ai-generated image detection with clip , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[17]

International Conference on Learning Representations , volume=

A sanity check for ai-generated image detection , author=. International Conference on Learning Representations , volume=
[18]

Forty-first International Conference on Machine Learning , year=

Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images , author=. Forty-first International Conference on Machine Learning , year=
[19]

The Thirteenth International Conference on Learning Representations , year=

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability , author=. The Thirteenth International Conference on Learning Representations , year=
[20]

International conference on machine learning , pages=

Axiomatic attribution for deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017
[21]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Quantifying attention flow in transformers , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
[22]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Transformer interpretability beyond attention visualization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[23]

International conference on machine learning , pages=

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) , author=. International conference on machine learning , pages=. 2018 , organization=

2018
[24]

International conference on machine learning , pages=

Concept bottleneck models , author=. International conference on machine learning , pages=. 2020 , organization=

2020
[25]

Advances in neural information processing systems , volume=

This looks like that: deep learning for interpretable image recognition , author=. Advances in neural information processing systems , volume=
[26]

The Thirteenth International Conference on Learning Representations , year=

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think , author=. The Thirteenth International Conference on Learning Representations , year=
[27]

The Fourteenth International Conference on Learning Representations , year=

Diffusion Transformers with Representation Autoencoders , author=. The Fourteenth International Conference on Learning Representations , year=
[28]

arXiv preprint arXiv:2512.17909 (2025) 3, 4

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing , author=. arXiv preprint arXiv:2512.17909 , year=

work page arXiv
[29]

International conference on machine learning , pages=

Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=

2019
[30]

Advances in neural information processing systems , volume=

Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability , author=. Advances in neural information processing systems , volume=
[31]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=
[32]

International conference on machine learning , pages=

Training data-efficient image transformers & distillation through attention , author=. International conference on machine learning , pages=. 2021 , organization=

2021
[33]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Swin transformer: Hierarchical vision transformer using shifted windows , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[34]

2019 IEEE international workshop on information forensics and security (WIFS) , pages=

Detecting and simulating artifacts in gan fake images , author=. 2019 IEEE international workshop on information forensics and security (WIFS) , pages=. 2019 , organization=

2019
[35]

European conference on computer vision , pages=

Thinking in frequency: Face forgery detection by mining frequency-aware clues , author=. European conference on computer vision , pages=. 2020 , organization=

2020
[36]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Global texture enhancement for fake face detection in the wild , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[37]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=
[38]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[39]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Forgery-aware adaptive transformer for generalizable synthetic image detection , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=
[40]

Orthogonal Subspace Decomposition for Generalizable

Zhiyuan Yan and Jiangming Wang and Peng Jin and Ke-Yue Zhang and Chengchun Liu and Shen Chen and Taiping Yao and Shouhong Ding and Baoyuan Wu and Li Yuan , booktitle=. Orthogonal Subspace Decomposition for Generalizable. 2025 , url=

2025
[41]

2009 IEEE conference on computer vision and pattern recognition , pages=

Imagenet: A large-scale hierarchical image database , author=. 2009 IEEE conference on computer vision and pattern recognition , pages=. 2009 , organization=

2009
[42]

arXiv preprint arXiv:2311.12397 , year=

Patchcraft: Exploring texture patch for efficient ai-generated image detection , author=. arXiv preprint arXiv:2311.12397 , year=

work page arXiv
[43]

IEEE Transactions on Neural Networks and Learning Systems , volume=

Visualizing and understanding patch interactions in vision transformer , author=. IEEE Transactions on Neural Networks and Learning Systems , volume=. 2023 , publisher=

2023
[44]

, author=

Lora: Low-rank adaptation of large language models. , author=. Iclr , volume=
[45]

Prentice Hall , year=

Algorithms for clustering data , author=. Prentice Hall , year=
[46]

Pattern recognition letters , volume=

Data clustering: 50 years beyond K-means , author=. Pattern recognition letters , volume=. 2010 , publisher=

2010
[47]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

2021
[48]

International conference on machine learning , pages=

Efficientnet: Rethinking model scaling for convolutional neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

2019
[49]

DINOv3

Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[50]

DINOv2: Learning Robust Visual Features without Supervision

Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[51]

Forty-first International Conference on Machine Learning , year=

Outlier-aware slicing for post-training quantization in vision transformer , author=. Forty-first International Conference on Machine Learning , year=
[52]

International Journal of Computer Vision , volume=

An information theory-inspired strategy for automated network pruning , author=. International Journal of Computer Vision , volume=. 2025 , publisher=

2025
[53]

Proceedings of the 33rd ACM International Conference on Multimedia , pages=

Hrseg: High-resolution visual perception and enhancement for reasoning segmentation , author=. Proceedings of the 33rd ACM International Conference on Multimedia , pages=
[54]

Visual Intelligence , volume=

Towards reliable deepfake detection from uncertainty calibration perspective , author=. Visual Intelligence , volume=. 2025 , publisher=

2025
[55]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Zooming in on fakes: A novel dataset for localized AI-generated image detection with forgery amplification approach , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[56]

Advances in Neural Information Processing Systems , volume=

Diffusionfake: Enhancing generalization in deepfake detection via guided stable diffusion , author=. Advances in Neural Information Processing Systems , volume=
[57]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

Stealthdiffusion: Towards evading diffusion forensic detection through diffusion model , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=
[58]

Proceedings of the AAAI conference on artificial intelligence , volume=

Domain general face forgery detection by learning to weight , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[59]

Proceedings of the AAAI conference on artificial intelligence , volume=

Dual contrastive learning for general face forgery detection , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[60]

European conference on computer vision , pages=

An information theoretic approach for attention-driven face forgery detection , author=. European conference on computer vision , pages=. 2022 , organization=

2022
[61]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Aigi-holmes: Towards explainable and generalizable ai-generated image detection via multimodal large language models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=
[62]

The Fourteenth International Conference on Learning Representations , year=

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? , author=. The Fourteenth International Conference on Learning Representations , year=
[63]

International conference on machine learning , pages=

Leveraging frequency analysis for deep fake image recognition , author=. International conference on machine learning , pages=. 2020 , organization=

2020
[64]

2022 IEEE International Conference on Image Processing (ICIP) , pages=

Fusing global and local features for generalized ai-synthesized image detection , author=. 2022 IEEE International Conference on Image Processing (ICIP) , pages=. 2022 , organization=

2022
[65]

European conference on computer vision , pages=

Detecting generated images by real images , author=. European conference on computer vision , pages=. 2022 , organization=

2022
[66]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Learning on gradients: Generalized artifacts representation for gan-generated images detection , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[1] [1]

Advances in neural information processing systems , volume=

Generative adversarial nets , author=. Advances in neural information processing systems , volume=

[2] [2]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Analyzing and improving the image quality of stylegan , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[3] [3]

Advances in neural information processing systems , volume=

Denoising diffusion probabilistic models , author=. Advances in neural information processing systems , volume=

[4] [4]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

High-resolution image synthesis with latent diffusion models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[5] [5]

Advances in neural information processing systems , volume=

Genimage: A million-scale benchmark for detecting ai-generated image , author=. Advances in neural information processing systems , volume=

[6] [6]

for now , author=

CNN-generated images are surprisingly easy to spot... for now , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[7] [7]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Towards universal fake image detectors that generalize across generative models , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[8] [8]

Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation , pages=

SIDBench: A Python framework for reliably assessing synthetic image detection methods , author=. Proceedings of the 3rd ACM International Workshop on Multimedia AI against Disinformation , pages=

[9] [9]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Dire for diffusion-generated image detection , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[10] [10]

Luo, Yunpeng and Du, Junlong and Yan, Ke and Ding, Shouhong , booktitle=. Lare\^

[11] [11]

Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

Fire: Robust detection of diffusion-generated images via frequency-guided reconstruction error , author=. Proceedings of the Computer Vision and Pattern Recognition Conference , pages=

[12] [12]

European conference on computer vision , pages=

Discovering transferable forensic features for cnn-generated images detection , author=. European conference on computer vision , pages=. 2022 , organization=

2022

[13] [13]

Advances in neural information processing systems , volume=

Emergent correspondence from image diffusion , author=. Advances in neural information processing systems , volume=

[14] [14]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Cleandift: Diffusion features without noise , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[15] [15]

Forty-first International Conference on Machine Learning , year=

Position: The platonic representation hypothesis , author=. Forty-first International Conference on Machine Learning , year=

[16] [16]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Raising the bar of ai-generated image detection with clip , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[17] [17]

International Conference on Learning Representations , volume=

A sanity check for ai-generated image detection , author=. International Conference on Learning Representations , volume=

[18] [18]

Forty-first International Conference on Machine Learning , year=

Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images , author=. Forty-first International Conference on Machine Learning , year=

[19] [19]

The Thirteenth International Conference on Learning Representations , year=

Enhancing Pre-trained Representation Classifiability can Boost its Interpretability , author=. The Thirteenth International Conference on Learning Representations , year=

[20] [20]

International conference on machine learning , pages=

Axiomatic attribution for deep networks , author=. International conference on machine learning , pages=. 2017 , organization=

2017

[21] [21]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Quantifying attention flow in transformers , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

[22] [22]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Transformer interpretability beyond attention visualization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[23] [23]

International conference on machine learning , pages=

Interpretability beyond feature attribution: Quantitative testing with concept activation vectors (tcav) , author=. International conference on machine learning , pages=. 2018 , organization=

2018

[24] [24]

International conference on machine learning , pages=

Concept bottleneck models , author=. International conference on machine learning , pages=. 2020 , organization=

2020

[25] [25]

Advances in neural information processing systems , volume=

This looks like that: deep learning for interpretable image recognition , author=. Advances in neural information processing systems , volume=

[26] [26]

The Thirteenth International Conference on Learning Representations , year=

Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think , author=. The Thirteenth International Conference on Learning Representations , year=

[27] [27]

The Fourteenth International Conference on Learning Representations , year=

Diffusion Transformers with Representation Autoencoders , author=. The Fourteenth International Conference on Learning Representations , year=

[28] [28]

arXiv preprint arXiv:2512.17909 (2025) 3, 4

Both Semantics and Reconstruction Matter: Making Representation Encoders Ready for Text-to-Image Generation and Editing , author=. arXiv preprint arXiv:2512.17909 , year=

work page arXiv

[29] [29]

International conference on machine learning , pages=

Similarity of neural network representations revisited , author=. International conference on machine learning , pages=. 2019 , organization=

2019

[30] [30]

Advances in neural information processing systems , volume=

Svcca: Singular vector canonical correlation analysis for deep learning dynamics and interpretability , author=. Advances in neural information processing systems , volume=

[31] [31]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deep residual learning for image recognition , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

[32] [32]

International conference on machine learning , pages=

Training data-efficient image transformers & distillation through attention , author=. International conference on machine learning , pages=. 2021 , organization=

2021

[33] [33]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Swin transformer: Hierarchical vision transformer using shifted windows , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=

[34] [34]

2019 IEEE international workshop on information forensics and security (WIFS) , pages=

Detecting and simulating artifacts in gan fake images , author=. 2019 IEEE international workshop on information forensics and security (WIFS) , pages=. 2019 , organization=

2019

[35] [35]

European conference on computer vision , pages=

Thinking in frequency: Face forgery detection by mining frequency-aware clues , author=. European conference on computer vision , pages=. 2020 , organization=

2020

[36] [36]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Global texture enhancement for fake face detection in the wild , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[37] [37]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Rethinking the up-sampling operations in cnn-based generative network for generalizable deepfake detection , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

[38] [38]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Frequency-aware deepfake detection: Improving generalizability through frequency space domain learning , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[39] [39]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Forgery-aware adaptive transformer for generalizable synthetic image detection , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

[40] [40]

Orthogonal Subspace Decomposition for Generalizable

Zhiyuan Yan and Jiangming Wang and Peng Jin and Ke-Yue Zhang and Chengchun Liu and Shen Chen and Taiping Yao and Shouhong Ding and Baoyuan Wu and Li Yuan , booktitle=. Orthogonal Subspace Decomposition for Generalizable. 2025 , url=

2025

[41] [41]

2009 IEEE conference on computer vision and pattern recognition , pages=

Imagenet: A large-scale hierarchical image database , author=. 2009 IEEE conference on computer vision and pattern recognition , pages=. 2009 , organization=

2009

[42] [42]

arXiv preprint arXiv:2311.12397 , year=

Patchcraft: Exploring texture patch for efficient ai-generated image detection , author=. arXiv preprint arXiv:2311.12397 , year=

work page arXiv

[43] [43]

IEEE Transactions on Neural Networks and Learning Systems , volume=

Visualizing and understanding patch interactions in vision transformer , author=. IEEE Transactions on Neural Networks and Learning Systems , volume=. 2023 , publisher=

2023

[44] [44]

, author=

Lora: Low-rank adaptation of large language models. , author=. Iclr , volume=

[45] [45]

Prentice Hall , year=

Algorithms for clustering data , author=. Prentice Hall , year=

[46] [46]

Pattern recognition letters , volume=

Data clustering: 50 years beyond K-means , author=. Pattern recognition letters , volume=. 2010 , publisher=

2010

[47] [47]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

2021

[48] [48]

International conference on machine learning , pages=

Efficientnet: Rethinking model scaling for convolutional neural networks , author=. International conference on machine learning , pages=. 2019 , organization=

2019

[49] [49]

DINOv3

Dinov3 , author=. arXiv preprint arXiv:2508.10104 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[50] [50]

DINOv2: Learning Robust Visual Features without Supervision

Dinov2: Learning robust visual features without supervision , author=. arXiv preprint arXiv:2304.07193 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[51] [51]

Forty-first International Conference on Machine Learning , year=

Outlier-aware slicing for post-training quantization in vision transformer , author=. Forty-first International Conference on Machine Learning , year=

[52] [52]

International Journal of Computer Vision , volume=

An information theory-inspired strategy for automated network pruning , author=. International Journal of Computer Vision , volume=. 2025 , publisher=

2025

[53] [53]

Proceedings of the 33rd ACM International Conference on Multimedia , pages=

Hrseg: High-resolution visual perception and enhancement for reasoning segmentation , author=. Proceedings of the 33rd ACM International Conference on Multimedia , pages=

[54] [54]

Visual Intelligence , volume=

Towards reliable deepfake detection from uncertainty calibration perspective , author=. Visual Intelligence , volume=. 2025 , publisher=

2025

[55] [55]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Zooming in on fakes: A novel dataset for localized AI-generated image detection with forgery amplification approach , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[56] [56]

Advances in Neural Information Processing Systems , volume=

Diffusionfake: Enhancing generalization in deepfake detection via guided stable diffusion , author=. Advances in Neural Information Processing Systems , volume=

[57] [57]

Proceedings of the 32nd ACM International Conference on Multimedia , pages=

Stealthdiffusion: Towards evading diffusion forensic detection through diffusion model , author=. Proceedings of the 32nd ACM International Conference on Multimedia , pages=

[58] [58]

Proceedings of the AAAI conference on artificial intelligence , volume=

Domain general face forgery detection by learning to weight , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[59] [59]

Proceedings of the AAAI conference on artificial intelligence , volume=

Dual contrastive learning for general face forgery detection , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[60] [60]

European conference on computer vision , pages=

An information theoretic approach for attention-driven face forgery detection , author=. European conference on computer vision , pages=. 2022 , organization=

2022

[61] [61]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Aigi-holmes: Towards explainable and generalizable ai-generated image detection via multimodal large language models , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

[62] [62]

The Fourteenth International Conference on Learning Representations , year=

Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? , author=. The Fourteenth International Conference on Learning Representations , year=

[63] [63]

International conference on machine learning , pages=

Leveraging frequency analysis for deep fake image recognition , author=. International conference on machine learning , pages=. 2020 , organization=

2020

[64] [64]

2022 IEEE International Conference on Image Processing (ICIP) , pages=

Fusing global and local features for generalized ai-synthesized image detection , author=. 2022 IEEE International Conference on Image Processing (ICIP) , pages=. 2022 , organization=

2022

[65] [65]

European conference on computer vision , pages=

Detecting generated images by real images , author=. European conference on computer vision , pages=. 2022 , organization=

2022

[66] [66]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Learning on gradients: Generalized artifacts representation for gan-generated images detection , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=