Recognition: 2 theorem links
· Lean TheoremMicro-Defects Expose Macro-Fakes: Detecting AI-Generated Images via Local Distributional Shifts
Pith reviewed 2026-05-12 04:26 UTC · model grok-4.3
The pith
By shifting focus from global image semantics to local patches, a detector amplifies micro-defects in AI-generated images into measurable distributional gaps.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that patch-wise modeling with a learnable Patch Forensic Signature produces provably larger discrepancies via Maximum Mean Discrepancy when localized forensic signals are present in generated images, enabling more reliable separation from real images than global feature methods.
What carries the argument
The learnable Patch Forensic Signature: a projection of semantic patch embeddings into a compact forensic latent space that preserves localized statistical irregularities for MMD comparison.
If this is right
- Patch-wise modeling yields provably larger discrepancies than global aggregation when localized forensic signals exist.
- The method separates real and generated images more reliably across multiple standard benchmarks.
- Localized cues remain effective even when global semantics are realistic and hard to distinguish.
Where Pith is reading between the lines
- The same local-projection idea could be tested on video frames or audio segments if micro-artifacts appear there too.
- Different generative models might imprint distinct patterns in the forensic latent space, allowing model attribution as a side benefit.
- Hybrid detectors that combine this local signal with global checks might further reduce false positives on edge cases.
Load-bearing premise
AI-generated images consistently contain localized micro-defects that survive semantic patch embedding and are not erased or masked by the learnable projection during training.
What would settle it
An experiment showing that removing the local patch projection and MMD step yields equivalent or better detection performance on the same benchmarks, or finding that advanced generators produce images without detectable localized micro-defects after embedding.
Figures
read the original abstract
Recent generative models can produce images that appear highly realistic, raising challenges in distinguishing real and AI-generated images. Yet existing detectors based on pre-trained feature extractors tend to over-rely on global semantics, limiting sensitivity to the critical micro-defects. In this work, we propose Micro-Defects expose Macro-Fakes (MDMF), a local distribution-aware detection framework that amplifies micro-scale statistical irregularities into macro-level distributional discrepancies. To avoid localized forensic cues being diluted by plain aggregation, we introduce a learnable Patch Forensic Signature that projects semantic patch embeddings into a compact forensic latent space. We then use Maximum Mean Discrepancy (MMD) to quantify distributional discrepancies between generated and real images. Our theory-grounded analysis shows that patch-wise modeling yields provably larger discrepancies when localized forensic signals are present in generated images, enabling more reliable separation from real images. Extensive experiments demonstrate that MDMF consistently outperforms baseline detectors across multiple benchmarks, validating its general effectiveness. Project page: https://zbox1005.github.io/MDMF-project/
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces MDMF, a local distribution-aware framework for detecting AI-generated images. It extracts semantic patch embeddings from a pre-trained model, projects them via a learnable Patch Forensic Signature into a compact forensic space, and applies patch-wise Maximum Mean Discrepancy (MMD) to quantify distributional shifts. The central claim is that this patch-wise approach produces provably larger discrepancies than global methods when localized micro-defects are present, leading to more reliable real-vs-generated separation. Extensive experiments are said to show consistent outperformance over baselines on multiple benchmarks.
Significance. If the theoretical separation claim can be rigorously established and the method generalizes beyond the evaluated generators, MDMF could meaningfully advance image forensics by exploiting localized statistical irregularities that global semantic extractors tend to suppress. The combination of a learnable projection with MMD on patches is a plausible way to amplify subtle artifacts without requiring new architectures. However, the absence of a detailed derivation in the abstract and the reliance on an end-to-end trained projection limit the assessed significance until the proof and robustness checks are provided.
major comments (2)
- [Abstract] Abstract: The assertion that 'patch-wise modeling yields provably larger discrepancies' when localized forensic signals are present is stated without derivation steps, explicit assumptions, error analysis, or comparison to global MMD. This is load-bearing for the central claim of more reliable separation; without it, the advantage reduces to an empirical observation rather than a theory-grounded result.
- [Theoretical Analysis] Theoretical Analysis section: The key assumption that micro-defects survive both the pre-trained patch embeddings and the subsequent learnable projection (whose weights are free parameters) is not bounded or verified. If the projection is optimized end-to-end on the detection task, the reported MMD gap may be an artifact of fitting rather than an independent property of patch-wise modeling, creating circularity risk for the 'provable' claim.
minor comments (2)
- [Abstract] The acronym MMD is used without expansion on first appearance in the abstract; define it explicitly.
- The project page link is provided but no mention of code or data release; consider adding a reproducibility statement if code will be made available.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which help clarify the presentation of our theoretical contributions. We address each major comment below and outline the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that 'patch-wise modeling yields provably larger discrepancies' when localized forensic signals are present is stated without derivation steps, explicit assumptions, error analysis, or comparison to global MMD. This is load-bearing for the central claim of more reliable separation; without it, the advantage reduces to an empirical observation rather than a theory-grounded result.
Authors: We agree that the abstract statement is too concise and lacks explicit pointers to the supporting derivation. In the revised version, we will expand the abstract to include a brief outline of the key steps (locality of defects implies patch-wise MMD strictly exceeds global MMD unless defects are uniformly distributed), the main assumption (micro-defects are spatially localized), and a direct comparison to global MMD. We will also add an explicit cross-reference to the full proof, assumptions, and error bounds in the Theoretical Analysis section. revision: yes
-
Referee: [Theoretical Analysis] Theoretical Analysis section: The key assumption that micro-defects survive both the pre-trained patch embeddings and the subsequent learnable projection (whose weights are free parameters) is not bounded or verified. If the projection is optimized end-to-end on the detection task, the reported MMD gap may be an artifact of fitting rather than an independent property of patch-wise modeling, creating circularity risk for the 'provable' claim.
Authors: This concern about potential circularity is well-taken. The existing proof in Section 3 establishes the inequality for any fixed linear projection, relying only on the spatial localization of defects rather than on the specific learned weights. To address the referee's point directly, we will add two elements in revision: (1) a short lemma bounding the effect of the learned projection under a Lipschitz continuity assumption on the pre-trained embeddings, showing that the gap cannot be driven to zero when defects remain localized, and (2) an ablation study comparing MMD values obtained with the learned projection against those obtained with a fixed identity or random projection. These additions will be placed in the Theoretical Analysis section to separate the general locality argument from the learned-projection case. revision: yes
Circularity Check
No circularity: theory-grounded MMD analysis is independent of fitted projection
full rationale
The paper's derivation chain centers on a learnable Patch Forensic Signature projecting pre-trained patch embeddings, followed by patch-wise MMD for discrepancy quantification. The claimed 'theory-grounded analysis' asserts provably larger discrepancies under localized forensic signals, which follows from standard properties of MMD (a kernel-based metric) applied to partitioned patches versus global aggregation, rather than from any fitted parameter or self-referential definition. No equations or steps reduce the 'provable' gap to the training objective by construction; the projection serves as an empirical amplifier while the inequality holds conditionally on signal presence in the embedding space. No self-citations are load-bearing for the core claim, no ansatz is smuggled, and no known result is merely renamed. The framework remains self-contained against external MMD theory and pre-trained extractors.
Axiom & Free-Parameter Ledger
free parameters (1)
- Patch Forensic Signature projection weights
axioms (2)
- domain assumption Localized forensic signals exist and are preserved in patch embeddings of generated images
- standard math MMD is an appropriate and unbiased measure of distributional difference for forensic signatures
invented entities (1)
-
Patch Forensic Signature
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We then use Maximum Mean Discrepancy (MMD) to quantify distributional discrepancies between generated and real images... patch-wise modeling yields provably larger discrepancies
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leancostAlphaLog_fourth_deriv_at_zero unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Assumption 2.3 (Sparse Defect Model)... ei(y)=ui + ai si μdefect
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock. Large scale gan training for high fidelity natural image synthesis.arXiv preprint arXiv:1809.11096, 2018
work page internal anchor Pith review arXiv 2018
-
[2]
Video generation models as world simulators
Tim Brooks, Bill Peebles, Connor Holmes, Will DePue, Yufei Guo, Li Jing, David Schnurr, Joe Taylor, Troy Luhman, Eric Luhman, et al. Video generation models as world simulators. OpenAI Blog, 1(8):1, 2024
work page 2024
-
[3]
Real-time deepfake detection in the real-world
Bar Cavia, Eliahu Horwitz, Tal Reiss, and Yedid Hoshen. Real-time deepfake detection in the real-world.arXiv preprint arXiv:2406.09398, 2024
-
[4]
What makes fake images detectable? understanding properties that generalize
Lucy Chai, David Bau, Ser-Nam Lim, and Phillip Isola. What makes fake images detectable? understanding properties that generalize. InEuropean conference on computer vision, pages 103–120. Springer, 2020
work page 2020
-
[5]
Baoying Chen, Jishen Zeng, Jianquan Yang, and Rui Yang. Drct: Diffusion reconstruction contrastive training towards universal detection of diffusion generated images. InForty-first International Conference on Machine Learning, 2024
work page 2024
-
[6]
Demamba: Ai-generated video detection on million-scale genvideo benchmark,
Haoxing Chen, Yan Hong, Zizheng Huang, Zhuoer Xu, Zhangxuan Gu, Yaohui Li, Jun Lan, Huijia Zhu, Jianfu Zhang, Weiqiang Wang, et al. Demamba: Ai-generated video detection on million-scale genvideo benchmark.arXiv preprint arXiv:2405.19707, 2024
-
[7]
Sungik Choi, Hankook Lee, and Moontae Lee. Training-free detection of ai-generated images via cropping robustness.arXiv preprint arXiv:2511.14030, 2025
-
[8]
On the detection of synthetic images generated by diffusion models
Riccardo Corvi, Davide Cozzolino, Giada Zingarini, Giovanni Poggi, Koki Nagano, and Luisa Verdoliva. On the detection of synthetic images generated by diffusion models. InICASSP 2023- 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1–5. IEEE, 2023
work page 2023
-
[9]
Imagenet: A large- scale hierarchical image database
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009
work page 2009
-
[10]
Diffusion models beat gans on image synthesis
Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34:8780–8794, 2021
work page 2021
-
[11]
A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012
Arthur Gretton, Karsten M Borgwardt, Malte J Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.The journal of machine learning research, 13(1):723–773, 2012
work page 2012
-
[12]
Jiaxi Gu, Xiaojun Meng, Guansong Lu, Lu Hou, Niu Minzhe, Xiaodan Liang, Lewei Yao, Runhui Huang, Wei Zhang, Xin Jiang, et al. Wukong: A 100 million large-scale chinese cross-modal pre-training benchmark.Advances in Neural Information Processing Systems, 35:26418–26431, 2022
work page 2022
-
[13]
Zhiyuan He, Pin-Yu Chen, and Tsung-Yi Ho. Rigid: A training-free and model-agnostic framework for robust ai-generated image detection.arXiv preprint arXiv:2405.20112, 2024. 10
-
[14]
Arash Heidari, Nima Jafari Navimipour, Hasan Dag, and Mehmet Unal. Deepfake detection using deep learning methods: A systematic and comprehensive review.Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 14(2):e1520, 2024
work page 2024
-
[15]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020
work page 2020
-
[16]
A style-based generator architecture for generative adversarial networks
Tero Karras, Samuli Laine, and Timo Aila. A style-based generator architecture for generative adversarial networks. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019
work page 2019
-
[17]
Auto-Encoding Variational Bayes
Diederik P Kingma and Max Welling. Auto-encoding variational bayes.arXiv preprint arXiv:1312.6114, 2013
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[18]
Improving synthetic image detection towards generalization: An image transformation perspective
Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Fuli Feng. Improving synthetic image detection towards generalization: An image transformation perspective. In Proceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V . 1, pages 2405–2414, 2025
work page 2025
-
[19]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[20]
Learning deep kernels for non-parametric two-sample tests
Feng Liu, Wenkai Xu, Jie Lu, Guangquan Zhang, Arthur Gretton, and Danica J Sutherland. Learning deep kernels for non-parametric two-sample tests. InInternational conference on machine learning, pages 6316–6326. PMLR, 2020
work page 2020
-
[21]
Forgery-aware adaptive transformer for generalizable synthetic image detection
Huan Liu, Zichang Tan, Chuangchuang Tan, Yunchao Wei, Jingdong Wang, and Yao Zhao. Forgery-aware adaptive transformer for generalizable synthetic image detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10770–10780, 2024
work page 2024
-
[22]
Swin transformer: Hierarchical vision transformer using shifted windows
Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021
work page 2021
-
[23]
Global texture enhancement for fake face detection in the wild
Zhengzhe Liu, Xiaojuan Qi, and Philip HS Torr. Global texture enhancement for fake face detection in the wild. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8060–8069, 2020
work page 2020
-
[24]
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[25]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InInternational conference on machine learning, pages 8162–8171. PMLR, 2021
work page 2021
-
[26]
Towards universal fake image detectors that generalize across generative models
Utkarsh Ojha, Yuheng Li, and Yong Jae Lee. Towards universal fake image detectors that generalize across generative models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24480–24489, 2023
work page 2023
-
[27]
Maxime Oquab, Timothée Darcet, Théo Moutakanni, Huy V . V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel HAZIZA, Francisco Massa, Alaaeldin El-Nouby, Mido Assran, Nicolas Ballas, Wojciech Galuba, Russell Howes, Po-Yao Huang, Shang-Wen Li, Ishan Misra, Michael Rabbat, Vasu Sharma, Gabriel Synnaeve, Hu Xu, Herve Jegou, Julien Mairal, Patrick L...
work page 2024
-
[28]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023
work page 2023
-
[29]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis.arXiv preprint arXiv:2307.01952, 2023. 11
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[30]
Thinking in frequency: Face forgery detection by mining frequency-aware clues
Yuyang Qian, Guojun Yin, Lu Sheng, Zixuan Chen, and Jing Shao. Thinking in frequency: Face forgery detection by mining frequency-aware clues. InEuropean conference on computer vision, pages 86–103. Springer, 2020
work page 2020
-
[31]
Anirudh Sundara Rajan and Yong Jae Lee. Stay-positive: A case for ignoring real image features in fake image detection.arXiv preprint arXiv:2502.07778, 2025
-
[32]
Anirudh Sundara Rajan, Utkarsh Ojha, Jedidiah Schloesser, and Yong Jae Lee. Aligned datasets improve detection of latent diffusion-generated images.arXiv preprint arXiv:2410.11835, 2024
-
[33]
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with clip latents.arXiv preprint arXiv:2204.06125, 1(2):3, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[34]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
work page 2022
-
[35]
Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Salimans, et al. Photorealistic text-to-image diffusion models with deep language understanding.Advances in neural information processing systems, 35:36479–36494, 2022
work page 2022
-
[36]
Kihyuk Sohn, Honglak Lee, and Xinchen Yan. Learning structured output representation using deep conditional generative models.Advances in neural information processing systems, 28, 2015
work page 2015
-
[37]
Diffu- sion art or digital forgery? investigating data replication in diffusion models
Gowthami Somepalli, Vasu Singla, Micah Goldblum, Jonas Geiping, and Tom Goldstein. Diffu- sion art or digital forgery? investigating data replication in diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6048–6058, 2023
work page 2023
-
[38]
C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection
Chuangchuang Tan, Renshuai Tao, Huan Liu, Guanghua Gu, Baoyuan Wu, Yao Zhao, and Yunchao Wei. C2p-clip: Injecting category common prompt in clip to enhance generalization in deepfake detection. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 7184–7192, 2025
work page 2025
-
[39]
Chuangchuang Tan, Yao Zhao, Shikui Wei, Guanghua Gu, Ping Liu, and Yunchao Wei. Rethink- ing the up-sampling operations in cnn-based generative network for generalizable deepfake detection. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 28130–28139, 2024
work page 2024
-
[40]
Training data-efficient image transformers & distillation through attention
Hugo Touvron, Matthieu Cord, Matthijs Douze, Francisco Massa, Alexandre Sablayrolles, and Hervé Jégou. Training data-efficient image transformers & distillation through attention. In International conference on machine learning, pages 10347–10357. PMLR, 2021
work page 2021
-
[41]
Cambridge university press, 2019
Martin J Wainwright.High-dimensional statistics: A non-asymptotic viewpoint, volume 48. Cambridge university press, 2019
work page 2019
-
[42]
Lota: Bit-planes guided ai-generated image detection
Hongsong Wang, Renxi Cheng, Yang Zhang, Chaolei Han, and Jie Gui. Lota: Bit-planes guided ai-generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17246–17255, 2025
work page 2025
-
[43]
Detecting human artifacts from text-to- image models.arXiv preprint arXiv:2411.13842, 2024
Kaihong Wang, Lingzhi Zhang, and Jianming Zhang. Detecting human artifacts from text-to- image models.arXiv preprint arXiv:2411.13842, 2024
-
[44]
Cnn- generated images are surprisingly easy to spot
Sheng-Yu Wang, Oliver Wang, Richard Zhang, Andrew Owens, and Alexei A Efros. Cnn- generated images are surprisingly easy to spot... for now. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020
work page 2020
-
[45]
Yiming Wang, Pei Zhang, Baosong Yang, Derek Wong, Zhuosheng Zhang, and Rui Wang. Embedding trajectory for out-of-distribution detection in mathematical reasoning.Advances in Neural Information Processing Systems, 37:42965–42999, 2024. 12
work page 2024
-
[46]
Yiyang Wang, Xi Chen, Xiaogang Xu, Sihui Ji, Yu Liu, Yujun Shen, and Hengshuang Zhao. Diff- doctor: Diagnosing image diffusion models before treating.arXiv preprint arXiv:2501.12382, 2025
-
[47]
Dire for diffusion-generated image detection
Zhendong Wang, Jianmin Bao, Wengang Zhou, Weilun Wang, Hezhen Hu, Hong Chen, and Houqiang Li. Dire for diffusion-generated image detection. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 22445–22455, 2023
work page 2023
-
[48]
Msr-vtt: A large video description dataset for bridging video and language
Jun Xu, Tao Mei, Ting Yao, and Yong Rui. Msr-vtt: A large video description dataset for bridging video and language. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 5288–5296, 2016
work page 2016
-
[49]
A sanity check for ai-generated image detection.arXiv preprint arXiv:2406.19435, 2024
Shilin Yan, Ouxiang Li, Jiayin Cai, Yanbin Hao, Xiaolong Jiang, Yao Hu, and Weidi Xie. A sanity check for ai-generated image detection.arXiv preprint arXiv:2406.19435, 2024
-
[50]
Zhiyuan Yan, Jiangming Wang, Peng Jin, Ke-Yue Zhang, Chengchun Liu, Shen Chen, Taiping Yao, Shouhong Ding, Baoyuan Wu, and Li Yuan. Orthogonal subspace decomposition for generalizable ai-generated image detection.arXiv preprint arXiv:2411.15633, 2024
-
[51]
LSUN: Construction of a Large-scale Image Dataset using Deep Learning with Humans in the Loop
Fisher Yu, Ari Seff, Yinda Zhang, Shuran Song, Thomas Funkhouser, and Jianxiong Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop.arXiv preprint arXiv:1506.03365, 2015
work page internal anchor Pith review arXiv 2015
-
[52]
Boxuan Zhang, Jianing Zhu, Zengmao Wang, Tongliang Liu, Bo Du, and Bo Han. What if the input is expanded in ood detection?Advances in Neural Information Processing Systems, 37:21289–21329, 2024
work page 2024
-
[53]
Physics-driven spatiotemporal modeling for ai-generated video detection
Shuhai Zhang, ZiHao Lian, Jiahao Yang, Daiyuan Li, Guoxuan Pang, Feng Liu, Bo Han, Shutao Li, and Mingkui Tan. Physics-driven spatiotemporal modeling for ai-generated video detection. arXiv preprint arXiv:2510.08073, 2025
-
[54]
Shuhai Zhang, Yiliao Song, Jiahao Yang, Yuanqing Li, Bo Han, and Mingkui Tan. Detecting machine-generated texts by multi-population aware optimization for maximum mean discrep- ancy.arXiv preprint arXiv:2402.16041, 2024
-
[55]
Detecting and simulating artifacts in gan fake images
Xu Zhang, Svebor Karaman, and Shih-Fu Chang. Detecting and simulating artifacts in gan fake images. In2019 IEEE international workshop on information forensics and security (WIFS), pages 1–6. IEEE, 2019
work page 2019
-
[56]
arXiv preprint arXiv:2511.01293 , year=
Yonggang Zhang, Jun Nie, Xinmei Tian, Mingming Gong, Kun Zhang, and Bo Han. Detecting generated images by fitting natural image distributions.arXiv preprint arXiv:2511.01293, 2025
-
[57]
Open-Sora: Democratizing Efficient Video Production for All
Zangwei Zheng, Xiangyu Peng, Tianji Yang, Chenhui Shen, Shenggui Li, Hongxin Liu, Yukun Zhou, Tianyi Li, and Yang You. Open-sora: Democratizing efficient video production for all. arXiv preprint arXiv:2412.20404, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
arXiv preprint arXiv:2311.12397 , year=
Nan Zhong, Yiran Xu, Sheng Li, Zhenxing Qian, and Xinpeng Zhang. Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397, 2023
-
[59]
Jiawei Zhou, Yixuan Zhang, Qianni Luo, Andrea G Parker, and Munmun De Choudhury. Synthetic lies: Understanding ai-generated misinformation and evaluating algorithmic and human solutions. InProceedings of the 2023 CHI conference on human factors in computing systems, pages 1–20, 2023
work page 2023
-
[60]
Mingjian Zhu, Hanting Chen, Mouxiao Huang, Wei Li, Hailin Hu, Jie Hu, and Yunhe Wang. Gendet: Towards good generalizations for ai-generated image detection.arXiv preprint arXiv:2312.08880, 2023
-
[61]
Mingjian Zhu, Hanting Chen, Qiangyu Yan, Xudong Huang, Guanyu Lin, Wei Li, Zhijun Tu, Hailin Hu, Jie Hu, and Yunhe Wang. Genimage: A million-scale benchmark for detecting ai-generated image.Advances in Neural Information Processing Systems, 36:77771–77782, 2023. 13 Appendices A Theoretical Analysis 15 A.1 Preliminaries and Modeling Assumptions . . . . . ....
work page 2023
-
[62]
24 Positivity and monotonicity in ∥∆PFS∥2.Let a:= γ2 γ2+2σ2z Kd 2 >0 and t:=∥∆ PFS∥2 ≥0
Hence, MMD2(P,Q;k ω) = 2 γ2 γ2 + 2σ2z Kd 2 1−exp − K∥∆PFS∥2 2 2(γ2 + 2σ2z) , which proves (11). 24 Positivity and monotonicity in ∥∆PFS∥2.Let a:= γ2 γ2+2σ2z Kd 2 >0 and t:=∥∆ PFS∥2 ≥0 . Then MMD2(P,Q;k ω) = 2a 1−e −t2/(2(γ2+2σ2 z)) . Ift >0, thene −t2/(2(γ2+2σ2 z)) ∈(0,1)and henceMMD 2 >0. Moreover, d dt 1−e −t2/(2(γ2+2σ2 z)) =e −t2/(2(γ2+2σ2 z)) · t γ2 +...
-
[63]
learns from diffusion reconstructions and contrastive hard samples to enhance robustness, F- ConV [56] exploits manifold geometry with flow-based extrusion. Motivated by the increasing sparsity of generative artifacts, some methods shift to patch-level evidence. PatchCraft [58] enhances texture traces via smash and reconstruction, FatFormer [ 21] adapts C...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.