Deepfakes: we need to re-think the concept of "real" images
Pith reviewed 2026-05-18 14:25 UTC · model grok-4.3
The pith
Smartphone image formation now relies on neural networks similar to those generating deepfakes, so the idea of a 'real' image needs redefinition.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Today, the vast majority of photographs are produced by smartphones that employ neural network architectures to compute images from multiple sensor inputs over time. These image formation processes are closely related to the neural networks used for generating deepfakes. Therefore, the distinction between real and fake images requires fundamental reconsideration, and the field must develop new definitions and datasets rather than focusing solely on generative models.
What carries the argument
Neural network-based image formation algorithms in modern smartphone cameras, which process multiple inputs into a single output image.
If this is right
- Current fake detection methods trained on old datasets may fail to distinguish modern real images from fakes.
- New benchmark datasets of contemporary real images are necessary for evaluating detectors.
- A clear technical definition of what constitutes a real image is required before detection can be reliable.
- The objective of detecting fake images may need to be re-evaluated entirely if the boundary with real images is blurred.
Where Pith is reading between the lines
- Detectors might need to analyze the specific sensor fusion processes rather than just pixel-level artifacts.
- This shift could affect legal standards for image evidence in courts.
- Future research should explore whether all digital images are now inherently processed in ways that introduce generative-like steps.
Load-bearing premise
The assumption that smartphone image-formation neural networks are similar enough to deepfake generators that they undermine current detection approaches based on outdated real-image datasets.
What would settle it
A direct comparison showing that state-of-the-art deepfake detectors perform significantly worse on high-resolution modern smartphone photographs than on ImageNet-style images would support the claim.
Figures
read the original abstract
The wide availability and low usability barrier of modern image generation models has triggered the reasonable fear of criminal misconduct and negative social implications. The machine learning community has been engaging this problem with an extensive series of publications proposing algorithmic solutions for the detection of "fake", e.g. entirely generated or partially manipulated images. While there is undoubtedly some progress towards technical solutions of the problem, we argue that current and prior work is focusing too much on generative algorithms and "fake" data-samples, neglecting a clear definition and data collection of "real" images. The fundamental question "what is a real image?" might appear to be quite philosophical, but our analysis shows that the development and evaluation of basically all current "fake"-detection methods is relying on only a few, quite old low-resolution datasets of "real" images like ImageNet. However, the technology for the acquisition of "real" images, aka taking photos, has drastically evolved over the last decade: Today, over 90% of all photographs are produced by smartphones which typically use algorithms to compute an image from multiple inputs (over time) from multiple sensors. Based on the fact that these image formation algorithms are typically neural network architectures which are closely related to "fake"-image generators, we state the position that today, we need to re-think the concept of "real" images. The purpose of this position paper is to raise the awareness of the current shortcomings in this active field of research and to trigger an open discussion whether the detection of "fake" images is a sound objective at all. At the very least, we need a clear technical definition of "real" images and new benchmark datasets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper is a position paper claiming that deepfake detection research over-focuses on generative algorithms and 'fake' samples while neglecting a rigorous definition of 'real' images. It notes that current detectors rely on a few old low-resolution datasets such as ImageNet, whereas over 90% of modern photographs are produced by smartphones whose computational photography pipelines use neural-network image-formation algorithms that the authors describe as closely related to those in fake-image generators; consequently the field should re-think the concept of real images, supply a clear technical definition, and create new benchmark datasets.
Significance. If the central analogy between smartphone NN pipelines and generative models can be substantiated, the position could usefully redirect attention toward dataset construction and detector assumptions that better reflect contemporary image acquisition. The manuscript correctly flags the field's continued dependence on legacy datasets as a structural limitation and may stimulate discussion on whether detection remains a well-posed task.
major comments (1)
- Abstract: the claim that smartphone image-formation algorithms are 'typically neural network architectures which are closely related to fake-image generators' is load-bearing for the argument that current detection approaches may not be conceptually sound, yet the manuscript supplies neither specific pipeline references (e.g., multi-frame fusion or learned denoising in commercial ISPs), architectural comparisons, nor training-objective overlaps to ground the asserted relation.
minor comments (2)
- The statistic that 'over 90% of all photographs are produced by smartphones' would benefit from an explicit citation or data source.
- A short section outlining what a 'clear technical definition of real images' might contain would help translate the position into actionable research guidance.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback, which correctly identifies a point where our position paper would benefit from greater technical grounding. We address the major comment below and have prepared revisions to strengthen the manuscript while preserving its intent as a position piece.
read point-by-point responses
-
Referee: Abstract: the claim that smartphone image-formation algorithms are 'typically neural network architectures which are closely related to fake-image generators' is load-bearing for the argument that current detection approaches may not be conceptually sound, yet the manuscript supplies neither specific pipeline references (e.g., multi-frame fusion or learned denoising in commercial ISPs), architectural comparisons, nor training-objective overlaps to ground the asserted relation.
Authors: We agree that the analogy, while central, is presented at a conceptual level in the current draft and would be more persuasive with concrete illustrations. As a position paper, our primary goal is to challenge assumptions in the detection literature rather than to deliver a comparative technical survey; nevertheless, the referee's observation is fair. In the revised version we will expand the relevant paragraph to include specific examples of neural components in commercial smartphone pipelines (e.g., learned denoising and multi-frame fusion in Google’s HDR+ and Apple’s Deep Fusion) and note shared architectural traits such as convolutional feature extraction and data-driven priors. We will also cite representative computational-photography literature to make the relation explicit without claiming identity between the two classes of models. revision: yes
Circularity Check
Position paper contains no derivations, equations or self-referential constructions
full rationale
The manuscript is a position paper that advances an argument about redefining 'real' images based on the prevalence of smartphone computational photography and the general observation that such pipelines employ neural networks. No equations, fitted parameters, predictions, or derivation chains appear in the text. The central claim rests on publicly documented technology trends (smartphone market share, multi-frame fusion) and dataset usage patterns rather than any quantity defined by the authors' own prior work or internal self-reference. No self-citations are invoked as load-bearing support, and the argument does not reduce any result to its inputs by construction. This is the expected outcome for a non-technical position piece that draws on external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Modern smartphone image formation uses neural-network architectures closely related to fake-image generators.
- domain assumption Development and evaluation of fake-detection methods relies on only a few old low-resolution datasets such as ImageNet.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Based on the fact that these image formation algorithms are typically neural network architectures which are closely related to 'fake'-image generators, we state the position that today, we need to re-think the concept of 'real' images.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
J. J. Bird and A. Lotfi. Cifake: Image classification and expl ainable identification of ai- generated synthetic images. IEEE Access, 12:15642–15650, 2024
work page 2024
-
[2]
Real-time deepfake detection in the real-world,
B. Cavia, E. Horwitz, T. Reiss, and Y. Hoshen. Real-time dee pfake detection in the real-world. arXiv preprint arXiv:2406.09398, 2024
-
[3]
L. Chai, D. Bau, S.-N. Lim, and P. Isola. What makes fake ima ges detectable? understanding properties that generalize. In Computer vision–ECCV 2020: 16th European conference, Glas gow, UK, August 23–28, 2020, proceedings, part XXVI 16 , pages 103–120. Springer, 2020
work page 2020
-
[4]
Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha. Stargan v2: Diverse ima ge synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition , pages 8188–8197, 2020
work page 2020
-
[5]
M. V. Conde, F. Vasluianu, J. Vazquez-Corral, and R. Timofte . Perceptual image enhancement for smartphone real-time applications. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages 1848–1858, 2023
work page 2023
-
[6]
D. Cozzolino, K. Nagano, L. Thomaz, A. Majumdar, and L. Verdo liva. Synthetic image de- tection: Highlights from the ieee video and image processing cu p 2022 student competition. arXiv preprint arXiv:2309.12428, 2023
-
[7]
D.-T. Dang-Nguyen, C. Pasquini, V. Conotter, and G. Boato. Rai se: a raw images dataset for digital image forensics. In Proceedings of the 6th ACM Multimedia Systems Conference, MMSys ’15, page 219–224, New York, NY, USA, 2015. Association for C omputing Machinery
work page 2015
-
[8]
M. Delbracio, D. Kelly, M. S. Brown, and P. Milanfar. Mobile c omputational photography: A tour. Annual review of vision science , 7(1):571–604, 2021
work page 2021
-
[9]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009
work page 2009
-
[10]
A. Dudhane, S. W. Zamir, S. Khan, F. S. Khan, and M.-H. Yang. Bur st image restoration and enhancement. In Proceedings of the ieee/cvf Conference on Computer Vision a nd Pattern Recog- nition, pages 5759–5768, 2022
work page 2022
- [11]
-
[12]
D. C. Epstein, I. Jain, O. Wang, and R. Zhang. Online detection of ai-generated images. In Proceedings of the IEEE/CVF international conference on co mputer vision, pages 382–392, 2023
work page 2023
- [13]
-
[14]
Z. Fu, M. Song, C. Ma, J. Nasti, V. Tyagi, G. Lloyd, and W. Tang . An efficient hybrid model for low-light image enhancement in mobile devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 3057–3066, 2022
work page 2022
-
[15]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. War de-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM , 63(11):139–144, 2020
work page 2020
-
[16]
P. Grommelt, L. Weiss, F.-J. Pfreundt, and J. Keuper. Fake or jpeg? revealing common bi- ases in generated image detection datasets. European Conference on Computer Vision (ECCV) workshop proceedings, CEGIS Workshop, 2024
work page 2024
-
[17]
K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for im age recognition. In Proceed- ings of the IEEE conference on computer vision and pattern re cognition, pages 770–778, 2016. 10
work page 2016
-
[18]
Y. Hong and J. Zhang. Wildfake: A large-scale challenging dat aset for ai-generated images detection. arXiv preprint arXiv:2402.11843, 2024
-
[19]
A. Ignatov, A. Sycheva, R. Timofte, Y. Tseng, Y.-S. Xu, P.- H. Yu, C.-M. Chiang, H.-K. Kuo, M.- H. Chen, C.-M. Cheng, et al. Microisp: processing 32mp photos on mobile devices with deep learning. In European Conference on Computer Vision , pages 729–746. Springer, 2022
work page 2022
-
[20]
Progressive Growing of GANs for Improved Quality, Stability, and Variation
T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive gr owing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 , 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
- [21]
-
[22]
A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images.(2009), 2009
work page 2009
-
[23]
T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Rama nan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, pr oceedings, part v 13 , pages 740–755. Springer, 2014
work page 2014
-
[24]
Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attribu tes in the wild. In Proceedings of International Conference on Computer Vision (ICCV) , December 2015
work page 2015
-
[25]
A. Mahara and N. Rishe. Methods and trends in detecting genera ted images: A comprehensive review. arXiv preprint arXiv:2502.15176, 2025
- [26]
-
[27]
Y. Mirsky and W. Lee. The creation and detection of deepfak es: A survey. ACM computing surveys (CSUR), 54(1):1–41, 2021
work page 2021
- [28]
-
[29]
C. Morikawa, M. Kobayashi, M. Satoh, Y. Kuroda, T. Inomat a, H. Matsuo, T. Miura, and M. Hilaga. Image and video processing on mobile devices: a surv ey. the visual Computer , 37(12):2931–2949, 2021
work page 2021
-
[30]
L. Nataraj, T. M. Mohammed, B. Manjunath, S. Chandrasekara n, A. Flenner, J. H. Bappy, and A. K. Roy-Chowdhury. Detecting gan generated fake images using co-occurrence matrices. Electronic Imaging, 31:1–7, 2019
work page 2019
-
[31]
J. F. O’brien and H. Farid. Exposing photo manipulation wit h inconsistent reflections. ACM Trans. Graph., 31(1):4–1, 2012
work page 2012
-
[32]
U. Ojha, Y. Li, and Y. J. Lee. Towards universal fake image d etectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision a nd Pattern Recognition, pages 24480–24489, 2023
work page 2023
-
[33]
E. B. Picaro. What is apple deep fusion and how does it work ? - https://tinyurl.com/54r64tk9, 2025-05-20
work page 2025
-
[34]
B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Ho ckenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase corresponde nces for richer image-to-sentence models. In Proceedings of the IEEE international conference on comput er vision , pages 2641– 2649, 2015
work page 2015
-
[35]
M. A. Rahman, B. Paul, N. H. Sarker, Z. I. A. Hakim, and S. A. F attah. Artifact: A large- scale dataset with artificial and factual images for generaliz able and robust synthetic image detection. In 2023 IEEE International Conference on Image Processing (IC IP), pages 2200–2204. IEEE, 2023. 11
work page 2023
-
[36]
C. T. M. Research. Apple takes number one spot in q1 for first time - https://tinyurl.com/wbamz8cv, 2025-05-20
work page 2025
- [37]
-
[38]
C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems , 35:25278– 25294, 2022
work page 2022
-
[39]
LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs
C. Schuhmann, R. Vencu, R. Beaumont, R. Kaczmarczyk, C. Mull is, A. Katta, T. Coombes, J. Jitsev, and A. Komatsuzaki. LAION-400M: open dataset of cl ip-filtered 400 million image- text pairs. CoRR, abs/2111.02114, 2021
work page internal anchor Pith review Pith/arXiv arXiv 2021
-
[40]
Z. Sha, Z. Li, N. Yu, and Y. Zhang. De-fake: Detection and att ribution of fake images generated by text-to-image generation models. In Proceedings of the 2023 ACM SIGSAC conference on computer and communications security , pages 3418–3432, 2023
work page 2023
-
[41]
I. Skorokhodov, G. Sotnikov, and M. Elhoseiny. Aligning late nt and image spaces to connect the unconnectable. In Proceedings of the IEEE/CVF international conference on co mputer vision, pages 14144–14153, 2021
work page 2021
-
[42]
S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros. Cnn-gener ated images are sur- prisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020
work page 2020
-
[43]
Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li. Dire for diffusion-generated image detection. In Proceedings of the IEEE/CVF International Conference on Co mputer Vision, pages 22445–22455, 2023
work page 2023
- [44]
-
[45]
M. Woolf. Top mobile photography statistics - https:// tinyurl.com/4xb969wf, 2025-05-20
work page 2025
- [46]
- [47]
-
[48]
F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 , 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[49]
N. Zhong, Y. Xu, S. Li, Z. Qian, and X. Zhang. Patchcraft: Expl oring texture patch for efficient ai-generated image detection. arXiv preprint arXiv:2311.12397 , 2023
- [50]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.