pith. sign in

arxiv: 2509.21864 · v2 · submitted 2025-09-26 · 💻 cs.CV

Deepfakes: we need to re-think the concept of "real" images

Pith reviewed 2026-05-18 14:25 UTC · model grok-4.3

classification 💻 cs.CV
keywords deepfakesimage authenticitysmartphone photographyneural image formationfake image detectionbenchmark datasetsreal vs fake images
0
0 comments X

The pith

Smartphone image formation now relies on neural networks similar to those generating deepfakes, so the idea of a 'real' image needs redefinition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that current methods for detecting fake or manipulated images overlook how most real photographs are actually created today. Over 90 percent of photos come from smartphones that combine inputs from multiple sensors using neural network algorithms. These algorithms are closely related to the ones used to generate fake images. As a result, relying on old low-resolution datasets like ImageNet for training detectors is no longer adequate. The authors call for a clear technical definition of real images and new benchmark datasets to make detection meaningful.

Core claim

Today, the vast majority of photographs are produced by smartphones that employ neural network architectures to compute images from multiple sensor inputs over time. These image formation processes are closely related to the neural networks used for generating deepfakes. Therefore, the distinction between real and fake images requires fundamental reconsideration, and the field must develop new definitions and datasets rather than focusing solely on generative models.

What carries the argument

Neural network-based image formation algorithms in modern smartphone cameras, which process multiple inputs into a single output image.

If this is right

  • Current fake detection methods trained on old datasets may fail to distinguish modern real images from fakes.
  • New benchmark datasets of contemporary real images are necessary for evaluating detectors.
  • A clear technical definition of what constitutes a real image is required before detection can be reliable.
  • The objective of detecting fake images may need to be re-evaluated entirely if the boundary with real images is blurred.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Detectors might need to analyze the specific sensor fusion processes rather than just pixel-level artifacts.
  • This shift could affect legal standards for image evidence in courts.
  • Future research should explore whether all digital images are now inherently processed in ways that introduce generative-like steps.

Load-bearing premise

The assumption that smartphone image-formation neural networks are similar enough to deepfake generators that they undermine current detection approaches based on outdated real-image datasets.

What would settle it

A direct comparison showing that state-of-the-art deepfake detectors perform significantly worse on high-resolution modern smartphone photographs than on ImageNet-style images would support the claim.

Figures

Figures reproduced from arXiv: 2509.21864 by Janis Keuper, Margret Keuper.

Figure 3
Figure 3. Figure 3: Camera Array of a recent iPhone 15Pro. On the hardware side, these challenges are usually countered by building multiple cameras with different lenses into a single phone. While selecting a single, most suitable camera for each task solves some of the focal problems, algorithmic combination of multiple images form multiple time steps and cameras allows computational solutions for a wider range of prob￾lems… view at source ↗
Figure 1
Figure 1. Figure 1: Novelty of photos and used imaging devices in the late [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Size distribution in x (width) and y (height) of all LAION 5B images taken with iPhones. The cut of image resolutions at 400px of the smaller image dimension is a design choice of the dataset [38]. to allow better image magnifications and color consis￾tency. Overall, even so most phone manufactures provide very limited technical details beyond marketing claims, there are sufficient indications that many of… view at source ↗
Figure 4
Figure 4. Figure 4: Overview of the results of the experimental evaluat [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Effects of automatic image enhancements. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

The wide availability and low usability barrier of modern image generation models has triggered the reasonable fear of criminal misconduct and negative social implications. The machine learning community has been engaging this problem with an extensive series of publications proposing algorithmic solutions for the detection of "fake", e.g. entirely generated or partially manipulated images. While there is undoubtedly some progress towards technical solutions of the problem, we argue that current and prior work is focusing too much on generative algorithms and "fake" data-samples, neglecting a clear definition and data collection of "real" images. The fundamental question "what is a real image?" might appear to be quite philosophical, but our analysis shows that the development and evaluation of basically all current "fake"-detection methods is relying on only a few, quite old low-resolution datasets of "real" images like ImageNet. However, the technology for the acquisition of "real" images, aka taking photos, has drastically evolved over the last decade: Today, over 90% of all photographs are produced by smartphones which typically use algorithms to compute an image from multiple inputs (over time) from multiple sensors. Based on the fact that these image formation algorithms are typically neural network architectures which are closely related to "fake"-image generators, we state the position that today, we need to re-think the concept of "real" images. The purpose of this position paper is to raise the awareness of the current shortcomings in this active field of research and to trigger an open discussion whether the detection of "fake" images is a sound objective at all. At the very least, we need a clear technical definition of "real" images and new benchmark datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper is a position paper claiming that deepfake detection research over-focuses on generative algorithms and 'fake' samples while neglecting a rigorous definition of 'real' images. It notes that current detectors rely on a few old low-resolution datasets such as ImageNet, whereas over 90% of modern photographs are produced by smartphones whose computational photography pipelines use neural-network image-formation algorithms that the authors describe as closely related to those in fake-image generators; consequently the field should re-think the concept of real images, supply a clear technical definition, and create new benchmark datasets.

Significance. If the central analogy between smartphone NN pipelines and generative models can be substantiated, the position could usefully redirect attention toward dataset construction and detector assumptions that better reflect contemporary image acquisition. The manuscript correctly flags the field's continued dependence on legacy datasets as a structural limitation and may stimulate discussion on whether detection remains a well-posed task.

major comments (1)
  1. Abstract: the claim that smartphone image-formation algorithms are 'typically neural network architectures which are closely related to fake-image generators' is load-bearing for the argument that current detection approaches may not be conceptually sound, yet the manuscript supplies neither specific pipeline references (e.g., multi-frame fusion or learned denoising in commercial ISPs), architectural comparisons, nor training-objective overlaps to ground the asserted relation.
minor comments (2)
  1. The statistic that 'over 90% of all photographs are produced by smartphones' would benefit from an explicit citation or data source.
  2. A short section outlining what a 'clear technical definition of real images' might contain would help translate the position into actionable research guidance.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback, which correctly identifies a point where our position paper would benefit from greater technical grounding. We address the major comment below and have prepared revisions to strengthen the manuscript while preserving its intent as a position piece.

read point-by-point responses
  1. Referee: Abstract: the claim that smartphone image-formation algorithms are 'typically neural network architectures which are closely related to fake-image generators' is load-bearing for the argument that current detection approaches may not be conceptually sound, yet the manuscript supplies neither specific pipeline references (e.g., multi-frame fusion or learned denoising in commercial ISPs), architectural comparisons, nor training-objective overlaps to ground the asserted relation.

    Authors: We agree that the analogy, while central, is presented at a conceptual level in the current draft and would be more persuasive with concrete illustrations. As a position paper, our primary goal is to challenge assumptions in the detection literature rather than to deliver a comparative technical survey; nevertheless, the referee's observation is fair. In the revised version we will expand the relevant paragraph to include specific examples of neural components in commercial smartphone pipelines (e.g., learned denoising and multi-frame fusion in Google’s HDR+ and Apple’s Deep Fusion) and note shared architectural traits such as convolutional feature extraction and data-driven priors. We will also cite representative computational-photography literature to make the relation explicit without claiming identity between the two classes of models. revision: yes

Circularity Check

0 steps flagged

Position paper contains no derivations, equations or self-referential constructions

full rationale

The manuscript is a position paper that advances an argument about redefining 'real' images based on the prevalence of smartphone computational photography and the general observation that such pipelines employ neural networks. No equations, fitted parameters, predictions, or derivation chains appear in the text. The central claim rests on publicly documented technology trends (smartphone market share, multi-frame fusion) and dataset usage patterns rather than any quantity defined by the authors' own prior work or internal self-reference. No self-citations are invoked as load-bearing support, and the argument does not reduce any result to its inputs by construction. This is the expected outcome for a non-technical position piece that draws on external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The position rests on two domain assumptions about current camera technology and research practice; no free parameters or invented entities are introduced.

axioms (2)
  • domain assumption Modern smartphone image formation uses neural-network architectures closely related to fake-image generators.
    Invoked in the abstract to link acquisition technology with generation technology.
  • domain assumption Development and evaluation of fake-detection methods relies on only a few old low-resolution datasets such as ImageNet.
    Stated as the basis for claiming shortcomings in prior work.

pith-pipeline@v0.9.0 · 5835 in / 1291 out tokens · 54282 ms · 2026-05-18T14:25:49.684893+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 3 internal anchors

  1. [1]

    J. J. Bird and A. Lotfi. Cifake: Image classification and expl ainable identification of ai- generated synthetic images. IEEE Access, 12:15642–15650, 2024

  2. [2]

    Real-time deepfake detection in the real-world,

    B. Cavia, E. Horwitz, T. Reiss, and Y. Hoshen. Real-time dee pfake detection in the real-world. arXiv preprint arXiv:2406.09398, 2024

  3. [3]

    L. Chai, D. Bau, S.-N. Lim, and P. Isola. What makes fake ima ges detectable? understanding properties that generalize. In Computer vision–ECCV 2020: 16th European conference, Glas gow, UK, August 23–28, 2020, proceedings, part XXVI 16 , pages 103–120. Springer, 2020

  4. [4]

    Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha. Stargan v2: Diverse ima ge synthesis for multiple domains. In Proceedings of the IEEE/CVF conference on computer vision a nd pattern recognition , pages 8188–8197, 2020

  5. [5]

    M. V. Conde, F. Vasluianu, J. Vazquez-Corral, and R. Timofte . Perceptual image enhancement for smartphone real-time applications. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages 1848–1858, 2023

  6. [6]

    Cozzolino, K

    D. Cozzolino, K. Nagano, L. Thomaz, A. Majumdar, and L. Verdo liva. Synthetic image de- tection: Highlights from the ieee video and image processing cu p 2022 student competition. arXiv preprint arXiv:2309.12428, 2023

  7. [7]

    Dang-Nguyen, C

    D.-T. Dang-Nguyen, C. Pasquini, V. Conotter, and G. Boato. Rai se: a raw images dataset for digital image forensics. In Proceedings of the 6th ACM Multimedia Systems Conference, MMSys ’15, page 219–224, New York, NY, USA, 2015. Association for C omputing Machinery

  8. [8]

    Delbracio, D

    M. Delbracio, D. Kelly, M. S. Brown, and P. Milanfar. Mobile c omputational photography: A tour. Annual review of vision science , 7(1):571–604, 2021

  9. [9]

    J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. ImageNet: A Large-Scale Hierarchical Image Database. In CVPR09, 2009

  10. [10]

    Dudhane, S

    A. Dudhane, S. W. Zamir, S. Khan, F. S. Khan, and M.-H. Yang. Bur st image restoration and enhancement. In Proceedings of the ieee/cvf Conference on Computer Vision a nd Pattern Recog- nition, pages 5759–5768, 2022

  11. [11]

    Durall, M

    R. Durall, M. Keuper, and J. Keuper. Watch your up-convolu tion: Cnn based generative deep neural networks are failing to reproduce spectral distributions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages 7890–7899, 2020

  12. [12]

    D. C. Epstein, I. Jain, O. Wang, and R. Zhang. Online detection of ai-generated images. In Proceedings of the IEEE/CVF international conference on co mputer vision, pages 382–392, 2023

  13. [13]

    Frank, T

    J. Frank, T. Eisenhofer, L. Schönherr, A. Fischer, D. Kolossa, and T. Holz. Leveraging frequency analysis for deep fake image recognition. In International conference on machine learning , pages 3247–3258. PMLR, 2020

  14. [14]

    Z. Fu, M. Song, C. Ma, J. Nasti, V. Tyagi, G. Lloyd, and W. Tang . An efficient hybrid model for low-light image enhancement in mobile devices. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages 3057–3066, 2022

  15. [15]

    Goodfellow, J

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. War de-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM , 63(11):139–144, 2020

  16. [16]

    Grommelt, L

    P. Grommelt, L. Weiss, F.-J. Pfreundt, and J. Keuper. Fake or jpeg? revealing common bi- ases in generated image detection datasets. European Conference on Computer Vision (ECCV) workshop proceedings, CEGIS Workshop, 2024

  17. [17]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for im age recognition. In Proceed- ings of the IEEE conference on computer vision and pattern re cognition, pages 770–778, 2016. 10

  18. [18]

    Hong and J

    Y. Hong and J. Zhang. Wildfake: A large-scale challenging dat aset for ai-generated images detection. arXiv preprint arXiv:2402.11843, 2024

  19. [19]

    Ignatov, A

    A. Ignatov, A. Sycheva, R. Timofte, Y. Tseng, Y.-S. Xu, P.- H. Yu, C.-M. Chiang, H.-K. Kuo, M.- H. Chen, C.-M. Cheng, et al. Microisp: processing 32mp photos on mobile devices with deep learning. In European Conference on Computer Vision , pages 729–746. Springer, 2022

  20. [20]

    Progressive Growing of GANs for Improved Quality, Stability, and Variation

    T. Karras, T. Aila, S. Laine, and J. Lehtinen. Progressive gr owing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196 , 2017

  21. [21]

    Karras, S

    T. Karras, S. Laine, and T. Aila. A style-based generator ar chitecture for generative adversarial networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4401–4410, 2019

  22. [22]

    Krizhevsky, G

    A. Krizhevsky, G. Hinton, et al. Learning multiple layers of features from tiny images.(2009), 2009

  23. [23]

    T.-Y. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Rama nan, P. Dollár, and C. L. Zitnick. Microsoft coco: Common objects in context. In Computer vision–ECCV 2014: 13th European conference, zurich, Switzerland, September 6-12, 2014, pr oceedings, part v 13 , pages 740–755. Springer, 2014

  24. [24]

    Z. Liu, P. Luo, X. Wang, and X. Tang. Deep learning face attribu tes in the wild. In Proceedings of International Conference on Computer Vision (ICCV) , December 2015

  25. [25]

    Mahara and N

    A. Mahara and N. Rishe. Methods and trends in detecting genera ted images: A comprehensive review. arXiv preprint arXiv:2502.15176, 2025

  26. [26]

    Masood, M

    M. Masood, M. Nawaz, K. M. Malik, A. Javed, A. Irtaza, and H . Malik. Deepfakes generation and detection: State-of-the-art, open challenges, countermea sures, and way forward. Applied intelligence, 53(4):3974–4026, 2023

  27. [27]

    Mirsky and W

    Y. Mirsky and W. Lee. The creation and detection of deepfak es: A survey. ACM computing surveys (CSUR), 54(1):1–41, 2021

  28. [28]

    Monod, J

    A. Monod, J. Delon, and T. Veit. An analysis and implementation of the hdr+ burst denoising method. Image Processing On Line , 11:142–169, 2021

  29. [29]

    Morikawa, M

    C. Morikawa, M. Kobayashi, M. Satoh, Y. Kuroda, T. Inomat a, H. Matsuo, T. Miura, and M. Hilaga. Image and video processing on mobile devices: a surv ey. the visual Computer , 37(12):2931–2949, 2021

  30. [30]

    Nataraj, T

    L. Nataraj, T. M. Mohammed, B. Manjunath, S. Chandrasekara n, A. Flenner, J. H. Bappy, and A. K. Roy-Chowdhury. Detecting gan generated fake images using co-occurrence matrices. Electronic Imaging, 31:1–7, 2019

  31. [31]

    J. F. O’brien and H. Farid. Exposing photo manipulation wit h inconsistent reflections. ACM Trans. Graph., 31(1):4–1, 2012

  32. [32]

    U. Ojha, Y. Li, and Y. J. Lee. Towards universal fake image d etectors that generalize across generative models. In Proceedings of the IEEE/CVF Conference on Computer Vision a nd Pattern Recognition, pages 24480–24489, 2023

  33. [33]

    E. B. Picaro. What is apple deep fusion and how does it work ? - https://tinyurl.com/54r64tk9, 2025-05-20

  34. [34]

    B. A. Plummer, L. Wang, C. M. Cervantes, J. C. Caicedo, J. Ho ckenmaier, and S. Lazebnik. Flickr30k entities: Collecting region-to-phrase corresponde nces for richer image-to-sentence models. In Proceedings of the IEEE international conference on comput er vision , pages 2641– 2649, 2015

  35. [35]

    M. A. Rahman, B. Paul, N. H. Sarker, Z. I. A. Hakim, and S. A. F attah. Artifact: A large- scale dataset with artificial and factual images for generaliz able and robust synthetic image detection. In 2023 IEEE International Conference on Image Processing (IC IP), pages 2200–2204. IEEE, 2023. 11

  36. [36]

    C. T. M. Research. Apple takes number one spot in q1 for first time - https://tinyurl.com/wbamz8cv, 2025-05-20

  37. [37]

    Ricker, S

    J. Ricker, S. Damm, T. Holz, and A. Fischer. Towards the detection of diffusion model deepfakes. arXiv preprint arXiv:2210.14571, 2022

  38. [38]

    Schuhmann, R

    C. Schuhmann, R. Beaumont, R. Vencu, C. Gordon, R. Wightman, M. Cherti, T. Coombes, A. Katta, C. Mullis, M. Wortsman, et al. Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in neural information processing systems , 35:25278– 25294, 2022

  39. [39]

    LAION-400M: Open Dataset of CLIP-Filtered 400 Million Image-Text Pairs

    C. Schuhmann, R. Vencu, R. Beaumont, R. Kaczmarczyk, C. Mull is, A. Katta, T. Coombes, J. Jitsev, and A. Komatsuzaki. LAION-400M: open dataset of cl ip-filtered 400 million image- text pairs. CoRR, abs/2111.02114, 2021

  40. [40]

    Z. Sha, Z. Li, N. Yu, and Y. Zhang. De-fake: Detection and att ribution of fake images generated by text-to-image generation models. In Proceedings of the 2023 ACM SIGSAC conference on computer and communications security , pages 3418–3432, 2023

  41. [41]

    Skorokhodov, G

    I. Skorokhodov, G. Sotnikov, and M. Elhoseiny. Aligning late nt and image spaces to connect the unconnectable. In Proceedings of the IEEE/CVF international conference on co mputer vision, pages 14144–14153, 2021

  42. [42]

    S.-Y. Wang, O. Wang, R. Zhang, A. Owens, and A. A. Efros. Cnn-gener ated images are sur- prisingly easy to spot... for now. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8695–8704, 2020

  43. [43]

    Z. Wang, J. Bao, W. Zhou, W. Wang, H. Hu, H. Chen, and H. Li. Dire for diffusion-generated image detection. In Proceedings of the IEEE/CVF International Conference on Co mputer Vision, pages 22445–22455, 2023

  44. [44]

    Z. J. Wang, E. Montoya, D. Munechika, H. Yang, B. Hoover, and D. H. Chau. Diffusiondb: A large-scale prompt gallery dataset for text-to-image gene rative models. arXiv preprint arXiv:2210.14896, 2022

  45. [45]

    M. Woolf. Top mobile photography statistics - https:// tinyurl.com/4xb969wf, 2025-05-20

  46. [46]

    Wu, W.-S

    X. Wu, W.-S. Lai, Y. Shih, C. Herrmann, M. Krainin, D. Sun, and C. -K. Liang. Efficient hybrid zoom using camera fusion on mobile phones. ACM Transactions on Graphics (TOG) , 42(6):1– 12, 2023

  47. [47]

    S. Yan, O. Li, J. Cai, Y. Hao, X. Jiang, Y. Hu, and W. Xie. A sanit y check for ai-generated image detection. arXiv preprint arXiv:2406.19435, 2024

  48. [48]

    F. Yu, A. Seff, Y. Zhang, S. Song, T. Funkhouser, and J. Xiao. Lsun: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv preprint arXiv:1506.03365 , 2015

  49. [49]

    Patchcraft: Exploring texture patch for efficient ai-generated image detection.arXiv preprint arXiv:2311.12397, 2023

    N. Zhong, Y. Xu, S. Li, Z. Qian, and X. Zhang. Patchcraft: Expl oring texture patch for efficient ai-generated image detection. arXiv preprint arXiv:2311.12397 , 2023

  50. [50]

    M. Zhu, H. Chen, M. Huang, W. Li, H. Hu, J. Hu, and Y. Wang. Gendet : Towards good generalizations for ai-generated image detection. arXiv preprint arXiv:2312.08880 , 2023. 12