pith. sign in

arxiv: 2605.20971 · v1 · pith:64N45R2Jnew · submitted 2026-05-20 · 💻 cs.CV · cs.AI· cs.CR

Comparative Evaluation of Deep Learning Models for Fake Image Detection

Pith reviewed 2026-05-21 04:55 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.CR
keywords fake image detectionCNN modelsVGG16GAN manipulationimage forensicspretrained networksclass imbalancemodel evaluation
0
0 comments X

The pith

VGG16 achieves 91% accuracy detecting GAN-manipulated images, ahead of three other CNNs at 90%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper compares four pretrained convolutional neural network models on the task of identifying images altered or created by generative adversarial networks. Each model goes through the same steps of resizing, normalizing, and augmenting a shared collection of real and fake pictures to reduce imbalance before training and testing. VGG16 reaches the highest accuracy of 91 percent while XceptionNet, ResNet50, and EfficientNetB0 each reach 90 percent, although EfficientNetB0 shows more sensitivity to fake images and less reliability on real ones due to data imbalance. A sympathetic reader would care because automated tools are increasingly needed to verify visual content as manipulation techniques grow more advanced and harder to spot by eye. The authors present their setup as a reproducible baseline while noting that imbalance, overfitting, and limited generalization still constrain reliability across different image sources.

Core claim

Using a unified preprocessing and training pipeline on a dataset of real and manipulated images, the evaluation finds that VGG16 reaches 91% accuracy while XceptionNet, ResNet50, and EfficientNetB0 each reach 90%. EfficientNetB0 exhibits stronger sensitivity to fake images but reduced reliability on real samples, reflecting imbalance-driven bias. The work supplies a reproducible baseline and calls attention to the requirements for balanced datasets, advanced augmentation, and fairness-aware training to build more reliable detection systems.

What carries the argument

A unified preprocessing and training pipeline applied to four pretrained CNN architectures (VGG16, ResNet50, EfficientNetB0, XceptionNet) for binary classification of real versus GAN-manipulated images.

If this is right

  • The reported numbers establish a reproducible baseline for comparing CNN performance on fake image detection.
  • Dataset imbalance produces measurable bias, with some models more reliable on fake images than on real ones.
  • Better generalization would require balanced datasets and advanced augmentation to reduce overfitting.
  • Fairness-aware training methods could improve reliability for applications in digital forensics.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the performance ranking holds on other collections, VGG16 could be used as an initial model in content verification pipelines.
  • The comparative approach could be repeated with additional manipulation types or with hybrid systems that combine CNNs and other detection signals.
  • Applying the identical pipeline to video frames or higher-resolution images would test whether the accuracy ordering remains stable.

Load-bearing premise

The single dataset after resizing, normalization, and augmentation is sufficiently representative and balanced for the performance numbers and observed biases to generalize beyond this specific collection.

What would settle it

Evaluating the same four models on a new, independently collected dataset with equal numbers of real and manipulated images would show whether the 91% accuracy and the imbalance bias persist or shift.

Figures

Figures reproduced from arXiv: 2605.20971 by Akhitha Pakala, Mohammed Mahir Rahman, Shahzad Memon, Tauseef Ahmed.

Figure 1
Figure 1. Figure 1: Dataset Summary [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sample Images [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Image Count by Class [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: RGB Histogram The graph given ( [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Pixel Intensity Distribution [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dataset Distribution by Class Together, these figures demonstrate how preprocessing mitigates imbalance and pre￾pares data for robust training. Each CNN architecture was trained and evaluated using ROC Curve, Accuracy, Precision, Recall, F1‑score and Confusion metrics. These out￾comes highlight the impact of dataset imbalance, with models favoring sensitivity to fake images at the expense of real‑image rec… view at source ↗
Figure 7
Figure 7. Figure 7: Confusion Matrix Comparison [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: ROC Curve Comparison [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 10
Figure 10. Figure 10: Sample Prediction [PITH_FULL_IMAGE:figures/full_fig_p008_10.png] view at source ↗
read the original abstract

The growing sophistication of GAN-based image manipulation presents significant challenges for digital forensics. This study compares the performance of four pretrained CNN architectures including VGG16, ResNet50, EfficientNetB0, and XceptionNet for fake image detection using a unified preprocessing and training pipeline. A dataset of real and manipulated images was processed through resizing, normalization, and augmentation to address class imbalance and improve generalization. Models were evaluated using Accuracy, Precision, Recall, F1-score, and ROC-AUC. VGG16 achieved the highest accuracy at 91%, with XceptionNet, ResNet50, and EfficientNetB0 each reaching 90%. EfficientNetB0 showed stronger sensitivity to fake images but reduced reliability on real samples, reflecting imbalance-driven bias. Limitations include dataset imbalance, overfitting, and limited interpretability, which affect cross-domain robustness. The study provides a reproducible baseline and underscores the need for balanced datasets, advanced augmentation, and fairness-aware training to develop reliable fake image detection systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript evaluates four pretrained CNN architectures (VGG16, ResNet50, EfficientNetB0, XceptionNet) for fake image detection on a dataset of real and manipulated images. It applies a unified pipeline of resizing, normalization, and augmentation to address class imbalance, then reports performance using Accuracy, Precision, Recall, F1-score, and ROC-AUC. VGG16 is stated to reach the highest accuracy of 91%, with the other three models each at 90%; the abstract notes imbalance-driven bias, overfitting risk, and limited cross-domain robustness while positioning the work as a reproducible baseline.

Significance. If the reported ordering is shown to be statistically stable, the study supplies a practical baseline comparison of standard CNN backbones under a shared preprocessing regime for digital forensics. The explicit discussion of imbalance bias and the call for fairness-aware training add modest practical value, though the absence of variance controls restricts broader claims about model superiority.

major comments (1)
  1. [Abstract] Abstract (performance claims): VGG16 is reported at 91% accuracy and XceptionNet/ResNet50/EfficientNetB0 at 90% on a single train/test split, yet no standard deviations, multiple random seeds, k-fold results, or paired significance tests are supplied. Given the abstract's own acknowledgment of imbalance-driven bias, the 1% gap cannot be treated as a reliable ranking without these controls; the central comparative claim therefore rests on unquantified sampling variation.
minor comments (1)
  1. The abstract and methods description would benefit from explicit dataset statistics (total images, real/fake split) and the precise augmentation operations applied.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The concern regarding the lack of statistical controls for the reported performance differences is valid, and we address it directly below while committing to revisions that improve the robustness of our comparative claims without overstating the current results.

read point-by-point responses
  1. Referee: [Abstract] Abstract (performance claims): VGG16 is reported at 91% accuracy and XceptionNet/ResNet50/EfficientNetB0 at 90% on a single train/test split, yet no standard deviations, multiple random seeds, k-fold results, or paired significance tests are supplied. Given the abstract's own acknowledgment of imbalance-driven bias, the 1% gap cannot be treated as a reliable ranking without these controls; the central comparative claim therefore rests on unquantified sampling variation.

    Authors: We agree that the 1% accuracy difference observed on a single train/test split cannot be interpreted as a statistically reliable ranking, particularly in light of the class imbalance and potential sampling variation we already note in the manuscript. In the revised version, we will conduct additional experiments using five different random seeds for each model, reporting mean accuracy, precision, recall, F1-score, and ROC-AUC along with standard deviations. This will allow us to quantify variability and evaluate the stability of the observed ordering. We will also update the abstract and results section to reflect these new statistics and strengthen the discussion of limitations. Full k-fold cross-validation remains computationally prohibitive for the current dataset size and model training regime, but the multiple-seed approach provides a practical improvement. We will not claim statistical superiority in the revision but will position the work more explicitly as an initial baseline comparison. revision: partial

Circularity Check

0 steps flagged

No significant circularity in empirical model comparison

full rationale

The paper reports direct empirical accuracies from training and evaluating four pretrained CNN architectures (VGG16, ResNet50, EfficientNetB0, XceptionNet) on a preprocessed dataset of real and fake images. Results such as VGG16 reaching 91% accuracy are measured outcomes on a held-out test set using standard metrics, with no equations, derivations, fitted parameters, or mathematical chains that could reduce to inputs by construction. The abstract and description contain no self-citations, ansatzes, or uniqueness claims that bear load on the central results. This is a standard experimental ML evaluation self-contained against external benchmarks, consistent with the reader's assessment of minimal circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the representativeness of the chosen dataset and the adequacy of standard preprocessing to remove bias. No new entities or free parameters are introduced beyond routine hyperparameter choices typical in transfer learning.

axioms (1)
  • domain assumption The collected real and manipulated images form a representative sample of the distribution encountered in real-world digital forensics.
    Invoked when claiming that the measured accuracies indicate practical utility; the abstract itself notes imbalance as a limitation.

pith-pipeline@v0.9.0 · 5709 in / 1266 out tokens · 37368 ms · 2026-05-21T04:55:12.133268+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 22 canonical work pages

  1. [1]

    Deepfake detection by analyzing convolutional traces,

    F. Guarnera, O. Giudice, and S. Battiato, “Deepfake detection by analyzing convolutional traces,” IEEE Transactions on Information Forensics and Security, vol. 19, pp. 1–12, 2024

  2. [2]

    Review of deep learning: concepts, CNN architec- tures, challenges, applications, future directions,

    Alzubaidi, J. Zhang, A. Humaidi, et al., “Review of deep learning: concepts, CNN architec- tures, challenges, applications, future directions,” Journal of Big Data, vol. 8, no. 1, pp. 1 – 74, 2021

  3. [3]

    Deep learning for image forgery detection: CNN architectures and optimization strategies,

    S. Raza, M. Munir, and A. Almutairi, “Deep learning for image forgery detection: CNN architectures and optimization strategies,” Multimedia Tools and Applications, vol. 81, no. 23, pp. 33421–33445, 2022

  4. [4]

    Adversarial attacks and defenses in deep learn- ing: A survey,

    S. Mukta, A. Rahman, and M. S. Hossain, “Adversarial attacks and defenses in deep learn- ing: A survey,” IEEE Access, vol. 11, pp. 12345–12367, 2023

  5. [5]

    Convolutional neural network (CNN) for image detection and recognition,

    M. Chauhan, S. Ghanshala, and R. Joshi, “Convolutional neural network (CNN) for image detection and recognition,” International Journal of Computer Applications, vol. 180, no. 5, pp. 1–5, 2018

  6. [6]

    Explainable AI for deepfake detection: Challenges and opportuni- ties,

    Mansoor and R. Iliev, “Explainable AI for deepfake detection: Challenges and opportuni- ties,” in Proc. IEEE Int. Conf. Artificial Intelligence and Ethics, 2025, pp. 1–8

  7. [7]

    Deep learning in image classification: A survey,

    Y. Liu and W. Deng, “Deep learning in image classification: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 37, no. 12, pp. 2485–2500, 2015. 10 Pakala A. et al

  8. [8]

    FaceForensics++: Learning to detect manipulated facial images,

    Rössler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner, “FaceForensics++: Learning to detect manipulated facial images,” in Proc. IEEE Int. Conf. Computer Vision (ICCV), 2019, pp. 1–11

  9. [9]

    Limitations of shallow CNNs in detecting adversarial manipulations,

    Amin, S. Khan, and M. Hussain, “Limitations of shallow CNNs in detecting adversarial manipulations,” Multimedia Tools and Applications, vol. 83, no. 4, pp. 11245–11260, 2024

  10. [10]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2015, pp. 770–778

  11. [11]

    Deepfake detection with residual networks,

    H. Dang, F. Liu, and J. Stehouwer, “Deepfake detection with residual networks,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition Workshops (CVPRW), 2019, pp. 1– 7

  12. [12]

    EfficientNet: Rethinking model scaling for convolutional neural net- works,

    M. Tan and Q. Le, “EfficientNet: Rethinking model scaling for convolutional neural net- works,” in Proc. Int. Conf. Machine Learning (ICML), 2019, pp. 6105–6114

  13. [13]

    Ensemble learning for deepfake detection using ResNet and Effi- cientNet,

    S. Rana and W. Sung, “Ensemble learning for deepfake detection using ResNet and Effi- cientNet,” Applied Sciences, vol. 10, no. 23, pp. 1–15, 2020

  14. [14]

    EfficientNet for mobile deepfake detection,

    D. Pokroy and A. Egorov, “EfficientNet for mobile deepfake detection,” Journal of Real Time Image Processing, vol. 18, no. 2, pp. 145–156, 2021

  15. [15]

    Challenges in CNN based deepfake detection,

    G. Petmezas, A. Tefas, and I. Pitas, “Challenges in CNN based deepfake detection,” Pattern Recognition Letters, vol. 165, pp. 1–10, 2025

  16. [16]

    Xception: Deep learning with depthwise separable convolutions,

    F. Chollet, “Xception: Deep learning with depthwise separable convolutions,” in Proc. IEEE Conf. Computer Vision and Pattern Recognition (CVPR), 2017, pp. 1251–1258

  17. [17]

    Hybrid deepfake detection using CNNs and trans- formers,

    S. Ganguly, R. Singh, and M. Vatsa, “Hybrid deepfake detection using CNNs and trans- formers,” IEEE Transactions on Biometrics, Behavior, and Identity Science, vol. 4, no. 2, pp. 1–12, 2022

  18. [18]

    Comparative study of CNN architectures for video deep- fake detection,

    M. Ritter, J. Kim, and T. Nguyen, “Comparative study of CNN architectures for video deep- fake detection,” Multimedia Tools and Applications, vol. 82, no. 5, pp. 6543–6561, 2023

  19. [19]

    Media forensics and deepfakes: An overview,

    L. Verdoliva, “Media forensics and deepfakes: An overview,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 5, pp. 910–932, 2020

  20. [20]

    Hybrid CNN transformer architectures for robust deepfake detection,

    J. Wang, Y. Zhang, and H. Li, “Hybrid CNN transformer architectures for robust deepfake detection,” Neural Networks, vol. 165, pp. 1–15, 2024

  21. [21]

    Generalization challenges in deepfake detection: Emerging synthesis methods,

    R. Babu, K. Sharma, and P. Gupta, “Generalization challenges in deepfake detection: Emerging synthesis methods,” IEEE Access, vol. 13, pp. 1–12, 2025

  22. [22]

    A survey on image data augmentation for deep learn- ing,

    T. Shorten and T. M. Khoshgoftaar, “A survey on image data augmentation for deep learn- ing,” Journal of Big Data, vol. 6, no. 1, pp. 1–48, 2019