pith. sign in

arxiv: 1906.11663 · v1 · pith:7PXFRFU3new · submitted 2019-06-27 · 💻 cs.CV

SpliceRadar: A Learned Method For Blind Image Forensics

Pith reviewed 2026-05-25 14:40 UTC · model grok-4.3

classification 💻 cs.CV
keywords splice localizationimage forensicsblind detectiondeep learningcamera model identificationGaussian mixture modelmanipulation detection
0
0 comments X

The pith

A deep learning method localizes image splices without knowing the camera model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops a technique for detecting and localizing spliced regions in digital images using deep learning without any information about the camera that took the image. Instead of training directly on manipulated images, the model learns to identify camera models from a large set of untouched photos. The learned features are then used during testing to separate regions that appear to come from different cameras by fitting a Gaussian mixture model. This setup allows the method to work on new images and datasets where camera details are unavailable.

Core claim

We propose a deep learning based method for splice localization without prior knowledge of a test image's camera-model. It comprises a novel approach for learning rich filters and for suppressing image-edges. Additionally, we train our model on a surrogate task of camera model identification, which allows us to leverage large and widely available, unmanipulated, camera-tagged image databases. During inference, we assume that the spliced and host regions come from different camera-models and we segment these regions using a Gaussian-mixture model.

What carries the argument

Convolutional network trained on camera model identification as surrogate task, with learned rich filters and edge suppression, followed by Gaussian mixture model segmentation of feature maps at inference.

If this is right

  • Enables splice localization on images from unknown cameras.
  • Uses abundant unmanipulated camera-tagged images for training instead of scarce manipulated examples.
  • Achieves results on par with or above the state-of-the-art on three test databases.
  • Generalizes to unknown datasets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Camera model features extracted this way could serve as a starting point for other blind forensic tasks.
  • The method would likely require a different segmentation step if more than two source cameras are present.
  • Success depends on the distinctiveness of camera signatures even after splicing operations.

Load-bearing premise

Spliced and host regions in a test image come from different camera models.

What would settle it

Performance collapse on a dataset of splices where both regions are taken from the same camera model.

Figures

Figures reproduced from arXiv: 1906.11663 by Aurobrata Ghosh, Maneesh Singh, Terrance E Boult, Zheng Zhong.

Figure 1
Figure 1. Figure 1: SpliceRadar is able to learn low level features while suppressing semantic-information which are image specific. This allows it to generalize well to new tampered datasets. Two examples: col-1: input image, col-2: sample of a learned rich filter (contains semantic￾edges), col-3: final features (semantic-edges suppressed), col-4: output heat map indicating tampered region. and learn low-level features of ca… view at source ↗
Figure 2
Figure 2. Figure 2: System architecture of SpliceRadar. the semantic contents of the training data, which would af￾fect its generalization ability. Therefore, after learning the spatial distribution of these residuals, we further suppress the remaining semantic-edges by applying a probabilistic regularization. From these we learn a hundred-dimensional feature vector characteristic of a camera-model and inde￾pendent of the ima… view at source ↗
Figure 3
Figure 3. Figure 3: Qualitative results from SpliceRadar. Col-1: input image, col-2: ground-truth manipulation mask, col-3: predicted probability [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Qualitative comparison of SpliceRadar, SB and EXIF-SC. Col-1: input image, col-2: ground-truth manipulation mask, col-3: [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Hard examples where all three algorithms, SpliceRadar, SB and EXIF-SC, fail to detect the spliced regions. Col-1: input image, [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

Detection and localization of image manipulations like splices are gaining in importance with the easy accessibility of image editing softwares. While detection generates a verdict for an image it provides no insight into the manipulation. Localization helps explain a positive detection by identifying the pixels of the image which have been tampered. We propose a deep learning based method for splice localization without prior knowledge of a test image's camera-model. It comprises a novel approach for learning rich filters and for suppressing image-edges. Additionally, we train our model on a surrogate task of camera model identification, which allows us to leverage large and widely available, unmanipulated, camera-tagged image databases. During inference, we assume that the spliced and host regions come from different camera-models and we segment these regions using a Gaussian-mixture model. Experiments on three test databases demonstrate results on par with and above the state-of-the-art and a good generalization ability to unknown datasets.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The paper proposes SpliceRadar, a CNN-based method for blind splice localization that requires no prior camera-model knowledge of the test image. A network is trained on the surrogate task of camera-model identification using large unmanipulated datasets; the architecture includes novel components for learning rich filters and suppressing image edges. At inference the learned features are clustered with a GMM under the explicit assumption that spliced and host regions originate from different camera models. Experiments on three test databases are claimed to match or exceed prior SOTA while showing good generalization to unknown data.

Significance. The surrogate-task strategy that exploits abundant camera-tagged data is a clear strength and could meaningfully advance blind forensics if the empirical claims are substantiated. However, the load-bearing inference assumption (different camera models) is unvalidated in the provided description, which limits the assessed significance until addressed.

major comments (2)
  1. [Abstract] Abstract: the localization pipeline rests on the assumption that 'the spliced and host regions come from different camera-models' followed by GMM segmentation, yet no experiment, ablation, or analysis is described that tests feature separability when this assumption is violated or quantifies how often real-world splices satisfy it.
  2. [Abstract] Abstract: the claim that experiments 'demonstrate results on par with and above the state-of-the-art' supplies no metrics, baselines, error bars, dataset sizes, or ablation results, preventing any assessment of the central empirical claim.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments highlighting the central assumption and the need for clearer empirical support in the abstract. We respond to each point below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the localization pipeline rests on the assumption that 'the spliced and host regions come from different camera-models' followed by GMM segmentation, yet no experiment, ablation, or analysis is described that tests feature separability when this assumption is violated or quantifies how often real-world splices satisfy it.

    Authors: The assumption is stated explicitly as a design choice for blind localization. We agree that testing feature separability under violation (same-camera splices) is valuable and will add a controlled ablation on the test sets by artificially creating same-model splices to measure degradation. A full quantification of real-world splice statistics is difficult without a dedicated provenance dataset, but we will add discussion referencing prior forensics literature on cross-camera splicing prevalence. revision: yes

  2. Referee: [Abstract] Abstract: the claim that experiments 'demonstrate results on par with and above the state-of-the-art' supplies no metrics, baselines, error bars, dataset sizes, or ablation results, preventing any assessment of the central empirical claim.

    Authors: Abstracts are space-limited and serve as summaries; the full Experiments section reports the metrics, baselines, dataset sizes (three test databases), and comparisons. We will revise the abstract to include key quantitative highlights (e.g., F1 scores and dataset names) while remaining within length limits. revision: yes

standing simulated objections not resolved
  • A rigorous quantification of how frequently real-world splices satisfy the different-camera-model assumption would require a large-scale study of verified manipulated images with camera metadata, which is not feasible within this work.

Circularity Check

0 steps flagged

No significant circularity; derivation relies on external data and standard clustering

full rationale

The paper trains a CNN on the surrogate task of camera-model identification using large external camera-tagged databases of unmanipulated images. At inference it applies a standard Gaussian-mixture model to the learned features under an explicitly stated assumption that spliced and host regions originate from different camera models. No equations, fitted parameters, or predictions are shown to reduce by construction to the method's own inputs. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claims therefore remain independent of the paper's own outputs and rest on external benchmarks and conventional post-processing.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no derivations, fitted constants, or new postulated entities; the approach rests on standard deep-learning assumptions and the explicit inference assumption of differing camera models for spliced regions.

pith-pipeline@v0.9.0 · 5693 in / 1177 out tokens · 36294 ms · 2026-05-25T14:40:54.850403+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages

  1. [1]

    Agarwal and H

    S. Agarwal and H. Farid. Photo forensics from JPEG dim- ples. In 2017 IEEE Workshop on Information Forensics and Security (WIFS), pages 1–6, 12 2017

  2. [2]

    J. H. Bappy, A. K. Roy-Chowdhury, J. Bunk, L. Nataraj, and B. S. Manjunath. Exploiting spatial structure for localizing manipulated image regions. In The IEEE International Con- ference on Computer Vision (ICCV), 10 2017

  3. [3]

    Barni, E

    M. Barni, E. Nowroozi, and B. Tondi. Higher-order, adversary-aware, double JPEG-detection via selected train- ing on attacked samples. In 25th European Signal Process- ing Conference (EUSIPCO), pages 281 – 285, 08 2017

  4. [4]

    Bayar and M

    B. Bayar and M. C. Stamm. Augmented convolutional fea- ture maps for robust CNN-based camera model identifica- tion. In 2017 IEEE International Conference on Image Pro- cessing (ICIP), pages 4098–4102, 09 2017

  5. [5]

    Bayar and M

    B. Bayar and M. C. Stamm. Constrained convolutional neu- ral networks: A new approach towards general purpose im- age manipulation detection. IEEE Transactions on Informa- tion Forensics and Security, 13(11):2691–2706, 11 2018

  6. [6]

    Bondi, L

    L. Bondi, L. Baroffio, D. G¨uera, P. Bestagini, E. J. Delp, and S. Tubaro. First steps toward camera model identification with convolutional neural networks. IEEE Signal Processing Letters, 24(3):259–263, 03 2017

  7. [7]

    Bondi, S

    L. Bondi, S. Lameri, D. G ¨uera, P. Bestagini, E. Delp, and S. Tubaro. Tampering detection and localization through clustering of camera-based CNN features. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 1855–1864, 07 2017

  8. [8]

    M. Chen, J. Fridrich, M. Goljan, and J. Luks. Determining image origin and integrity using sensor noise. Information Forensics and Security, IEEE Transactions on, 3:74 – 90, 04 2008

  9. [9]

    Cozzolino, G

    D. Cozzolino, G. Poggi, and L. Verdoliva. Splicebuster: A new blind image splicing detector. In 2015 IEEE Inter- national Workshop on Information Forensics and Security (WIFS), pages 1–6, 11 2015

  10. [10]

    Cozzolino, G

    D. Cozzolino, G. Poggi, and L. Verdoliva. Recasting residual-based local descriptors as convolutional neural net- works: An application to image forgery detection. In Pro- ceedings of the 5th ACM Workshop on Information Hiding and Multimedia Security , pages 159–164, New York, NY , USA, 2017. ACM

  11. [11]

    Cozzolino, J

    D. Cozzolino, J. Thies, A. R ¨ossler, C. Riess, M. Nießner, and L. Verdoliva. Forensictransfer: Weakly-supervised domain adaptation for forgery detection. arXiv, 2018

  12. [12]

    Cozzolino and L

    D. Cozzolino and L. Verdoliva. Noiseprint: a CNN-based camera model fingerprint. arXiv, 2018

  13. [13]

    T. J. d. Carvalho, C. Riess, E. Angelopoulou, H. Pedrini, and A. d. R. Rocha. Exposing digital image forgeries by illumi- nation color classification. IEEE Transactions on Informa- tion Forensics and Security, 8(7):1182–1194, 07 2013

  14. [14]

    Fiscus, H

    J. Fiscus, H. Guan, Y . Lee, A. Yates, A. Delgado, D. Zhou, D. Joy, and A. Pereira. The 2017 Nimble Challenge Evalua- tion: Results and Future Directions, 2017

  15. [15]

    Fridrich and J

    J. Fridrich and J. Kodovsky. Rich models for steganalysis of digital images. IEEE Transactions on Information Forensics and Security, 7(3):868–882, 06 2012

  16. [16]

    Gloe and R

    T. Gloe and R. Bhme. The ‘Dresden Image Database’ for benchmarking digital image forensics. In Proceedings of the 25th Symposium On Applied Computing (ACM SAC 2010) , volume 2, pages 1585–1591, 2010

  17. [17]

    K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. InThe IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 06 2016

  18. [18]

    M. Huh, A. Liu, A. Owens, and A. A. Efros. Fighting fake news: Image splice detection via learned self-consistency. In V . Ferrari, M. Hebert, C. Sminchisescu, and Y . Weiss, edi- tors, Computer Vision – ECCV, pages 106–124, Cham, 2018. Springer International Publishing

  19. [19]

    Lukas, J

    J. Lukas, J. Fridrich, and M. Goljan. Digital camera iden- tification from sensor pattern noise. IEEE Transactions on Information Forensics and Security, 1(2):205–214, 06 2006

  20. [20]

    F. Maes, D. Vandermeulen, and P. Suetens. Medical image registration using mutual information. Proceedings of the IEEE, 91(10):1699–1722, 10 2003

  21. [21]

    Mayer and M

    O. Mayer and M. C. Stamm. Learned forensic source sim- ilarity for unknown camera models. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE SigPort, 2018

  22. [22]

    A. C. Popescu and H. Farid. Exposing digital forgeries in color filter array interpolated images. IEEE Transactions on Signal Processing, 53(10):3948–3959, 10 2005

  23. [23]

    T. Qiao, F. Retraint, R. Cogranne, and T. H. Thai. Individ- ual camera device identification from JPEG images. Signal Processing: Image Communication, 52:74 – 86, 2017

  24. [24]

    R ¨ossler, D

    A. R ¨ossler, D. Cozzolino, L. Verdoliva, C. Riess, J. Thies, and M. Nießner. Faceforensics++: Learning to detect ma- nipulated facial images. arXiv, 2019

  25. [25]

    Salloum, Y

    R. Salloum, Y . Ren, and C.-C. J. Kuo. Image splicing localization using a multi-task fully convolutional network (MFCN). Journal of Visual Communication and Image Rep- resentation, 51:201 – 209, 2018

  26. [26]

    San Choi, E

    K. San Choi, E. Lam, and K. Wong. Source camera identi- fication by JPEG compression statistics for image forensics. In IEEE Region Conf. TENCON, pages 1 – 4, 12 2006

  27. [27]

    Zampoglou, S

    M. Zampoglou, S. Papadopoulos, and I. Kompatsiaris. Large-scale evaluation of splicing localization algorithms for web images. Multimedia Tools and Applications, 09 2016

  28. [28]

    P. Zhou, X. Han, V . I. Morariu, and L. S. Davis. Learn- ing rich features for image manipulation detection. In The IEEE Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 1053–1061, 06 2018. 4328