pith. sign in

arxiv: 1906.08743 · v1 · pith:UN2M4Y2Jnew · submitted 2019-06-20 · 💻 cs.LG · cs.CR· cs.CV· stat.ML

We Need No Pixels: Video Manipulation Detection Using Stream Descriptors

Pith reviewed 2026-05-25 19:30 UTC · model grok-4.3

classification 💻 cs.LG cs.CRcs.CVstat.ML
keywords video manipulation detectionstream descriptorsbinary classifiersvideo forensicsforgery detectionmetadata analysismultimedia streams
0
0 comments X

The pith

Video manipulation can be detected by analyzing stream descriptors with binary classifiers, without using pixels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes detecting forged videos by examining their multimedia stream descriptors rather than pixel data. Simple binary classifiers are used on these descriptors to identify manipulations. This method achieves high detection scores on standard datasets provided that manipulators have not carefully sanitized the descriptors. It offers a scalable alternative to pixel-based techniques. A reader would care because video manipulation is becoming easier and metadata can reveal forgeries that pixel analysis might miss.

Core claim

We propose to identify forged videos by analyzing their multimedia stream descriptors with simple binary classifiers, completely avoiding the pixel space. Using well-known datasets, this scalable approach can achieve a high manipulation detection score if the manipulators have not done a careful data sanitization of the multimedia stream descriptors.

What carries the argument

Multimedia stream descriptors processed by simple binary classifiers to detect forgeries

If this is right

  • High manipulation detection scores on well-known datasets
  • Scalable detection without pixel analysis
  • Detection works unless careful sanitization of descriptors is performed by manipulators
  • Applicable to video content where metadata is harder to forge than in images

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Video editing tools may need to include automatic descriptor sanitization to evade detection
  • This method could serve as a first-pass filter before more computationally intensive pixel analysis
  • Manipulators might need to develop new techniques to sanitize stream descriptors effectively

Load-bearing premise

Manipulators have not performed careful data sanitization of the multimedia stream descriptors.

What would settle it

A dataset of manipulated videos where the stream descriptors have been carefully sanitized by the forgers, resulting in low detection scores.

Figures

Figures reproduced from arXiv: 1906.08743 by David G\"uera, Edward J. Delp, Paolo Bestagini, Sriram Baireddy, Stefano Tubaro.

Figure 1
Figure 1. Figure 1: Examples of some of the information extracted from the video stream descriptors. These descriptors are necessary to decode and playback a video. Due to the ever increasing sophistication of these techniques, uncovering manipulations in videos remains an open prob￾lem. Existing video manipulation detection solutions focus entirely on the observance of anomalies in the pixel domain of the video. Unfortunatel… view at source ↗
Figure 2
Figure 2. Figure 2: (a) Block diagram of the training stage of our proposed method. We process a labeled database of manipulated and pristine videos to generate a feature vector for each video from its multimedia stream descriptors. These feature vectors are then used to train and select the best detector (b) Block diagram of the testing stage of our proposed method. Given a suspect video, a feature vector is generated and pr… view at source ↗
Figure 5
Figure 5. Figure 5: PR curves, F1 score, AUC score, and AP score on the test set for all the trained models using 50% of the available training data (339 videos). 0.0 0.2 0.4 0.6 0.8 1.0 Recall 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 Precision Baseline (F1=0.306, AUC=0.306, AP=0.306) SVM (F1=0.853, AUC=0.932, AP=0.932) Random Forest (F1=0.917, AUC=0.981, AP=0.981) Ensemble (F1=0.917, AUC=0.984, AP=0.984) [PITH_FULL_IMAGE:figures/ful… view at source ↗
Figure 6
Figure 6. Figure 6: PR curves, F1 score, AUC score, and AP score on the test set for all the trained models using 75% of the available training data (508 videos). 5. Conclusion Up until now, most video manipulation detection techniques have focused on analyzing the pixel data to spot forged content. In this paper, we have shown how simple machine learning classifiers can be highly effective at detecting video manipulations wh… view at source ↗
read the original abstract

Manipulating video content is easier than ever. Due to the misuse potential of manipulated content, multiple detection techniques that analyze the pixel data from the videos have been proposed. However, clever manipulators should also carefully forge the metadata and auxiliary header information, which is harder to do for videos than images. In this paper, we propose to identify forged videos by analyzing their multimedia stream descriptors with simple binary classifiers, completely avoiding the pixel space. Using well-known datasets, our results show that this scalable approach can achieve a high manipulation detection score if the manipulators have not done a careful data sanitization of the multimedia stream descriptors.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 1 minor

Summary. The paper proposes identifying forged videos by analyzing their multimedia stream descriptors with simple binary classifiers, avoiding the pixel space. It claims that this approach can achieve a high manipulation detection score on well-known datasets if the manipulators have not performed careful data sanitization of the descriptors.

Significance. If the results hold, this provides a scalable method for video manipulation detection that leverages metadata which is harder to forge than pixels. The explicit acknowledgment of the condition under which the method works is a positive aspect. The work could complement existing pixel-based techniques.

minor comments (1)
  1. [Abstract] Abstract: the claim of a 'high manipulation detection score' is not supported by any specific metrics, classifier architecture, dataset names, or baseline comparisons, which would strengthen the summary.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive assessment of our manuscript, the accurate summary of our approach, and the recommendation for minor revision. The referee correctly notes both the scalability of the method and the explicit condition regarding data sanitization of descriptors.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper presents a direct classification approach on existing multimedia stream descriptors to detect video manipulations, with results reported on well-known datasets under the explicit condition that manipulators have not performed careful sanitization. No equations, fitted parameters, or derivation steps are described that reduce to self-definition, fitted inputs renamed as predictions, or load-bearing self-citations. The central claim remains independent of any internal circular reduction and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.0 · 5649 in / 947 out tokens · 28171 ms · 2026-05-25T19:30:57.389388+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · 2 internal anchors

  1. [1]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  2. [2]

    Recycle- GAN : Unsupervised video retargeting

    Bansal, A., Ma, S., Ramanan, D., and Sheikh, Y. Recycle- GAN : Unsupervised video retargeting. Proceedings of the European Conference on Computer Vision, pp.\ 119--135, September 2018. URL https://doi.org/10.1007/978-3-030-01228-1_8. Munich, Germany

  3. [3]

    Barnes, C., Shechtman, E., Finkelstein, A., and Goldman, D. B. Patchmatch: A randomized correspondence algorithm for structural image editing. ACM Transactions on Graphics, 28 0 (3): 0 24:1--24:11, July 2009. URL https://doi.org/10.1145/1531326.1531330

  4. [4]

    T., and Memon, N

    Bayram, S., Sencar, H. T., and Memon, N. Video copy detection based on source device characteristics: A complementary approach to content-based methods. Proceedings of the ACM International Conference on Multimedia Information Retrieval, pp.\ 435--442, October 2008. URL https://doi.org/10.1145/1460096.1460167. Vancouver, British Columbia, Canada

  5. [5]

    Bellard, F. et al. ffprobe documentation. April 2019. URL https://www.ffmpeg.org/ffprobe.html. (Accessed on 04/17/2019)

  6. [6]

    Local tampering detection in video sequences

    Bestagini, P., Milani, S., Tagliasacchi, M., and Tubaro, S. Local tampering detection in video sequences. Proceedings of the IEEE International Workshop on Multimedia Signal Processing, pp.\ 488--493, September 2013. URL https://doi.org/10.1109/MMSP.2013.6659337. Pula, Italy

  7. [7]

    Codec and gop identification in double compressed videos

    Bestagini , P., Milani , S., Tagliasacchi , M., and Tubaro , S. Codec and gop identification in double compressed videos. IEEE Transactions on Image Processing, 25 0 (5): 0 2298--2310, May 2016. URL https://doi.org/10.1109/TIP.2016.2541960

  8. [8]

    Exposing fake bit rate videos and estimating original bit rates

    Bian , S., Luo , W., and Huang , J. Exposing fake bit rate videos and estimating original bit rates. IEEE Transactions on Circuits and Systems for Video Technology, 24 0 (12): 0 2144--2154, December 2014. URL https://doi.org/10.1109/TCSVT.2014.2334031

  9. [9]

    and Piva , A

    Bianchi , T. and Piva , A. Image forgery localization via block-grained analysis of jpeg artifacts. IEEE Transactions on Information Forensics and Security, 7 0 (3): 0 1003--1017, June 2012. URL https://doi.org/10.1109/TIFS.2012.2187516

  10. [10]

    The video in which Greece 's finance minister gives Germany the finger has several bizarre new twists

    Bird, M. The video in which Greece 's finance minister gives Germany the finger has several bizarre new twists. March 2015. URL https://www.businessinsider.com/yanis-varoufakis-middle-finger-controversy-real-fake-bohmermann-jauch-2015-3. (Accessed on 04/17/2019)

  11. [11]

    C., Steinhardt, J., Flynn, C., h\' E igeartaigh, S

    Brundage, M., Avin, S., Clark, J., Toner, H., Eckersley, P., Garfinkel, B., Dafoe, A., Scharre, P., Zeitzoff, T., Filar, B., Anderson, H., Roff, H., Allen, G. C., Steinhardt, J., Flynn, C., h\' E igeartaigh, S. \' O ., Beard, S., Belfield, H., Farquhar, S., Lyle, C., Crootof, R., Evans, O., Page, M., Bryson, J., Yampolskiy, R., and Amodei, D. The maliciou...

  12. [12]

    Geometric distortion signatures for printer identification

    Bulan , O., Mao , J., and Sharma , G. Geometric distortion signatures for printer identification. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp.\ 1401--1404, April 2009. URL https://doi.org/10.1109/ICASSP.2009.4959855

  13. [13]

    and Citron, D

    Chesney, R. and Citron, D. K. Disinformation on steroids: The threat of deep fakes. October 2018. URL https://www.cfr.org/report/deep-fake-disinformation-steroids. (Accessed on 04/17/2019)

  14. [14]

    Fake porn makers are worried about accidentally making child porn

    Cole, S. Fake porn makers are worried about accidentally making child porn. February 2018. URL https://motherboard.vice.com/en_us/article/evmkxa/ai-fake-porn-deepfakes-child-pornography-emma-watson-elle-fanning. (Accessed on 04/17/2019)

  15. [15]

    Deepfakes are being weaponized to silence women — but this woman is fighting back

    Curtis, C. Deepfakes are being weaponized to silence women — but this woman is fighting back. October 2018. URL https://thenextweb.com/code-word/2018/10/05/deepfakes-are-being-weaponized-to-silence-women-but-this-woman-is-fighting-back/. (Accessed on 04/17/2019)

  16. [16]

    Video forgery detection and localization based on 3d PatchMatch

    D'Amiano , L., Cozzolino , D., Poggi , G., and Verdoliva , L. Video forgery detection and localization based on 3d PatchMatch . Proceedings of the IEEE International Conference on Multimedia Expo Workshops, pp.\ 1--6, June 2015. URL https://doi.org/10.1109/ICMEW.2015.7169805. Turin, Italy

  17. [17]

    A PatchMatch -based dense-field algorithm for video copy–move detection and localization

    D'Amiano , L., Cozzolino , D., Poggi , G., and Verdoliva , L. A PatchMatch -based dense-field algorithm for video copy–move detection and localization. IEEE Transactions on Circuits and Systems for Video Technology, 29 0 (3): 0 669--682, March 2019. URL https://doi.org/10.1109/TCSVT.2018.2804768

  18. [18]

    Autoencoder with recurrent neural networks for video forgery detection

    D'Avino, D., Cozzolino, D., Poggi, G., and Verdoliva, L. Autoencoder with recurrent neural networks for video forgery detection. Proceedings of the IS&T Electronic Imaging, 2017 0 (7): 0 92--99, January 2017. URL https://doi.org/10.2352/ISSN.2470-1173.2017.7.MWSF-330. Burlingame, CA

  19. [19]

    C., Cao , H., and Sattar , F

    Fan , J., Kot , A. C., Cao , H., and Sattar , F. Modeling the exif-image correlation for image manipulation detection. Proceedings of the IEEE International Conference on Image Processing, pp.\ 1945--1948, September 2011. URL https://doi.org/10.1109/ICIP.2011.6115853. Brussels, Belgium

  20. [20]

    N., Delgado , A., Zhou , D., Kheyrkhah , T., Smith , J., and Fiscus , J

    Guan , H., Kozak , M., Robertson , E., Lee , Y., Yates , A. N., Delgado , A., Zhou , D., Kheyrkhah , T., Smith , J., and Fiscus , J. Mfc datasets: Large-scale benchmark datasets for media forensic challenge evaluation. Proceedings of the IEEE Winter Applications of Computer Vision Workshops, pp.\ 63--72, January 2019. URL https://doi.org/10.1109/WACVW.201...

  21. [21]

    and Delp , E

    G\" u era , D. and Delp , E. J. Deepfake video detection using recurrent neural networks. Proceedings of the IEEE International Conference on Advanced Video and Signal Based Surveillance, pp.\ 1--6, November 2018. URL https://doi.org/10.1109/AVSS.2018.8639163. Auckland, New Zealand

  22. [22]

    Video forgery detection using correlation of noise residue

    Hsu, C.-C., Hung, T.-Y., Lin, C.-W., and Hsu, C.-T. Video forgery detection using correlation of noise residue. Proceedings of IEEE Workshop on Multimedia Signal Processing, pp.\ 170--174, October 2008. URL https://doi.org/10.1109/MMSP.2008.4665069. Cairns, Qld, Australia

  23. [23]

    Huh, M., Liu, A., Owens, A., and Efros, A. A. Fighting fake news: Image splice detection via learned self-consistency. Proceedings of the European Conference on Computer Vision, pp.\ 106--124, September 2018. URL https://doi.org/10.1007/978-3-030-01252-6_7. Munich, Germany

  24. [24]

    A video forensic framework for the unsupervised analysis of MP4 -like file container

    Iuliani , M., Shullani , D., Fontani , M., Meucci , S., and Piva , A. A video forensic framework for the unsupervised analysis of MP4 -like file container. IEEE Transactions on Information Forensics and Security, 14 0 (3): 0 635--645, March 2019. URL https://doi.org/10.1109/TIFS.2018.2859760

  25. [25]

    Chapter 13 - MPEG -2

    Jack, K. Chapter 13 - MPEG -2. In Jack, K. (ed.), Video Demystified: A Handbook for the Digital Engineer, pp.\ 577--737. Newnes, Burlington, MA , 2007. URL https://doi.org/10.1016/B978-075068395-1/50013-4

  26. [26]

    Khanna , N., Chiu , G. T. ., Allebach , J. P., and Delp , E. J. Forensic techniques for classifying scanner, computer generated and digital camera images. pp.\ 1653--1656, March 2008. URL https://doi.org/10.1109/ICASSP.2008.4517944. Las Vegas, NV

  27. [27]

    DeepFakes: a New Threat to Face Recognition? Assessment and Detection

    Korshunov, P. and Marcel, S. Deepfakes: a new threat to face recognition? assessment and detection. arXiv:1812.08685v1, March 2018. URL https://arxiv.org/abs/1812.08685v1

  28. [28]

    Fast face-swap using convolutional neural networks

    Korshunova , I., Shi , W., Dambre , J., and Theis , L. Fast face-swap using convolutional neural networks. Proceedings of the IEEE International Conference on Computer Vision, pp.\ 3697--3705, October 2017. URL https://doi.org/10.1109/ICCV.2017.397. Venice, Italy

  29. [29]

    Near-duplicate video detection exploiting noise residual traces

    Lameri , S., Bondi , L., Bestagin , P., and Tubaro , S. Near-duplicate video detection exploiting noise residual traces. Proceedings of the IEEE International Conference on Image Processing, pp.\ 1497--1501, September 2017. URL https://doi.org/10.1109/ICIP.2017.8296531. Beijing, China

  30. [30]

    In ictu oculi: Exposing AI created fake videos by detecting eye blinking

    Li , Y., Chang , M., and Lyu , S. In ictu oculi: Exposing AI created fake videos by detecting eye blinking. Proceedings of the IEEE International Workshop on Information Forensics and Security, pp.\ 1--7, December 2018. URL https://doi.org/10.1109/WIFS.2018.8630787. Hong Kong, China

  31. [31]

    Blind detection and localization of video temporal splicing exploiting sensor-based footprints

    Mandelli , S., Bestagini , P., Tubaro , S., Cozzolino , D., and Verdoliva , L. Blind detection and localization of video temporal splicing exploiting sensor-based footprints. Proceedings of the European Signal Processing Conference, pp.\ 1362--1366, September 2018. URL https://doi.org/10.23919/EUSIPCO.2018.8553511. Rome, Italy

  32. [32]

    Exploiting visual artifacts to expose deepfakes and face manipulations

    Matern , F., Riess , C., and Stamminger , M. Exploiting visual artifacts to expose deepfakes and face manipulations. Proceedings of the IEEE Winter Applications of Computer Vision Workshops, pp.\ 83--92, January 2019. URL https://doi.org/10.1109/WACVW.2019.00020. Waikoloa Village, HI

  33. [33]

    Data structures for statistical computing in python

    McKinney, W. Data structures for statistical computing in python. Proceedings of the Python in Science Conference, pp.\ 51--56, June 2010. URL http://conference.scipy.org/proceedings/scipy2010/mckinney.html. Austin, TX

  34. [34]

    An overview on video forensics

    Milani, S., Fontani, M., Bestagini, P., Barni, M., Piva, A., Tagliasacchi, M., and Tubaro, S. An overview on video forensics. APSIPA Transactions on Signal and Information Processing, 1: 0 e2, August 2012. URL https://doi.org/10.1017/ATSIP.2012.2

  35. [35]

    Near-duplicate video detection exploiting noise residual traces

    Mullan, P., Cozzolino, D., Verdoliva, L., and Riess, C. Residual-based forensic comparison of video sequences. Proceedings of the IEEE International Conference on Image Processing, pp.\ 1507--1511, September 2017. URL https://doi.org/10.1109/ICIP.2017.8296533. Beijing, China

  36. [36]

    Scikit-learn: Machine learning in P ython

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., and Duchesnay, E. Scikit-learn: Machine learning in P ython. Journal of Machine Learning Research, 12: 0 2825--2830, November 2011. URL http://dl.acm.or...

  37. [37]

    FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces

    R\" o ssler, A., Cozzolino, D., Verdoliva, L., Riess, C., Thies, J., and Nie ner, M. Faceforensics: A large-scale video dataset for forgery detection in human faces. arXiv:1803.09179, March 2018. URL https://arxiv.org/abs/1803.09179

  38. [38]

    and Rehmsmeier, M

    Saito, T. and Rehmsmeier, M. The precision-recall plot is more informative than the roc plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE, 10 0 (3): 0 e0118432, March 2015. URL https://doi.org/10.1371/journal.pone.0118432

  39. [39]

    C., Lin , W

    Stamm , M. C., Lin , W. S., and Liu , K. J. R. Forensics vs. anti-forensics: A decision and game theoretic framework. Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing, pp.\ 1749--1752, March 2012 a . URL https://doi.org/10.1109/ICASSP.2012.6288237. Kyoto, Japan

  40. [40]

    C., Lin , W

    Stamm , M. C., Lin , W. S., and Liu , K. J. R. Temporal forensics and anti-forensics for motion compensated video. IEEE Transactions on Information Forensics and Security, 7 0 (4): 0 1315--1329, August 2012 b . URL https://doi.org/10.1109/TIFS.2012.2205568

  41. [41]

    Face2 F ace: Real-time face capture and reenactment of RGB videos

    Thies, J., Zollh\" o fer, M., Stamminger, M., Theobalt, C., and Nie ner, M. Face2 F ace: Real-time face capture and reenactment of RGB videos. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.\ 2387--2395, June 2016. URL https://doi.org/10.1109/CVPR.2016.262. Las Vegas, NV

  42. [42]

    US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’

    Vincent, J. US lawmakers say AI deepfakes ‘have the potential to disrupt every facet of our society’. September 2018. URL https://www.theverge.com/2018/9/14/17859188/ai-deepfakes-national-security-threat-lawmakers-letter-intelligence-community. (Accessed on 04/17/2019)

  43. [43]

    The good, the bad and the bait: Detecting and characterizing clickbait on youtube

    Zannettou , S., Chatzis , S., Papadamou , K., and Sirivianos , M. The good, the bad and the bait: Detecting and characterizing clickbait on youtube. Proceedings of the IEEE Security and Privacy Workshops, pp.\ 63--69, May 2018. URL https://doi.org/10.1109/SPW.2018.00018. San Francisco, CA