pith. sign in

arxiv: 1907.07604 · v2 · pith:PFARE35Znew · submitted 2019-07-17 · 💻 cs.SI · cs.IR

Towards Reliable Online Clickbait Video Detection: A Content-Agnostic Approach

Pith reviewed 2026-05-24 19:51 UTC · model grok-4.3

classification 💻 cs.SI cs.IR
keywords clickbait detectionaudience commentscontent-agnosticonline videoYouTubevideo platformssocial media analysis
0
0 comments X

The pith

Audience comments detect clickbait videos more effectively than analyzing their titles, thumbnails, or content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OVCP, a scheme that identifies clickbait videos—those whose content deviates from their title or thumbnail—by examining comments left by viewers who watched them. This avoids direct analysis of the video or pre-click elements, making evasion by creators who craft misleading previews harder. Experiments on real YouTube data show it outperforms both existing detection models and human annotators. A reader would care because clickbait wastes time and undermines platform trust; if the approach holds, detection can occur after upload based on audience signals rather than creator-controlled previews.

Core claim

OVCP detects clickbait videos by exploring comments from the audience who watched the video rather than analyzing video content, title, or thumbnail, and experimental results on a YouTube dataset show it is effective and significantly outperforms state-of-the-art baseline models and human annotators.

What carries the argument

OVCP (Online Video Clickbait Protector), the content-agnostic scheme that uses audience comments to flag content mismatch.

If this is right

  • Creators cannot easily evade detection by optimizing only titles and thumbnails.
  • Detection can run after videos accumulate comments, catching issues missed at upload time.
  • Platforms gain a method that remains effective even when pre-click elements are crafted to mislead.
  • Automation via comments beats human judgment on the collected dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The method could apply to other comment-rich platforms where video previews are controlled by uploaders.
  • If comment manipulation becomes common, combining OVCP with comment-authenticity filters would be a natural next step.
  • Viewer comments may contain mismatch signals that pre-upload metadata lacks, suggesting broader use for post-publication quality checks.

Load-bearing premise

Comments posted by viewers who watched the video reliably indicate clickbait status without manipulation or external bias.

What would settle it

Collect a set of known clickbait videos, replace or alter their comments with neutral or fabricated ones, and measure whether OVCP accuracy falls below baseline methods.

Figures

Figures reproduced from arXiv: 1907.07604 by Daniel Zhang, Dong Wang, Lanyu Shang, Michael Wang, Shuyue Lai.

Figure 1
Figure 1. Figure 1: Examples of Clickbait and Non-clickbait Video with Similar Titles [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Examples of Clickbait and Non-clickbait Video with Similar Thumbnails [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of an Online Video and Its Components [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: An Overview of OVCP 7 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Examples of the Comment Network Structure for Clickbait and Non-clickbait Videos [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Sentiment Feature Path. The color of each node represents the sentiment attribute [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Endorsement Feature Path. The size of each node represents the endorsement [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Word Clouds Therefore, we employ a widely adopted document embedding technique, namely Doc2vec [37], to extract linguistic features from comments (i.e., com￾ment embedding). Doc2vec, derived from the famous Word2vec framework, is designed to learn fixed-length continuous distributed vector representations for word sequences of variable-length. A na¨ıve approach is to simply embed the whole comment section … view at source ↗
Figure 9
Figure 9. Figure 9: Metadata Feature Correlation 5. Evaluation In this section, we first describe the dataset we collected from YouTube. We then evaluate the performance of the OVCP scheme in comparison with state-of-the-art baselines on the collected dataset. The results show that OVCP significantly outperforms both the compared baseline methods and human an￾notators in terms of accurately detecting online clickbait videos. … view at source ↗
Figure 10
Figure 10. Figure 10: Distribution of Comments Count per Thread [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: ROC Curve of All Schemes 5.4. Feature Analysis In the second experiment, we study the importance of features in each cat￾egory (i.e., network, linguistic, metadata) and their combinations. The results are shown in [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: We observe that the performance of our scheme generally improves as [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Performance (F1 Score) v.s. Average Detection Time Cost (per Video) [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Computation Time for All Modules of OVCP [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗
read the original abstract

Online video sharing platforms (e.g., YouTube, Vimeo) have become an increasingly popular paradigm for people to consume video contents. Clickbait video, whose content clearly deviates from its title/thumbnail, has emerged as a critical problem on online video sharing platforms. Current clickbait detection solutions that mainly focus on analyzing the text of the title, the image of the thumbnail, or the content of the video are shown to be suboptimal in detecting the online clickbait videos. In this paper, we develop a novel content-agnostic scheme, Online Video Clickbait Protector (OVCP), to effectively detect clickbait videos by exploring the comments from the audience who watched the video. Different from existing solutions, OVCP does not directly analyze the content of the video and its pre-click information (e.g., title and thumbnail). Therefore, it is robust against sophisticated content creators who often generate clickbait videos that can bypass the current clickbait detectors. We evaluate OVCP with a real-world dataset collected from YouTube. Experimental results demonstrate that OVCP is effective in identifying clickbait videos and significantly outperforms both state-of-the-art baseline models and human annotators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces Online Video Clickbait Protector (OVCP), a content-agnostic detector for clickbait videos on YouTube that relies on audience comments rather than analyzing titles, thumbnails, or video content. It claims that OVCP is robust to evasion by sophisticated creators and that experiments on a real-world YouTube dataset show it significantly outperforms both state-of-the-art content-based baselines and human annotators.

Significance. If the comment-derived labels validly capture title-content mismatch without bias, the approach would offer a practical advantage over content-based methods that can be gamed. The real-world dataset is a positive element, but the load-bearing assumption about comment reliability requires explicit validation for the significance to hold.

major comments (2)
  1. [Evaluation section] Evaluation / data collection section: No description is provided of how ground-truth labels are derived from comments (e.g., the exact proxy rule mapping comments to clickbait/non-clickbait, any filtering criteria, or inter-annotator agreement). This is load-bearing because OVCP is trained and evaluated on these labels; without the procedure, the reported outperformance cannot be assessed for selection bias or manipulation.
  2. [Results section] Results section: The claim of significant outperformance over baselines and humans is stated without accompanying metrics, statistical tests, or error analysis (e.g., confusion matrices or failure cases). This undermines the central effectiveness claim because the abstract asserts superiority but supplies no quantitative evidence or controls for comment authenticity.
minor comments (1)
  1. [Abstract] The abstract would be clearer if it included at least one concrete performance number (accuracy, F1, etc.) rather than only qualitative statements of outperformance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and additional details.

read point-by-point responses
  1. Referee: [Evaluation section] Evaluation / data collection section: No description is provided of how ground-truth labels are derived from comments (e.g., the exact proxy rule mapping comments to clickbait/non-clickbait, any filtering criteria, or inter-annotator agreement). This is load-bearing because OVCP is trained and evaluated on these labels; without the procedure, the reported outperformance cannot be assessed for selection bias or manipulation.

    Authors: We agree that the current manuscript does not provide sufficient detail on the ground-truth labeling process from comments. In the revised version, we will add an explicit subsection in the Evaluation / data collection section describing the exact proxy rule used to map comments to clickbait/non-clickbait labels, any filtering criteria applied, and inter-annotator agreement statistics if multiple annotators were involved. This will allow readers to assess potential selection bias. revision: yes

  2. Referee: [Results section] Results section: The claim of significant outperformance over baselines and humans is stated without accompanying metrics, statistical tests, or error analysis (e.g., confusion matrices or failure cases). This undermines the central effectiveness claim because the abstract asserts superiority but supplies no quantitative evidence or controls for comment authenticity.

    Authors: The manuscript states that OVCP significantly outperforms baselines and human annotators on the YouTube dataset, but we acknowledge that the Results section would benefit from more granular quantitative evidence. In the revision, we will include specific performance metrics (e.g., precision, recall, F1), results of statistical significance tests, confusion matrices, and an error analysis with representative failure cases. We will also add discussion of controls or checks for comment authenticity and potential biases in the data collection process. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation relies on external dataset and proxy labels

full rationale

The paper claims OVCP detects clickbait via audience comments without analyzing video content, title or thumbnail. It reports superior performance on a collected YouTube dataset versus baselines and humans. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described method. The load-bearing assumption (comments encode title-content mismatch) is an external validity concern rather than a definitional or self-referential reduction. This matches the default non-circular case for a supervised ML paper evaluated on held-out external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no details available on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5740 in / 955 out tokens · 19303 ms · 2026-05-24T19:51:06.014060+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

  1. [1]

    GroupM, Groupm introduces state of digital report, available at https:// www.groupm.com/news/groupm-introduces-state-digital-report, accessed 2019-01-07 (2018)

  2. [2]

    Molla, Next year, people will spend more time online than they will watching tv

    R. Molla, Next year, people will spend more time online than they will watching tv. that’s a first., available at https://www.recode.net/2018/6/ 8/17441288/internet-time-spent-tv-zenith-data-media, accessed 2019-02- 14 (2018)

  3. [3]

    B¨ artl, Youtube channels, uploads and views: A statistical analysis of the past 10 years, Convergence 24 (1) (2018) 16–32 (2018)

    M. B¨ artl, Youtube channels, uploads and views: A statistical analysis of the past 10 years, Convergence 24 (1) (2018) 16–32 (2018)

  4. [4]

    M. M. U. Rony, N. Hassan, M. Yousuf, Diving deep into clickbaits: Who use them to what extents in which topics with what effects?, in: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, ACM, 2017, pp. 232–239 (2017)

  5. [5]

    D. Wang, T. Abdelzaher, L. Kaplan, Social sensing: building reliable sys- tems on unreliable data, Morgan Kaufmann, 2015 (2015)

  6. [6]

    M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional net- works, in: European conference on computer vision, Springer, 2014, pp. 818–833 (2014)

  7. [7]

    Zhang, N

    Y. Zhang, N. Vance, D. Zhang, D. Wang, On opinion characterization in social sensing: A multi-view subspace learning approach, in: 2018 14th International Conference on Distributed Computing in Sensor Systems (DCOSS), IEEE, 2018, pp. 155–162 (2018). 23

  8. [8]

    Papadopoulou, M

    O. Papadopoulou, M. Zampoglou, S. Papadopoulos, Y. Kompatsiaris, Web video verification using contextual cues, in: Proceedings of the 2nd Inter- national Workshop on Multimedia Forensics and Security, ACM, 2017, pp. 6–10 (2017)

  9. [9]

    D. Y. Zhang, L. Song, Q. Li, Y. Zhang, D. Wang, Streamguard: A bayesian network approach to copyright infringement detection problem in large- scale live video sharing systems, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 901–910 (2018)

  10. [10]

    Potthast, S

    M. Potthast, S. K¨ opsel, B. Stein, M. Hagen, Clickbait detection, in: Eu- ropean Conference on Information Retrieval, Springer, 2016, pp. 810–817 (2016)

  11. [11]

    M. Huh, A. Liu, A. Owens, A. A. Efros, Fighting fake news: Image splice detection via learned self-consistency, in: Proceedings of the European Con- ference on Computer Vision (ECCV), 2018, pp. 101–117 (2018)

  12. [12]

    In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking

    Y. Li, M.-C. Chang, H. Farid, S. Lyu, In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking, arXiv preprint arXiv:1806.02877 (2018)

  13. [13]

    K. Fagan, Youtube’s clickbait problem might not be fixable, available at https://yr.media/tech/ youtubes-clickbait-problem-is-out-of-hand-and-there-may-be-no-fixing-it/, accessed 2019-02-25 (2018)

  14. [14]

    To clickbait or not to clickbait: What you need to know about headlines and clickbaits, available at https://marketinginsidergroup.com/ content-marketing/what-you-need-to-know-headlines-clickbaits/, accessed 2019-02-19 (2016)

  15. [15]

    Agrawal, Clickbait detection using deep learning, in: 2016 2nd Interna- tional Conference on Next Generation Computing Technologies (NGCT), IEEE, 2016, pp

    A. Agrawal, Clickbait detection using deep learning, in: 2016 2nd Interna- tional Conference on Next Generation Computing Technologies (NGCT), IEEE, 2016, pp. 268–272 (2016)

  16. [16]

    Potthast, T

    M. Potthast, T. Gollub, K. Komlossy, S. Schuster, M. Wiegmann, E. P. G. Fernandez, M. Hagen, B. Stein, Crowdsourcing a large corpus of clickbait on twitter, in: Proceedings of the 27th International Conference on Com- putational Linguistics, 2018, pp. 1498–1507 (2018)

  17. [17]

    Anand, T

    A. Anand, T. Chakraborty, N. Park, We used neural networks to detect clickbaits: You won’t believe what happened next!, in: European Confer- ence on Information Retrieval, Springer, 2017, pp. 541–547 (2017)

  18. [18]

    Clickbait Identification using Neural Networks

    P. Thomas, Clickbait identification using neural networks, arXiv preprint arXiv:1710.08721 (2017)

  19. [19]

    J. Qu, A. M. Hißbach, T. Gollub, M. Potthast, Towards crowdsourcing clickbait labels for youtube videos. 24

  20. [20]

    Zannettou, S

    S. Zannettou, S. Chatzis, K. Papadamou, M. Sirivianos, The good, the bad and the bait: Detecting and characterizing clickbait on youtube, in: 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018, pp. 63–69 (2018)

  21. [21]

    D. Y. Zhang, J. Badilla, Y. Zhang, D. Wang, Towards reliable missing truth discovery in online social media sensing applications, in: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2018, pp. 143–150 (2018)

  22. [22]

    N. Vo, K. Lee, The rise of guardians: Fact-checking url recommendation to combat fake news, arXiv preprint arXiv:1806.07516 (2018)

  23. [23]

    X. Yin, J. Han, P. S. Yu, Truth discovery with multiple conflicting infor- mation providers on the web, IEEE Transactions on Knowledge and Data Engineering 20 (6) (2008) 796–808 (Jun. 2008). doi:10.1109/TKDE.2007. 190745

  24. [24]

    D. Wang, M. T. Amin, S. Li, T. Abdelzaher, L. Kaplan, S. Gu, C. Pan, H. Liu, C. C. Aggarwal, R. Ganti, et al., Using humans as sensors: an estimation-theoretic perspective, in: Information Processing in Sensor Net- works, IPSN-14 Proceedings of the 13th International Symposium on, IEEE, 2014, pp. 35–46 (2014)

  25. [25]

    D. Wang, T. Abdelzaher, L. Kaplan, C. C. Aggarwal, Recursive fact- finding: A streaming approach to truth estimation in crowdsourcing ap- plications, in: Distributed Computing Systems (ICDCS), 2013 IEEE 33rd International Conference on, IEEE, 2013, pp. 530–539 (2013)

  26. [26]

    Zhang, D

    D. Zhang, D. Wang, N. Vance, Y. Zhang, S. Mike, On scalable and ro- bust truth discovery in big data social media sensing applications, IEEE Transactions on Big Data (2018)

  27. [27]

    D. Wang, L. Kaplan, H. Le, T. Abdelzaher, On truth discovery in social sensing: A maximum likelihood estimation approach, in: Proc. ACM/IEEE 11th Int Information Processing in Sensor Networks (IPSN) Conf, 2012, pp. 233–244 (Apr. 2012). doi:10.1109/IPSN.2012.6920960

  28. [28]

    D. Y. Zhang, L. Shang, B. Geng, S. Lai, K. Li, H. Zhu, M. T. Amin, D. Wang, Fauxbuster: A content-free fauxtography detector using social media comments, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 891–900 (2018)

  29. [29]

    Huynh-Kha, T

    T. Huynh-Kha, T. Le-Tien, S. Ha-Viet-Uyen, K. Huynh-Van, M. Luong, A robust algorithm of forgery detection in copy-move and spliced images, IJACSA) International Journal of Advanced Computer Science and Appli- cations 7 (3) (2016). 25

  30. [30]

    H. Wang, F. Zhang, M. Hou, X. Xie, M. Guo, Q. Liu, Shine: Signed hetero- geneous information network embedding for sentiment link prediction, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018, pp. 592–600 (2018)

  31. [31]

    Grover, J

    A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2016, pp. 855–864 (2016)

  32. [32]

    Perozzi, R

    B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 701– 710 (2014)

  33. [33]

    Huang, J

    X. Huang, J. Li, X. Hu, Label informed attributed network embedding, in: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, ACM, 2017, pp. 731–739 (2017)

  34. [34]

    Zhang, Y

    Y. Zhang, Y. Lu, D. Zhang, L. Shang, D. Wang, Risksens: A multi-view learning approach to identifying risky traffic locations in intelligent trans- portation systems using social and remote sensing, in: 2018 IEEE Inter- national Conference on Big Data (Big Data), IEEE, 2018, pp. 1544–1553 (2018)

  35. [35]

    Vincent, H

    P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com- posing robust features with denoising autoencoders, in: Proceedings of the 25th international conference on Machine learning, ACM, 2008, pp. 1096– 1103 (2008)

  36. [36]

    G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507 (2006)

  37. [37]

    Q. Le, T. Mikolov, Distributed representations of sentences and docu- ments, in: Proceedings of the 31st International Conference on Interna- tional Conference on Machine Learning - Volume 32, ICML’14, JMLR.org, 2014 (2014)

  38. [38]

    Bajaj, M

    P. Bajaj, M. Kavidayal, P. Srivastava, M. N. Akhtar, P. Kumaraguru, Disinformation in multimedia annotation: Misleading metadata detection on youtube, in: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion, ACM, 2016, pp. 53–61 (2016)

  39. [39]

    X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class- imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39 (2) (2008) 539–550 (2008)

  40. [40]

    P. M. Domingos, A few useful things to know about machine learning., Commun. acm 55 (10) (2012) 78–87 (2012). 26

  41. [41]

    Alpaydin, Introduction to machine learning, MIT press, 2014 (2014)

    E. Alpaydin, Introduction to machine learning, MIT press, 2014 (2014)

  42. [42]

    Zhang, J

    X. Zhang, J. Zou, K. He, J. Sun, Accelerating very deep convolutional networks for classification and detection, IEEE transactions on pattern analysis and machine intelligence 38 (10) (2015) 1943–1955 (2015)

  43. [43]

    Chakraborty, B

    A. Chakraborty, B. Paranjape, S. Kakarla, N. Ganguly, Stop clickbait: Detecting and preventing clickbaits in online news media, in: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Press, 2016, pp. 9–16 (2016)

  44. [44]

    J. A. Hertz, Introduction to the theory of neural computation, CRC Press, 2018 (2018)

  45. [45]

    Ferrara, O

    E. Ferrara, O. Varol, C. Davis, F. Menczer, A. Flammini, The rise of social bots, Communications of the ACM 59 (7) (2016) 96–104 (2016)

  46. [46]

    D. Wang, B. K. Szymanski, T. Abdelzaher, H. Ji, L. Kaplan, The age of social sensing, Computer 52 (1) (2019) 36–45 (2019). 27