Towards Reliable Online Clickbait Video Detection: A Content-Agnostic Approach

Daniel Zhang; Dong Wang; Lanyu Shang; Michael Wang; Shuyue Lai

arxiv: 1907.07604 · v2 · pith:PFARE35Znew · submitted 2019-07-17 · 💻 cs.SI · cs.IR

Towards Reliable Online Clickbait Video Detection: A Content-Agnostic Approach

Lanyu Shang , Daniel Zhang , Michael Wang , Shuyue Lai , Dong Wang This is my paper

Pith reviewed 2026-05-24 19:51 UTC · model grok-4.3

classification 💻 cs.SI cs.IR

keywords clickbait detectionaudience commentscontent-agnosticonline videoYouTubevideo platformssocial media analysis

0 comments

The pith

Audience comments detect clickbait videos more effectively than analyzing their titles, thumbnails, or content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces OVCP, a scheme that identifies clickbait videos—those whose content deviates from their title or thumbnail—by examining comments left by viewers who watched them. This avoids direct analysis of the video or pre-click elements, making evasion by creators who craft misleading previews harder. Experiments on real YouTube data show it outperforms both existing detection models and human annotators. A reader would care because clickbait wastes time and undermines platform trust; if the approach holds, detection can occur after upload based on audience signals rather than creator-controlled previews.

Core claim

OVCP detects clickbait videos by exploring comments from the audience who watched the video rather than analyzing video content, title, or thumbnail, and experimental results on a YouTube dataset show it is effective and significantly outperforms state-of-the-art baseline models and human annotators.

What carries the argument

OVCP (Online Video Clickbait Protector), the content-agnostic scheme that uses audience comments to flag content mismatch.

If this is right

Creators cannot easily evade detection by optimizing only titles and thumbnails.
Detection can run after videos accumulate comments, catching issues missed at upload time.
Platforms gain a method that remains effective even when pre-click elements are crafted to mislead.
Automation via comments beats human judgment on the collected dataset.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could apply to other comment-rich platforms where video previews are controlled by uploaders.
If comment manipulation becomes common, combining OVCP with comment-authenticity filters would be a natural next step.
Viewer comments may contain mismatch signals that pre-upload metadata lacks, suggesting broader use for post-publication quality checks.

Load-bearing premise

Comments posted by viewers who watched the video reliably indicate clickbait status without manipulation or external bias.

What would settle it

Collect a set of known clickbait videos, replace or alter their comments with neutral or fabricated ones, and measure whether OVCP accuracy falls below baseline methods.

Figures

Figures reproduced from arXiv: 1907.07604 by Daniel Zhang, Dong Wang, Lanyu Shang, Michael Wang, Shuyue Lai.

**Figure 2.** Figure 2: Examples of Clickbait and Non-clickbait Video with Similar Thumbnails [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Example of an Online Video and Its Components [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: An Overview of OVCP 7 [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Examples of the Comment Network Structure for Clickbait and Non-clickbait Videos [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Sentiment Feature Path. The color of each node represents the sentiment attribute [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: Endorsement Feature Path. The size of each node represents the endorsement [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: Word Clouds Therefore, we employ a widely adopted document embedding technique, namely Doc2vec [37], to extract linguistic features from comments (i.e., comment embedding). Doc2vec, derived from the famous Word2vec framework, is designed to learn fixed-length continuous distributed vector representations for word sequences of variable-length. A na¨ıve approach is to simply embed the whole comment section … view at source ↗

**Figure 9.** Figure 9: Metadata Feature Correlation 5. Evaluation In this section, we first describe the dataset we collected from YouTube. We then evaluate the performance of the OVCP scheme in comparison with state-of-the-art baselines on the collected dataset. The results show that OVCP significantly outperforms both the compared baseline methods and human annotators in terms of accurately detecting online clickbait videos. … view at source ↗

**Figure 10.** Figure 10: Distribution of Comments Count per Thread [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: ROC Curve of All Schemes 5.4. Feature Analysis In the second experiment, we study the importance of features in each category (i.e., network, linguistic, metadata) and their combinations. The results are shown in [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: We observe that the performance of our scheme generally improves as [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: Performance (F1 Score) v.s. Average Detection Time Cost (per Video) [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: Computation Time for All Modules of OVCP [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

read the original abstract

Online video sharing platforms (e.g., YouTube, Vimeo) have become an increasingly popular paradigm for people to consume video contents. Clickbait video, whose content clearly deviates from its title/thumbnail, has emerged as a critical problem on online video sharing platforms. Current clickbait detection solutions that mainly focus on analyzing the text of the title, the image of the thumbnail, or the content of the video are shown to be suboptimal in detecting the online clickbait videos. In this paper, we develop a novel content-agnostic scheme, Online Video Clickbait Protector (OVCP), to effectively detect clickbait videos by exploring the comments from the audience who watched the video. Different from existing solutions, OVCP does not directly analyze the content of the video and its pre-click information (e.g., title and thumbnail). Therefore, it is robust against sophisticated content creators who often generate clickbait videos that can bypass the current clickbait detectors. We evaluate OVCP with a real-world dataset collected from YouTube. Experimental results demonstrate that OVCP is effective in identifying clickbait videos and significantly outperforms both state-of-the-art baseline models and human annotators.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's new angle is using viewer comments as a post-watch signal for clickbait detection instead of title or video content, but the evaluation leaves selection bias and labeling details unaddressed.

read the letter

The paper introduces OVCP, a detector that skips the video, title, and thumbnail entirely and instead pulls features from audience comments. This is the actual new piece: it treats comments as a proxy for whether the delivered content matched the click promise, which is a shift from the content-analysis baselines the authors criticize as easy to game. The motivation is sound on its face—sophisticated uploaders can optimize thumbnails and titles to fool pre-click models, so looking after the fact makes sense in principle. The abstract also notes that the method is tested on a real YouTube dataset and reports better results than both prior models and human annotators, which at least shows they ran a comparison rather than just claiming the idea is better in theory. That is the credit the work earns. The soft spot is exactly the one the stress-test flags. The central claim rests on comments being an unbiased signal of mismatch, yet the abstract supplies no description of how clickbait labels were obtained, whether comments were filtered for authenticity, or how the authors handled the fact that only viewers who already clicked can comment. If the dataset was assembled in a way that favors videos with obvious comment signals, or if seeded comments are common, the reported gains could be an artifact of the collection process rather than genuine detection power. Without those controls the outperformance numbers are difficult to interpret. This paper is aimed at researchers working on platform moderation and social-signal detection. A reader already thinking about post-engagement data would get the conceptual point and might want to see the feature set, but anyone planning to replicate or extend it would need the full methods to judge whether the bias issues were handled. I would send it to peer review so the data pipeline and label quality can be examined directly; the idea is worth testing even if the current write-up leaves the key assumption untested.

Referee Report

2 major / 1 minor

Summary. The paper introduces Online Video Clickbait Protector (OVCP), a content-agnostic detector for clickbait videos on YouTube that relies on audience comments rather than analyzing titles, thumbnails, or video content. It claims that OVCP is robust to evasion by sophisticated creators and that experiments on a real-world YouTube dataset show it significantly outperforms both state-of-the-art content-based baselines and human annotators.

Significance. If the comment-derived labels validly capture title-content mismatch without bias, the approach would offer a practical advantage over content-based methods that can be gamed. The real-world dataset is a positive element, but the load-bearing assumption about comment reliability requires explicit validation for the significance to hold.

major comments (2)

[Evaluation section] Evaluation / data collection section: No description is provided of how ground-truth labels are derived from comments (e.g., the exact proxy rule mapping comments to clickbait/non-clickbait, any filtering criteria, or inter-annotator agreement). This is load-bearing because OVCP is trained and evaluated on these labels; without the procedure, the reported outperformance cannot be assessed for selection bias or manipulation.
[Results section] Results section: The claim of significant outperformance over baselines and humans is stated without accompanying metrics, statistical tests, or error analysis (e.g., confusion matrices or failure cases). This undermines the central effectiveness claim because the abstract asserts superiority but supplies no quantitative evidence or controls for comment authenticity.

minor comments (1)

[Abstract] The abstract would be clearer if it included at least one concrete performance number (accuracy, F1, etc.) rather than only qualitative statements of outperformance.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each major comment below and will revise the manuscript to incorporate the requested clarifications and additional details.

read point-by-point responses

Referee: [Evaluation section] Evaluation / data collection section: No description is provided of how ground-truth labels are derived from comments (e.g., the exact proxy rule mapping comments to clickbait/non-clickbait, any filtering criteria, or inter-annotator agreement). This is load-bearing because OVCP is trained and evaluated on these labels; without the procedure, the reported outperformance cannot be assessed for selection bias or manipulation.

Authors: We agree that the current manuscript does not provide sufficient detail on the ground-truth labeling process from comments. In the revised version, we will add an explicit subsection in the Evaluation / data collection section describing the exact proxy rule used to map comments to clickbait/non-clickbait labels, any filtering criteria applied, and inter-annotator agreement statistics if multiple annotators were involved. This will allow readers to assess potential selection bias. revision: yes
Referee: [Results section] Results section: The claim of significant outperformance over baselines and humans is stated without accompanying metrics, statistical tests, or error analysis (e.g., confusion matrices or failure cases). This undermines the central effectiveness claim because the abstract asserts superiority but supplies no quantitative evidence or controls for comment authenticity.

Authors: The manuscript states that OVCP significantly outperforms baselines and human annotators on the YouTube dataset, but we acknowledge that the Results section would benefit from more granular quantitative evidence. In the revision, we will include specific performance metrics (e.g., precision, recall, F1), results of statistical significance tests, confusion matrices, and an error analysis with representative failure cases. We will also add discussion of controls or checks for comment authenticity and potential biases in the data collection process. revision: yes

Circularity Check

0 steps flagged

No significant circularity; evaluation relies on external dataset and proxy labels

full rationale

The paper claims OVCP detects clickbait via audience comments without analyzing video content, title or thumbnail. It reports superior performance on a collected YouTube dataset versus baselines and humans. No equations, fitted parameters renamed as predictions, or self-citation chains appear in the abstract or described method. The load-bearing assumption (comments encode title-content mismatch) is an external validity concern rather than a definitional or self-referential reduction. This matches the default non-circular case for a supervised ML paper evaluated on held-out external data.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based on abstract only; no details available on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5740 in / 955 out tokens · 19303 ms · 2026-05-24T19:51:06.014060+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

46 extracted references · 46 canonical work pages · 2 internal anchors

[1]

GroupM, Groupm introduces state of digital report, available at https:// www.groupm.com/news/groupm-introduces-state-digital-report, accessed 2019-01-07 (2018)

work page 2019
[2]

Molla, Next year, people will spend more time online than they will watching tv

R. Molla, Next year, people will spend more time online than they will watching tv. that’s a ﬁrst., available at https://www.recode.net/2018/6/ 8/17441288/internet-time-spent-tv-zenith-data-media, accessed 2019-02- 14 (2018)

work page 2018
[3]

B¨ artl, Youtube channels, uploads and views: A statistical analysis of the past 10 years, Convergence 24 (1) (2018) 16–32 (2018)

M. B¨ artl, Youtube channels, uploads and views: A statistical analysis of the past 10 years, Convergence 24 (1) (2018) 16–32 (2018)

work page 2018
[4]

M. M. U. Rony, N. Hassan, M. Yousuf, Diving deep into clickbaits: Who use them to what extents in which topics with what eﬀects?, in: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, ACM, 2017, pp. 232–239 (2017)

work page 2017
[5]

D. Wang, T. Abdelzaher, L. Kaplan, Social sensing: building reliable sys- tems on unreliable data, Morgan Kaufmann, 2015 (2015)

work page 2015
[6]

M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional net- works, in: European conference on computer vision, Springer, 2014, pp. 818–833 (2014)

work page 2014
[7]

Zhang, N

Y. Zhang, N. Vance, D. Zhang, D. Wang, On opinion characterization in social sensing: A multi-view subspace learning approach, in: 2018 14th International Conference on Distributed Computing in Sensor Systems (DCOSS), IEEE, 2018, pp. 155–162 (2018). 23

work page 2018
[8]

Papadopoulou, M

O. Papadopoulou, M. Zampoglou, S. Papadopoulos, Y. Kompatsiaris, Web video veriﬁcation using contextual cues, in: Proceedings of the 2nd Inter- national Workshop on Multimedia Forensics and Security, ACM, 2017, pp. 6–10 (2017)

work page 2017
[9]

D. Y. Zhang, L. Song, Q. Li, Y. Zhang, D. Wang, Streamguard: A bayesian network approach to copyright infringement detection problem in large- scale live video sharing systems, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 901–910 (2018)

work page 2018
[10]

Potthast, S

M. Potthast, S. K¨ opsel, B. Stein, M. Hagen, Clickbait detection, in: Eu- ropean Conference on Information Retrieval, Springer, 2016, pp. 810–817 (2016)

work page 2016
[11]

M. Huh, A. Liu, A. Owens, A. A. Efros, Fighting fake news: Image splice detection via learned self-consistency, in: Proceedings of the European Con- ference on Computer Vision (ECCV), 2018, pp. 101–117 (2018)

work page 2018
[12]

In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking

Y. Li, M.-C. Chang, H. Farid, S. Lyu, In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking, arXiv preprint arXiv:1806.02877 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018
[13]

K. Fagan, Youtube’s clickbait problem might not be ﬁxable, available at https://yr.media/tech/ youtubes-clickbait-problem-is-out-of-hand-and-there-may-be-no-ﬁxing-it/, accessed 2019-02-25 (2018)

work page 2019
[14]

To clickbait or not to clickbait: What you need to know about headlines and clickbaits, available at https://marketinginsidergroup.com/ content-marketing/what-you-need-to-know-headlines-clickbaits/, accessed 2019-02-19 (2016)

work page 2019
[15]

Agrawal, Clickbait detection using deep learning, in: 2016 2nd Interna- tional Conference on Next Generation Computing Technologies (NGCT), IEEE, 2016, pp

A. Agrawal, Clickbait detection using deep learning, in: 2016 2nd Interna- tional Conference on Next Generation Computing Technologies (NGCT), IEEE, 2016, pp. 268–272 (2016)

work page 2016
[16]

Potthast, T

M. Potthast, T. Gollub, K. Komlossy, S. Schuster, M. Wiegmann, E. P. G. Fernandez, M. Hagen, B. Stein, Crowdsourcing a large corpus of clickbait on twitter, in: Proceedings of the 27th International Conference on Com- putational Linguistics, 2018, pp. 1498–1507 (2018)

work page 2018
[17]

Anand, T

A. Anand, T. Chakraborty, N. Park, We used neural networks to detect clickbaits: You won’t believe what happened next!, in: European Confer- ence on Information Retrieval, Springer, 2017, pp. 541–547 (2017)

work page 2017
[18]

Clickbait Identification using Neural Networks

P. Thomas, Clickbait identiﬁcation using neural networks, arXiv preprint arXiv:1710.08721 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017
[19]

J. Qu, A. M. Hißbach, T. Gollub, M. Potthast, Towards crowdsourcing clickbait labels for youtube videos. 24

work page
[20]

Zannettou, S

S. Zannettou, S. Chatzis, K. Papadamou, M. Sirivianos, The good, the bad and the bait: Detecting and characterizing clickbait on youtube, in: 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018, pp. 63–69 (2018)

work page 2018
[21]

D. Y. Zhang, J. Badilla, Y. Zhang, D. Wang, Towards reliable missing truth discovery in online social media sensing applications, in: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2018, pp. 143–150 (2018)

work page 2018
[22]

N. Vo, K. Lee, The rise of guardians: Fact-checking url recommendation to combat fake news, arXiv preprint arXiv:1806.07516 (2018)

work page arXiv 2018
[23]

X. Yin, J. Han, P. S. Yu, Truth discovery with multiple conﬂicting infor- mation providers on the web, IEEE Transactions on Knowledge and Data Engineering 20 (6) (2008) 796–808 (Jun. 2008). doi:10.1109/TKDE.2007. 190745

work page doi:10.1109/tkde.2007 2008
[24]

D. Wang, M. T. Amin, S. Li, T. Abdelzaher, L. Kaplan, S. Gu, C. Pan, H. Liu, C. C. Aggarwal, R. Ganti, et al., Using humans as sensors: an estimation-theoretic perspective, in: Information Processing in Sensor Net- works, IPSN-14 Proceedings of the 13th International Symposium on, IEEE, 2014, pp. 35–46 (2014)

work page 2014
[25]

D. Wang, T. Abdelzaher, L. Kaplan, C. C. Aggarwal, Recursive fact- ﬁnding: A streaming approach to truth estimation in crowdsourcing ap- plications, in: Distributed Computing Systems (ICDCS), 2013 IEEE 33rd International Conference on, IEEE, 2013, pp. 530–539 (2013)

work page 2013
[26]

Zhang, D

D. Zhang, D. Wang, N. Vance, Y. Zhang, S. Mike, On scalable and ro- bust truth discovery in big data social media sensing applications, IEEE Transactions on Big Data (2018)

work page 2018
[27]

D. Wang, L. Kaplan, H. Le, T. Abdelzaher, On truth discovery in social sensing: A maximum likelihood estimation approach, in: Proc. ACM/IEEE 11th Int Information Processing in Sensor Networks (IPSN) Conf, 2012, pp. 233–244 (Apr. 2012). doi:10.1109/IPSN.2012.6920960

work page doi:10.1109/ipsn.2012.6920960 2012
[28]

D. Y. Zhang, L. Shang, B. Geng, S. Lai, K. Li, H. Zhu, M. T. Amin, D. Wang, Fauxbuster: A content-free fauxtography detector using social media comments, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 891–900 (2018)

work page 2018
[29]

Huynh-Kha, T

T. Huynh-Kha, T. Le-Tien, S. Ha-Viet-Uyen, K. Huynh-Van, M. Luong, A robust algorithm of forgery detection in copy-move and spliced images, IJACSA) International Journal of Advanced Computer Science and Appli- cations 7 (3) (2016). 25

work page 2016
[30]

H. Wang, F. Zhang, M. Hou, X. Xie, M. Guo, Q. Liu, Shine: Signed hetero- geneous information network embedding for sentiment link prediction, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018, pp. 592–600 (2018)

work page 2018
[31]

Grover, J

A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2016, pp. 855–864 (2016)

work page 2016
[32]

Perozzi, R

B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 701– 710 (2014)

work page 2014
[33]

Huang, J

X. Huang, J. Li, X. Hu, Label informed attributed network embedding, in: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, ACM, 2017, pp. 731–739 (2017)

work page 2017
[34]

Zhang, Y

Y. Zhang, Y. Lu, D. Zhang, L. Shang, D. Wang, Risksens: A multi-view learning approach to identifying risky traﬃc locations in intelligent trans- portation systems using social and remote sensing, in: 2018 IEEE Inter- national Conference on Big Data (Big Data), IEEE, 2018, pp. 1544–1553 (2018)

work page 2018
[35]

Vincent, H

P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com- posing robust features with denoising autoencoders, in: Proceedings of the 25th international conference on Machine learning, ACM, 2008, pp. 1096– 1103 (2008)

work page 2008
[36]

G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507 (2006)

work page 2006
[37]

Q. Le, T. Mikolov, Distributed representations of sentences and docu- ments, in: Proceedings of the 31st International Conference on Interna- tional Conference on Machine Learning - Volume 32, ICML’14, JMLR.org, 2014 (2014)

work page 2014
[38]

Bajaj, M

P. Bajaj, M. Kavidayal, P. Srivastava, M. N. Akhtar, P. Kumaraguru, Disinformation in multimedia annotation: Misleading metadata detection on youtube, in: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion, ACM, 2016, pp. 53–61 (2016)

work page 2016
[39]

X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class- imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39 (2) (2008) 539–550 (2008)

work page 2008
[40]

P. M. Domingos, A few useful things to know about machine learning., Commun. acm 55 (10) (2012) 78–87 (2012). 26

work page 2012
[41]

Alpaydin, Introduction to machine learning, MIT press, 2014 (2014)

E. Alpaydin, Introduction to machine learning, MIT press, 2014 (2014)

work page 2014
[42]

Zhang, J

X. Zhang, J. Zou, K. He, J. Sun, Accelerating very deep convolutional networks for classiﬁcation and detection, IEEE transactions on pattern analysis and machine intelligence 38 (10) (2015) 1943–1955 (2015)

work page 2015
[43]

Chakraborty, B

A. Chakraborty, B. Paranjape, S. Kakarla, N. Ganguly, Stop clickbait: Detecting and preventing clickbaits in online news media, in: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Press, 2016, pp. 9–16 (2016)

work page 2016
[44]

J. A. Hertz, Introduction to the theory of neural computation, CRC Press, 2018 (2018)

work page 2018
[45]

Ferrara, O

E. Ferrara, O. Varol, C. Davis, F. Menczer, A. Flammini, The rise of social bots, Communications of the ACM 59 (7) (2016) 96–104 (2016)

work page 2016
[46]

D. Wang, B. K. Szymanski, T. Abdelzaher, H. Ji, L. Kaplan, The age of social sensing, Computer 52 (1) (2019) 36–45 (2019). 27

work page 2019

[1] [1]

GroupM, Groupm introduces state of digital report, available at https:// www.groupm.com/news/groupm-introduces-state-digital-report, accessed 2019-01-07 (2018)

work page 2019

[2] [2]

Molla, Next year, people will spend more time online than they will watching tv

R. Molla, Next year, people will spend more time online than they will watching tv. that’s a ﬁrst., available at https://www.recode.net/2018/6/ 8/17441288/internet-time-spent-tv-zenith-data-media, accessed 2019-02- 14 (2018)

work page 2018

[3] [3]

B¨ artl, Youtube channels, uploads and views: A statistical analysis of the past 10 years, Convergence 24 (1) (2018) 16–32 (2018)

M. B¨ artl, Youtube channels, uploads and views: A statistical analysis of the past 10 years, Convergence 24 (1) (2018) 16–32 (2018)

work page 2018

[4] [4]

M. M. U. Rony, N. Hassan, M. Yousuf, Diving deep into clickbaits: Who use them to what extents in which topics with what eﬀects?, in: Proceedings of the 2017 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2017, ACM, 2017, pp. 232–239 (2017)

work page 2017

[5] [5]

D. Wang, T. Abdelzaher, L. Kaplan, Social sensing: building reliable sys- tems on unreliable data, Morgan Kaufmann, 2015 (2015)

work page 2015

[6] [6]

M. D. Zeiler, R. Fergus, Visualizing and understanding convolutional net- works, in: European conference on computer vision, Springer, 2014, pp. 818–833 (2014)

work page 2014

[7] [7]

Zhang, N

Y. Zhang, N. Vance, D. Zhang, D. Wang, On opinion characterization in social sensing: A multi-view subspace learning approach, in: 2018 14th International Conference on Distributed Computing in Sensor Systems (DCOSS), IEEE, 2018, pp. 155–162 (2018). 23

work page 2018

[8] [8]

Papadopoulou, M

O. Papadopoulou, M. Zampoglou, S. Papadopoulos, Y. Kompatsiaris, Web video veriﬁcation using contextual cues, in: Proceedings of the 2nd Inter- national Workshop on Multimedia Forensics and Security, ACM, 2017, pp. 6–10 (2017)

work page 2017

[9] [9]

D. Y. Zhang, L. Song, Q. Li, Y. Zhang, D. Wang, Streamguard: A bayesian network approach to copyright infringement detection problem in large- scale live video sharing systems, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 901–910 (2018)

work page 2018

[10] [10]

Potthast, S

M. Potthast, S. K¨ opsel, B. Stein, M. Hagen, Clickbait detection, in: Eu- ropean Conference on Information Retrieval, Springer, 2016, pp. 810–817 (2016)

work page 2016

[11] [11]

M. Huh, A. Liu, A. Owens, A. A. Efros, Fighting fake news: Image splice detection via learned self-consistency, in: Proceedings of the European Con- ference on Computer Vision (ECCV), 2018, pp. 101–117 (2018)

work page 2018

[12] [12]

In Ictu Oculi: Exposing AI Generated Fake Face Videos by Detecting Eye Blinking

Y. Li, M.-C. Chang, H. Farid, S. Lyu, In ictu oculi: Exposing ai generated fake face videos by detecting eye blinking, arXiv preprint arXiv:1806.02877 (2018)

work page internal anchor Pith review Pith/arXiv arXiv 2018

[13] [13]

K. Fagan, Youtube’s clickbait problem might not be ﬁxable, available at https://yr.media/tech/ youtubes-clickbait-problem-is-out-of-hand-and-there-may-be-no-ﬁxing-it/, accessed 2019-02-25 (2018)

work page 2019

[14] [14]

To clickbait or not to clickbait: What you need to know about headlines and clickbaits, available at https://marketinginsidergroup.com/ content-marketing/what-you-need-to-know-headlines-clickbaits/, accessed 2019-02-19 (2016)

work page 2019

[15] [15]

Agrawal, Clickbait detection using deep learning, in: 2016 2nd Interna- tional Conference on Next Generation Computing Technologies (NGCT), IEEE, 2016, pp

A. Agrawal, Clickbait detection using deep learning, in: 2016 2nd Interna- tional Conference on Next Generation Computing Technologies (NGCT), IEEE, 2016, pp. 268–272 (2016)

work page 2016

[16] [16]

Potthast, T

M. Potthast, T. Gollub, K. Komlossy, S. Schuster, M. Wiegmann, E. P. G. Fernandez, M. Hagen, B. Stein, Crowdsourcing a large corpus of clickbait on twitter, in: Proceedings of the 27th International Conference on Com- putational Linguistics, 2018, pp. 1498–1507 (2018)

work page 2018

[17] [17]

Anand, T

A. Anand, T. Chakraborty, N. Park, We used neural networks to detect clickbaits: You won’t believe what happened next!, in: European Confer- ence on Information Retrieval, Springer, 2017, pp. 541–547 (2017)

work page 2017

[18] [18]

Clickbait Identification using Neural Networks

P. Thomas, Clickbait identiﬁcation using neural networks, arXiv preprint arXiv:1710.08721 (2017)

work page internal anchor Pith review Pith/arXiv arXiv 2017

[19] [19]

J. Qu, A. M. Hißbach, T. Gollub, M. Potthast, Towards crowdsourcing clickbait labels for youtube videos. 24

work page

[20] [20]

Zannettou, S

S. Zannettou, S. Chatzis, K. Papadamou, M. Sirivianos, The good, the bad and the bait: Detecting and characterizing clickbait on youtube, in: 2018 IEEE Security and Privacy Workshops (SPW), IEEE, 2018, pp. 63–69 (2018)

work page 2018

[21] [21]

D. Y. Zhang, J. Badilla, Y. Zhang, D. Wang, Towards reliable missing truth discovery in online social media sensing applications, in: 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2018, pp. 143–150 (2018)

work page 2018

[22] [22]

N. Vo, K. Lee, The rise of guardians: Fact-checking url recommendation to combat fake news, arXiv preprint arXiv:1806.07516 (2018)

work page arXiv 2018

[23] [23]

X. Yin, J. Han, P. S. Yu, Truth discovery with multiple conﬂicting infor- mation providers on the web, IEEE Transactions on Knowledge and Data Engineering 20 (6) (2008) 796–808 (Jun. 2008). doi:10.1109/TKDE.2007. 190745

work page doi:10.1109/tkde.2007 2008

[24] [24]

D. Wang, M. T. Amin, S. Li, T. Abdelzaher, L. Kaplan, S. Gu, C. Pan, H. Liu, C. C. Aggarwal, R. Ganti, et al., Using humans as sensors: an estimation-theoretic perspective, in: Information Processing in Sensor Net- works, IPSN-14 Proceedings of the 13th International Symposium on, IEEE, 2014, pp. 35–46 (2014)

work page 2014

[25] [25]

D. Wang, T. Abdelzaher, L. Kaplan, C. C. Aggarwal, Recursive fact- ﬁnding: A streaming approach to truth estimation in crowdsourcing ap- plications, in: Distributed Computing Systems (ICDCS), 2013 IEEE 33rd International Conference on, IEEE, 2013, pp. 530–539 (2013)

work page 2013

[26] [26]

Zhang, D

D. Zhang, D. Wang, N. Vance, Y. Zhang, S. Mike, On scalable and ro- bust truth discovery in big data social media sensing applications, IEEE Transactions on Big Data (2018)

work page 2018

[27] [27]

D. Wang, L. Kaplan, H. Le, T. Abdelzaher, On truth discovery in social sensing: A maximum likelihood estimation approach, in: Proc. ACM/IEEE 11th Int Information Processing in Sensor Networks (IPSN) Conf, 2012, pp. 233–244 (Apr. 2012). doi:10.1109/IPSN.2012.6920960

work page doi:10.1109/ipsn.2012.6920960 2012

[28] [28]

D. Y. Zhang, L. Shang, B. Geng, S. Lai, K. Li, H. Zhu, M. T. Amin, D. Wang, Fauxbuster: A content-free fauxtography detector using social media comments, in: 2018 IEEE International Conference on Big Data (Big Data), IEEE, 2018, pp. 891–900 (2018)

work page 2018

[29] [29]

Huynh-Kha, T

T. Huynh-Kha, T. Le-Tien, S. Ha-Viet-Uyen, K. Huynh-Van, M. Luong, A robust algorithm of forgery detection in copy-move and spliced images, IJACSA) International Journal of Advanced Computer Science and Appli- cations 7 (3) (2016). 25

work page 2016

[30] [30]

H. Wang, F. Zhang, M. Hou, X. Xie, M. Guo, Q. Liu, Shine: Signed hetero- geneous information network embedding for sentiment link prediction, in: Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining, ACM, 2018, pp. 592–600 (2018)

work page 2018

[31] [31]

Grover, J

A. Grover, J. Leskovec, node2vec: Scalable feature learning for networks, in: Proceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2016, pp. 855–864 (2016)

work page 2016

[32] [32]

Perozzi, R

B. Perozzi, R. Al-Rfou, S. Skiena, Deepwalk: Online learning of social representations, in: Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining, ACM, 2014, pp. 701– 710 (2014)

work page 2014

[33] [33]

Huang, J

X. Huang, J. Li, X. Hu, Label informed attributed network embedding, in: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, ACM, 2017, pp. 731–739 (2017)

work page 2017

[34] [34]

Zhang, Y

Y. Zhang, Y. Lu, D. Zhang, L. Shang, D. Wang, Risksens: A multi-view learning approach to identifying risky traﬃc locations in intelligent trans- portation systems using social and remote sensing, in: 2018 IEEE Inter- national Conference on Big Data (Big Data), IEEE, 2018, pp. 1544–1553 (2018)

work page 2018

[35] [35]

Vincent, H

P. Vincent, H. Larochelle, Y. Bengio, P.-A. Manzagol, Extracting and com- posing robust features with denoising autoencoders, in: Proceedings of the 25th international conference on Machine learning, ACM, 2008, pp. 1096– 1103 (2008)

work page 2008

[36] [36]

G. E. Hinton, R. R. Salakhutdinov, Reducing the dimensionality of data with neural networks, science 313 (5786) (2006) 504–507 (2006)

work page 2006

[37] [37]

Q. Le, T. Mikolov, Distributed representations of sentences and docu- ments, in: Proceedings of the 31st International Conference on Interna- tional Conference on Machine Learning - Volume 32, ICML’14, JMLR.org, 2014 (2014)

work page 2014

[38] [38]

Bajaj, M

P. Bajaj, M. Kavidayal, P. Srivastava, M. N. Akhtar, P. Kumaraguru, Disinformation in multimedia annotation: Misleading metadata detection on youtube, in: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion, ACM, 2016, pp. 53–61 (2016)

work page 2016

[39] [39]

X.-Y. Liu, J. Wu, Z.-H. Zhou, Exploratory undersampling for class- imbalance learning, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 39 (2) (2008) 539–550 (2008)

work page 2008

[40] [40]

P. M. Domingos, A few useful things to know about machine learning., Commun. acm 55 (10) (2012) 78–87 (2012). 26

work page 2012

[41] [41]

Alpaydin, Introduction to machine learning, MIT press, 2014 (2014)

E. Alpaydin, Introduction to machine learning, MIT press, 2014 (2014)

work page 2014

[42] [42]

Zhang, J

X. Zhang, J. Zou, K. He, J. Sun, Accelerating very deep convolutional networks for classiﬁcation and detection, IEEE transactions on pattern analysis and machine intelligence 38 (10) (2015) 1943–1955 (2015)

work page 2015

[43] [43]

Chakraborty, B

A. Chakraborty, B. Paranjape, S. Kakarla, N. Ganguly, Stop clickbait: Detecting and preventing clickbaits in online news media, in: Proceedings of the 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, IEEE Press, 2016, pp. 9–16 (2016)

work page 2016

[44] [44]

J. A. Hertz, Introduction to the theory of neural computation, CRC Press, 2018 (2018)

work page 2018

[45] [45]

Ferrara, O

E. Ferrara, O. Varol, C. Davis, F. Menczer, A. Flammini, The rise of social bots, Communications of the ACM 59 (7) (2016) 96–104 (2016)

work page 2016

[46] [46]

D. Wang, B. K. Szymanski, T. Abdelzaher, H. Ji, L. Kaplan, The age of social sensing, Computer 52 (1) (2019) 36–45 (2019). 27

work page 2019