When Youth Enter the Algorithmic Wild: Discovering and Understanding Potentially Harmful Teen Videos on Douyin and Kwai

Jing Zhang; Shaoxuan Zhou; Xianghang Mi; Yafei Sun

arxiv: 2605.23598 · v1 · pith:FEPOCXY2new · submitted 2026-05-22 · 💻 cs.CR · cs.HC

When Youth Enter the Algorithmic Wild: Discovering and Understanding Potentially Harmful Teen Videos on Douyin and Kwai

Shaoxuan Zhou , Yafei Sun , Jing Zhang , Xianghang Mi This is my paper

Pith reviewed 2026-05-25 04:18 UTC · model grok-4.3

classification 💻 cs.CR cs.HC

keywords potentially harmful teen videosshort-video platformsrecommendation algorithmsYouth ModeDouyinKwaiadolescent safetycontent moderation

0 comments

The pith

Youth Mode blocks every potentially harmful teen video on Douyin and Kwai, yet only 30 to 41 percent of teens activate it.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops PHTV-Scout to measure how often teens encounter potentially harmful videos on popular short-video apps. It combines a survey of 683 adolescents with simulated accounts that gather real recommendation feeds, then applies a multimodal classifier and analysis tools to detect and categorize the content. Results indicate a 6.11 percent prevalence of such videos, mostly child sexual exploitation imagery, that spreads through evasion tactics and covert comments. Youth Mode stops all of them when enabled, but low adoption and algorithmic factors mean exposure happens through regulation and passive browsing regardless of user identity. A sympathetic reader would care because the work shows how platform designs shape teen safety in algorithmic spaces.

Core claim

The PHTV-Scout framework, built from an offline survey of 683 adolescents and a tri-module pipeline of simulated accounts, a LoRA-finetuned classifier, and fine-grained analysis, examined 186,727 videos and found 6.11 percent to be potentially harmful teen videos, with 53.2 percent of those involving child sexual exploitation imagery. Harmful content persists via semantic camouflage, noise injection, and grooming comments. Youth Mode blocks 100 percent of these videos, yet adoption stands at only 30-41 percent, and exposure arises from platform regulation, algorithms, and passive browsing rather than user identity.

What carries the argument

PHTV-Scout, a behaviorally grounded measurement framework that integrates an offline adolescent survey with PHTV Hunter simulated accounts for feed collection, PHTV Arbiter for 94.29 percent accurate detection, and PHTV Analyzer for categorization and impact assessment.

If this is right

Youth Mode provides complete protection when active, so increasing its adoption would directly reduce exposure for most teens.
Exposure occurs through passive browsing and algorithmic amplification, implying that changes to recommendation systems affect all users regardless of identity.
Harmful videos rely on covert interactions such as grooming comments, so moderation of comments alongside videos is needed.
The 6.11 percent prevalence and dominance of child sexual exploitation imagery indicate that current platform safeguards leave substantial harmful content in teen feeds.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same simulation-plus-survey approach could be adapted to measure exposure on other short-video platforms with similar recommendation systems.
Making Youth Mode the default setting rather than an opt-in feature would address the low adoption rates without depending on individual teen choices.
The finding that regulation and algorithms drive exposure more than user identity suggests platform-level policy changes could have broader effects than content takedowns alone.

Load-bearing premise

The offline survey of 683 adolescents produces representative behavioral patterns that can be faithfully replicated by the PHTV Hunter simulated accounts to collect authentic recommendation feeds.

What would settle it

A side-by-side comparison of PHTV rates collected from the simulated accounts versus a set of actual teen accounts on the same platforms would show whether the simulated feeds accurately reflect real exposure.

Figures

Figures reproduced from arXiv: 2605.23598 by Jing Zhang, Shaoxuan Zhou, Xianghang Mi, Yafei Sun.

**Figure 1.** Figure 1: Overview of PHTV-Scout (Section 3.3) is leveraged to automatically annotate each PHTV with informative attributes (e.g., fine-grained categories), to well facilitate the in-depth characterization presented in Section 4. 3.1 PHTV Hunter: Teenager Account Mimic and Video Collection Across discovery and attribution, we adopt ethical practices from previous works [22,38] and carefully design our methodology… view at source ↗

**Figure 2.** Figure 2: Sentiment analysis of PHTV comments (N=51,287). [PITH_FULL_IMAGE:figures/full_fig_p010_2.png] view at source ↗

**Figure 3.** Figure 3: Thematic analysis of PHTV comments (N=51,287). [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 4.** Figure 4: Hourly upload pattern of PHTVs (2-hour bins). [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗

**Figure 5.** Figure 5: Daily PHTV exposure rate for account F after pas [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Comparative analysis of PHTV exposure across static identity. [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗

**Figure 7.** Figure 7: Distribution of primary short-video platform prefer [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of daily usage duration on school days [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Exposure rates to nine categories of harmful content [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

**Figure 11.** Figure 11: 2D projections of the same clustering result, show [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 10.** Figure 10: 3D visualization of adolescent clusters based on [PITH_FULL_IMAGE:figures/full_fig_p023_10.png] view at source ↗

**Figure 12.** Figure 12: Distribution of interactive behaviors across age [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

**Figure 13.** Figure 13: Template structure of the training prompt for the binary classifier. [Note: This figure illustrates the layout of the prompt, [PITH_FULL_IMAGE:figures/full_fig_p030_13.png] view at source ↗

read the original abstract

Short-video platforms like Douyin and Kwai have become central to adolescent digital life, but they also risk exposing teens to algorithmically amplified harmful content. Despite its societal importance, the scale, mechanisms, and real-world impact of this exposure remain poorly understood. Measuring it is challenging: recommendation feeds are personalized black boxes, harmful content employs sophisticated evasion tactics, and naive crawlers fail to replicate authentic teen behavior. To bridge this gap, we propose PHTV-Scout, the first large-scale, behaviorally grounded measurement framework for Potentially Harmful Teen Videos (PHTVs). We integrate an offline survey of 683 adolescents with a tri-module online pipeline: (1) PHTV Hunter simulates teen accounts to collect recommendation feeds; (2) PHTV Arbiter, a LoRA-finetuned multimodal classifier, detects PHTVs with 94.29% accuracy and 96.41% precision; and (3) PHTV Analyzer performs fine-grained categorization and impact assessment. Over six months, we analyzed 186,727 videos and 51,287 comments, uncovering a troubling 6.11% PHTV prevalence--dominated by Child Sexual Exploitation Imagery (53.2%)--and revealing that harmful content thrives through covert interactions (e.g., grooming comments, self-disclosure) and active evasion (semantic camouflage, noise injection). Crucially, while Youth Mode blocks 100% of PHTVs, its low adoption (30-41%) leaves most teens unprotected. We further show that exposure is driven not by user identity but by regulation, platform algorithms, and even passive browsing, exposing the fragility of adolescent information environments. Our findings call for a paradigm shift from reactive takedowns to proactive, human-centered safeguards.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives first-scale prevalence numbers on harmful teen videos for Douyin and Kwai but its teen-behavior simulation from the survey is unvalidated.

read the letter

The main takeaway is that this work reports a 6.11% rate of potentially harmful videos in simulated teen feeds on two major Chinese short-video platforms, with child sexual exploitation imagery making up over half. It also finds Youth Mode stops all such content yet sees only 30-41% adoption, and argues exposure stems from platform rules and passive use rather than user traits. That is the concrete output worth noting first. The tri-module pipeline is the clearest addition. It starts from an offline survey of 683 teens to configure simulated accounts, then applies a LoRA-tuned multimodal classifier (94% accuracy claimed) and does fine-grained analysis on 186k videos plus comments. This is more tied to observed teen patterns than plain crawlers, and the scale plus the breakdown of comment interactions and evasion tactics give usable detail. The soft spot sits in the simulation step itself. The survey responses need to produce account settings and browsing that actually match real teen recommendation feeds, but the paper gives no check on that mapping. Self-reported behavior often diverges from platform signals on session length or search patterns, so the claims about what drives exposure and the 100% Youth Mode block could shift if the simulation is off. Classifier validation details and error bars are also thin in what is shown. This paper is for researchers working on platform safety, child protection policy, and recommendation measurement. Readers who need grounded numbers on Chinese short-video harms or who study behavioral simulation methods will find parts to use. It deserves a serious referee because the topic and scale matter and the approach improves on prior crawls, even though the simulation validation needs direct attention. Recommendation: send it to review and request added tests on how well the simulated accounts replicate real teen feeds.

Referee Report

2 major / 2 minor

Summary. The paper introduces PHTV-Scout, a behaviorally grounded measurement framework for Potentially Harmful Teen Videos (PHTVs) on Douyin and Kwai. It combines an offline survey of 683 adolescents to parameterize simulated teen accounts in the PHTV Hunter module, which collects recommendation feeds; a LoRA-finetuned multimodal classifier (PHTV Arbiter) achieving 94.29% accuracy and 96.41% precision; and PHTV Analyzer for categorization and impact assessment. Over six months the pipeline processes 186,727 videos and 51,287 comments, reporting 6.11% PHTV prevalence (53.2% Child Sexual Exploitation Imagery), 100% blocking by Youth Mode (adoption 30-41%), and that exposure is driven by regulation, platform algorithms, and passive browsing rather than user identity.

Significance. If the simulation faithfully reproduces real adolescent recommendation streams, the work supplies the first large-scale, longitudinal evidence on PHTV prevalence and evasion tactics on two dominant short-video platforms, together with a concrete demonstration that current Youth Mode is effective yet under-adopted. The emphasis on algorithmic and regulatory drivers rather than individual identity offers a useful reframing for platform-safety research and policy.

major comments (2)

[PHTV Hunter module] Section describing PHTV Hunter (tri-module pipeline): the central claim that exposure is independent of user identity and driven by regulation/algorithms/passive browsing rests on the unvalidated assumption that survey-derived parameters produce recommendation feeds statistically indistinguishable from those seen by real teens. No cross-validation against held-out real-user feeds, A/B comparison of session statistics, or sensitivity analysis on survey-to-parameter mapping is reported; any systematic mismatch would artifactually support the independence conclusion.
[Youth Mode results] Youth Mode evaluation paragraph: the statement that Youth Mode blocks 100% of PHTVs is presented without a described test set, number of PHTV instances evaluated, or confirmation that the simulated accounts exercised the full range of evasion tactics identified by PHTV Analyzer. Because this result is used to argue that low adoption (30-41%) is the sole remaining barrier, the missing validation details are load-bearing.

minor comments (2)

[Methods] The abstract and methods should explicitly state the train/validation/test split sizes and any post-hoc filtering applied to the 186,727-video corpus before prevalence calculation.
[PHTV Analyzer] Clarify whether the 51,287 comments were sampled uniformly or conditioned on PHTV detection; the current description leaves open the possibility that comment analysis over-represents harmful videos.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important areas for strengthening the methodological transparency of our work. We address each major comment below and indicate the revisions we will make.

read point-by-point responses

Referee: [PHTV Hunter module] Section describing PHTV Hunter (tri-module pipeline): the central claim that exposure is independent of user identity and driven by regulation/algorithms/passive browsing rests on the unvalidated assumption that survey-derived parameters produce recommendation feeds statistically indistinguishable from those seen by real teens. No cross-validation against held-out real-user feeds, A/B comparison of session statistics, or sensitivity analysis on survey-to-parameter mapping is reported; any systematic mismatch would artifactually support the independence conclusion.

Authors: We acknowledge that the manuscript does not include explicit cross-validation of the simulated feeds against real-user data or a formal sensitivity analysis on the survey-to-parameter mapping. The parameterization draws directly from the 683-adolescent survey to reflect observed behaviors, but the absence of these checks is a limitation. In revision we will add a sensitivity analysis that varies key survey-derived parameters (e.g., session length, topic preferences, and interaction rates) and report how the PHTV prevalence and independence conclusions change. We will also expand the limitations section to discuss the implications of any potential mismatch between simulated and real recommendation streams for the claim that exposure is driven by regulation and algorithms rather than user identity. revision: partial
Referee: [Youth Mode results] Youth Mode evaluation paragraph: the statement that Youth Mode blocks 100% of PHTVs is presented without a described test set, number of PHTV instances evaluated, or confirmation that the simulated accounts exercised the full range of evasion tactics identified by PHTV Analyzer. Because this result is used to argue that low adoption (30-41%) is the sole remaining barrier, the missing validation details are load-bearing.

Authors: We agree that the Youth Mode evaluation paragraph lacks necessary methodological details. The 100% blocking result was obtained by enabling Youth Mode on the same set of simulated accounts used throughout the study and re-collecting feeds; the PHTVs tested were the 11,412 instances identified by PHTV Arbiter across the six-month collection. In the revised manuscript we will explicitly state the test-set size, confirm that the simulated accounts incorporated all evasion tactics catalogued by PHTV Analyzer (semantic camouflage, noise injection, etc.), and report the exact number of PHTV instances re-evaluated under Youth Mode. These additions will make the claim fully reproducible and will clarify that the low adoption rate remains the primary barrier. revision: yes

Circularity Check

0 steps flagged

No significant circularity; survey-grounded simulation feeds independent platform measurements

full rationale

The derivation chain begins with an offline survey of 683 adolescents whose self-reported behaviors parameterize PHTV Hunter account simulations; these simulations then collect real recommendation feeds from Douyin and Kwai. PHTV Arbiter (LoRA-finetuned classifier) labels the collected videos, and PHTV Analyzer extracts prevalence, categories, and Youth Mode results. None of these steps reduce by construction to the survey inputs: the 6.11% prevalence, 100% Youth Mode block rate, and driver attributions are outputs of platform interactions, not tautological re-expressions of survey statistics. No equations, fitted parameters renamed as predictions, self-citation load-bearing premises, or imported uniqueness theorems appear. The framework is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities beyond the framework name itself. The 6.11% prevalence figure is presented as measured output rather than a fitted constant.

pith-pipeline@v0.9.0 · 5869 in / 1323 out tokens · 19240 ms · 2026-05-25T04:18:43.437644+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · 3 internal anchors

[1]

Twitter spam account detection based on clustering and classification methods.Journal of Supercomputing, 76(7), 2020

Kayode Sakariyah Adewole, Tao Han, Wanqing Wu, Houbing Song, and Arun Kumar Sangaiah. Twitter spam account detection based on clustering and classification methods.Journal of Supercomputing, 76(7), 2020

work page 2020
[2]

Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D

Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co- Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, and Hugo Larochelle. Many-shot in-context learning,

work page
[3]

URL: https://arxiv.org/abs/2404.11018, arXiv:2404.11018

work page arXiv
[4]

Offensive video detection: Dataset and baseline re- sults

Cleber Alcântara, Viviane Moreira, and Diego Feijo. Offensive video detection: Dataset and baseline re- sults. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry De- clerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors,...

work page 2020
[5]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond.arXiv preprint arXiv:2308.12966, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Your fairness may vary: Pretrained language model fairness in toxic text classification

Ioana Baldini, Dennis Wei, Karthikeyan Natesan Rama- murthy, Moninder Singh, and Mikhail Yurochkin. Your fairness may vary: Pretrained language model fairness in toxic text classification. InFindings of the associ- ation for computational linguistics: ACL 2022, pages 2245–2262, 2022

work page 2022
[7]

Dynamics of algorithmic content amplification on tiktok, 2025

Fabian Baumann, Nipun Arora, Iyad Rahwan, and Ag- nieszka Czaplicka. Dynamics of algorithmic content amplification on tiktok, 2025. URL: https://arxiv. org/abs/2503.20231,arXiv:2503.20231

work page arXiv 2025
[8]

Gormley, and Graham Neubig

Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. In-context learning with long-context models: An in-depth exploration, 2025. URL: https://arxiv. org/abs/2405.00200,arXiv:2405.00200

work page arXiv 2025
[9]

An empiri- cal investigation of personalization factors on tiktok

Maximilian Boeker and Aleksandra Urman. An empiri- cal investigation of personalization factors on tiktok. In Proceedings of the ACM Web Conference 2022, WWW ’22, page 2298–2309. ACM, April 2022. URL: http: //dx.doi.org/10.1145/3485447.3512102, doi:10. 1145/3485447.3512102

work page doi:10.1145/3485447.3512102 2022
[10]

Detection and visualization of misleading content on twitter.International Journal of Multimedia Information Retrieval, 7(1):71–86, 2018

Christina Boididou, Symeon Papadopoulos, Markos Zampoglou, Lazaros Apostolidis, Olga Papadopoulou, and Yiannis Kompatsiaris. Detection and visualization of misleading content on twitter.International Journal of Multimedia Information Retrieval, 7(1):71–86, 2018

work page 2018
[11]

Language models are few-shot learn- ers.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learn- ers.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901
[12]

Cyberbullying detection using recursive neural network through offline repository

Nidhi Chandra, Sunil Kumar Khatri, and Subhranil Som. Cyberbullying detection using recursive neural network through offline repository. In2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), pages 748–754. IEEE, 2018

work page 2018
[13]

China Central Television (CCTV). [24 hours]addressing the chaos in short videos involving minors, which includes inducing them to engage in dangerous activities offline and even directly committing acts of harm against them. https://tv.cctv.cn/2025/ 09/17/VIDESfLEzwZq0TCKjyofRnQB250917.shtml,

work page 2025
[14]

PaddleOCR 3.0 Technical Report

Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, Yue Zhang, Wenyu Lv, Kui Huang, Yichao Zhang, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, and Yanjun Ma. Paddleocr 3.0 technical report, 2025. URL: https://arxiv.org/abs/2507. 05595,arXiv:2507.05595

work page internal anchor Pith review Pith/arXiv arXiv 2025
[15]

Hatemm: A multi-modal dataset for hate video classification

Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, and Animesh Mukherjee. Hatemm: A multi-modal dataset for hate video classification. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, pages 1014–1023, 2023

work page 2023
[16]

Teens, so- cial media and ai chatbots 2025, 2025

Michelle Faverio and Olivia Sidoti. Teens, so- cial media and ai chatbots 2025, 2025. On- line; accessed 2026-01-15. URL: https: //www.pewresearch.org/internet/2025/12/09/ teens-social-media-and-ai-chatbots-2025/

work page 2025
[17]

Towards online spam filtering in social networks

Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia, and Alok N Choudhary. Towards online spam filtering in social networks. InNDSS, volume 12, pages 1–16, 2012

work page 2012
[18]

Convo- lutional neural networks for toxic comment classifica- tion

Spiros V Georgakopoulos, Sotiris K Tasoulis, Aris- tidis G Vrahatis, and Vassilis P Plagianakos. Convo- lutional neural networks for toxic comment classifica- tion. InProceedings of the 10th hellenic conference on artificial intelligence, pages 1–6, 2018. 18

work page 2018
[19]

Amer- icans’ social media use 2025, 2025

Jeffrey Gottfried and Eugenie Park. Amer- icans’ social media use 2025, 2025. On- line; accessed 2026-01-15. URL: https: //www.pewresearch.org/internet/2025/11/ 20/americans-social-media-use-2025/

work page 2025
[20]

Spam detection using knn and decision tree mechanism in social network

Saumya Goyal, RK Chauhan, and Shabnam Parveen. Spam detection using knn and decision tree mechanism in social network. In2016 Fourth International Con- ference on Parallel, Distributed and Grid Computing (PDGC), pages 522–526. IEEE, 2016

work page 2016
[21]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022
[22]

Chen, and An- drew Y

Yixing Jiang, Jeremy Irvin, Ji Hun Wang, Muham- mad Ahmed Chaudhry, Jonathan H. Chen, and An- drew Y . Ng. Many-shot in-context learning in mul- timodal foundation models, 2024. URL: https:// arxiv.org/abs/2405.09798,arXiv:2405.09798

work page arXiv 2024
[23]

Harmful youtube video detection: A taxon- omy of online harm and mllms as alternative annotators,

Claire Wonjeong Jo, Miki Wesołowska, and Magdalena Wojcieszak. Harmful youtube video detection: A taxon- omy of online harm and mllms as alternative annotators,

work page
[24]

URL: https://arxiv.org/abs/2411.05854, arXiv:2411.05854

work page arXiv
[25]

Evaluation of text classification techniques for in- appropriate web content blocking

Igor Kotenko, Andrey Chechulin, and Dmitry Komashin- sky. Evaluation of text classification techniques for in- appropriate web content blocking. In2015 IEEE 8th In- ternational Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Ap- plications (IDAACS), volume 1, pages 412–417. IEEE, 2015

work page 2015
[26]

Towards explainable harmful meme detection through multimodal debate be- tween large language models

Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo Wang, and Ruichao Yang. Towards explainable harmful meme detection through multimodal debate be- tween large language models. InProceedings of the ACM Web Conference 2024, pages 2359–2370, 2024

work page 2024
[27]

Toxic- chat: Unveiling hidden challenges of toxicity detec- tion in real-world user-ai conversation.arXiv preprint arXiv:2310.17389, 2023

Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, and Jingbo Shang. Toxic- chat: Unveiling hidden challenges of toxicity detec- tion in real-world user-ai conversation.arXiv preprint arXiv:2310.17389, 2023

work page arXiv 2023
[28]

Self-harm and its association with internet addiction and internet exposure to suicidal thought in adolescents.Journal of the Formosan Medical Association, 116(3):153–160, 2017

Hui-Ching Liu, Shen-Ing Liu, Jin-Jin Tjung, Fang-Ju Sun, Hui-Chun Huang, and Chun-Kai Fang. Self-harm and its association with internet addiction and internet exposure to suicidal thought in adolescents.Journal of the Formosan Medical Association, 116(3):153–160, 2017

work page 2017
[29]

Fake News Detection on Social Media using Geometric Deep Learning

Federico Monti, Fabrizio Frasca, Davide Eynard, Da- mon Mannion, and Michael M Bronstein. Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[30]

A bert-based transfer learning approach for hate speech detection in online social media

Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. A bert-based transfer learning approach for hate speech detection in online social media. InInternational confer- ence on complex networks and their applications, pages 928–940. Springer, 2019

work page 2019
[31]

Mtikguard sys- tem: A transformer-based multimodal system for child- safe content moderation on tiktok.arXiv preprint arXiv:2511.17955, 2025

Dat Thanh Nguyen, Nguyen Hung Lam, Anh Hoang- Thi Nguyen, and Trong-Hop Do. Mtikguard sys- tem: A transformer-based multimodal system for child- safe content moderation on tiktok.arXiv preprint arXiv:2511.17955, 2025

work page arXiv 2025
[32]

Disturbed youtube for kids: Characterizing and detecting disturbing content on youtube.CoRR, abs/1901.07046, 2019

Kostantinos Papadamou, Antonis Papasavva, Savvas Zannettou, Jeremy Blackburn, Nicolas Kourtellis, Il- ias Leontiadis, Gianluca Stringhini, and Michael Siri- vianos. Disturbed youtube for kids: Characterizing and detecting disturbing content on youtube.CoRR, abs/1901.07046, 2019. URL: http://arxiv.org/ abs/1901.07046,arXiv:1901.07046

work page arXiv 1901
[33]

A comprehensive framework for multi-modal hate speech detection in social media using deep learning.Scientific Reports, 15(1):13020, 2025

R Prabhu and V Seethalakshmi. A comprehensive framework for multi-modal hate speech detection in social media using deep learning.Scientific Reports, 15(1):13020, 2025

work page 2025
[34]

Momenta: A multimodal framework for detecting harmful memes and their targets

Shraman Pramanick, Shivam Sharma, Dimitar Dim- itrov, Md Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. Momenta: A multimodal framework for detecting harmful memes and their targets. InFind- ings of the association for computational linguistics: EMNLP 2021, pages 4439–4455, 2021

work page 2021
[35]

Stress and adoles- cence: vulnerability and opportunity during a sensitive window of development.Current opinion in psychology, 44:286–292, 2022

Lucinda M Sisk and Dylan G Gee. Stress and adoles- cence: vulnerability and opportunity during a sensitive window of development.Current opinion in psychology, 44:286–292, 2022

work page 2022
[36]

Bringing the kid back into youtube kids: detecting inappropriate content on video streaming platforms

Rashid Tahir, Faizan Ahmed, Hammas Saeed, Shiza Ali, Fareed Zaffar, and Christo Wilson. Bringing the kid back into youtube kids: detecting inappropriate content on video streaming platforms. InProceed- ings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’19, page 464–469, New York, NY , USA,

work page 2019
[37]

doi: 10.1145/3341161.3342913

Association for Computing Machinery. doi: 10.1145/3341161.3342913

work page doi:10.1145/3341161.3342913
[38]

Qwen3-max: Just scale it, September 2025

Qwen Team. Qwen3-max: Just scale it, September 2025

work page 2025
[39]

Supporting human raters with the detection of harmful content using large language models

Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Brataniˇc, Felipe Tiengo Ferreira, Vijay Kumar Eranti, and Elie 19 Bursztein. Supporting human raters with the detection of harmful content using large language models. In2025 IEEE Symposium on Security and Privacy (SP), pages 2772–2789. IEEE, 2025

work page 2025
[40]

Don’t follow me: Spam detection in twitter

Alex Hai Wang. Don’t follow me: Spam detection in twitter. In2010 international conference on security and cryptography (SECRYPT), pages 1–10. IEEE, 2010

work page 2010
[41]

Detecting and understanding the promotion of illicit goods and services on twitter

Hongyu Wang, Ying Li, Ronghong Huang, and Xiang- hang Mi. Detecting and understanding the promotion of illicit goods and services on twitter. InProceedings of the ACM on Web Conference 2025, pages 3389–3404, 2025

work page 2025
[42]

Yunwen Wang. Humor and camera view on mo- bile short-form video apps influence user experience and technology-adoption intent, an example of tiktok (douyin).Computers in human behavior, 110:106373, 2020

work page 2020
[43]

Fir- eredasr: Open-source industrial-grade mandarin speech recognition models from encoder-decoder to llm inte- gration.arXiv preprint arXiv:2501.14350, 2025

Kai-Tuo Xu, Feng-Long Xie, Xu Tang, and Yao Hu. Fir- eredasr: Open-source industrial-grade mandarin speech recognition models from encoder-decoder to llm inte- gration.arXiv preprint arXiv:2501.14350, 2025

work page arXiv 2025
[44]

How much is too much: the difficul- ties of social media content moderation.Information & Communications Technology Law, 31(1):1–16, 2022

Greyson K Young. How much is too much: the difficul- ties of social media content moderation.Information & Communications Technology Law, 31(1):1–16, 2022

work page 2022
[45]

don’t think it’s necessary

Kan Yuan, Di Tang, Xiaojing Liao, XiaoFeng Wang, Xuan Feng, Yi Chen, Menghan Sun, Haoran Lu, and Kehuan Zhang. Stealthy porn: Understanding real-world adversarial images for illicit online promotion. In2019 IEEE Symposium on Security and Privacy (SP), pages 952–966. IEEE, 2019. A Detailed User Behavior Study Analysis This appendix presents a comprehensive...

work page 2019
[46]

Which short-video platform do you use most often? (Sin- gle choice) □ Douyin □ Kwai □ Bilibili □ Rednote □ Other: □None

work page
[47]

On school days, approximately how much time do you spend watching short videos? (Single choice) □ Almost never □ Less than 30 minutes □ 30 min- utes – 1 hour□1–2 hours□More than 2 hours

work page
[48]

During holidays, approximately how much time do you spend watching short videos? (Single choice) □ Almost never □ Less than 30 minutes □ 30 min- utes – 1 hour□1–2 hours□More than 2 hours

work page
[49]

Do you perform any of the following actions on short- video platforms? (Multiple choices) □ Like videos □ Follow accounts □ Comment □ Share with friends □ Watch most videos to completion (without swiping away)□Never interact

work page
[50]

Youth Mode

Have you currently enabled “Youth Mode”? □Yes□No

work page
[51]

If you have not enabled Youth Mode, what are the main reasons? (Multiple choices) □ Content is too dull/uninteresting □ Too many fea- ture restrictions (e.g., cannot comment or go live) □ Don’t know where to enable it □ Don’t think it’s necessary □ Parents didn’t require it □ Other:

work page
[52]

after watching a certain video, similar content keeps being recommended

Have you noticed that “after watching a certain video, similar content keeps being recommended”? □ Never noticed □ Occasionally □ Often □ Al- ways

work page
[53]

While browsing short videos, have you ever encountered content that made you feel uncomfortable or inappropri- ate? □ Never □ Rarely □ Sometimes □ Often □ Almost every time

work page
[54]

earn money online

Which of the following types of content have you seen? (Multiple choices) □ Soft pornography (e.g., revealing clothing, suggestive gestures) □ Topics related to body modification, self-harm, depres- sion, or suicide □Dangerous stunts or reckless driving □ Cyberbullying, physical violence, or bloody/gory scenes □Smoking, drinking, or betel nut chewing □ Sc...

work page
[55]

When you encounter such content, how do you usually respond? (Multiple choices) □ Swipe away immediately □ Report the video □ Take a screenshot and share with friends □ Feel scared or anxious □ Don’t care □ Want to know more (keep watching)

work page
[56]

How much do you think this type of content affects you? (1 = No effect at all, 5 = Strong effect) Emotional state:□1□2□3□4□5 Value judgments:□1□2□3□4□5 Tendency to imitate behaviors:□1□2□3□4□5

work page
[57]

Soft pornography (e.g., revealing clothing, suggestive gestures)

Do your parents restrict your use of short-video apps? □ Strictly restricted (e.g., time limits, supervision) □ General reminders □ Rarely intervene □ No restric- tions at all A.11 Mapping from Survey Language to PHTV Taxonomy To ensure the questionnaire was age-appropriate, we used descriptive, relatable language rather than academic or clinical terms. T...

work page
[58]

From the recommended videos, annotators identified initial PHTV instances and extracted recurring terms from video titles, ASR transcripts, OCR text, and hashtags

Initial Discovery: We deployed passive simulation ac- counts to browse recommendation feeds without any interaction. From the recommended videos, annotators identified initial PHTV instances and extracted recurring terms from video titles, ASR transcripts, OCR text, and hashtags

work page
[59]

For all videos returned by a query, we crawled the authors’ full public video histories and identified additional PHTVs

Keyword Snowball Expansion: Each extracted term was used as a search query. For all videos returned by a query, we crawled the authors’ full public video histories and identified additional PHTVs. New recurring terms from these videos were added to the keyword pool. This process was iterated until no new keywords emerged across two consecutive rounds (key...

work page
[60]

This mimicked the behavioral pattern of an engaged teenage user who has shown interest in similar content

Behavioral Simulation: Using the saturated keyword set and identified authors, we created dedicated simula- tion accounts that systematicallyfollowed,liked, and favoritedcontent from these authors. This mimicked the behavioral pattern of an engaged teenage user who has shown interest in similar content. These accounts were used exclusively for ground-trut...

work page
[61]

The final dataset of 3,510 videos (1,755 harmful, 1,755 benign) was selected from this pool through rigorous manual annotation

Recommendation Feed Harvesting: After establishing the simulated behavioral profile, we scraped the person- alized recommendation feeds across five iterative collec- tion rounds (∼2,000 videos per round, ∼10,000 total). The final dataset of 3,510 videos (1,755 harmful, 1,755 benign) was selected from this pool through rigorous manual annotation. Data coll...

work page
[62]

Manual Annotation: Three trained graduate student annotators with expertise in child safety and digital me- dia independently labeled all videos using the refined taxonomy (Section 3.3). • Taxonomy Development: The nine-category taxonomy was grounded in China’sRegulations on the Protection of Minors on the Internetand refined through iterative review of c...

work page
[63]

**Role Awareness**: Remember your task is to protect adolescents; remain vigilant against subtle or veiled expressions of harm

work page
[64]

**Video Analysis**: Carefully examine the video frames for the presence of behavioral indicators listed above

work page
[65]

**Text Analysis**: Analyze keyword combinations in the title and speech content (ASR) as well as rhetorical devices

work page
[66]

**On-screen Text Analysis**: Check whether any text extracted via OCR contains slogans or annotations matching the defined categories

work page
[67]

Harmful" - If the video is daily documentation, academic sharing, or positive entertainment Output

**Final Judgment**: - Synthesize all available information and make a determination based on the category definitions above. - Even if one type of data is missing, make a judgment based on the remaining information. - If the video matches any risky behavior Output "Harmful" - If the video is daily documentation, academic sharing, or positive entertainment...

work page 2025

[1] [1]

Twitter spam account detection based on clustering and classification methods.Journal of Supercomputing, 76(7), 2020

Kayode Sakariyah Adewole, Tao Han, Wanqing Wu, Houbing Song, and Arun Kumar Sangaiah. Twitter spam account detection based on clustering and classification methods.Journal of Supercomputing, 76(7), 2020

work page 2020

[2] [2]

Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D

Rishabh Agarwal, Avi Singh, Lei M. Zhang, Bernd Bohnet, Luis Rosias, Stephanie Chan, Biao Zhang, Ankesh Anand, Zaheer Abbas, Azade Nova, John D. Co- Reyes, Eric Chu, Feryal Behbahani, Aleksandra Faust, and Hugo Larochelle. Many-shot in-context learning,

work page

[3] [3]

URL: https://arxiv.org/abs/2404.11018, arXiv:2404.11018

work page arXiv

[4] [4]

Offensive video detection: Dataset and baseline re- sults

Cleber Alcântara, Viviane Moreira, and Diego Feijo. Offensive video detection: Dataset and baseline re- sults. In Nicoletta Calzolari, Frédéric Béchet, Philippe Blache, Khalid Choukri, Christopher Cieri, Thierry De- clerck, Sara Goggi, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, Hélène Mazo, Asuncion Moreno, Jan Odijk, and Stelios Piperidis, editors,...

work page 2020

[5] [5]

Qwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond

Jinze Bai, Shuai Bai, Shusheng Yang, Shijie Wang, Sinan Tan, Peng Wang, Junyang Lin, Chang Zhou, and Jingren Zhou. Qwen-vl: A versatile vision-language model for understanding, localization, text reading, and beyond.arXiv preprint arXiv:2308.12966, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Your fairness may vary: Pretrained language model fairness in toxic text classification

Ioana Baldini, Dennis Wei, Karthikeyan Natesan Rama- murthy, Moninder Singh, and Mikhail Yurochkin. Your fairness may vary: Pretrained language model fairness in toxic text classification. InFindings of the associ- ation for computational linguistics: ACL 2022, pages 2245–2262, 2022

work page 2022

[7] [7]

Dynamics of algorithmic content amplification on tiktok, 2025

Fabian Baumann, Nipun Arora, Iyad Rahwan, and Ag- nieszka Czaplicka. Dynamics of algorithmic content amplification on tiktok, 2025. URL: https://arxiv. org/abs/2503.20231,arXiv:2503.20231

work page arXiv 2025

[8] [8]

Gormley, and Graham Neubig

Amanda Bertsch, Maor Ivgi, Emily Xiao, Uri Alon, Jonathan Berant, Matthew R. Gormley, and Graham Neubig. In-context learning with long-context models: An in-depth exploration, 2025. URL: https://arxiv. org/abs/2405.00200,arXiv:2405.00200

work page arXiv 2025

[9] [9]

An empiri- cal investigation of personalization factors on tiktok

Maximilian Boeker and Aleksandra Urman. An empiri- cal investigation of personalization factors on tiktok. In Proceedings of the ACM Web Conference 2022, WWW ’22, page 2298–2309. ACM, April 2022. URL: http: //dx.doi.org/10.1145/3485447.3512102, doi:10. 1145/3485447.3512102

work page doi:10.1145/3485447.3512102 2022

[10] [10]

Detection and visualization of misleading content on twitter.International Journal of Multimedia Information Retrieval, 7(1):71–86, 2018

Christina Boididou, Symeon Papadopoulos, Markos Zampoglou, Lazaros Apostolidis, Olga Papadopoulou, and Yiannis Kompatsiaris. Detection and visualization of misleading content on twitter.International Journal of Multimedia Information Retrieval, 7(1):71–86, 2018

work page 2018

[11] [11]

Language models are few-shot learn- ers.Advances in neural information processing systems, 33:1877–1901, 2020

Tom Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, et al. Language models are few-shot learn- ers.Advances in neural information processing systems, 33:1877–1901, 2020

work page 1901

[12] [12]

Cyberbullying detection using recursive neural network through offline repository

Nidhi Chandra, Sunil Kumar Khatri, and Subhranil Som. Cyberbullying detection using recursive neural network through offline repository. In2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO), pages 748–754. IEEE, 2018

work page 2018

[13] [13]

China Central Television (CCTV). [24 hours]addressing the chaos in short videos involving minors, which includes inducing them to engage in dangerous activities offline and even directly committing acts of harm against them. https://tv.cctv.cn/2025/ 09/17/VIDESfLEzwZq0TCKjyofRnQB250917.shtml,

work page 2025

[14] [14]

PaddleOCR 3.0 Technical Report

Cheng Cui, Ting Sun, Manhui Lin, Tingquan Gao, Yubo Zhang, Jiaxuan Liu, Xueqing Wang, Zelun Zhang, Changda Zhou, Hongen Liu, Yue Zhang, Wenyu Lv, Kui Huang, Yichao Zhang, Jing Zhang, Jun Zhang, Yi Liu, Dianhai Yu, and Yanjun Ma. Paddleocr 3.0 technical report, 2025. URL: https://arxiv.org/abs/2507. 05595,arXiv:2507.05595

work page internal anchor Pith review Pith/arXiv arXiv 2025

[15] [15]

Hatemm: A multi-modal dataset for hate video classification

Mithun Das, Rohit Raj, Punyajoy Saha, Binny Mathew, Manish Gupta, and Animesh Mukherjee. Hatemm: A multi-modal dataset for hate video classification. In Proceedings of the International AAAI Conference on Web and Social Media, volume 17, pages 1014–1023, 2023

work page 2023

[16] [16]

Teens, so- cial media and ai chatbots 2025, 2025

Michelle Faverio and Olivia Sidoti. Teens, so- cial media and ai chatbots 2025, 2025. On- line; accessed 2026-01-15. URL: https: //www.pewresearch.org/internet/2025/12/09/ teens-social-media-and-ai-chatbots-2025/

work page 2025

[17] [17]

Towards online spam filtering in social networks

Hongyu Gao, Yan Chen, Kathy Lee, Diana Palsetia, and Alok N Choudhary. Towards online spam filtering in social networks. InNDSS, volume 12, pages 1–16, 2012

work page 2012

[18] [18]

Convo- lutional neural networks for toxic comment classifica- tion

Spiros V Georgakopoulos, Sotiris K Tasoulis, Aris- tidis G Vrahatis, and Vassilis P Plagianakos. Convo- lutional neural networks for toxic comment classifica- tion. InProceedings of the 10th hellenic conference on artificial intelligence, pages 1–6, 2018. 18

work page 2018

[19] [19]

Amer- icans’ social media use 2025, 2025

Jeffrey Gottfried and Eugenie Park. Amer- icans’ social media use 2025, 2025. On- line; accessed 2026-01-15. URL: https: //www.pewresearch.org/internet/2025/11/ 20/americans-social-media-use-2025/

work page 2025

[20] [20]

Spam detection using knn and decision tree mechanism in social network

Saumya Goyal, RK Chauhan, and Shabnam Parveen. Spam detection using knn and decision tree mechanism in social network. In2016 Fourth International Con- ference on Parallel, Distributed and Grid Computing (PDGC), pages 522–526. IEEE, 2016

work page 2016

[21] [21]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen- Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022

[22] [22]

Chen, and An- drew Y

Yixing Jiang, Jeremy Irvin, Ji Hun Wang, Muham- mad Ahmed Chaudhry, Jonathan H. Chen, and An- drew Y . Ng. Many-shot in-context learning in mul- timodal foundation models, 2024. URL: https:// arxiv.org/abs/2405.09798,arXiv:2405.09798

work page arXiv 2024

[23] [23]

Harmful youtube video detection: A taxon- omy of online harm and mllms as alternative annotators,

Claire Wonjeong Jo, Miki Wesołowska, and Magdalena Wojcieszak. Harmful youtube video detection: A taxon- omy of online harm and mllms as alternative annotators,

work page

[24] [24]

URL: https://arxiv.org/abs/2411.05854, arXiv:2411.05854

work page arXiv

[25] [25]

Evaluation of text classification techniques for in- appropriate web content blocking

Igor Kotenko, Andrey Chechulin, and Dmitry Komashin- sky. Evaluation of text classification techniques for in- appropriate web content blocking. In2015 IEEE 8th In- ternational Conference on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Ap- plications (IDAACS), volume 1, pages 412–417. IEEE, 2015

work page 2015

[26] [26]

Towards explainable harmful meme detection through multimodal debate be- tween large language models

Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo Wang, and Ruichao Yang. Towards explainable harmful meme detection through multimodal debate be- tween large language models. InProceedings of the ACM Web Conference 2024, pages 2359–2370, 2024

work page 2024

[27] [27]

Toxic- chat: Unveiling hidden challenges of toxicity detec- tion in real-world user-ai conversation.arXiv preprint arXiv:2310.17389, 2023

Zi Lin, Zihan Wang, Yongqi Tong, Yangkun Wang, Yuxin Guo, Yujia Wang, and Jingbo Shang. Toxic- chat: Unveiling hidden challenges of toxicity detec- tion in real-world user-ai conversation.arXiv preprint arXiv:2310.17389, 2023

work page arXiv 2023

[28] [28]

Self-harm and its association with internet addiction and internet exposure to suicidal thought in adolescents.Journal of the Formosan Medical Association, 116(3):153–160, 2017

Hui-Ching Liu, Shen-Ing Liu, Jin-Jin Tjung, Fang-Ju Sun, Hui-Chun Huang, and Chun-Kai Fang. Self-harm and its association with internet addiction and internet exposure to suicidal thought in adolescents.Journal of the Formosan Medical Association, 116(3):153–160, 2017

work page 2017

[29] [29]

Fake News Detection on Social Media using Geometric Deep Learning

Federico Monti, Fabrizio Frasca, Davide Eynard, Da- mon Mannion, and Michael M Bronstein. Fake news detection on social media using geometric deep learning. arXiv preprint arXiv:1902.06673, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902

[30] [30]

A bert-based transfer learning approach for hate speech detection in online social media

Marzieh Mozafari, Reza Farahbakhsh, and Noel Crespi. A bert-based transfer learning approach for hate speech detection in online social media. InInternational confer- ence on complex networks and their applications, pages 928–940. Springer, 2019

work page 2019

[31] [31]

Mtikguard sys- tem: A transformer-based multimodal system for child- safe content moderation on tiktok.arXiv preprint arXiv:2511.17955, 2025

Dat Thanh Nguyen, Nguyen Hung Lam, Anh Hoang- Thi Nguyen, and Trong-Hop Do. Mtikguard sys- tem: A transformer-based multimodal system for child- safe content moderation on tiktok.arXiv preprint arXiv:2511.17955, 2025

work page arXiv 2025

[32] [32]

Disturbed youtube for kids: Characterizing and detecting disturbing content on youtube.CoRR, abs/1901.07046, 2019

Kostantinos Papadamou, Antonis Papasavva, Savvas Zannettou, Jeremy Blackburn, Nicolas Kourtellis, Il- ias Leontiadis, Gianluca Stringhini, and Michael Siri- vianos. Disturbed youtube for kids: Characterizing and detecting disturbing content on youtube.CoRR, abs/1901.07046, 2019. URL: http://arxiv.org/ abs/1901.07046,arXiv:1901.07046

work page arXiv 1901

[33] [33]

A comprehensive framework for multi-modal hate speech detection in social media using deep learning.Scientific Reports, 15(1):13020, 2025

R Prabhu and V Seethalakshmi. A comprehensive framework for multi-modal hate speech detection in social media using deep learning.Scientific Reports, 15(1):13020, 2025

work page 2025

[34] [34]

Momenta: A multimodal framework for detecting harmful memes and their targets

Shraman Pramanick, Shivam Sharma, Dimitar Dim- itrov, Md Shad Akhtar, Preslav Nakov, and Tanmoy Chakraborty. Momenta: A multimodal framework for detecting harmful memes and their targets. InFind- ings of the association for computational linguistics: EMNLP 2021, pages 4439–4455, 2021

work page 2021

[35] [35]

Stress and adoles- cence: vulnerability and opportunity during a sensitive window of development.Current opinion in psychology, 44:286–292, 2022

Lucinda M Sisk and Dylan G Gee. Stress and adoles- cence: vulnerability and opportunity during a sensitive window of development.Current opinion in psychology, 44:286–292, 2022

work page 2022

[36] [36]

Bringing the kid back into youtube kids: detecting inappropriate content on video streaming platforms

Rashid Tahir, Faizan Ahmed, Hammas Saeed, Shiza Ali, Fareed Zaffar, and Christo Wilson. Bringing the kid back into youtube kids: detecting inappropriate content on video streaming platforms. InProceed- ings of the 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM ’19, page 464–469, New York, NY , USA,

work page 2019

[37] [37]

doi: 10.1145/3341161.3342913

Association for Computing Machinery. doi: 10.1145/3341161.3342913

work page doi:10.1145/3341161.3342913

[38] [38]

Qwen3-max: Just scale it, September 2025

Qwen Team. Qwen3-max: Just scale it, September 2025

work page 2025

[39] [39]

Supporting human raters with the detection of harmful content using large language models

Kurt Thomas, Patrick Gage Kelley, David Tao, Sarah Meiklejohn, Owen Vallis, Shunwen Tan, Blaž Brataniˇc, Felipe Tiengo Ferreira, Vijay Kumar Eranti, and Elie 19 Bursztein. Supporting human raters with the detection of harmful content using large language models. In2025 IEEE Symposium on Security and Privacy (SP), pages 2772–2789. IEEE, 2025

work page 2025

[40] [40]

Don’t follow me: Spam detection in twitter

Alex Hai Wang. Don’t follow me: Spam detection in twitter. In2010 international conference on security and cryptography (SECRYPT), pages 1–10. IEEE, 2010

work page 2010

[41] [41]

Detecting and understanding the promotion of illicit goods and services on twitter

Hongyu Wang, Ying Li, Ronghong Huang, and Xiang- hang Mi. Detecting and understanding the promotion of illicit goods and services on twitter. InProceedings of the ACM on Web Conference 2025, pages 3389–3404, 2025

work page 2025

[42] [42]

Yunwen Wang. Humor and camera view on mo- bile short-form video apps influence user experience and technology-adoption intent, an example of tiktok (douyin).Computers in human behavior, 110:106373, 2020

work page 2020

[43] [43]

Fir- eredasr: Open-source industrial-grade mandarin speech recognition models from encoder-decoder to llm inte- gration.arXiv preprint arXiv:2501.14350, 2025

Kai-Tuo Xu, Feng-Long Xie, Xu Tang, and Yao Hu. Fir- eredasr: Open-source industrial-grade mandarin speech recognition models from encoder-decoder to llm inte- gration.arXiv preprint arXiv:2501.14350, 2025

work page arXiv 2025

[44] [44]

How much is too much: the difficul- ties of social media content moderation.Information & Communications Technology Law, 31(1):1–16, 2022

Greyson K Young. How much is too much: the difficul- ties of social media content moderation.Information & Communications Technology Law, 31(1):1–16, 2022

work page 2022

[45] [45]

don’t think it’s necessary

Kan Yuan, Di Tang, Xiaojing Liao, XiaoFeng Wang, Xuan Feng, Yi Chen, Menghan Sun, Haoran Lu, and Kehuan Zhang. Stealthy porn: Understanding real-world adversarial images for illicit online promotion. In2019 IEEE Symposium on Security and Privacy (SP), pages 952–966. IEEE, 2019. A Detailed User Behavior Study Analysis This appendix presents a comprehensive...

work page 2019

[46] [46]

Which short-video platform do you use most often? (Sin- gle choice) □ Douyin □ Kwai □ Bilibili □ Rednote □ Other: □None

work page

[47] [47]

On school days, approximately how much time do you spend watching short videos? (Single choice) □ Almost never □ Less than 30 minutes □ 30 min- utes – 1 hour□1–2 hours□More than 2 hours

work page

[48] [48]

During holidays, approximately how much time do you spend watching short videos? (Single choice) □ Almost never □ Less than 30 minutes □ 30 min- utes – 1 hour□1–2 hours□More than 2 hours

work page

[49] [49]

Do you perform any of the following actions on short- video platforms? (Multiple choices) □ Like videos □ Follow accounts □ Comment □ Share with friends □ Watch most videos to completion (without swiping away)□Never interact

work page

[50] [50]

Youth Mode

Have you currently enabled “Youth Mode”? □Yes□No

work page

[51] [51]

If you have not enabled Youth Mode, what are the main reasons? (Multiple choices) □ Content is too dull/uninteresting □ Too many fea- ture restrictions (e.g., cannot comment or go live) □ Don’t know where to enable it □ Don’t think it’s necessary □ Parents didn’t require it □ Other:

work page

[52] [52]

after watching a certain video, similar content keeps being recommended

Have you noticed that “after watching a certain video, similar content keeps being recommended”? □ Never noticed □ Occasionally □ Often □ Al- ways

work page

[53] [53]

While browsing short videos, have you ever encountered content that made you feel uncomfortable or inappropri- ate? □ Never □ Rarely □ Sometimes □ Often □ Almost every time

work page

[54] [54]

earn money online

Which of the following types of content have you seen? (Multiple choices) □ Soft pornography (e.g., revealing clothing, suggestive gestures) □ Topics related to body modification, self-harm, depres- sion, or suicide □Dangerous stunts or reckless driving □ Cyberbullying, physical violence, or bloody/gory scenes □Smoking, drinking, or betel nut chewing □ Sc...

work page

[55] [55]

When you encounter such content, how do you usually respond? (Multiple choices) □ Swipe away immediately □ Report the video □ Take a screenshot and share with friends □ Feel scared or anxious □ Don’t care □ Want to know more (keep watching)

work page

[56] [56]

How much do you think this type of content affects you? (1 = No effect at all, 5 = Strong effect) Emotional state:□1□2□3□4□5 Value judgments:□1□2□3□4□5 Tendency to imitate behaviors:□1□2□3□4□5

work page

[57] [57]

Soft pornography (e.g., revealing clothing, suggestive gestures)

Do your parents restrict your use of short-video apps? □ Strictly restricted (e.g., time limits, supervision) □ General reminders □ Rarely intervene □ No restric- tions at all A.11 Mapping from Survey Language to PHTV Taxonomy To ensure the questionnaire was age-appropriate, we used descriptive, relatable language rather than academic or clinical terms. T...

work page

[58] [58]

From the recommended videos, annotators identified initial PHTV instances and extracted recurring terms from video titles, ASR transcripts, OCR text, and hashtags

Initial Discovery: We deployed passive simulation ac- counts to browse recommendation feeds without any interaction. From the recommended videos, annotators identified initial PHTV instances and extracted recurring terms from video titles, ASR transcripts, OCR text, and hashtags

work page

[59] [59]

For all videos returned by a query, we crawled the authors’ full public video histories and identified additional PHTVs

Keyword Snowball Expansion: Each extracted term was used as a search query. For all videos returned by a query, we crawled the authors’ full public video histories and identified additional PHTVs. New recurring terms from these videos were added to the keyword pool. This process was iterated until no new keywords emerged across two consecutive rounds (key...

work page

[60] [60]

This mimicked the behavioral pattern of an engaged teenage user who has shown interest in similar content

Behavioral Simulation: Using the saturated keyword set and identified authors, we created dedicated simula- tion accounts that systematicallyfollowed,liked, and favoritedcontent from these authors. This mimicked the behavioral pattern of an engaged teenage user who has shown interest in similar content. These accounts were used exclusively for ground-trut...

work page

[61] [61]

The final dataset of 3,510 videos (1,755 harmful, 1,755 benign) was selected from this pool through rigorous manual annotation

Recommendation Feed Harvesting: After establishing the simulated behavioral profile, we scraped the person- alized recommendation feeds across five iterative collec- tion rounds (∼2,000 videos per round, ∼10,000 total). The final dataset of 3,510 videos (1,755 harmful, 1,755 benign) was selected from this pool through rigorous manual annotation. Data coll...

work page

[62] [62]

Manual Annotation: Three trained graduate student annotators with expertise in child safety and digital me- dia independently labeled all videos using the refined taxonomy (Section 3.3). • Taxonomy Development: The nine-category taxonomy was grounded in China’sRegulations on the Protection of Minors on the Internetand refined through iterative review of c...

work page

[63] [63]

**Role Awareness**: Remember your task is to protect adolescents; remain vigilant against subtle or veiled expressions of harm

work page

[64] [64]

**Video Analysis**: Carefully examine the video frames for the presence of behavioral indicators listed above

work page

[65] [65]

**Text Analysis**: Analyze keyword combinations in the title and speech content (ASR) as well as rhetorical devices

work page

[66] [66]

**On-screen Text Analysis**: Check whether any text extracted via OCR contains slogans or annotations matching the defined categories

work page

[67] [67]

Harmful" - If the video is daily documentation, academic sharing, or positive entertainment Output

**Final Judgment**: - Synthesize all available information and make a determination based on the category definitions above. - Even if one type of data is missing, make a judgment based on the remaining information. - If the video matches any risky behavior Output "Harmful" - If the video is daily documentation, academic sharing, or positive entertainment...

work page 2025