Understanding the (in) effectiveness of content moderation: A case study of facebook in the context of the us capitol riot

Preliminary (prima facie) finding of a violating content The enforcement process is initiated by flagging an item of content as potentially violating · 2022 · arXiv 2301.02737

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Algorithmic Constitutionalism

cs.CY · 2026-05-16 · unverdicted · novelty 6.0

Authors introduce algorithmic constitutionalism, a three-pillar framework for AI governance with layered architecture, algorithmic meta-reasoning, and deliberative correction, applied to Facebook's content moderation and implications for the EU Digital Services Act.

The Enforcement and Feasibility of Hate Speech Moderation on Twitter

cs.CY · 2026-04-14 · conditional · novelty 6.0

80% of hateful tweets remain online after five months with no higher removal rate than non-hateful content, while human-AI moderation pipelines can feasibly cut user exposure below regulatory penalty costs.

An Evaluation of Chat Safety Moderations in Roblox

cs.CY · 2026-05-06 · unverdicted · novelty 5.0 · 2 refs

Roblox's automated chat moderation fails to catch numerous unsafe messages involving grooming, sexualization of minors, bullying, violence, self-harm, and sensitive information sharing, with users evading detection through various techniques.

citing papers explorer

Showing 3 of 3 citing papers.

Algorithmic Constitutionalism cs.CY · 2026-05-16 · unverdicted · none · ref 8
Authors introduce algorithmic constitutionalism, a three-pillar framework for AI governance with layered architecture, algorithmic meta-reasoning, and deliberative correction, applied to Facebook's content moderation and implications for the EU Digital Services Act.
The Enforcement and Feasibility of Hate Speech Moderation on Twitter cs.CY · 2026-04-14 · conditional · none · ref 25
80% of hateful tweets remain online after five months with no higher removal rate than non-hateful content, while human-AI moderation pipelines can feasibly cut user exposure below regulatory penalty costs.
An Evaluation of Chat Safety Moderations in Roblox cs.CY · 2026-05-06 · unverdicted · none · ref 57 · 2 links
Roblox's automated chat moderation fails to catch numerous unsafe messages involving grooming, sexualization of minors, bullying, violence, self-harm, and sensitive information sharing, with users evading detection through various techniques.

Understanding the (in) effectiveness of content moderation: A case study of facebook in the context of the us capitol riot

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer