pith. sign in

arxiv: 1608.03448 · v1 · pith:WHFHP3J5new · submitted 2016-08-11 · 💻 cs.CL

Sex, drugs, and violence

classification 💻 cs.CL
keywords contentdetectinglargelatentallocationannotationsapproachappropriateness
0
0 comments X
read the original abstract

Automatically detecting inappropriate content can be a difficult NLP task, requiring understanding context and innuendo, not just identifying specific keywords. Due to the large quantity of online user-generated content, automatic detection is becoming increasingly necessary. We take a largely unsupervised approach using a large corpus of narratives from a community-based self-publishing website and a small segment of crowd-sourced annotations. We explore topic modelling using latent Dirichlet allocation (and a variation), and use these to regress appropriateness ratings, effectively automating rating for suitability. The results suggest that certain topics inferred may be useful in detecting latent inappropriateness -- yielding recall up to 96% and low regression errors.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.