pith. sign in

arxiv: 1310.6772 · v1 · pith:WUUYBFIFnew · submitted 2013-10-24 · 💻 cs.CL · cs.CR· cs.CY

Sockpuppet Detection in Wikipedia: A Corpus of Real-World Deceptive Writing for Linking Identities

classification 💻 cs.CL cs.CRcs.CY
keywords sockpuppetcorpuscasescrawlingdatasetdeceptiveprocessreal-world
0
0 comments X
read the original abstract

This paper describes the corpus of sockpuppet cases we gathered from Wikipedia. A sockpuppet is an online user account created with a fake identity for the purpose of covering abusive behavior and/or subverting the editing regulation process. We used a semi-automated method for crawling and curating a dataset of real sockpuppet investigation cases. To the best of our knowledge, this is the first corpus available on real-world deceptive writing. We describe the process for crawling the data and some preliminary results that can be used as baseline for benchmarking research. The dataset will be released under a Creative Commons license from our project website: http://docsig.cis.uab.edu.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.