Plagiarism Detection in arXiv

Daria Sorokina; Johannes Gehrke; Paul Ginsparg; Simeon Warner

arxiv: cs/0702012 · v1 · submitted 2007-02-01 · 💻 cs.DB · cs.DL· cs.IR

Plagiarism Detection in arXiv

Daria Sorokina , Johannes Gehrke , Simeon Warner , Paul Ginsparg This is my paper

classification 💻 cs.DB cs.DLcs.IR

keywords methodsarxivcollectionplagiarismresearchapplicationappliedauthor

0 comments

read the original abstract

We describe a large-scale application of methods for finding plagiarism in research document collections. The methods are applied to a collection of 284,834 documents collected by arXiv.org over a 14 year period, covering a few different research disciplines. The methodology efficiently detects a variety of problematic author behaviors, and heuristics are developed to reduce the number of false positives. The methods are also efficient enough to implement as a real-time submission screen for a collection many times larger.

This paper has not been read by Pith yet.

Plagiarism Detection in arXiv

discussion (0)