pith. sign in

arxiv: 1608.01965 · v1 · pith:SBKNUCWQnew · submitted 2016-07-29 · 💻 cs.CL

Text authorship identified using the dynamics of word co-occurrence networks

classification 💻 cs.CL
keywords textswereauthorshipnetworksauthorsco-occurrencedynamicslearning
0
0 comments X p. Extension
pith:SBKNUCWQ Add to your LaTeX paper What is a Pith Number?
\usepackage{pith}
\pithnumber{SBKNUCWQ}

Prints a linked pith:SBKNUCWQ badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

The identification of authorship in disputed documents still requires human expertise, which is now unfeasible for many tasks owing to the large volumes of text and authors in practical applications. In this study, we introduce a methodology based on the dynamics of word co-occurrence networks representing written texts to classify a corpus of 80 texts by 8 authors. The texts were divided into sections with equal number of linguistic tokens, from which time series were created for 12 topological metrics. The series were proven to be stationary (p-value>0.05), which permits to use distribution moments as learning attributes. With an optimized supervised learning procedure using a Radial Basis Function Network, 68 out of 80 texts were correctly classified, i.e. a remarkable 85% author matching success rate. Therefore, fluctuations in purely dynamic network metrics were found to characterize authorship, thus opening the way for the description of texts in terms of small evolving networks. Moreover, the approach introduced allows for comparison of texts with diverse characteristics in a simple, fast fashion.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.