A quantitative analysis of concepts and semantic structure in written language: Long range correlations in dynamics of texts
read the original abstract
Understanding texts requires memory: the reader has to keep in mind enough words to create meaning. This calls for a relation between the memory of the reader and the structure of the text. To investigate this interaction, we first identify a connectivity matrix defined by co-occurrence of words in the text. A vector space of words characterizing the text is spanned by the principal directions of this matrix. It is useful to think of these weighted combinations of words as representing ``concepts''. As the reader follows the text, the set of words in her window of attention follows a dynamical motion among these concepts. We observe long range power law correlations in this trajectory. By explicitly constructing surrogate hierarchical texts, we demonstrate that the power law originates from structural organization of texts into subunits such as chapters and paragraphs.
This paper has not been read by Pith yet.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.