pith. sign in

arxiv: 1210.5268 · v4 · pith:AGX7DU6Anew · submitted 2012-10-18 · 💻 cs.CL · cs.SI· physics.soc-ph

Diffusion of Lexical Change in Social Media

classification 💻 cs.CL cs.SIphysics.soc-ph
keywords changeslinguisticanalysischangecommunicationcomputer-mediateddemographicdiffusion
0
0 comments X
read the original abstract

Computer-mediated communication is driving fundamental changes in the nature of written language. We investigate these changes by statistical analysis of a dataset comprising 107 million Twitter messages (authored by 2.7 million unique user accounts). Using a latent vector autoregressive model to aggregate across thousands of words, we identify high-level patterns in diffusion of linguistic change over the United States. Our model is robust to unpredictable changes in Twitter's sampling rate, and provides a probabilistic characterization of the relationship of macro-scale linguistic influence to a set of demographic and geographic predictors. The results of this analysis offer support for prior arguments that focus on geographical proximity and population size. However, demographic similarity -- especially with regard to race -- plays an even more central role, as cities with similar racial demographics are far more likely to share linguistic influence. Rather than moving towards a single unified "netspeak" dialect, language evolution in computer-mediated communication reproduces existing fault lines in spoken American English.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.