A multi-terabyte relational database for geo-tagged social network data

D\'aniel Kondor; G\'abor Vattay; Istv\'an Csabai; J\'anos Sz\"ule; J\'ozsef St\'eger; L\'aszl\'o Dobos; Tam\'as Bodn\'ar; Tam\'as Hanyecz; Tam\'as Seb\H{o}k; Zs\'ofia Kallus

arxiv: 1311.0841 · v2 · pith:4E4I5YL3new · submitted 2013-11-04 · 💻 cs.DB

A multi-terabyte relational database for geo-tagged social network data

L\'aszl\'o Dobos , J\'anos Sz\"ule , Tam\'as Bodn\'ar , Tam\'as Hanyecz , Tam\'as Seb\H{o}k , D\'aniel Kondor , Zs\'ofia Kallus , J\'ozsef St\'eger

show 2 more authors

Istv\'an Csabai G\'abor Vattay

This is my paper

classification 💻 cs.DB

keywords datadatabasenetworksocialgeo-taggedloadingmulti-terabyterelational

0 comments

read the original abstract

Despite their relatively low sampling factor, the freely available, randomly sampled status streams of Twitter are very useful sources of geographically embedded social network data. To statistically analyze the information Twitter provides via these streams, we have collected a year's worth of data and built a multi-terabyte relational database from it. The database is designed for fast data loading and to support a wide range of studies focusing on the statistics and geographic features of social networks, as well as on the linguistic analysis of tweets. In this paper we present the method of data collection, the database design, the data loading procedure and special treatment of geo-tagged and multi-lingual data. We also provide some SQL recipes for computing network statistics.

This paper has not been read by Pith yet.

A multi-terabyte relational database for geo-tagged social network data

discussion (0)