pith. sign in

arxiv: 1501.02527 · v1 · pith:AQR6WOY4new · submitted 2015-01-12 · 💻 cs.CL · cs.AI· cs.IR

Autodetection and Classification of Hidden Cultural City Districts from Yelp Reviews

classification 💻 cs.CL cs.AIcs.IR
keywords cityrestaurantsreviewssimilaritytopicclassifyclusteringclusters
0
0 comments X
read the original abstract

Topic models are a way to discover underlying themes in an otherwise unstructured collection of documents. In this study, we specifically used the Latent Dirichlet Allocation (LDA) topic model on a dataset of Yelp reviews to classify restaurants based off of their reviews. Furthermore, we hypothesize that within a city, restaurants can be grouped into similar "clusters" based on both location and similarity. We used several different clustering methods, including K-means Clustering and a Probabilistic Mixture Model, in order to uncover and classify districts, both well-known and hidden (i.e. cultural areas like Chinatown or hearsay like "the best street for Italian restaurants") within a city. We use these models to display and label different clusters on a map. We also introduce a topic similarity heatmap that displays the similarity distribution in a city to a new restaurant.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.