HDBSCAN: Density based Clustering over Location Based Services

arxiv: 1602.03730 · v2 · pith:FCMBHUQ4new · submitted 2016-02-11 · 💻 cs.DB

HDBSCAN: Density based Clustering over Location Based Services

Md Farhadur Rahman , Weimo Liu , Saad Bin Suhaim , Saravanan Thirumuruganathan , Nan Zhang , Gautam Das This is my paper

classification 💻 cs.DB

keywords querydatabasepopularservicesassignmentclusterclusteringdensity

0 comments p. Extension

pith:FCMBHUQ4 Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{FCMBHUQ4}

Prints a linked pith:FCMBHUQ4 badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

read the original abstract

Location Based Services (LBS) have become extremely popular and used by millions of users. Popular LBS run the entire gamut from mapping services (such as Google Maps) to restaurants (such as Yelp) and real-estate (such as Redfin). The public query interfaces of LBS can be abstractly modeled as a kNN interface over a database of two dimensional points: given an arbitrary query point, the system returns the k points in the database that are nearest to the query point. Often, k is set to a small value such as 20 or 50. In this paper, we consider the novel problem of enabling density based clustering over an LBS with only a limited, kNN query interface. Due to the query rate limits imposed by LBS, even retrieving every tuple once is infeasible. Hence, we seek to construct a cluster assignment function f(.) by issuing a small number of kNN queries, such that for any given tuple t in the database which may or may not have been accessed, f(.) outputs the cluster assignment of t with high accuracy. We conduct a comprehensive set of experiments over benchmark datasets and popular real-world LBS such as Yahoo! Flickr, Zillow, Redfin and Google Maps.

This paper has not been read by Pith yet.

HDBSCAN: Density based Clustering over Location Based Services

discussion (0)