Scalable Bayesian Nonparametric Clustering and Classification

Maurice Diesendruck; Peter M\"uller; Sinead Williamson; Yang Ni; Yitan Zhu; Yuan Ji

arxiv: 1806.02670 · v1 · pith:EPCRSHUMnew · submitted 2018-06-07 · 📊 stat.CO · stat.ME

Scalable Bayesian Nonparametric Clustering and Classification

Yang Ni , Peter M\"uller , Maurice Diesendruck , Sinead Williamson , Yitan Zhu , Yuan Ji This is my paper

classification 📊 stat.CO stat.ME

keywords inferencebayesianclassificationlargenonparametricapproachcarloclustering

0 comments

read the original abstract

We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and favorable classification performance relative to other widely used competing classifiers.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dynamic time series clustering via volatility change-points
stat.ME 2019-06 unverdicted novelty 4.0

A Bayesian method clusters time series by similarity in the timing of their most recent volatility change-points via a metric on posterior distributions, demonstrated on S&P 500 returns.