pith. sign in

arxiv: 1806.02670 · v1 · pith:EPCRSHUMnew · submitted 2018-06-07 · 📊 stat.CO · stat.ME

Scalable Bayesian Nonparametric Clustering and Classification

classification 📊 stat.CO stat.ME
keywords inferencebayesianclassificationlargenonparametricapproachcarloclustering
0
0 comments X
read the original abstract

We develop a scalable multi-step Monte Carlo algorithm for inference under a large class of nonparametric Bayesian models for clustering and classification. Each step is "embarrassingly parallel" and can be implemented using the same Markov chain Monte Carlo sampler. The simplicity and generality of our approach makes inference for a wide range of Bayesian nonparametric mixture models applicable to large datasets. Specifically, we apply the approach to inference under a product partition model with regression on covariates. We show results for inference with two motivating data sets: a large set of electronic health records (EHR) and a bank telemarketing dataset. We find interesting clusters and favorable classification performance relative to other widely used competing classifiers.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Dynamic time series clustering via volatility change-points

    stat.ME 2019-06 unverdicted novelty 4.0

    A Bayesian method clusters time series by similarity in the timing of their most recent volatility change-points via a metric on posterior distributions, demonstrated on S&P 500 returns.