pith. sign in

arxiv: 1311.4780 · v2 · pith:XPBB3DZYnew · submitted 2013-11-19 · 📊 stat.ML · cs.DC· cs.LG· stat.CO

Asymptotically Exact, Embarrassingly Parallel MCMC

classification 📊 stat.ML cs.DCcs.LGstat.CO
keywords datamachineparallelsamplesalgorithmcommunicationmcmcasymptotically
0
0 comments X
read the original abstract

Communication costs, resulting from synchronization requirements during learning, can greatly slow down many parallel machine learning algorithms. In this paper, we present a parallel Markov chain Monte Carlo (MCMC) algorithm in which subsets of data are processed independently, with very little communication. First, we arbitrarily partition data onto multiple machines. Then, on each machine, any classical MCMC method (e.g., Gibbs sampling) may be used to draw samples from a posterior distribution given the data subset. Finally, the samples from each machine are combined to form samples from the full posterior. This embarrassingly parallel algorithm allows each machine to act independently on a subset of the data (without communication) until the final combination stage. We prove that our algorithm generates asymptotically exact samples and empirically demonstrate its ability to parallelize burn-in and sampling in several models.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. A Scalable Parametric Item Calibration Engine (SPICE) for Explanatory IRT with Sparse Data

    stat.ME 2026-05 unverdicted novelty 5.0

    SPICE is a scalable Bayesian MCMC engine for explanatory IRT calibration on sparsely linked persons and items in large assessment banks.

  2. Analyzing and Storing Network Intrusion Detection Data using Bayesian Coresets: A Preliminary Study in Offline and Streaming Settings

    cs.LG 2019-06 unverdicted novelty 3.0

    Bayesian coresets applied to network intrusion data reduce sample counts while supporting accurate posterior inference in both offline and streaming regimes.