pith. sign in

arxiv: 1804.00681 · v1 · pith:MEGBY5MHnew · submitted 2018-04-02 · 📊 stat.ML · cs.LG

Stochastic EM for Shuffled Linear Regression

classification 📊 stat.ML cs.LG
keywords shuffledstochasticdatasetsalgorithmapproacherrorexperimentsframework
0
0 comments X
read the original abstract

We consider the problem of inference in a linear regression model in which the relative ordering of the input features and output labels is not known. Such datasets naturally arise from experiments in which the samples are shuffled or permuted during the protocol. In this work, we propose a framework that treats the unknown permutation as a latent variable. We maximize the likelihood of observations using a stochastic expectation-maximization (EM) approach. We compare this to the dominant approach in the literature, which corresponds to hard EM in our framework. We show on synthetic data that the stochastic EM algorithm we develop has several advantages, including lower parameter error, less sensitivity to the choice of initialization, and significantly better performance on datasets that are only partially shuffled. We conclude by performing two experiments on real datasets that have been partially shuffled, in which we show that the stochastic EM algorithm can recover the weights with modest error.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. FlashSinkhorn: IO-Aware Entropic Optimal Transport on GPU

    cs.LG 2026-02 conditional novelty 7.0

    FlashSinkhorn delivers up to 32x forward and 161x end-to-end speedups for entropic OT on A100 GPUs via IO-aware Triton kernels that fuse log-domain updates and streaming transport application.