pith. sign in

arxiv: 1803.08882 · v1 · pith:TD6PV5AQnew · submitted 2018-03-23 · 📊 stat.ML · cs.LG· stat.ME

Trace your sources in large-scale data: one ring to find them all

classification 📊 stat.ML cs.LGstat.ME
keywords datasourcesalgorithmscross-validationdecomposeextractlarge-scaleprobabilistic
0
0 comments X
read the original abstract

An important preprocessing step in most data analysis pipelines aims to extract a small set of sources that explain most of the data. Currently used algorithms for blind source separation (BSS), however, often fail to extract the desired sources and need extensive cross-validation. In contrast, their rarely used probabilistic counterparts can get away with little cross-validation and are more accurate and reliable but no simple and scalable implementations are available. Here we present a novel probabilistic BSS framework (DECOMPOSE) that can be flexibly adjusted to the data, is extensible and easy to use, adapts to individual sources and handles large-scale data through algorithmic efficiency. DECOMPOSE encompasses and generalises many traditional BSS algorithms such as PCA, ICA and NMF and we demonstrate substantial improvements in accuracy and robustness on artificial and real data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. TGCM: Topic-Guided Generative Disentanglement of Interleaved APT Technique Sequences

    cs.CR 2026-06 unverdicted novelty 5.0

    TGCM applies consistency models with ATT&CK-derived topic priors to solve unknown-K interleaved sequence demixing for concurrent APT campaigns.