Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

arxiv: 2605.15920 · v1 · pith:VB6XASACnew · submitted 2026-05-15 · 📊 stat.ML · cs.LG

Unsupervised Domain Shift Detection with Interpretable Subspace Attribution

Sebastian Springer , Alessandro Laio This is my paper

Pith reviewed 2026-05-19 19:22 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords domain shift detectionunsupervised learningdensity anomaliessubspace attributioninterpretabilityECG analysisdistributional differencescohort bias

0 comments p. Extension

pith:VB6XASAC Add to your LaTeX paper

What is a Pith Number?

\usepackage{pith}
\pithnumber{VB6XASAC}

Prints a linked pith:VB6XASAC badge after your title and writes the identifier into PDF metadata. Compiles on arXiv with no extra files. Learn more

The pith

Domain shifts appear as localized density anomalies that can be attributed to small sets of features without using labels.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a method to find subtle differences between the probability distributions of two datasets in an unsupervised way. It searches high-dimensional feature spaces for localized density anomalies and, when one is found, identifies the subspace in which the anomaly is strongest so that the shift can be linked to a handful of concrete features. A further protocol extracts balanced subsets of samples from the two unlabeled datasets that show no detectable remaining distributional difference. The approach is tested on synthetic 20-dimensional data with known shifts and on real electrocardiogram recordings that differ by recording device, where it flags device-related biases and points to the ECG measurements most responsible. This matters for spotting hidden cohort effects in data before any modeling step begins.

Core claim

The framework recovers both broad and localized shifts together with their supporting feature subspaces on controlled 20-dimensional benchmarks and, when applied to ECG recordings differing in measurement-device composition, detects device-induced shifts and identifies associated ECG features.

What carries the argument

An algorithm that detects localized density anomalies in high-dimensional feature spaces and then isolates the subspace in which the anomaly reaches its maximum strength.

If this is right

Both broad and localized shifts are recovered along with their exact supporting subspaces on 20-dimensional controlled benchmarks.
Device-induced shifts in ECG recordings are detected and tied to specific associated ECG features.
Representative subsets enriched for the imbalanced device components can be extracted from the unlabeled cohorts.
The resulting attribution makes the source of the distributional difference interpretable in terms of a small number of original features.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The compensation protocol offers a route to create matched cohorts for downstream tasks when only unlabeled data from differing sources is available.
If the initial feature representation is too coarse, shifts that require nonlinear views may remain undetected, pointing to possible extensions that include learned embeddings.
The same subspace-search logic could be applied to other high-dimensional domains such as images or time-series beyond ECG to surface acquisition or collection biases.

Load-bearing premise

Domain shifts manifest as detectable localized density anomalies in the chosen feature representation.

What would settle it

A pair of datasets whose known shift is diffuse across all features or only visible after a nonlinear transformation outside the searched subspaces would produce no detection or incorrect attribution.

Figures

Figures reproduced from arXiv: 2605.15920 by Alessandro Laio, Sebastian Springer.

**Figure 1.** Figure 1: Robustness of the identified shift set across global domain-shift amplitudes and random seeds. (A) Inclusion frequency of each feature in the identified shift set, aggregated across random seeds, as a function of the global domain-shift amplitude σ. (B) Pruned-to-total ratio as a function of σ. The solid line denotes the mean across seeds, and the shaded region denotes the 25th–75th percentile range across… view at source ↗

**Figure 2.** Figure 2: Robustness of localized domain-shift recovery across injected shift cardinalities and random seeds. (A) Inclusion frequency of each feature in the identified shift set, aggregated across random seeds, as a function of the localized domain-shift cardinality. (B) Pruned-to-injected domainshift ratio as a function of the cardinality of the injected localized shift. The solid line denotes the mean across seed… view at source ↗

**Figure 3.** Figure 3: Device-induced domain shifts in healthy ECG cohorts. Each row corresponds to one controlled comparison between two age- and sex-matched healthy ECG cohorts, X and Y, constructed from three measurement-device-specific subsets. Cases A–C are primary device-composition contrasts, each with one device specific to X , one device approximately shared between cohorts, and one device specific to Y. Case D is a p… view at source ↗

**Figure 4.** Figure 4: Confusion matrices for NORM-vs-PATH binary classification before and after EagleEye equalization. Rows correspond to the five illustrative pathology tasks (ISC_, 1AVB, NST_, IVCD, ISCAL); columns correspond to six conditions: three classifiers (LR-L2, LR-EN, HGBT) evaluated before equalization (left block) and after equalization (right block). Each 2×2 matrix reports prediction counts with predicted class… view at source ↗

**Figure 5.** Figure 5: ROC curves for NORM-vs-PATH binary classification before and after EagleEye equalization. Each panel corresponds to one of the five illustrative pathology tasks. Solid curves show performance before equalization; dashed curves show performance after equalization. Colors distinguish LR-L2, LR-EN, and HGBT, and AUCROC values are reported inside each panel. The classifier shown in row E of [PITH_FULL_IMAGE:f… view at source ↗

read the original abstract

We developed a tool for detecting domain shifts, namely subtle differences in the probability distributions of datasets. We identify these shifts using an algorithm designed to detect localised density anomalies in high-dimensional feature spaces. If an anomaly is present, we then identify the feature subspace in which the anomaly is most pronounced. This allows us to trace the domain shift to a small set of features, making the shift interpretable. Moreover, we provide a protocol for compensating domain shifts by extracting, from two unlabelled datasets, subsets of samples with no detectable residual distributional difference. We validate the framework on controlled 20-dimensional benchmarks with known ground truth, recovering both broad and localized shifts together with their supporting feature subspaces. We then apply it to healthy electrocardiogram (ECG) recordings represented by 782 features. In age- and sex-matched cohort comparisons differing in measurement-device composition, the method detects device-induced shifts, extracts representative subsets enriched in the imbalanced device components, and identifies ECG features associated with the acquisition contrast. These results suggest that density-shift detection and subspace attribution provide a practical framework for uncovering hidden cohort biases before downstream modelling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes an unsupervised framework for detecting domain shifts by identifying localized density anomalies in high-dimensional feature spaces and attributing them to interpretable feature subspaces. It further provides a protocol for extracting representative sample subsets from two unlabeled datasets with no detectable residual distributional differences. Validation occurs on controlled 20-dimensional benchmarks with known ground truth for both broad and localized shifts, followed by application to ECG recordings (782 features) to detect device-induced shifts and associated ECG features.

Significance. If the quantitative evaluation and robustness concerns are addressed, the framework could serve as a practical tool for uncovering hidden cohort biases in unlabeled data prior to modeling, with particular value in healthcare applications such as ECG analysis. The interpretability gained through subspace attribution and the compensation protocol represent clear strengths for improving downstream model reliability.

major comments (2)

Abstract: the abstract reports successful recovery on controlled benchmarks and sensible behavior on ECG data, yet provides no quantitative performance numbers, error bars, or description of how the anomaly threshold or subspace search are chosen; without these details the central claim cannot be fully evaluated.
Method and validation sections: the framework assumes domain shifts manifest as detectable localized density anomalies in the linear feature representation. The 20D benchmarks satisfy this by construction, but for the ECG case (782 features) a diffuse shift or one visible only after nonlinear transformation would evade both detection and attribution, and no sensitivity analysis or bounds on this risk are provided.

minor comments (1)

Abstract: consider briefly outlining the specific anomaly detection technique (e.g., density estimation method) used in the framework.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive review and recognition of the framework's potential value in healthcare applications. We address each major comment below with planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: the abstract reports successful recovery on controlled benchmarks and sensible behavior on ECG data, yet provides no quantitative performance numbers, error bars, or description of how the anomaly threshold or subspace search are chosen; without these details the central claim cannot be fully evaluated.

Authors: We agree that incorporating quantitative details would improve the abstract. In the revision we will add key performance metrics from the 20-dimensional benchmarks (e.g., detection and attribution accuracy with standard deviations across repeated trials) and a concise description of threshold selection via permutation testing for statistical significance together with the subspace enumeration procedure that maximizes the localized density anomaly score. These elements are already detailed in the Methods section and will be summarized in the abstract. revision: yes
Referee: Method and validation sections: the framework assumes domain shifts manifest as detectable localized density anomalies in the linear feature representation. The 20D benchmarks satisfy this by construction, but for the ECG case (782 features) a diffuse shift or one visible only after nonlinear transformation would evade both detection and attribution, and no sensitivity analysis or bounds on this risk are provided.

Authors: We acknowledge the assumption that shifts appear as localized density anomalies within the supplied (linear) feature space. The ECG results demonstrate detection of known device-induced shifts, providing empirical support for the method in practice. To address the concern we will add a sensitivity analysis subsection that reports detection performance on synthetic data under controlled diffuse shifts and after nonlinear transformations, quantifying how detection rates vary with shift characteristics. Theoretical bounds on the probability of missing nonlinear or diffuse shifts are difficult to derive within the current scope and will be noted as a limitation for future work. revision: partial

Circularity Check

0 steps flagged

No significant circularity; algorithmic procedure validated on independent benchmarks

full rationale

The paper presents an algorithmic framework for unsupervised domain shift detection via localized density anomaly identification followed by subspace attribution. Validation occurs on controlled 20-dimensional benchmarks constructed with explicit ground-truth shifts and on real ECG recordings with device-induced contrasts. No load-bearing steps reduce by definition or self-citation to the method's own outputs; the procedure is described as a sequence of detection and attribution operations whose success is measured against external ground truth rather than internal consistency. This is the most common honest finding for method papers that do not rely on fitted parameters renamed as predictions or uniqueness theorems imported from prior self-work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not enumerate free parameters or axioms; the method implicitly relies on choices for density estimation, anomaly threshold, and subspace search criterion that are not specified here.

pith-pipeline@v0.9.0 · 5717 in / 1158 out tokens · 27938 ms · 2026-05-19T19:22:20.108723+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 16 canonical work pages · 1 internal anchor

[1]

Lawrence, editors

Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence, editors. Dataset Shift in Machine Learning. Neural Information Processing Series. MIT Press, Cambridge, MA,

work page
[2]

Boyd, P.J

Masashi Sugiyama and Motoaki Kawanabe.Machine Learning in Non-Stationary Environments: Intro- duction to Covariate Shift Adaptation. MIT Press, Cambridge, MA, 2012. ISBN 9780262017091. doi: 10.7551/mitpress/9780262017091.001.0001

work page doi:10.7551/mitpress/9780262017091.001.0001 2012
[3]

Stephan Rabanser, Stephan Günnemann, and Zachary C. Lipton. Failing loudly: An empirical study of methods for detecting dataset shift. InAdvances in Neural Information Processing Systems, vol- ume 32, pages 1394–1406, 2019. URL https://papers.neurips.cc/paper_files/paper/2019/ hash/846c260d715e5b854ffad5f70a516c88-Abstract.html

work page 2019
[4]

Boyd, P.J

Arthur Gretton, Alexander J. Smola, Jiayuan Huang, Marcel Schmittfull, Karsten M. Borgwardt, and Bernhard Schölkopf. Covariate shift by kernel mean matching. In Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence, editors,Dataset Shift in Machine Learning. MIT Press, Cambridge, MA, 2009. doi: 10.7551/mitpress/978026217005...

work page doi:10.7551/mitpress/9780262170055.003.0008 2009
[5]

Direct importance estimation for covariate shift adaptation

Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von Bünau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation.Annals of the Institute of Statistical Mathematics, 60(4):699–746, 2008. doi: 10.1007/s10463-008-0197-x

work page doi:10.1007/s10463-008-0197-x 2008
[6]

Revisiting classifier two-sample tests

David Lopez-Paz and Maxime Oquab. Revisiting classifier two-sample tests. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=SJkXfE5xx

work page 2017
[7]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012. URL https: //www.jmlr.org/papers/v13/gretton12a.html

work page 2012
[8]

Székely and Maria L

Gábor J. Székely and Maria L. Rizzo. Energy statistics: A class of statistics based on distances.Journal of Statistical Planning and Inference, 143(8):1249–1272, 2013. doi: 10.1016/j.jspi.2013.03.018

work page doi:10.1016/j.jspi.2013.03.018 2013
[9]

, journal =

Mark F. Schilling. Multivariate two-sample tests based on nearest neighbors.Journal of the American Statistical Association, 81(395):799–806, 1986. doi: 10.1080/01621459.1986.10478337

work page doi:10.1080/01621459.1986.10478337 1986
[10]

Sasi Kiran Gaddipati et al

Jerome H. Friedman and Lawrence C. Rafsky. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests.The Annals of Statistics, 7(4):697–717, 1979. doi: 10.1214/aos/1176344722

work page doi:10.1214/aos/1176344722 1979
[11]

Detecting localized density anomalies in multivariate data via coin-flip statistics,

Sebastian Springer, Andre Scaffidi, Maximilian Autenrieth, Gabriella Contardo, Alessandro Laio, Roberto Trotta, and Heikki Haario. Detecting localized density anomalies in multivariate data via coin-flip statistics,

work page
[12]

URLhttps://arxiv.org/abs/2503.23927

work page internal anchor Pith review Pith/arXiv arXiv
[13]

Lunze, Wojciech Samek, and Tobias Schaeffter

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I. Lunze, Wojciech Samek, and Tobias Schaeffter. Ptb-xl, a large publicly available electrocardiography dataset.Scien- tific Data, 7(1):154, 2020. doi: 10.1038/s41597-020-0495-6. URL https://doi.org/10.1038/ s41597-020-0495-6

work page doi:10.1038/s41597-020-0495-6 2020
[14]

Aston, Ashish Sundar, Claus Graff, Jørgen K

Nils Strodthoff, Temesgen Mehari, Claudia Nagel, Philip J. Aston, Ashish Sundar, Claus Graff, Jørgen K. Kanters, Wilhelm Haverkamp, Olaf Dössel, Axel Loewe, Markus Bär, and Tobias Schaeffter. Ptb-xl+, a comprehensive electrocardiographic feature dataset.Scientific Data, 10(1):279, 2023. doi: 10.1038/ s41597-023-02153-8. URLhttps://doi.org/10.1038/s41597-0...

work page doi:10.1038/s41597-023-02153-8 2023
[15]

16 Karen Spärck Jones

Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques.ACM Transactions on Information Systems, 20(4):422–446, October 2002. doi: 10.1145/582415.582418

work page doi:10.1145/582415.582418 2002
[16]

more similar

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze.Introduction to Information Retrieval. Cambridge University Press, 2008. ISBN 978-0-521-86571-5. 10 A Supplementary Information A.1 Supplement:Tail-Based Bidirectional Equalization Algorithm 1Tail-Based Bidirectional Equalization Require:SamplesX,Y, neighborhood depthK M , tail quantile level...

work page 2008

[1] [1]

Lawrence, editors

Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence, editors. Dataset Shift in Machine Learning. Neural Information Processing Series. MIT Press, Cambridge, MA,

work page

[2] [2]

Boyd, P.J

Masashi Sugiyama and Motoaki Kawanabe.Machine Learning in Non-Stationary Environments: Intro- duction to Covariate Shift Adaptation. MIT Press, Cambridge, MA, 2012. ISBN 9780262017091. doi: 10.7551/mitpress/9780262017091.001.0001

work page doi:10.7551/mitpress/9780262017091.001.0001 2012

[3] [3]

Stephan Rabanser, Stephan Günnemann, and Zachary C. Lipton. Failing loudly: An empirical study of methods for detecting dataset shift. InAdvances in Neural Information Processing Systems, vol- ume 32, pages 1394–1406, 2019. URL https://papers.neurips.cc/paper_files/paper/2019/ hash/846c260d715e5b854ffad5f70a516c88-Abstract.html

work page 2019

[4] [4]

Boyd, P.J

Arthur Gretton, Alexander J. Smola, Jiayuan Huang, Marcel Schmittfull, Karsten M. Borgwardt, and Bernhard Schölkopf. Covariate shift by kernel mean matching. In Joaquin Quiñonero-Candela, Masashi Sugiyama, Anton Schwaighofer, and Neil D. Lawrence, editors,Dataset Shift in Machine Learning. MIT Press, Cambridge, MA, 2009. doi: 10.7551/mitpress/978026217005...

work page doi:10.7551/mitpress/9780262170055.003.0008 2009

[5] [5]

Direct importance estimation for covariate shift adaptation

Masashi Sugiyama, Taiji Suzuki, Shinichi Nakajima, Hisashi Kashima, Paul von Bünau, and Motoaki Kawanabe. Direct importance estimation for covariate shift adaptation.Annals of the Institute of Statistical Mathematics, 60(4):699–746, 2008. doi: 10.1007/s10463-008-0197-x

work page doi:10.1007/s10463-008-0197-x 2008

[6] [6]

Revisiting classifier two-sample tests

David Lopez-Paz and Maxime Oquab. Revisiting classifier two-sample tests. InInternational Conference on Learning Representations, 2017. URLhttps://openreview.net/forum?id=SJkXfE5xx

work page 2017

[7] [7]

Borgwardt, Malte J

Arthur Gretton, Karsten M. Borgwardt, Malte J. Rasch, Bernhard Schölkopf, and Alexander Smola. A kernel two-sample test.Journal of Machine Learning Research, 13:723–773, 2012. URL https: //www.jmlr.org/papers/v13/gretton12a.html

work page 2012

[8] [8]

Székely and Maria L

Gábor J. Székely and Maria L. Rizzo. Energy statistics: A class of statistics based on distances.Journal of Statistical Planning and Inference, 143(8):1249–1272, 2013. doi: 10.1016/j.jspi.2013.03.018

work page doi:10.1016/j.jspi.2013.03.018 2013

[9] [9]

, journal =

Mark F. Schilling. Multivariate two-sample tests based on nearest neighbors.Journal of the American Statistical Association, 81(395):799–806, 1986. doi: 10.1080/01621459.1986.10478337

work page doi:10.1080/01621459.1986.10478337 1986

[10] [10]

Sasi Kiran Gaddipati et al

Jerome H. Friedman and Lawrence C. Rafsky. Multivariate generalizations of the wald-wolfowitz and smirnov two-sample tests.The Annals of Statistics, 7(4):697–717, 1979. doi: 10.1214/aos/1176344722

work page doi:10.1214/aos/1176344722 1979

[11] [11]

Detecting localized density anomalies in multivariate data via coin-flip statistics,

Sebastian Springer, Andre Scaffidi, Maximilian Autenrieth, Gabriella Contardo, Alessandro Laio, Roberto Trotta, and Heikki Haario. Detecting localized density anomalies in multivariate data via coin-flip statistics,

work page

[12] [12]

URLhttps://arxiv.org/abs/2503.23927

work page internal anchor Pith review Pith/arXiv arXiv

[13] [13]

Lunze, Wojciech Samek, and Tobias Schaeffter

Patrick Wagner, Nils Strodthoff, Ralf-Dieter Bousseljot, Dieter Kreiseler, Fatima I. Lunze, Wojciech Samek, and Tobias Schaeffter. Ptb-xl, a large publicly available electrocardiography dataset.Scien- tific Data, 7(1):154, 2020. doi: 10.1038/s41597-020-0495-6. URL https://doi.org/10.1038/ s41597-020-0495-6

work page doi:10.1038/s41597-020-0495-6 2020

[14] [14]

Aston, Ashish Sundar, Claus Graff, Jørgen K

Nils Strodthoff, Temesgen Mehari, Claudia Nagel, Philip J. Aston, Ashish Sundar, Claus Graff, Jørgen K. Kanters, Wilhelm Haverkamp, Olaf Dössel, Axel Loewe, Markus Bär, and Tobias Schaeffter. Ptb-xl+, a comprehensive electrocardiographic feature dataset.Scientific Data, 10(1):279, 2023. doi: 10.1038/ s41597-023-02153-8. URLhttps://doi.org/10.1038/s41597-0...

work page doi:10.1038/s41597-023-02153-8 2023

[15] [15]

16 Karen Spärck Jones

Kalervo Järvelin and Jaana Kekäläinen. Cumulated gain-based evaluation of IR techniques.ACM Transactions on Information Systems, 20(4):422–446, October 2002. doi: 10.1145/582415.582418

work page doi:10.1145/582415.582418 2002

[16] [16]

more similar

Christopher D. Manning, Prabhakar Raghavan, and Hinrich Schütze.Introduction to Information Retrieval. Cambridge University Press, 2008. ISBN 978-0-521-86571-5. 10 A Supplementary Information A.1 Supplement:Tail-Based Bidirectional Equalization Algorithm 1Tail-Based Bidirectional Equalization Require:SamplesX,Y, neighborhood depthK M , tail quantile level...

work page 2008