pith. sign in

arxiv: 2407.06892 · v2 · pith:NKGBIJ7Hnew · submitted 2024-07-09 · 📊 stat.ME

When Knockoffs fail: diagnosing and fixing non-exchangeability of Knockoffs

classification 📊 stat.ME
keywords dataassumptioncontrolinferenceknockoffknockoffsstatisticaldiagnostic
0
0 comments X
read the original abstract

Knockoffs are a popular statistical framework that addresses the challenging problem of conditional variable selection in high-dimensional settings with statistical control. Such statistical control is essential for the reliability of inference. However, knockoff guarantees rely on an exchangeability assumption that is difficult to test in practice, and there is little discussion in the literature on how to deal with unfulfilled hypotheses. This assumption is related to the ability to generate data similar to the observed data. To maintain reliable inference, we introduce a diagnostic tool based on Classifier Two-Sample Tests. Using simulations and real data, we show that violations of this assumption occur in common settings for classical knockoff generators, especially when the data have a strong dependence structure. As a consequence, knockoff-based inference suffers from a massive inflation of false positives. We show that the diagnostic tool correctly detects such behavior. We show that an alternative knockoff construction, based on constructing a predictor of each variable based on all others, solves the issue. We also propose a computationally-efficient variant of this algorithm and show empirically that this approach restores error control on simulated data and semi-simulated experiments based on neuroimaging data.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Model Selection for SLOPE Models: A Bayesian Perspective

    stat.ME 2026-06 unverdicted novelty 7.0

    Introduces Bayesian spike-and-slab embeddings for group SLOPE models and a two-step orthogonal transformation to achieve FDR control and higher power than cross-validation in general settings.

  2. Multiple testing

    stat.ME 2026-06 unverdicted

    An expository introduction to multiple hypothesis testing procedures and error control criteria with R package references.