Weak Signals and Heavy Tails: Learning Theory meets Extreme Value Analysis

Anne Sabourin; Stephan Cl\'emen\c{c}on

arxiv: 2504.06984 · v3 · submitted 2025-04-09 · 🧮 math.ST · stat.TH

Weak Signals and Heavy Tails: Learning Theory meets Extreme Value Analysis

Stephan Cl\'emen\c{c}on , Anne Sabourin This is my paper

classification 🧮 math.ST stat.TH

keywords extremelearningtheorymultivariateresultsalgorithmsdatageneralization

0 comments

read the original abstract

The masses of data now available have opened up the prospect of discovering weak signals using machine-learning algorithms, with a view to predictive or interpretation tasks. As this survey of recent results attempts to show, bringing multivariate extreme value theory and statistical learning theory together in a common, nonparametric and nonasymptotic framework makes it possible to design and analyze new methods for exploiting the scarce information located in distribution tails in these purposes. This article reviews recently proved theoretical tools for establishing guarantees for supervised or unsupervised algorithms learning from a fraction of extreme data. These are mainly exponential maximal deviation inequalities tailored to low-probability regions and concentration results for stochastic processes empirically describing the behavior of multivariate extreme observations, their dependence structure in particular. Under appropriate assumptions of regular variation, several illustrative applications in multivariate settings are then examined: classification, regression, anomaly detection, model selection via cross-validation. For these, generalization results are established inspired by the classical bounds in statistical learning theory. In the same spirit, it is also shown how to adapt the popular high-dimensional Lasso technique in the context of extreme values for the covariates with generalization guarantees.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

TailedTS: Benchmark Dataset for Heavy-Tailed Time Series Prediction and Periodicity Quantification
cs.LG 2026-05 unverdicted novelty 7.0

TailedTS supplies 24.69 billion Wikipedia page-view records as a public benchmark for heavy-tailed time series forecasting and periodicity analysis, revealing weaker periodic structure in high-traffic pages.
Extrapolation in Statistical Learning with Extreme Value Theory
stat.ML 2026-05 unverdicted novelty 2.0

A survey of recent methods that apply extreme value theory to enable extrapolation in statistical learning and machine learning.