pith. sign in

arxiv: 2507.05701 · v1 · submitted 2025-07-08 · 📊 stat.ME

Area-based epigraph and hypograph indices for functional outlier detection

Pith reviewed 2026-05-19 06:25 UTC · model grok-4.3

classification 📊 stat.ME
keywords functional data analysisoutlier detectionepigraph indexhypograph indexarea-based indicesfunctional outliersmultivariate detection
0
0 comments X

The pith

New area-based epigraph and hypograph indices detect functional outliers by measuring deviation areas.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates the Area-Based Epigraph Index and Area-Based Hypograph Index to measure the actual area separating curves instead of the fraction of the domain where one exceeds the other. This change makes the indices responsive to differences in scale or level as well as to differences in shape. These indices are then calculated for each curve, its first derivative, and its second derivative. The resulting vectors are passed to standard multivariate outlier detection routines in the EHyOut procedure. Simulations indicate that this approach performs reliably across contamination scenarios and often surpasses prior methods, with demonstrations on weather and population data showing its practical value.

Core claim

The authors establish that the Area-Based Epigraph Index (ABEI) and Area-Based Hypograph Index (ABHI) quantify the area between functions and thereby detect both magnitude and shape outliers. Incorporating these indices computed on the original curves and on their first and second derivatives allows recasting functional outlier detection as a multivariate problem to which conventional techniques can be applied.

What carries the argument

The Area-Based Epigraph Index (ABEI) and Area-Based Hypograph Index (ABHI) that compute the integrated area between a given curve and all others in the sample.

If this is right

  • Outlier detection in functional data now accounts for magnitude deviations in addition to shape anomalies.
  • The EHyOut procedure proves stable and competitive or superior in extensive simulation studies under varied contamination.
  • Applications to Spanish weather data and United Nations population data demonstrate the method's ability to identify meaningful outliers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • One could explore whether using only first derivatives or adding third derivatives alters detection power in specific applications.
  • The framework might integrate with other functional data tools such as functional principal components for preprocessing.
  • Performance could vary with the choice of multivariate outlier method, suggesting comparisons across several detectors.

Load-bearing premise

Quantifying the area between curves in ABEI and ABHI simultaneously captures magnitude and shape deviations, and combining these with derivative information allows multivariate outlier detection to identify functional outliers reliably.

What would settle it

Simulation results in which curves deviate substantially in magnitude but EHyOut does not flag them as outliers more effectively than shape-based alternatives.

read the original abstract

Detecting outliers in Functional Data Analysis is challenging because curves can stray from the majority in many different ways. The Modified Epigraph Index (MEI) and Modified Hypograph Index (MHI) rank functions by the fraction of the domain on which one curve lies above or below another. While effective for spotting shape anomalies, their construction limits their ability to flag magnitude outliers. This paper introduces two new metrics, the Area-Based Epigraph Index (ABEI) and Area-Based Hypograph Index (ABHI) that quantify the area between curves, enabling simultaneous sensitivity to both magnitude and shape deviations. Building on these indices, we present EHyOut, a robust procedure that recasts functional outlier detection as a multivariate problem: for every curve, and for its first and second derivatives, we compute ABEI and ABHI and then apply multivariate outlier-detection techniques to the resulting feature vectors. Extensive simulations show that EHyOut remains stable across a wide range of contamination settings and often outperforms established benchmark methods. Moreover, applications to Spanish weather data and United Nations world population data further illustrate the practical utility and meaningfulness of this methodology.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces Area-Based Epigraph Index (ABEI) and Area-Based Hypograph Index (ABHI) to address limitations of the Modified Epigraph Index (MEI) and Modified Hypograph Index (MHI) in functional outlier detection. These new indices quantify the area between curves rather than the fraction of the domain, enabling sensitivity to both magnitude and shape deviations. The EHyOut procedure computes ABEI and ABHI on each curve and on its first and second derivatives, yielding six-dimensional feature vectors that are then passed to standard multivariate outlier detectors. Simulations across contamination settings are reported to show stability and frequent outperformance of benchmarks, with applications to Spanish weather data and UN world population data illustrating practical use.

Significance. If the results hold, the contribution lies in a direct, area-based extension of epigraph/hypograph ideas that simultaneously targets magnitude and shape outliers without requiring new parametric assumptions. The simulation design and real-data examples supply concrete evidence of utility; the method's recasting of the problem as a fixed-dimensional multivariate task is a clear practical strength.

major comments (1)
  1. [Section describing EHyOut construction and derivative feature extraction] The central claim that the six-dimensional feature vector (ABEI/ABHI on the curve plus first and second derivatives) reliably flags both magnitude and shape outliers depends on the second-derivative indices. In discretely observed data these derivatives are obtained only after smoothing, yet the manuscript reports no sensitivity analysis to bandwidth choice or to additive noise levels. If the smoothing parameter is misspecified, the area indices on the second derivative can be dominated by estimation artifacts rather than genuine curvature deviations, undermining the claim that the procedure simultaneously captures magnitude and shape outliers.
minor comments (1)
  1. [Abstract] The abstract states that 'extensive simulations show stability across a wide range of contamination settings' but does not enumerate the specific contamination fractions, sample sizes, or performance metrics (e.g., true-positive rate, false-positive rate) used to support that statement.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive summary and for identifying a key robustness issue in the EHyOut construction. We address the single major comment below and agree that additional analysis is warranted to support the claims about derivative-based features.

read point-by-point responses
  1. Referee: The central claim that the six-dimensional feature vector (ABEI/ABHI on the curve plus first and second derivatives) reliably flags both magnitude and shape outliers depends on the second-derivative indices. In discretely observed data these derivatives are obtained only after smoothing, yet the manuscript reports no sensitivity analysis to bandwidth choice or to additive noise levels. If the smoothing parameter is misspecified, the area indices on the second derivative can be dominated by estimation artifacts rather than genuine curvature deviations, undermining the claim that the procedure simultaneously captures magnitude and shape outliers.

    Authors: We agree that the reliability of the second-derivative indices in the six-dimensional feature vector hinges on the quality of the smoothing step, and that an explicit sensitivity study to bandwidth and noise level was omitted from the original manuscript. Our simulations used standard smoothing procedures and produced stable results across contamination settings, but this does not substitute for a targeted robustness check. In the revision we will add a new subsection (and corresponding figures) that varies the smoothing bandwidth over a grid of values and adds controlled noise levels to the observed curves. Preliminary checks already indicate that outlier-detection performance remains largely unchanged for bandwidths within a factor of two of the default choice and for moderate noise; these results will be reported to substantiate that the procedure is not driven by estimation artifacts. revision: yes

Circularity Check

0 steps flagged

No circularity: new indices defined directly from areas; outlier procedure uses off-the-shelf methods

full rationale

The paper's central construction defines ABEI and ABHI explicitly as area-based extensions of the existing MEI/MHI indices, then computes these six scalars (on the curve plus first and second derivatives) and feeds them into standard multivariate outlier detectors. No step reduces a claimed result to a fitted parameter or self-citation by construction; the performance claims rest on external simulations and real-data applications rather than tautological re-use of the same quantities. The derivation chain is therefore self-contained against the benchmarks it invokes.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the direct definition of area-based indices and the effectiveness of standard multivariate outlier detection applied to derived features; the abstract mentions no explicit free parameters, new physical entities, or non-standard axioms beyond typical functional data smoothness assumptions needed to compute derivatives.

axioms (1)
  • domain assumption Curves are sufficiently smooth to admit first and second derivatives that can be meaningfully computed and compared.
    The EHyOut procedure explicitly uses first and second derivatives of the observed functions.

pith-pipeline@v0.9.0 · 5738 in / 1379 out tokens · 70131 ms · 2026-05-19T06:25:11.314705+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages

  1. [1]

    Romo , Juan J

    barticle [author] Arribas-Gil , Ana A. Romo , Juan J. ( 2014 ). Shape outlier detection and visualization for functional data: the outliergram . Biostatistics 15 603--619 . barticle

  2. [2]

    , Lewis , Toby T

    bbook [author] Barnett , Vic V. , Lewis , Toby T. et al. ( 1994 ). Outliers in statistical data 3 . Wiley New York . bbook

  3. [3]

    , Lillo , Rosa E R

    barticle [author] Cabana , Elisa E. , Lillo , Rosa E R. E. Laniado , Henry H. ( 2021 ). Multivariate outlier detection based on a robust Mahalanobis distance with shrinkage estimators . Statistical papers 62 1583--1609 . barticle

  4. [4]

    Jurman , Giuseppe G

    barticle [author] Chicco , Davide D. Jurman , Giuseppe G. ( 2020 ). The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation . BMC genomics 21 1--13 . barticle

  5. [5]

    barticle [author] Cuesta-Albertos , Juan Antonio J. A. Nieto-Reyes , Alicia A. ( 2008 ). The random Tukey depth . Computational Statistics & Data Analysis 52 4979--4988 . barticle

  6. [6]

    , Febrero , Manuel M

    barticle [author] Cuevas , Antonio A. , Febrero , Manuel M. Fraiman , Ricardo R. ( 2006 ). On the use of the bootstrap for estimating functions with functional data . Computational statistics & data analysis 51 1063--1074 . barticle

  7. [7]

    Genton , Marc G M

    barticle [author] Dai , Wenlin W. Genton , Marc G M. G. ( 2018 ). Multivariate functional data visualization and outlier detection . Journal of Computational and Graphical Statistics 27 923--934 . barticle

  8. [8]

    , Mrkvička , Tomáš T

    barticle [author] Dai , Wenlin W. , Mrkvička , Tomáš T. , Sun , Ying Y. Genton , Marc G. M. G. ( 2020 ). Functional outlier detection and taxonomy by sequential transformations . Computational Statistics & Data Analysis 149 106960 . barticle

  9. [9]

    ( 1997 )

    barticle [author] Falk , Michael M. ( 1997 ). On mad and comedians . Annals of the Institute of Statistical Mathematics 49 615--644 . barticle

  10. [10]

    De La Fuente , Manuel Oviedo M

    barticle [author] Febrero-Bande , Manuel M. De La Fuente , Manuel Oviedo M. O. ( 2012 ). Statistical computing in functional data analysis: The R package fda. usc . Journal of statistical Software 51 1--28 . barticle

  11. [11]

    Vieu , Philippe P

    bbook [author] Ferraty , Fr \'e d \'e ric F. Vieu , Philippe P. ( 2006 ). Nonparametric functional data analysis: theory and practice . Springer Science & Business Media . bbook

  12. [12]

    , Garrett , Robert G R

    barticle [author] Filzmoser , Peter P. , Garrett , Robert G R. G. Reimann , Clemens C. ( 2005 ). Multivariate outlier detection in exploration geochemistry . Computers & geosciences 31 579--587 . barticle

  13. [13]

    Muniz , Graciela G

    barticle [author] Fraiman , Ricardo R. Muniz , Graciela G. ( 2001 ). Trimmed means for functional data . Test 10 419--440 . barticle

  14. [14]

    bincollection [author] Franco-Pereira , A. M A. M. , Lillo , R. E. R. E. Romo , J. J. ( 2011 ). Extremality for functional data . In Recent advances in functional data analysis and related topics , ( F F. Ferraty , ed.) 14 651-676 . Springer, New York . bincollection

  15. [15]

    barticle [author] Franco-Pereira , A. M A. M. Lillo , R. E R. E. ( 2020 ). Rank tests for functional data based on the epigraph, the hypograph and associated graphical representations . Advances in Data Analysis and Classification 14 651--676 . 10.1007/s11634-019-00380-9 barticle

  16. [16]

    Kettenring , John R J

    barticle [author] Gnanadesikan , Ramanathan R. Kettenring , John R J. R. ( 1972 ). Robust estimates, residuals, and outlier detection with multiresponse data . Biometrics 81--124 . barticle

  17. [17]

    Rocke , David M D

    barticle [author] Hardin , Johanna J. Rocke , David M D. M. ( 2005 ). The Distribution of Robust Distances . Journal of Computational and Graphical Statistics 14 928--946 . barticle

  18. [18]

    bbook [author] Hawkins , Douglas M D. M. ( 1980 ). Identification of outliers 11 . Springer . bbook

  19. [19]

    Muñoz , Alberto A

    binproceedings [author] Hernández , Nicolás N. Muñoz , Alberto A. ( 2016 ). Kernel Depth Measures for Functional Data with Application to Outlier Detection . In Artificial Neural Networks and Machine Learning – ICANN 2016 ( Alessandro E. P. A. E. P. Villa , Paolo P. Masulli Antonio Javier A. J. Pons Rivero , eds.). Lecture Notes in Computer Science 235--2...

  20. [20]

    Scheipl , Fabian F

    barticle [author] Herrmann , Moritz M. Scheipl , Fabian F. ( 2021 ). A Geometric Perspective on Functional Outlier Detection . Stats 4 . barticle

  21. [21]

    Kokoszka , Piotr P

    bbook [author] Horv \'a th , Lajos L. Kokoszka , Piotr P. ( 2012 ). Inference for functional data with applications 200 . Springer Science & Business Media . bbook

  22. [22]

    Eubank , Randall R

    bbook [author] Hsing , Tailen T. Eubank , Randall R. ( 2015 ). Theoretical foundations of functional data analysis, with an introduction to linear operators 997 . John Wiley & Sons . bbook

  23. [23]

    Sun , Ying Y

    barticle [author] Huang , Huang H. Sun , Ying Y. ( 2019 ). A decomposition of total variation depth for understanding functional outliers . Technometrics . barticle

  24. [24]

    barticle [author] Jiménez-Varón , Cristian F. C. F. , Harrou , Fouzi F. Sun , Ying Y. ( 2024 ). Pointwise data depth for univariate and multivariate functional outlier detection . Environmetrics e2851 . barticle

  25. [25]

    barticle [author] López-Pintado , S. S. Romo , J. J. ( 2009 ). On the concept of depth for functional data . American Statistical Association 104 327-332 . barticle

  26. [26]

    barticle [author] López-Pintado , S. S. Romo , J. J. ( 2011 ). A half-region depth for functional data . Computational Statistics and Data Analysis 55 1679-1695 . barticle

  27. [27]

    , Rousseeuw , Peter P

    bmanual [author] Maechler , Martin M. , Rousseeuw , Peter P. , Croux , Christophe C. , Todorov , Valentin V. , Ruckstuhl , Andreas A. , Salibian-Barrera , Matias M. , Verbeke , Tobias T. , Koller , Manuel M. , Conceicao , Eduardo L. T. E. L. T. di Palma , Maria Anna M. A. ( 2024 ). robustbase: Basic Robust Statistics R package version 0.99-4-1 . bmanual

  28. [28]

    barticle [author] Mahalanobis , Prasanta Chandra P. C. ( 1936 ). On the generalized distance in statistics . Proceedings of the National Institute of Sciences (Calcutta) 2 49--55 . barticle

  29. [29]

    barticle [author] Maronna , Ricardo A R. A. Zamar , Ruben H R. H. ( 2002 ). Robust estimates of location and dispersion for high-dimensional datasets . Technometrics 44 307--317 . barticle

  30. [30]

    , Lillo , RE R

    barticle [author] Martin-Barragan , B B. , Lillo , RE R. Romo , J J. ( 2016 ). Functional boxplots based on epigraphs and hypographs . Journal of Applied Statistics 43 1088--1103 . barticle

  31. [31]

    barticle [author] Matthews , Brian W B. W. ( 1975 ). Comparison of the predicted and observed secondary structure of T4 phage lysozyme . Biochimica et Biophysica Acta (BBA)-Protein Structure 405 442--451 . barticle

  32. [32]

    , Gijbels , Irène I

    barticle [author] Nagy , Stanislav S. , Gijbels , Irène I. Hlubinka , Daniel D. ( 2017 ). Depth- Based Recognition of Shape Outlying Functions . Journal of Computational and Graphical Statistics 26 . barticle

  33. [33]

    barticle [author] Ojo , Oluwasegun Taiwo O. T. , Fern \'a ndez Anta , Antonio A. , Lillo , Rosa E R. E. Sguera , Carlo C. ( 2022 ). Detecting and classifying outliers in big functional data . Advances in Data Analysis and Classification 16 725--760 . barticle

  34. [34]

    bmanual [author] Pulido , B. B. ( 2024 ). ehymet: Epigraph and Hypograph Based Methodology for Outlier Detection in Functional Data R package version 0.1.1 . bmanual

  35. [35]

    , Franco-Pereira , Alba M A

    barticle [author] Pulido , Bel \'e n B. , Franco-Pereira , Alba M A. M. Lillo , Rosa E R. E. ( 2023 ). A fast epigraph and hypograph-based approach for clustering functional data . Statistics and Computing 33 36 . 10.1007/s11222-023-10213-7 barticle

  36. [36]

    , Franco-Pereira , Alba M

    barticle [author] Pulido , Belén B. , Franco-Pereira , Alba M. A. M. Lillo , Rosa E. R. E. ( 2025 ). Clustering multivariate functional data using the epigraph and hypograph indices: a case study on Madrid air quality . Stoch Environ Res Risk Assess . 10.1007/s00477-025-02986-2 barticle

  37. [37]

    bbook [author] Ramsay , J. O. J. O. Silverman , B. W. B. W. ( 2005 ). Functional Data Analysis , 2 ed. Springer . bbook

  38. [38]

    barticle [author] Rousseeuw , Peter J P. J. Driessen , Katrien Van K. V. ( 1999 ). A fast algorithm for the minimum covariance determinant estimator . Technometrics 41 212--223 . barticle

  39. [39]

    Srinivasan , MR M

    barticle [author] Sajesh , TA T. Srinivasan , MR M. ( 2012 ). Outlier detection for high dimensional data using the Comedian approach . Journal of statistical computation and simulation 82 745--757 . barticle

  40. [40]

    , Galeano , Pedro P

    barticle [author] Sguera , Carlo C. , Galeano , Pedro P. Lillo , Rosa R. ( 2014 ). Spatial depth-based classification for functional data . Test 23 725--750 . barticle

  41. [41]

    Genton , Marc G

    barticle [author] Sun , Ying Y. Genton , Marc G. M. G. ( 2011 ). Functional Boxplots . Journal of Computational and Graphical Statistics 20 316--334 . barticle

  42. [42]

    bmanual [author] Todorov , V. V. ( 2025 ). rrcov: Scalable Robust Estimators with High Breakdown Point R package version 1.7.7 . bmanual

  43. [43]

    Zhang , Cun-Hui C.-H

    barticle [author] Vardi , Yehuda Y. Zhang , Cun-Hui C.-H. ( 2000 ). The multivariate L1-median and associated data depth . Proceedings of the National Academy of Sciences 97 1423--1426 . barticle

  44. [44]

    , Chiou , Jeng-Min J.-M

    barticle [author] Wang , Jane-Ling J.-L. , Chiou , Jeng-Min J.-M. M \"u ller , Hans-Georg H.-G. ( 2016 ). Functional data analysis . Annual Review of Statistics and its Application 3 257--295 . barticle