Conditional regularized halfspace depth for sparse functional data and its applications
Pith reviewed 2026-05-21 03:22 UTC · model grok-4.3
The pith
Conditional regularized halfspace depth ranks sparse functional data directly from observations without reconstructing trajectories.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. The paper establishes several basic theoretical properties that confirm its behavior as a depth measure and shows that the approach remains applicable even when observations are extremely sparse.
What carries the argument
The conditional regularized halfspace depth, defined as the infimum of conditional halfspace probabilities for the trajectory given the sparse measurements.
If this is right
- CRHD applies to functional data observed at only a handful of irregular times.
- It produces rankings that can be used directly in rank-based statistical procedures.
- The method avoids bias that can arise from preliminary curve reconstruction steps.
- It is illustrated on an infant growth dataset with irregular measurement times.
Where Pith is reading between the lines
- The same conditional-probability construction could be examined for other forms of incomplete high-dimensional data where full reconstruction is unreliable.
- It may offer a route to robust outlier detection in longitudinal studies that collect only a few observations per subject.
- Performance under varying patterns of missingness, such as clustered observation times, remains a natural next check.
Load-bearing premise
Conditional halfspace probabilities for the infinite-dimensional trajectory can be defined, regularized, and estimated from sparse measurements without introducing substantial bias or instability.
What would settle it
Apply CRHD to simulated data generated from known full trajectories, subsample each trajectory to sparse points, compute the induced ordering, and compare it with the ordering obtained from full-data depth; systematic reversal of expected ranks would falsify the claim.
Figures
read the original abstract
Many functional datasets are observed sparsely and irregularly. Ordering such data is challenging because only limited information is available from each observation, while the underlying trajectories remain infinite-dimensional. This paper develops a novel depth notion for sparse functional data, called the conditional regularized halfspace depth (CRHD). CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements, thereby enabling depth evaluation directly at sparse observations without requiring trajectory reconstruction. We study several basic theoretical properties of CRHD that clarify its behavior as a depth measure. The proposed depth is applicable even to extremely sparsely observed functional data, overcoming key limitations of existing sparse functional depths that often rely on reconstructed curves. In addition, CRHD induces meaningful rankings for complex functional data. Its numerical performance is demonstrated through rank-based tests, and its practical utility is illustrated using an infant growth dataset.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes the conditional regularized halfspace depth (CRHD) for sparsely and irregularly observed functional data. CRHD is defined as the infimum over directions u of the conditional probability that the underlying infinite-dimensional trajectory satisfies a halfspace condition given the finite sparse point observations. The authors state that this construction permits direct depth evaluation on the observed data points without explicit trajectory reconstruction. They examine basic theoretical properties of CRHD as a depth measure, claim applicability to extremely sparse regimes, demonstrate that it produces meaningful rankings, and illustrate performance via rank-based tests together with an application to an infant growth dataset.
Significance. If the central construction can be shown to be well-defined and estimable without implicitly reintroducing reconstruction bias, the work would address a genuine methodological gap in functional data analysis. Existing sparse functional depths frequently rely on preliminary smoothing or basis expansion, which can distort ordering; a depth that operates directly on the conditional law would be a useful addition. The reported theoretical properties and the real-data illustration on growth curves provide a foundation for further development, though the magnitude of the advance hinges on whether the regularization step delivers the claimed separation from reconstruction-based approaches.
major comments (3)
- [Section 2] Definition of CRHD (Section 2): the claim that the depth is evaluated 'directly at sparse observations without requiring trajectory reconstruction' is not yet supported by an explicit argument showing that the conditional probability P(X in halfspace | observations at t1..tk) remains identifiable and stable under the chosen regularization without effectively constraining the trajectory to a finite-dimensional subspace. In nonparametric settings the conditional law of an infinite-dimensional process given finitely many point evaluations is typically non-identifiable; any practical estimator must therefore impose structure (basis truncation, kernel smoothing, or parametric covariance) whose effect on the resulting depth values needs to be quantified.
- [Section 3] Theoretical properties (Section 3): the manuscript states that several basic properties of CRHD are established, yet the provided text does not contain the derivations or the precise regularity conditions under which the infimum over directions is attained and the depth is monotone or convex. Without these details it is difficult to assess whether the properties survive the regularization step or whether they reduce to known properties of unconditional halfspace depth after the conditional law is approximated.
- [Section 4] Estimation procedure (Section 4): the numerical implementation must rely on an estimator of the conditional halfspace probability. The manuscript should clarify whether this estimator is constructed from a finite sample of sparsely observed curves and whether consistency or rates are proved; if the estimator implicitly reconstructs trajectories via the same regularization used in the definition, the advantage over existing reconstruction-based depths requires explicit comparison on the same sparse regimes.
minor comments (2)
- [Abstract] The abstract and introduction would benefit from a concise statement of the precise form of regularization (e.g., basis dimension, penalty parameter, or kernel bandwidth) employed in the definition and estimation of CRHD.
- [Section 2] Notation for the sparse observation times and the conditioning sigma-field should be introduced once and used consistently throughout the theoretical sections.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment point by point below, indicating revisions where the manuscript will be strengthened to provide additional clarity and rigor.
read point-by-point responses
-
Referee: [Section 2] Definition of CRHD (Section 2): the claim that the depth is evaluated 'directly at sparse observations without requiring trajectory reconstruction' is not yet supported by an explicit argument showing that the conditional probability P(X in halfspace | observations at t1..tk) remains identifiable and stable under the chosen regularization without effectively constraining the trajectory to a finite-dimensional subspace. In nonparametric settings the conditional law of an infinite-dimensional process given finitely many point evaluations is typically non-identifiable; any practical estimator must therefore impose structure (basis truncation, kernel smoothing, or parametric covariance) whose effect on the resulting depth values needs to be quantified.
Authors: We appreciate the referee's emphasis on identifiability. The regularization in the definition of CRHD is introduced precisely to ensure the conditional probability is well-defined and depends only on the sparse observations and the regularization parameter, without performing explicit trajectory reconstruction or projecting onto a finite basis. We will revise Section 2 to include a detailed argument establishing identifiability under the chosen regularization, along with an analysis of how the regularization parameter affects depth values and preserves the infinite-dimensional character of the underlying process. revision: yes
-
Referee: [Section 3] Theoretical properties (Section 3): the manuscript states that several basic properties of CRHD are established, yet the provided text does not contain the derivations or the precise regularity conditions under which the infimum over directions is attained and the depth is monotone or convex. Without these details it is difficult to assess whether the properties survive the regularization step or whether they reduce to known properties of unconditional halfspace depth after the conditional law is approximated.
Authors: We agree that the derivations and regularity conditions should be presented more explicitly. In the revised manuscript we will supply complete proofs for the properties of CRHD, including conditions ensuring the infimum is attained and establishing monotonicity and convexity. These proofs will explicitly address the effect of regularization and demonstrate that the properties do not reduce to those of the unconditional halfspace depth. revision: yes
-
Referee: [Section 4] Estimation procedure (Section 4): the numerical implementation must rely on an estimator of the conditional halfspace probability. The manuscript should clarify whether this estimator is constructed from a finite sample of sparsely observed curves and whether consistency or rates are proved; if the estimator implicitly reconstructs trajectories via the same regularization used in the definition, the advantage over existing reconstruction-based depths requires explicit comparison on the same sparse regimes.
Authors: The estimator is constructed directly from finite samples of sparsely observed curves via empirical conditional probabilities under the regularization. We will clarify this construction in the revised Section 4, add consistency results with convergence rates, and include explicit numerical comparisons against reconstruction-based depths in the same sparse observation regimes to quantify the practical advantage. revision: yes
Circularity Check
No significant circularity; CRHD definition is a direct construction from conditional probabilities without reduction to fitted inputs or self-citations
full rationale
The paper defines CRHD explicitly as the infimum of conditional halfspace probabilities of the underlying trajectory given sparse measurements. This is presented as a novel depth notion enabling direct evaluation without reconstruction. No equations or sections in the provided abstract or description show the definition reducing by construction to its own inputs, fitted parameters renamed as predictions, or load-bearing self-citations. The regularization is invoked to make the conditional law tractable, but the derivation chain remains independent of the target result itself. This is the common honest non-finding for papers introducing new depth measures via explicit probabilistic definitions.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
CRHD is defined as the infimum of conditional halfspace probabilities of the underlying trajectory given the observed sparse measurements
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
regularized halfspace depth (RHD) ... restricting the projection direction set ... to directions with RKHS norms less than lambda
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Proceedings of the International Congress of Mathematicians, Vancouver, 1975 , volume=
Mathematics and the picturing of data , author=. Proceedings of the International Congress of Mathematicians, Vancouver, 1975 , volume=
work page 1975
-
[2]
Exploratory Data Analysis (Limited Preliminary Edition) , author=. 1970 , volume=
work page 1970
- [3]
- [4]
-
[5]
Linear Operators, Part 1: General Theory , author=. 1988 , publisher=
work page 1988
- [6]
-
[7]
Handbook of Analysis and its Foundations , author=. 1996 , publisher=
work page 1996
- [8]
-
[9]
Nonparametrics: Statistical Methods Based on Ranks , author=. 2006 , publisher=
work page 2006
-
[10]
Weak Convergence and Empirical Processes: With Applications to Statistics , author=. 1996 , publisher=
work page 1996
- [11]
-
[12]
Reproducing Kernel Hilbert Spaces in Probability and Statistics , author=. 2004 , publisher=
work page 2004
-
[13]
Applied Functional Data Analysis: Methods and Case Studies , author=. 2002 , publisher=
work page 2002
- [14]
- [15]
- [16]
- [17]
-
[18]
Functional Analysis, Sobolev Spaces and Partial Differential Equations , author=. 2010 , publisher=
work page 2010
-
[19]
Inference for Functional Data with Applications , author=. 2012 , publisher=
work page 2012
-
[20]
Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators , author=. 2015 , publisher=
work page 2015
-
[21]
Introduction to functional data analysis , author=. 2017 , publisher=
work page 2017
-
[22]
Nonparametric Statistical Inference: Revised and Expanded , author=. 2014 , publisher=
work page 2014
- [23]
-
[24]
An Introduction to Multivariate Statistical Analysis , author=. 2003 , publisher=
work page 2003
- [25]
-
[26]
Principal components analysis of sampled functions , author=. Psychometrika , volume=. 1986 , publisher=
work page 1986
-
[27]
The 1988 National Maternal and Infant Health Survey: design, content, and data availability , author=. Birth , volume=. 1991 , publisher=
work page 1988
-
[28]
Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=
Some tools for functional data analysis , author=. Journal of the Royal Statistical Society Series B: Statistical Methodology , volume=. 1991 , publisher=
work page 1991
-
[29]
Journal of the Royal Statistical Society: Series B (Methodological) , volume=
Estimating the mean and covariance structure nonparametrically when the data are curves , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1991 , publisher=
work page 1991
-
[30]
Journal of the Royal Statistical Society: Series B (Methodological) , volume=
Incorporating parametric effects into functional principal components analysis , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1995 , publisher=
work page 1995
-
[31]
Annals of Statistics , volume=
Smoothed functional principal components analysis by choice of norm , author=. Annals of Statistics , volume=. 1996 , publisher=
work page 1996
-
[32]
Asymptotics for the Tukey depth process, with an application to a multivariate trimmed mean , author=. Bernoulli , volume=. 2004 , publisher=
work page 2004
-
[33]
Journal of Computational and Graphical Statistics , volume=
Integrated Depths for Partially Observed Functional Data , author=. Journal of Computational and Graphical Statistics , volume=. 2023 , publisher=
work page 2023
-
[34]
A generalization of Fisher's z test , author=. Biometrika , volume=. 1938 , publisher=
work page 1938
-
[35]
Individual comparisons by ranking methods , author=. Biometrics , volume=
-
[36]
Annals of Mathematical Statistics , pages=
On a test of whether one of two random variables is stochastically larger than the other , author=. Annals of Mathematical Statistics , pages=. 1947 , publisher=
work page 1947
-
[37]
Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability , volume=
A generalized T test and measure of multivariate dispersion , author=. Proceedings of the Second Berkeley Symposium on Mathematical Statistics and Probability , volume=. 1951 , organization=
work page 1951
-
[38]
Annals of Mathematical Statistics , pages=
A nonparametric test for the several sample problem , author=. Annals of Mathematical Statistics , pages=. 1952 , publisher=
work page 1952
-
[39]
Journal of the American Statistical Association , volume=
Use of ranks in one-criterion variance analysis , author=. Journal of the American Statistical Association , volume=. 1952 , publisher=
work page 1952
-
[40]
Journal of Approximation Theory , volume =
Convergence rates of certain approximate solutions to. Journal of Approximation Theory , volume =. 1973 , issn =
work page 1973
-
[41]
Mathematische Operationsforschung und Statistik , volume=
Kleffe, J. Mathematische Operationsforschung und Statistik , volume=. 1973 , publisher=
work page 1973
-
[42]
SIAM Journal on Mathematical Analysis , volume=
Generalized inverses in reproducing kernel spaces: An approach to regularization of linear operator equations , author=. SIAM Journal on Mathematical Analysis , volume=. 1974 , publisher=
work page 1974
-
[43]
Principal modes of variation for processes with continuous sample curves , author=. Technometrics , volume=. 1986 , publisher=
work page 1986
-
[44]
On a notion of data depth based on random simplices , author=. Annals of Statistics , pages=. 1990 , publisher=
work page 1990
-
[45]
L1-statistical analysis and related methods , pages=
Data depth and multivariate rank tests , author=. L1-statistical analysis and related methods , pages=. 1992 , publisher=
work page 1992
-
[46]
Annals of Statistics , volume=
Breakdown properties of location estimates based on halfspace depth and projected outlyingness , author=. Annals of Statistics , volume=. 1992 , publisher=
work page 1992
-
[47]
Journal of the American Statistical Association , volume=
A quality index based on data depth and multivariate rank tests , author=. Journal of the American Statistical Association , volume=. 1993 , publisher=
work page 1993
-
[48]
Journal of Multivariate Analysis , volume=
Bounds for the breakdown point of the simplicial median , author=. Journal of Multivariate Analysis , volume=. 1995 , publisher=
work page 1995
-
[49]
Journal of Multivariate Analysis , volume=
A characterization of halfspace depth , author=. Journal of Multivariate Analysis , volume=. 1996 , publisher=
work page 1996
-
[50]
Balanced confidence regions based on
Yeh, Arthur B and Singh, Kesar , journal=. Balanced confidence regions based on. 1997 , publisher=
work page 1997
-
[51]
Journal of the American Statistical Association , volume=
Notions of limiting P values based on data depth and bootstrap , author=. Journal of the American Statistical Association , volume=. 1997 , publisher=
work page 1997
-
[52]
The Journals of Gerontology Series A: Biological Sciences and Medical Sciences , volume=
Relationship of age patterns of fecundity to mortality, longevity, and lifetime reproduction in a large cohort of Mediterranean fruit fly females , author=. The Journals of Gerontology Series A: Biological Sciences and Medical Sciences , volume=. 1998 , publisher=
work page 1998
-
[53]
National Maternal and Infant Health Survey: methods and response characteristics , author=. Vital Health Stat , volume=. 1998 , publisher=
work page 1998
-
[54]
Journal of the American Statistical Association , volume=
Regression depth , author=. Journal of the American Statistical Association , volume=. 1999 , publisher=
work page 1999
-
[55]
Multivariate L-estimation , author=. Test , volume=. 1999 , publisher=
work page 1999
-
[56]
American Statistician , volume=
The bagplot: a bivariate boxplot , author=. American Statistician , volume=. 1999 , publisher=
work page 1999
-
[57]
Bergmann, Reinhard and Ludbrook, John and Spooren, Will PJM , journal=. Different outcomes of the. 2000 , publisher=
work page 2000
-
[58]
Annals of Statistics , volume=
General notions of statistical depth function , author=. Annals of Statistics , volume=. 2000 , publisher=
work page 2000
-
[59]
Trimmed means for functional data , author=. Test , volume=. 2001 , publisher=
work page 2001
-
[60]
V. A proof of the. The American mathematical monthly , volume=. 2003 , publisher=
work page 2003
-
[61]
Journal of the American Statistical Association , volume=
Clustering for sparsely sampled functional data , author=. Journal of the American Statistical Association , volume=. 2003 , publisher=
work page 2003
-
[62]
Allgemeines Statistisches Archiv , VOLUME =
Dyckerhoff, Rainer , TITLE =. Allgemeines Statistisches Archiv , VOLUME =. 2004 , NUMBER =
work page 2004
-
[63]
Journal of Multivariate Analysis , volume=
Clustering and classification based on the L1 data depth , author=. Journal of Multivariate Analysis , volume=. 2004 , publisher=
work page 2004
-
[64]
Scandinavian Journal of Statistics , volume=
Functional modelling and classification of longitudinal data , author=. Scandinavian Journal of Statistics , volume=. 2005 , publisher=
work page 2005
-
[65]
On data depth and distribution-free discriminant analysis using separating surfaces , author=. Bernoulli , volume=. 2005 , publisher=
work page 2005
-
[66]
Annals of Statistics , volume=
Functional linear regression analysis for longitudinal data , author=. Annals of Statistics , volume=
-
[67]
Journal of the American Statistical Association , volume=
Functional data analysis for sparse longitudinal data , author=. Journal of the American Statistical Association , volume=. 2005 , publisher=
work page 2005
-
[68]
DIMACS Series in Discrete Mathematics and Theoretical Computer Science , volume=
Depth-based classification for functional data , author=. DIMACS Series in Discrete Mathematics and Theoretical Computer Science , volume=. 2006 , publisher=
work page 2006
-
[69]
Annals of Statistics , volume=
On the limiting distributions of multivariate depth-based rank sum statistics and related tests , author=. Annals of Statistics , volume=. 2006 , publisher=
work page 2006
-
[70]
Annals of Statistics , volume=
Multidimensional trimming based on projection depth , author=. Annals of Statistics , volume=. 2006 , publisher=
work page 2006
-
[71]
Journal of Multivariate Analysis , volume=
Asymptotic distributions of nonparametric regression estimators for longitudinal or functional data , author=. Journal of Multivariate Analysis , volume=. 2007 , publisher=
work page 2007
-
[72]
Valentin V. Petrov , keywords =. On lower bounds for tail probabilities , journal =. 2007 , note =
work page 2007
-
[73]
Annals of Statistics , volume=
Methodology and convergence rates for functional linear regression , author=. Annals of Statistics , volume=. 2007 , publisher=
work page 2007
-
[74]
Probability Theory and Related Fields , volume=
Cardot, Herv. Probability Theory and Related Fields , volume=. 2007 , publisher=
work page 2007
-
[75]
Computational Statistics , volume=
Robust estimation and classification for functional data via projection-based depth notions , author=. Computational Statistics , volume=. 2007 , publisher=
work page 2007
-
[76]
Journal of Theoretical Probability , volume=
A sharp form of the Cramer--Wold theorem , author=. Journal of Theoretical Probability , volume=. 2007 , publisher=
work page 2007
-
[77]
Cuesta-Albertos, Juan Antonio and Nieto-Reyes, Alicia , journal=. The random. 2008 , publisher=
work page 2008
-
[78]
Statistics & Probability Letters , volume=
Principal points and elliptical distributions from the multivariate setting to the functional case , author=. Statistics & Probability Letters , volume=. 2009 , publisher=
work page 2009
-
[79]
Journal of the American statistical Association , volume=
On the concept of depth for functional data , author=. Journal of the American statistical Association , volume=. 2009 , publisher=
work page 2009
-
[80]
Journal of the American Statistical Association , volume=
Estimating derivatives for samples of sparsely observed functions, with application to online auction dynamics , author=. Journal of the American Statistical Association , volume=. 2009 , publisher=
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.