Statistical description and dimension reduction of continuous time categorical trajectories with multivariate functional principal components
Pith reviewed 2026-05-23 03:18 UTC · model grok-4.3
The pith
Categorical trajectories convert to binary indicators for multivariate functional principal components analysis.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By associating each state of a categorical trajectory with a binary random indicator function, the statistical description problem becomes a multivariate functional principal components analysis. The sample paths are piecewise constant with finitely many jumps, yet under continuity in probability the means and the multivariate covariance functions are continuous and admit interpretations in terms of departure from independence of the joint probabilities. The binary trajectories can be viewed as random elements in the Hilbert space of square integrable functions, and consistent estimators of the mean trajectories and covariance functions exist under weak regularity assumptions.
What carries the argument
Multivariate functional principal components analysis applied to the vector of binary indicator functions, one per categorical state.
If this is right
- Dimension reduction to a small number of principal components becomes available while retaining full information from the original categorical sequences.
- Multiple states observed at the same instant are represented without loss through the joint covariance structure.
- The principal components supply direct visual and numerical summaries of typical variation across trajectories.
- Estimation of means and covariances remains valid for any finite collection of observed paths under the stated regularity conditions.
Where Pith is reading between the lines
- The same indicator construction may apply to other piecewise-constant processes with finitely many jumps, such as certain point processes or regime-switching models.
- One could test whether the leading principal components recover known transition patterns in longitudinal categorical data.
- Rates of convergence for the estimators could be derived to guide sample-size requirements in applied settings.
Load-bearing premise
The 0-1 indicator trajectories must be continuous in probability so that their means and covariances remain continuous functions even though the observed paths jump.
What would settle it
A collection of categorical trajectories satisfying continuity in probability for which the estimated covariance function fails to converge to any continuous limit as the number of observed paths grows.
Figures
read the original abstract
Getting tools that allow simple representations and comparisons of a set of categorical trajectories is of major interest for statisticians. Without loosing any information, we associate to each state a binary random indicator function, taking values in $\{0,1\}$, and turn the problem of statistical description of the categorical trajectories into a multivariate functional principal components analysis. This viewpoint encompasses experimental frameworks where two or more states can be observed simultaneously. The sample paths being piecewise constant, with a finite number of jumps, this a rare case in functional data analysis in which the trajectories are not supposed to be continuous and can be observed exhaustively. Under the weak hypothesis assuming only continuity in probability of the $0-1$ trajectories, the means and the (multivariate) covariance function are continuous and have interpretations in terms of departure from independence of the joint probabilities. Considering a functional data point of view, we show that the binary trajectories, which are right-continuous functions with left-hand limits, can be seen as random elements in the Hilbert space of square integrable functions. The multivariate functional principal components are simple to interpret and we show that we can get consistent estimators of the mean trajectories and the covariance functions under weak regularity assumptions. The ability of the approach to represent categorical trajectories in a small dimension space is illustrated on a data set of sensory perceptions, considering different gustometer-controlled stimuli experiments.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes to represent continuous-time categorical trajectories by associating each state with a binary 0-1 indicator function, thereby recasting the problem as one of multivariate functional principal component analysis. Under the assumption that the indicator processes are continuous in probability, the authors claim that the mean functions equal the marginal probability trajectories and the cross-covariance functions equal the joint probabilities minus the product of the marginals (hence continuous and interpretable as departures from independence). They further assert that consistent estimators of the means and covariance functions exist under weak regularity assumptions, that the cadlag sample paths remain square-integrable elements of the relevant Hilbert space, and that the resulting MFPCA provides a low-dimensional representation, which is illustrated on sensory-perception data from gustometer experiments.
Significance. If the consistency claims hold, the work supplies a direct route for applying standard functional-data tools to categorical trajectories while preserving the possibility of simultaneous states and without requiring continuous sample paths. The probabilistic interpretation of the covariance operator as a measure of departure from independence is a clear interpretive strength, and the reliance on only continuity in probability plus square-integrability is notably weak. The empirical example on real sensory data demonstrates practical utility.
major comments (2)
- [Abstract] Abstract and the section stating the consistency result: the manuscript asserts existence of consistent estimators for the mean trajectories and the multivariate covariance function but supplies neither the explicit form of the estimators, a derivation of consistency, error bounds, nor any simulation evidence. Because this consistency statement is the central theoretical claim, the absence of supporting detail is load-bearing.
- [Theoretical framework] Section introducing the Hilbert-space embedding: while the cadlag paths are correctly noted to be square-integrable, the manuscript does not discuss whether the continuity-in-probability assumption is automatically satisfied for mutually exclusive categorical states or how it might be verified from data; this assumption is invoked to guarantee that the mean and covariance objects remain continuous and lie in the Hilbert space.
minor comments (3)
- [Abstract] Abstract: 'Without loosing any information' should read 'Without losing any information'.
- [Abstract] Abstract: 'this a rare case' should read 'this is a rare case'.
- The manuscript would benefit from a short simulation study (even a small one) that checks finite-sample behavior of the proposed estimators under the stated continuity-in-probability regime.
Simulated Author's Rebuttal
We thank the referee for their positive evaluation of the work's significance and for the constructive major comments. We address each point below, indicating planned revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract and the section stating the consistency result: the manuscript asserts existence of consistent estimators for the mean trajectories and the multivariate covariance function but supplies neither the explicit form of the estimators, a derivation of consistency, error bounds, nor any simulation evidence. Because this consistency statement is the central theoretical claim, the absence of supporting detail is load-bearing.
Authors: The estimators are the standard empirical mean functions and empirical cross-covariance functions of the observed multivariate indicator trajectories. Consistency in probability follows from the square-integrability of the cadlag paths together with continuity in probability, via standard uniform integrability arguments for functional data. We agree that the manuscript would benefit from greater explicitness on this central claim. In revision we will state the estimators explicitly, sketch the consistency argument, and add a small simulation study (in supplementary material) demonstrating finite-sample performance under the stated assumptions. revision: yes
-
Referee: [Theoretical framework] Section introducing the Hilbert-space embedding: while the cadlag paths are correctly noted to be square-integrable, the manuscript does not discuss whether the continuity-in-probability assumption is automatically satisfied for mutually exclusive categorical states or how it might be verified from data; this assumption is invoked to guarantee that the mean and covariance objects remain continuous and lie in the Hilbert space.
Authors: Continuity in probability is not an automatic consequence of mutual exclusivity; it follows from the cadlag property together with the finite number of jumps per trajectory (which implies that the probability of a discontinuity at any fixed t is zero). When states may co-occur, the same argument applies componentwise to each indicator process. For data-based verification we suggest inspecting the empirical marginal probability curves for visible jumps or estimating the probability of state changes in small time windows around observed times. We will insert a short clarifying paragraph on these points in the revised theoretical framework section. revision: yes
Circularity Check
No significant circularity; derivation self-contained in standard FDA
full rationale
The paper maps categorical trajectories to 0-1 indicator processes and applies multivariate FPCA. The central results (continuity of means/covariances under continuity-in-probability, consistency of estimators) follow directly from the stated weak hypothesis and standard Hilbert-space arguments for cadlag square-integrable paths; they do not reduce to fitted parameters or self-citations. No load-bearing self-citation chains, no ansatz smuggled via prior work, and no renaming of known results as new derivations. The approach rests on external, independently verifiable FDA theory rather than internal redefinition.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The 0-1 indicator trajectories are continuous in probability.
- domain assumption The trajectories are right-continuous with left limits and piecewise constant with finitely many jumps.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under the weak hypothesis assuming only continuity in probability of the 0-1 trajectories, the means and the (multivariate) covariance function are continuous...
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the binary trajectories... can be seen as random elements in the Hilbert space of square integrable functions
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Aitchison, J. (1983). Principal component analysis of compositional data. Biometrika , 70:57--65
work page 1983
-
[2]
Béno, N., Nicolle, L., and Visalli, M. (2023). A dataset of consumer perceptions of gustometer-controlled stimuli measured with three temporal sensory evaluation methods. Data in Brief , 48:109271
work page 2023
-
[3]
Cardot, H. and Frascolla, C. (2024). Hypothesis testing for panels of semi-markov processes with parametric sojourn time distributions. J. Stat. Plann. Inference , 228:59--79
work page 2024
-
[4]
Cardot, H., Frascolla, C., Schlich, P., and Visalli, M. (2019). Estimating finite mixtures of semi-markov chains: An application to the segmentation of temporal sensory data. J. R. Stat. Soc., Ser. C, Appl. Stat. , 68:1281--1303
work page 2019
-
[5]
Castura, J., Antunez, L., Gimenez, A., and Ares, G. (2016). Temporal check-all-that-apply (tcata): A novel dynamic method for characterizing products. Food Quality and Preference , 47A:79--90
work page 2016
-
[6]
Chiou, J., Chen, Y., and Yang, Y. (2014). Multivariate functional principal component analysis: A normalization approach. Statistica Sinica , 24:1571--1596
work page 2014
-
[7]
Dauxois, J., Pousse, A., and Romain, Y. (1982). Asymptotic theory for the principal component analysis of a vector random function: Some applications to statistical inference. Journal of Multivariate Analysis , 12:136--154
work page 1982
-
[8]
Deville, J. (1982). Analyse des données chronologiques qualitatives. Annales de l'INSEE , 45:45--104
work page 1982
-
[9]
Deville, J. and Saporta, G. (1980). Analyse harmonique qualitative. In Data Analysis and Informatics, Proc. Int. Symp., Versailles , pages 375--389
work page 1980
-
[10]
Gertheiss, J., Rügamer, D., Liew, B., and Greven, S. (2024). Functional data analysis: An introduction and recent developments. Biometrical Journal , 66:e202300363
work page 2024
-
[11]
Greenacre, M. (2021). Compositional data analysis. Annu. Rev. Stat. Appl. , 8:271--299
work page 2021
-
[12]
Happ, C. (2022). Mfpca: Multivariate functional principal component analysis. R package version 1.3-10
work page 2022
-
[13]
Happ, C. and Greven, S. (2018). Multivariate functional principal component analysis for data observed on different (dimensional) domains. J. Am. Stat. Assoc. , 113:649--659
work page 2018
-
[14]
Hsing, T. and Eubank, R. (2015). Theoretical Foundations of Functional Data Analysis, with an Introduction to Linear Operators . Wiley Series in Probability and Statistics. John Wiley & Sons
work page 2015
-
[15]
Koner, S. and Staicu, A. (2023). Second-generation functional data. Annu. Rev. Stat. Appl. , 10:547--572
work page 2023
-
[16]
Limnios, N. and Opri s an, G. (2001). Semi-Markov processes and reliability . Stat. Ind. Technol. Birkh \"a user, Basel
work page 2001
-
[17]
Lindsey, J. (2012). Statistical analysis of stochastic processes in time , volume 14 of Camb. Ser. Stat. Probab. Math. Cambridge University Press, Cambridge
work page 2012
-
[18]
Peltier, C., Visalli, M., Schlich, P., and Cardot, H. (2023). Analyzing temporal dominance of sensations data with categorical functional data analysis. Food Quality and Preference , 109
work page 2023
-
[19]
Pineau, N., Schlich, P., Cordelle, S., Mathonnière, C., Issanchou, S., and Imbert, A. (2009). Temporal dominance of sensations: Construction of the tds curves and comparison with time-intensity. Food Quality and Preference , 20:450--455
work page 2009
-
[20]
Preda, C., Grimonprez, Q., and Vandewalle, V. (2021). Categorical functional data analysis. the cfda r package. Mathematics , 9(23):3074
work page 2021
-
[21]
R: A Language and Environment for Statistical Computing
R Core Team (2024). R: A Language and Environment for Statistical Computing . R Foundation for Statistical Computing, Vienna, Austria
work page 2024
-
[22]
Ramsay, J. O. and Silverman, B. W. (2005). Functional data analysis. Springer Ser. Stat. New York, NY: Springer, 2nd ed. edition
work page 2005
-
[23]
Serfling, R. (1980). Approximation theorems of mathematical statistics . Wiley Ser. Probab. Math. Stat. John Wiley & Sons, Hoboken, NJ
work page 1980
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.