Data-driven modeling of multivariate stochastic trajectories -- Application to water waves
Pith reviewed 2026-05-16 22:16 UTC · model grok-4.3
The pith
Functional principal components combined with vine copulas and conditional tail models generate joint stochastic trajectories for water wave variables.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Reducing each wave trajectory segment to functional principal components, then modeling the joint distribution of those components with a non-parametric vine copula for typical values and the Heffernan-Tawn conditional framework for the tail, produces a stochastic model for free-surface slope, normal velocity, and vertical acceleration that incorporates an acceleration-based wave-breaking filter and requires only three hyperparameters for calibration.
What carries the argument
Functional Principal Component Analysis for trajectory reduction, paired with vine copulas for bulk dependence and the Heffernan-Tawn conditional modeling framework for extremes.
If this is right
- The calibrated model reproduces distributions of response variables derived from the three kinematic quantities.
- Synthetic trajectories can be generated that respect both typical and extreme joint statistics.
- The acceleration variable enforces exclusion of breaking-wave samples during generation.
- Stepwise tuning of the three hyperparameters allows practical adjustment without exhaustive search.
Where Pith is reading between the lines
- The same reduction-plus-copula pipeline could be tested on other multivariate fluid trajectories such as those arising in breaking-wave turbulence.
- Hybrid use with deterministic wave solvers might enable efficient sampling of rare load events for offshore structure design.
- Application to laboratory wave-tank measurements would check whether numerical artifacts in the training data affect the extracted components.
Load-bearing premise
The functional principal components extracted from observed trajectories capture enough variability for accurate joint tail modeling, and the vine copula plus Heffernan-Tawn combination faithfully reproduces the dependence structure without bias introduced by the feature reduction.
What would settle it
Generating many synthetic trajectories and comparing their joint tail distributions of slope, velocity, and acceleration against an independent hold-out portion of the DeRisk database; significant mismatch in the extremes would falsify the modeling claim.
Figures
read the original abstract
A data-driven methodology is proposed to model the distribution of multivariate stochastic trajectories from an observed sample. As a first step, each trajectory in the sample is reduced to a vector of features by means of Functional Principal Component Analysis. Next, the joint distribution of features is modeled using (i) a non-parametric vine copula approach for the bulk of the distribution, and (ii) the conditional modeling framework of Heffernan and Tawn (2004) for the multivariate tail. The method is applied to the modeling of water waves. The dataset used is the DeRisk database, which consists of numerical simulations of water waves. The analysis is restricted to the portion of the wave period between the free-surface zero-upcrossing and the wave crest. The kinematic variables considered are the free-surface slope, the normal component of the fluid velocity at the free surface, and the vertical Lagrangian acceleration of the fluid at the free surface. The stochastic trajectories of these three variables are modeled jointly. The vertical Lagrangian acceleration of the fluid is employed to enforce a wave-breaking filter in the stochastic model. The number of hyperparameters in the stochastic framework is reduced to three, and a stepwise calibration strategy is proposed for their adjustment. The capabilities of the model are illustrated by predicting the distributions of selected response variables and by generating synthetic trajectories.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a data-driven method for modeling multivariate stochastic trajectories: each observed trajectory is reduced to a feature vector via Functional Principal Component Analysis (FPCA), the joint distribution of features is modeled with a non-parametric vine copula for the bulk and the Heffernan-Tawn conditional framework for the multivariate tails, and the approach is applied to joint modeling of free-surface slope, normal fluid velocity, and vertical Lagrangian acceleration over the zero-upcrossing-to-crest portion of waves in the DeRisk database. A wave-breaking filter is enforced via the acceleration variable, the framework is reduced to three hyperparameters with a proposed stepwise calibration, and capabilities are illustrated via predicted response distributions and generated synthetic trajectories.
Significance. If the FPCA reduction preserves the necessary joint tail dependence and the vine-copula/Heffernan-Tawn construction plus breaking filter produces unbiased distributions, the method would provide a practical, low-hyperparameter route to generating realistic multivariate wave kinematics for engineering risk assessment. The explicit reduction to three calibrated hyperparameters and the use of established tail-modeling tools constitute clear strengths; however, significance hinges on whether the data-driven pipeline demonstrably outperforms simpler alternatives on tail statistics.
major comments (3)
- [§3] §3 (FPCA step): because FPCA is a variance-maximizing linear projection, it is not guaranteed to retain the low-probability joint tail structure among the three kinematic variables that is required for unbiased Heffernan-Tawn conditional modeling; the manuscript must show (e.g., via tail-dependence coefficients or quantile-quantile plots of extremes before and after reduction) that the retained principal components preserve the dependence needed for the claimed synthetic-trajectory accuracy.
- [§5] §5 (validation): the claimed predictive capability is illustrated only qualitatively; quantitative metrics (Kolmogorov-Smirnov distances, coverage of empirical tails, or comparison against a baseline such as direct multivariate kernel density estimation or a Gaussian copula) are absent, so it is impossible to judge whether the three-hyperparameter model reproduces the DeRisk response distributions within acceptable error.
- [§4.2] §4.2 (breaking filter): the precise mechanism by which the vertical Lagrangian acceleration threshold is imposed inside the stochastic generator is not specified (e.g., rejection sampling, conditional truncation, or post-processing), leaving open the possibility that the filter distorts the joint distribution of the other two variables in a way that is not accounted for in the three-hyperparameter calibration.
minor comments (2)
- [§2] The notation for the three hyperparameters (and the stepwise calibration procedure) should be introduced with explicit symbols and a clear algorithmic outline in §2 to improve reproducibility.
- [§5] Figure captions in §5 should state the exact subset of the DeRisk database (number of waves, period range, breaking criterion) used for both calibration and validation.
Simulated Author's Rebuttal
We thank the referee for the constructive comments that highlight important aspects of our methodology. We address each major comment below and will incorporate the suggested clarifications and additions in the revised manuscript.
read point-by-point responses
-
Referee: [§3] §3 (FPCA step): because FPCA is a variance-maximizing linear projection, it is not guaranteed to retain the low-probability joint tail structure among the three kinematic variables that is required for unbiased Heffernan-Tawn conditional modeling; the manuscript must show (e.g., via tail-dependence coefficients or quantile-quantile plots of extremes before and after reduction) that the retained principal components preserve the dependence needed for the claimed synthetic-trajectory accuracy.
Authors: We agree that explicit verification of tail preservation is necessary. Although FPCA prioritizes variance, the leading components in our wave-kinematics application capture the dominant functional modes that include extreme behavior. In the revision we will add upper-tail dependence coefficients and quantile-quantile plots of the extreme quantiles (e.g., above the 95th percentile) comparing the original feature vectors to those reconstructed from the retained principal components, thereby confirming that the joint tail structure required for the Heffernan-Tawn step is adequately retained. revision: yes
-
Referee: [§5] §5 (validation): the claimed predictive capability is illustrated only qualitatively; quantitative metrics (Kolmogorov-Smirnov distances, coverage of empirical tails, or comparison against a baseline such as direct multivariate kernel density estimation or a Gaussian copula) are absent, so it is impossible to judge whether the three-hyperparameter model reproduces the DeRisk response distributions within acceptable error.
Authors: We accept that quantitative metrics are required for a rigorous assessment. The revised manuscript will report Kolmogorov-Smirnov distances for the marginal and selected joint response distributions, empirical tail coverage probabilities at the 95 % and 99 % levels, and direct comparisons against a Gaussian copula baseline as well as a multivariate kernel density estimator (where feasible given dimensionality). These additions will allow readers to evaluate the three-hyperparameter model’s accuracy on the DeRisk data. revision: yes
-
Referee: [§4.2] §4.2 (breaking filter): the precise mechanism by which the vertical Lagrangian acceleration threshold is imposed inside the stochastic generator is not specified (e.g., rejection sampling, conditional truncation, or post-processing), leaving open the possibility that the filter distorts the joint distribution of the other two variables in a way that is not accounted for in the three-hyperparameter calibration.
Authors: The breaking constraint is enforced by rejection sampling: feature vectors are first sampled from the fitted vine-copula/Heffernan-Tawn model, trajectories are reconstructed, and any realization whose vertical Lagrangian acceleration exceeds the breaking threshold is discarded. This post-generation filter leaves the three calibrated hyperparameters unchanged. We will describe the procedure explicitly in the revised Section 4.2 and add a short discussion of any resulting effect on the joint distributions of slope and normal velocity. revision: yes
Circularity Check
No circularity: data-driven FPCA + vine-copula + Heffernan-Tawn pipeline is independent of its target predictions
full rationale
The paper reduces observed trajectories from the external DeRisk database to feature vectors via FPCA, then fits a non-parametric vine copula to the bulk and Heffernan-Tawn conditional model to the tails, with an acceleration-based breaking filter and stepwise calibration of three hyperparameters. No equation or step equates the generated synthetic trajectories or predicted response distributions to the fitted inputs by construction. The central claims rest on the external data and standard statistical constructions rather than self-definition, self-citation load-bearing, or renaming of known results. The derivation chain is therefore self-contained.
Axiom & Free-Parameter Ledger
free parameters (1)
- three hyperparameters
axioms (2)
- domain assumption Functional principal components extracted from the trajectories capture the essential joint variability of the three kinematic variables.
- domain assumption The vine copula and Heffernan-Tawn conditional model together represent the true joint distribution and tails of the feature vectors.
Reference graph
Works this paper leans on
-
[1]
A. Naess, Prediction of extremes related to the second-order, sum-frequency response of a TLP, in: ISOPE International Ocean and Polar Engineering Conference, ISOPE, 1992, pp. ISOPE–I. 20
work page 1992
-
[2]
Faltinsen, Sea loads on ships and offshore structures, Vol
O. Faltinsen, Sea loads on ships and offshore structures, Vol. 1, Cambridge university press, 1993
work page 1993
-
[3]
O. M. Faltinsen, J. Newman, T. Vinje, Nonlinear wave loads on a slender vertical cylinder, Journal of Fluid Mechanics 289 (1995) 179–198
work page 1995
-
[4]
E. E. Bachynski, T. Moan, Ringing loads on tension leg platform wind turbines, Ocean engi- neering 84 (2014) 237–248
work page 2014
-
[5]
P.Renaud, F.Hulin, A.Tassin, J.-F.Filipot, N.Jacques, Analysisofslammingloadsinducedby breaking waves on vertical cylinders using fully nonlinear wave kinematics and semi-analytical load model, Coastal Engineering (2025) 104898
work page 2025
-
[6]
A. K. Jha, S. R. Winterstein, Stochastic fatigue damage accumulation due to nonlinear ship loads, J. Offshore Mech. Arct. Eng. 122 (4) (2000) 253–259
work page 2000
-
[7]
Z. Li, J. W. Ringsberg, G. Storhaug, Time-domain fatigue assessment of ship side-shell struc- tures, International journal of fatigue 55 (2013) 276–290
work page 2013
-
[8]
G. Storhaug, The measured contribution of whipping and springing on the fatigue and extreme loadingofcontainervessels, InternationalJournalofNavalArchitectureandOceanEngineering 6 (4) (2014) 1096–1110.doi:10.2478/IJNAOE-2013-0233
-
[9]
R. Hascoët, N. Jacques, On the risk of fatigue failure of structural elements exposed to bottom wave slamming–impulse response regime, Applied Ocean Research 154 (2025) 104411
work page 2025
-
[10]
B. Kinsman, Wind waves: their generation and propagation on the ocean surface, Courier Corporation, 1984
work page 1984
-
[11]
M. J. Tucker, E. G. Pitt, Waves in ocean engineering, Vol. 5, Elsevier Ocean Engineering Series, 2001
work page 2001
-
[12]
M. K. Ochi, Ocean waves: the stochastic approach, Vol. 6, Cambridge University Press, 2005
work page 2005
-
[13]
L. H. Holthuijsen, Waves in Oceanic and Coastal Waters, Cambridge University Press, 2007. doi:10.1017/CBO9780511618536
-
[14]
M. S. Longuet-Higgins, On the statistical distribution of the heights of sea waves, Journal of Marine Research 11 (3)
-
[15]
G. Lindgren, I. Rychlik, Wave characteristic distributions for Gaussian waves?wave-length, amplitude and steepness, Ocean Engineering 9 (5) (1982) 411–432
work page 1982
- [16]
-
[17]
R. Hascoët, N. Raillard, N. Jacques, Effect of forward speed on the level-crossing distribution of kinematic variables in multidirectional ocean waves, Ocean Engineering 235 (2021) 109345. doi:10.1016/j.oceaneng.2021.109345
-
[18]
M. S. Longuet-Higgins, Modified Gaussian distributions for slightly nonlinear variables, Radio Sci. D 68 (1964) 1049–1062. 21
work page 1964
-
[19]
A. Naess, Statistical analysis of second-order response of marine structures, Journal of Ship Research 29 (04) (1985) 270–284
work page 1985
-
[20]
R. Langley, A statistical analysis of non-linear random waves, Ocean Engineering 14 (5) (1987) 389–407.doi:10.1016/0029-8018(87)90052-7
-
[21]
R. Hascoët, Level-crossing distributions of kinematic variables in multidirectional second-order ocean waves, Ocean Engineering 265 (2022) 112585.doi:10.1016/j.oceaneng.2022.112585
-
[22]
F. Pierella, O. Lindberg, H. Bredmose, H. B. Bingham, R. W. Read, A. P. Engsig-Karup, The DeRisk database: Extreme design waves for offshore wind turbines, Marine Structures 80 (2021) 103046.doi:10.1016/j.marstruc.2021.103046
-
[23]
J. E. Heffernan, J. A. Tawn, A conditional approach for multivariate extreme values (with discussion), Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66 (3) (2004) 497–546
work page 2004
-
[24]
A. Engsig-Karup, H. Bingham, O. Lindberg, An efficient flexible-order model for 3D nonlinear water waves, Journal of Computational Physics 228 (6) (2009) 2100–2118.doi:10.1016/j. jcp.2008.11.028
work page doi:10.1016/j 2009
-
[25]
H. Akima, A new method of interpolation and smooth curve fitting based on local procedures, Journal of the ACM (JACM) 17 (4) (1970) 589–602
work page 1970
-
[26]
H. Akima, A method of bivariate interpolation and smooth surface fitting based on local procedures, Communications of the ACM 17 (1) (1974) 18–20
work page 1974
-
[27]
The MathWorks, Inc.,makima: modified Akima piecewise cubic hermite interpolation,https: //blogs.mathworks.com/cleve/2019/04/29/makima-piecewise-cubic-interpolation/, introduced in MATLAB R2019b (2019)
work page 2019
-
[28]
J. O. Ramsay, B. W. Silverman, Functional Data Analysis, Springer New York, NY, 2006. doi:10.1007/b98888
- [29]
-
[30]
J. R. Hosking, L-moments: analysis and estimation of distributions using linear combinations of order statistics, Journal of the Royal Statistical Society Series B: Statistical Methodology 52 (1) (1990) 105–124
work page 1990
-
[31]
J. R. M. Hosking, J. R. Wallis, Regional Frequency Analysis: An Approach Based on L- Moments, Cambridge University Press, 1997
work page 1997
-
[32]
Talbot, Extreme values statistical analysis library, version 1.3.1, MATLAB Central File Exchange
G. Talbot, Extreme values statistical analysis library, version 1.3.1, MATLAB Central File Exchange
- [33]
-
[34]
Loader, Local regression and likelihood, Springer New York, NY, 1999.doi:10.1007/ b98858
C. Loader, Local regression and likelihood, Springer New York, NY, 1999.doi:10.1007/ b98858. 22
work page 1999
-
[35]
G. Geenens, A. Charpentier, D. Paindaveine, Probit transformation for nonparametric kernel estimation of the copula density, Bernoulli 23 (3) (2017) 1848–1873
work page 2017
-
[36]
T. Nagler, C. Schellhase, C. Czado, Nonparametric estimation of simplified vine copula models: comparison of methods, De Gruyter Open, Dependence Modeling 5 (1) (2017) 99–120.doi: 10.1515/demo-2017-0007
-
[37]
C. Keef, I. Papastathopoulos, J. A. Tawn, Estimation of the conditional distribution of a multivariate variable given that one of its components is large: Additional constraints for the Heffernan and Tawn model, Journal of Multivariate Analysis 115 (2013) 396–404.doi: 10.1016/j.jmva.2012.10.012
-
[38]
C. H. Wu, H. Nepf, Breaking criteria and energy losses for three-dimensional wave breaking, Journal of Geophysical Research: Oceans 107 (C10) (2002) 41–1
work page 2002
-
[39]
Babanin, Breaking and dissipation of ocean surface waves, Cambridge University Press, 2011
A. Babanin, Breaking and dissipation of ocean surface waves, Cambridge University Press, 2011
work page 2011
-
[40]
J. Caers, J.Beirlant, P. Vynckier, Bootstrap confidence intervals fortail indices, Computational statistics & data analysis 26 (3) (1998) 259–277
work page 1998
-
[41]
M. I. Gomes, O. Oliveira, The bootstrap methodology in statistics of extremes–choice of the optimal sample fraction, Extremes 4 (4) (2001) 331–358
work page 2001
-
[42]
B. Wang, S. N. Mishra, M. S. Mulekar, N. Mishra, K. Huang, Comparison of bootstrap and generalized bootstrap methods for estimating high quantiles, Journal of statistical planning and inference 140 (10) (2010) 2926–2935
work page 2010
-
[43]
Gilleland, Bootstrap methods for statistical inference
E. Gilleland, Bootstrap methods for statistical inference. Part II: Extreme-value analysis, Jour- nal of Atmospheric and Oceanic Technology 37 (11) (2020) 2135–2144
work page 2020
-
[44]
E. Mackay, P. Jonathan, Estimation of Environmental Contours Using a Block Resampling Method, Vol. Volume 2A: Structures, Safety, and Reliability of International Conference on Offshore Mechanics and Arctic Engineering, 2020.doi:10.1115/OMAE2020-18308
- [45]
-
[46]
J. Legrand, P. Ailliot, P. Naveau, N. Raillard, Joint stochastic simulation of extreme coastal and offshore significant wave heights, The Annals of Applied Statistics 17 (4) (2023) 3363–3383
work page 2023
-
[47]
T. Bedford, R. M. Cooke, Probability density decomposition for conditionally dependent ran- dom variables modeled by vines, Annals of Mathematics and Artificial intelligence 32 (1) (2001) 245–268
work page 2001
-
[48]
T. Bedford, R. M. Cooke, Vines–a new graphical model for dependent random variables, The Annals of statistics 30 (4) (2002) 1031–1068
work page 2002
-
[49]
K. Aas, C. Czado, A. Frigessi, H. Bakken, Pair-copula constructions of multiple dependence, Insurance: Mathematics and economics 44 (2) (2009) 182–198. 23
work page 2009
-
[50]
D. Kurowicka, H. Joe, Dependence modeling: vine copula handbook, World Scientific, 2010
work page 2010
-
[51]
T. Nagler, Kernel methods for vine copula estimation, Master’s thesis, Technische Universität München (2014)
work page 2014
- [52]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.