pith. sign in

arxiv: 2512.11948 · v2 · pith:PADXWFXXnew · submitted 2025-12-12 · ⚛️ physics.flu-dyn · physics.data-an

Data-driven modeling of multivariate stochastic trajectories -- Application to water waves

Pith reviewed 2026-05-16 22:16 UTC · model grok-4.3

classification ⚛️ physics.flu-dyn physics.data-an
keywords stochastic trajectoriesfunctional principal component analysisvine copulaswater wavesmultivariate modelingwave breakingHeffernan-Tawndata-driven modeling
0
0 comments X

The pith

Functional principal components combined with vine copulas and conditional tail models generate joint stochastic trajectories for water wave variables.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a data-driven method to model the joint distribution of multiple kinematic variables along individual wave trajectories. Observed trajectories are first reduced to a small set of functional principal components, after which a vine copula captures dependence in the bulk of the distribution while the Heffernan-Tawn framework handles the multivariate extremes. Vertical Lagrangian acceleration is used as a filter to exclude breaking waves from generated samples. The approach limits the model to three hyperparameters that are adjusted through stepwise calibration. This construction allows the production of synthetic trajectories whose statistics match those extracted from large numerical wave databases.

Core claim

Reducing each wave trajectory segment to functional principal components, then modeling the joint distribution of those components with a non-parametric vine copula for typical values and the Heffernan-Tawn conditional framework for the tail, produces a stochastic model for free-surface slope, normal velocity, and vertical acceleration that incorporates an acceleration-based wave-breaking filter and requires only three hyperparameters for calibration.

What carries the argument

Functional Principal Component Analysis for trajectory reduction, paired with vine copulas for bulk dependence and the Heffernan-Tawn conditional modeling framework for extremes.

If this is right

  • The calibrated model reproduces distributions of response variables derived from the three kinematic quantities.
  • Synthetic trajectories can be generated that respect both typical and extreme joint statistics.
  • The acceleration variable enforces exclusion of breaking-wave samples during generation.
  • Stepwise tuning of the three hyperparameters allows practical adjustment without exhaustive search.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same reduction-plus-copula pipeline could be tested on other multivariate fluid trajectories such as those arising in breaking-wave turbulence.
  • Hybrid use with deterministic wave solvers might enable efficient sampling of rare load events for offshore structure design.
  • Application to laboratory wave-tank measurements would check whether numerical artifacts in the training data affect the extracted components.

Load-bearing premise

The functional principal components extracted from observed trajectories capture enough variability for accurate joint tail modeling, and the vine copula plus Heffernan-Tawn combination faithfully reproduces the dependence structure without bias introduced by the feature reduction.

What would settle it

Generating many synthetic trajectories and comparing their joint tail distributions of slope, velocity, and acceleration against an independent hold-out portion of the DeRisk database; significant mismatch in the extremes would falsify the modeling claim.

Figures

Figures reproduced from arXiv: 2512.11948 by Romain Hasco\"et.

Figure 1
Figure 1. Figure 1: Parameter space of sea states covered by the DeRisk simulations. Gray dots indicate individual simulation [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Extraction of water-entry trajectories. The “clouds” of thin grey lines show the full dataset of stochastic [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Samples of stochastic trajectories for un and s. The first two rows show samples for sea state #1, while the last row show samples for sea state #3. The trajectory sample extracted from the DeRisk dataset is labeled as "true sample". The remaining samples (nine for sea state #1 and four for sea state #3) are independent synthetic samples generated by the stochastic model. Each synthetic sample contains the… view at source ↗
Figure 4
Figure 4. Figure 4: Marginal distributions of the response variables, [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Bootstrap histograms of the (1 − 1/n)-quantiles of the response variables. The first and second rows show results for sea state #1 (with the wave-breaking filter enabled) and sea state #3, respectively. The histograms were generated from 400 bootstrap samples. original dataset are indicated by vertical dotted lines. Across the bootstrap distributions shown in [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗
read the original abstract

A data-driven methodology is proposed to model the distribution of multivariate stochastic trajectories from an observed sample. As a first step, each trajectory in the sample is reduced to a vector of features by means of Functional Principal Component Analysis. Next, the joint distribution of features is modeled using (i) a non-parametric vine copula approach for the bulk of the distribution, and (ii) the conditional modeling framework of Heffernan and Tawn (2004) for the multivariate tail. The method is applied to the modeling of water waves. The dataset used is the DeRisk database, which consists of numerical simulations of water waves. The analysis is restricted to the portion of the wave period between the free-surface zero-upcrossing and the wave crest. The kinematic variables considered are the free-surface slope, the normal component of the fluid velocity at the free surface, and the vertical Lagrangian acceleration of the fluid at the free surface. The stochastic trajectories of these three variables are modeled jointly. The vertical Lagrangian acceleration of the fluid is employed to enforce a wave-breaking filter in the stochastic model. The number of hyperparameters in the stochastic framework is reduced to three, and a stepwise calibration strategy is proposed for their adjustment. The capabilities of the model are illustrated by predicting the distributions of selected response variables and by generating synthetic trajectories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes a data-driven method for modeling multivariate stochastic trajectories: each observed trajectory is reduced to a feature vector via Functional Principal Component Analysis (FPCA), the joint distribution of features is modeled with a non-parametric vine copula for the bulk and the Heffernan-Tawn conditional framework for the multivariate tails, and the approach is applied to joint modeling of free-surface slope, normal fluid velocity, and vertical Lagrangian acceleration over the zero-upcrossing-to-crest portion of waves in the DeRisk database. A wave-breaking filter is enforced via the acceleration variable, the framework is reduced to three hyperparameters with a proposed stepwise calibration, and capabilities are illustrated via predicted response distributions and generated synthetic trajectories.

Significance. If the FPCA reduction preserves the necessary joint tail dependence and the vine-copula/Heffernan-Tawn construction plus breaking filter produces unbiased distributions, the method would provide a practical, low-hyperparameter route to generating realistic multivariate wave kinematics for engineering risk assessment. The explicit reduction to three calibrated hyperparameters and the use of established tail-modeling tools constitute clear strengths; however, significance hinges on whether the data-driven pipeline demonstrably outperforms simpler alternatives on tail statistics.

major comments (3)
  1. [§3] §3 (FPCA step): because FPCA is a variance-maximizing linear projection, it is not guaranteed to retain the low-probability joint tail structure among the three kinematic variables that is required for unbiased Heffernan-Tawn conditional modeling; the manuscript must show (e.g., via tail-dependence coefficients or quantile-quantile plots of extremes before and after reduction) that the retained principal components preserve the dependence needed for the claimed synthetic-trajectory accuracy.
  2. [§5] §5 (validation): the claimed predictive capability is illustrated only qualitatively; quantitative metrics (Kolmogorov-Smirnov distances, coverage of empirical tails, or comparison against a baseline such as direct multivariate kernel density estimation or a Gaussian copula) are absent, so it is impossible to judge whether the three-hyperparameter model reproduces the DeRisk response distributions within acceptable error.
  3. [§4.2] §4.2 (breaking filter): the precise mechanism by which the vertical Lagrangian acceleration threshold is imposed inside the stochastic generator is not specified (e.g., rejection sampling, conditional truncation, or post-processing), leaving open the possibility that the filter distorts the joint distribution of the other two variables in a way that is not accounted for in the three-hyperparameter calibration.
minor comments (2)
  1. [§2] The notation for the three hyperparameters (and the stepwise calibration procedure) should be introduced with explicit symbols and a clear algorithmic outline in §2 to improve reproducibility.
  2. [§5] Figure captions in §5 should state the exact subset of the DeRisk database (number of waves, period range, breaking criterion) used for both calibration and validation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments that highlight important aspects of our methodology. We address each major comment below and will incorporate the suggested clarifications and additions in the revised manuscript.

read point-by-point responses
  1. Referee: [§3] §3 (FPCA step): because FPCA is a variance-maximizing linear projection, it is not guaranteed to retain the low-probability joint tail structure among the three kinematic variables that is required for unbiased Heffernan-Tawn conditional modeling; the manuscript must show (e.g., via tail-dependence coefficients or quantile-quantile plots of extremes before and after reduction) that the retained principal components preserve the dependence needed for the claimed synthetic-trajectory accuracy.

    Authors: We agree that explicit verification of tail preservation is necessary. Although FPCA prioritizes variance, the leading components in our wave-kinematics application capture the dominant functional modes that include extreme behavior. In the revision we will add upper-tail dependence coefficients and quantile-quantile plots of the extreme quantiles (e.g., above the 95th percentile) comparing the original feature vectors to those reconstructed from the retained principal components, thereby confirming that the joint tail structure required for the Heffernan-Tawn step is adequately retained. revision: yes

  2. Referee: [§5] §5 (validation): the claimed predictive capability is illustrated only qualitatively; quantitative metrics (Kolmogorov-Smirnov distances, coverage of empirical tails, or comparison against a baseline such as direct multivariate kernel density estimation or a Gaussian copula) are absent, so it is impossible to judge whether the three-hyperparameter model reproduces the DeRisk response distributions within acceptable error.

    Authors: We accept that quantitative metrics are required for a rigorous assessment. The revised manuscript will report Kolmogorov-Smirnov distances for the marginal and selected joint response distributions, empirical tail coverage probabilities at the 95 % and 99 % levels, and direct comparisons against a Gaussian copula baseline as well as a multivariate kernel density estimator (where feasible given dimensionality). These additions will allow readers to evaluate the three-hyperparameter model’s accuracy on the DeRisk data. revision: yes

  3. Referee: [§4.2] §4.2 (breaking filter): the precise mechanism by which the vertical Lagrangian acceleration threshold is imposed inside the stochastic generator is not specified (e.g., rejection sampling, conditional truncation, or post-processing), leaving open the possibility that the filter distorts the joint distribution of the other two variables in a way that is not accounted for in the three-hyperparameter calibration.

    Authors: The breaking constraint is enforced by rejection sampling: feature vectors are first sampled from the fitted vine-copula/Heffernan-Tawn model, trajectories are reconstructed, and any realization whose vertical Lagrangian acceleration exceeds the breaking threshold is discarded. This post-generation filter leaves the three calibrated hyperparameters unchanged. We will describe the procedure explicitly in the revised Section 4.2 and add a short discussion of any resulting effect on the joint distributions of slope and normal velocity. revision: yes

Circularity Check

0 steps flagged

No circularity: data-driven FPCA + vine-copula + Heffernan-Tawn pipeline is independent of its target predictions

full rationale

The paper reduces observed trajectories from the external DeRisk database to feature vectors via FPCA, then fits a non-parametric vine copula to the bulk and Heffernan-Tawn conditional model to the tails, with an acceleration-based breaking filter and stepwise calibration of three hyperparameters. No equation or step equates the generated synthetic trajectories or predicted response distributions to the fitted inputs by construction. The central claims rest on the external data and standard statistical constructions rather than self-definition, self-citation load-bearing, or renaming of known results. The derivation chain is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The method rests on standard statistical assumptions for functional PCA and copula modeling plus the applicability of the Heffernan-Tawn tail framework to wave data; three hyperparameters are introduced and calibrated from the sample.

free parameters (1)
  • three hyperparameters
    The stochastic framework is reduced to three adjustable values whose calibration is performed stepwise on the DeRisk data.
axioms (2)
  • domain assumption Functional principal components extracted from the trajectories capture the essential joint variability of the three kinematic variables.
    Invoked when reducing each trajectory to a feature vector before copula modeling.
  • domain assumption The vine copula and Heffernan-Tawn conditional model together represent the true joint distribution and tails of the feature vectors.
    Central modeling assumption stated in the methodology description.

pith-pipeline@v0.9.0 · 5528 in / 1488 out tokens · 36054 ms · 2026-05-16T22:16:37.075855+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

52 extracted references · 52 canonical work pages

  1. [1]

    Naess, Prediction of extremes related to the second-order, sum-frequency response of a TLP, in: ISOPE International Ocean and Polar Engineering Conference, ISOPE, 1992, pp

    A. Naess, Prediction of extremes related to the second-order, sum-frequency response of a TLP, in: ISOPE International Ocean and Polar Engineering Conference, ISOPE, 1992, pp. ISOPE–I. 20

  2. [2]

    Faltinsen, Sea loads on ships and offshore structures, Vol

    O. Faltinsen, Sea loads on ships and offshore structures, Vol. 1, Cambridge university press, 1993

  3. [3]

    O. M. Faltinsen, J. Newman, T. Vinje, Nonlinear wave loads on a slender vertical cylinder, Journal of Fluid Mechanics 289 (1995) 179–198

  4. [4]

    E. E. Bachynski, T. Moan, Ringing loads on tension leg platform wind turbines, Ocean engi- neering 84 (2014) 237–248

  5. [5]

    P.Renaud, F.Hulin, A.Tassin, J.-F.Filipot, N.Jacques, Analysisofslammingloadsinducedby breaking waves on vertical cylinders using fully nonlinear wave kinematics and semi-analytical load model, Coastal Engineering (2025) 104898

  6. [6]

    A. K. Jha, S. R. Winterstein, Stochastic fatigue damage accumulation due to nonlinear ship loads, J. Offshore Mech. Arct. Eng. 122 (4) (2000) 253–259

  7. [7]

    Z. Li, J. W. Ringsberg, G. Storhaug, Time-domain fatigue assessment of ship side-shell struc- tures, International journal of fatigue 55 (2013) 276–290

  8. [8]

    G. Storhaug, The measured contribution of whipping and springing on the fatigue and extreme loadingofcontainervessels, InternationalJournalofNavalArchitectureandOceanEngineering 6 (4) (2014) 1096–1110.doi:10.2478/IJNAOE-2013-0233

  9. [9]

    Hascoët, N

    R. Hascoët, N. Jacques, On the risk of fatigue failure of structural elements exposed to bottom wave slamming–impulse response regime, Applied Ocean Research 154 (2025) 104411

  10. [10]

    Kinsman, Wind waves: their generation and propagation on the ocean surface, Courier Corporation, 1984

    B. Kinsman, Wind waves: their generation and propagation on the ocean surface, Courier Corporation, 1984

  11. [11]

    M. J. Tucker, E. G. Pitt, Waves in ocean engineering, Vol. 5, Elsevier Ocean Engineering Series, 2001

  12. [12]

    M. K. Ochi, Ocean waves: the stochastic approach, Vol. 6, Cambridge University Press, 2005

  13. [13]

    L. H. Holthuijsen, Waves in Oceanic and Coastal Waters, Cambridge University Press, 2007. doi:10.1017/CBO9780511618536

  14. [14]

    M. S. Longuet-Higgins, On the statistical distribution of the heights of sea waves, Journal of Marine Research 11 (3)

  15. [15]

    Lindgren, I

    G. Lindgren, I. Rychlik, Wave characteristic distributions for Gaussian waves?wave-length, amplitude and steepness, Ocean Engineering 9 (5) (1982) 411–432

  16. [16]

    Azaïs, J

    J.-M. Azaïs, J. R. León, J. Ortega, Geometrical characteristics of Gaussian sea waves, Journal of applied probability 42 (2) (2005) 407–425

  17. [17]

    Hascoët, N

    R. Hascoët, N. Raillard, N. Jacques, Effect of forward speed on the level-crossing distribution of kinematic variables in multidirectional ocean waves, Ocean Engineering 235 (2021) 109345. doi:10.1016/j.oceaneng.2021.109345

  18. [18]

    M. S. Longuet-Higgins, Modified Gaussian distributions for slightly nonlinear variables, Radio Sci. D 68 (1964) 1049–1062. 21

  19. [19]

    Naess, Statistical analysis of second-order response of marine structures, Journal of Ship Research 29 (04) (1985) 270–284

    A. Naess, Statistical analysis of second-order response of marine structures, Journal of Ship Research 29 (04) (1985) 270–284

  20. [20]

    Langley, A statistical analysis of non-linear random waves, Ocean Engineering 14 (5) (1987) 389–407.doi:10.1016/0029-8018(87)90052-7

    R. Langley, A statistical analysis of non-linear random waves, Ocean Engineering 14 (5) (1987) 389–407.doi:10.1016/0029-8018(87)90052-7

  21. [21]

    Hascoët, Level-crossing distributions of kinematic variables in multidirectional second-order ocean waves, Ocean Engineering 265 (2022) 112585.doi:10.1016/j.oceaneng.2022.112585

    R. Hascoët, Level-crossing distributions of kinematic variables in multidirectional second-order ocean waves, Ocean Engineering 265 (2022) 112585.doi:10.1016/j.oceaneng.2022.112585

  22. [22]

    Pierella, O

    F. Pierella, O. Lindberg, H. Bredmose, H. B. Bingham, R. W. Read, A. P. Engsig-Karup, The DeRisk database: Extreme design waves for offshore wind turbines, Marine Structures 80 (2021) 103046.doi:10.1016/j.marstruc.2021.103046

  23. [23]

    J. E. Heffernan, J. A. Tawn, A conditional approach for multivariate extreme values (with discussion), Journal of the Royal Statistical Society: Series B (Statistical Methodology) 66 (3) (2004) 497–546

  24. [24]

    Masset, R

    A. Engsig-Karup, H. Bingham, O. Lindberg, An efficient flexible-order model for 3D nonlinear water waves, Journal of Computational Physics 228 (6) (2009) 2100–2118.doi:10.1016/j. jcp.2008.11.028

  25. [25]

    Akima, A new method of interpolation and smooth curve fitting based on local procedures, Journal of the ACM (JACM) 17 (4) (1970) 589–602

    H. Akima, A new method of interpolation and smooth curve fitting based on local procedures, Journal of the ACM (JACM) 17 (4) (1970) 589–602

  26. [26]

    Akima, A method of bivariate interpolation and smooth surface fitting based on local procedures, Communications of the ACM 17 (1) (1974) 18–20

    H. Akima, A method of bivariate interpolation and smooth surface fitting based on local procedures, Communications of the ACM 17 (1) (1974) 18–20

  27. [27]

    The MathWorks, Inc.,makima: modified Akima piecewise cubic hermite interpolation,https: //blogs.mathworks.com/cleve/2019/04/29/makima-piecewise-cubic-interpolation/, introduced in MATLAB R2019b (2019)

  28. [28]

    J. O. Ramsay, B. W. Silverman, Functional Data Analysis, Springer New York, NY, 2006. doi:10.1007/b98888

  29. [29]

    Ramsay, G

    J. Ramsay, G. Hooker, S. Graves,fda: Functional Data Analysis, R package version 6.1.8

  30. [30]

    J. R. Hosking, L-moments: analysis and estimation of distributions using linear combinations of order statistics, Journal of the Royal Statistical Society Series B: Statistical Methodology 52 (1) (1990) 105–124

  31. [31]

    J. R. M. Hosking, J. R. Wallis, Regional Frequency Analysis: An Approach Based on L- Moments, Cambridge University Press, 1997

  32. [32]

    Talbot, Extreme values statistical analysis library, version 1.3.1, MATLAB Central File Exchange

    G. Talbot, Extreme values statistical analysis library, version 1.3.1, MATLAB Central File Exchange

  33. [33]

    Nagler, T

    T. Nagler, T. Vatter,rvinecopulib: High performance algorithms for vine copula modeling, R package version 0.6.3.1.1

  34. [34]

    Loader, Local regression and likelihood, Springer New York, NY, 1999.doi:10.1007/ b98858

    C. Loader, Local regression and likelihood, Springer New York, NY, 1999.doi:10.1007/ b98858. 22

  35. [35]

    Geenens, A

    G. Geenens, A. Charpentier, D. Paindaveine, Probit transformation for nonparametric kernel estimation of the copula density, Bernoulli 23 (3) (2017) 1848–1873

  36. [36]

    Nagler, C

    T. Nagler, C. Schellhase, C. Czado, Nonparametric estimation of simplified vine copula models: comparison of methods, De Gruyter Open, Dependence Modeling 5 (1) (2017) 99–120.doi: 10.1515/demo-2017-0007

  37. [37]

    C. Keef, I. Papastathopoulos, J. A. Tawn, Estimation of the conditional distribution of a multivariate variable given that one of its components is large: Additional constraints for the Heffernan and Tawn model, Journal of Multivariate Analysis 115 (2013) 396–404.doi: 10.1016/j.jmva.2012.10.012

  38. [38]

    C. H. Wu, H. Nepf, Breaking criteria and energy losses for three-dimensional wave breaking, Journal of Geophysical Research: Oceans 107 (C10) (2002) 41–1

  39. [39]

    Babanin, Breaking and dissipation of ocean surface waves, Cambridge University Press, 2011

    A. Babanin, Breaking and dissipation of ocean surface waves, Cambridge University Press, 2011

  40. [40]

    Caers, J.Beirlant, P

    J. Caers, J.Beirlant, P. Vynckier, Bootstrap confidence intervals fortail indices, Computational statistics & data analysis 26 (3) (1998) 259–277

  41. [41]

    M. I. Gomes, O. Oliveira, The bootstrap methodology in statistics of extremes–choice of the optimal sample fraction, Extremes 4 (4) (2001) 331–358

  42. [42]

    B. Wang, S. N. Mishra, M. S. Mulekar, N. Mishra, K. Huang, Comparison of bootstrap and generalized bootstrap methods for estimating high quantiles, Journal of statistical planning and inference 140 (10) (2010) 2926–2935

  43. [43]

    Gilleland, Bootstrap methods for statistical inference

    E. Gilleland, Bootstrap methods for statistical inference. Part II: Extreme-value analysis, Jour- nal of Atmospheric and Oceanic Technology 37 (11) (2020) 2135–2144

  44. [44]

    Mackay, P

    E. Mackay, P. Jonathan, Estimation of Environmental Contours Using a Block Resampling Method, Vol. Volume 2A: Structures, Safety, and Reliability of International Conference on Offshore Mechanics and Arctic Engineering, 2020.doi:10.1115/OMAE2020-18308

  45. [45]

    Naveau, R

    P. Naveau, R. Huser, P. Ribereau, A. Hannart, Modeling jointly low, moderate, and heavy rainfall intensities without a threshold selection, Water Resources Research 52 (4) (2016) 2753– 2769

  46. [46]

    Legrand, P

    J. Legrand, P. Ailliot, P. Naveau, N. Raillard, Joint stochastic simulation of extreme coastal and offshore significant wave heights, The Annals of Applied Statistics 17 (4) (2023) 3363–3383

  47. [47]

    Bedford, R

    T. Bedford, R. M. Cooke, Probability density decomposition for conditionally dependent ran- dom variables modeled by vines, Annals of Mathematics and Artificial intelligence 32 (1) (2001) 245–268

  48. [48]

    Bedford, R

    T. Bedford, R. M. Cooke, Vines–a new graphical model for dependent random variables, The Annals of statistics 30 (4) (2002) 1031–1068

  49. [49]

    K. Aas, C. Czado, A. Frigessi, H. Bakken, Pair-copula constructions of multiple dependence, Insurance: Mathematics and economics 44 (2) (2009) 182–198. 23

  50. [50]

    Kurowicka, H

    D. Kurowicka, H. Joe, Dependence modeling: vine copula handbook, World Scientific, 2010

  51. [51]

    Nagler, Kernel methods for vine copula estimation, Master’s thesis, Technische Universität München (2014)

    T. Nagler, Kernel methods for vine copula estimation, Master’s thesis, Technische Universität München (2014)

  52. [52]

    Nagler, C

    T. Nagler, C. Czado, Evading the curse of dimensionality in nonparametric density estimation with simplified vine copulas, Journal of Multivariate Analysis 151 (2016) 69–89.doi:10.1016/ j.jmva.2016.07.003. 24