pith. sign in

arxiv: 1906.09665 · v2 · pith:IVQHEYEVnew · submitted 2019-06-23 · 📊 stat.ML · cs.LG· eess.SP

Compositionally-Warped Gaussian Processes

Pith reviewed 2026-05-25 17:24 UTC · model grok-4.3

classification 📊 stat.ML cs.LGeess.SP
keywords Gaussian processeswarped Gaussian processesnon-Gaussian marginalscompositional warpinginvertible transformationsprobabilistic modelingmachine learning
0
0 comments X

The pith

Compositions of elementary invertible functions let Gaussian processes model non-Gaussian data with fully analytical inverses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper constructs a class of warpings for Gaussian processes by composing multiple elementary functions, each with an explicit inverse. This produces the compositionally-warped Gaussian process, a generative model whose non-Gaussian marginals arise from the depth of the composition while all required inverses remain closed-form. A reader would care because the construction removes the numerical inversion step that slows standard warped GPs at prediction time. Experiments on synthetic and real data show the resulting model is robust across warping choices and yields more accurate point predictions together with shorter run times.

Core claim

The paper establishes that a warping formed by composing elementary invertible functions yields a non-Gaussian generative model over functions whose inverse is known exactly, thereby preserving the computational advantages of the original Gaussian process while expanding the family of marginal distributions that can be represented.

What carries the argument

The compositionally-warped Gaussian process (CWGP), a non-Gaussian model whose warping is a finite composition of elementary functions chosen so that the overall inverse remains explicit and therefore analytical.

If this is right

  • Point predictions become more accurate than those of a standard warped GP on the tested data.
  • Model training requires less computation because no numerical inversion is needed.
  • The model remains effective across a range of different elementary warping functions.
  • Prediction itself stays fully analytical at every stage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same compositional pattern could be applied to other latent-variable models that currently rely on numerical inverses.
  • One could test whether adding more layers of elementary functions eventually saturates performance gains before reaching the cost of deep GPs.
  • The approach implies that many practical non-Gaussian patterns can be captured without arbitrary, non-analytic warpings.

Load-bearing premise

Compositions of a modest number of elementary invertible functions are expressive enough to match the non-Gaussian marginals that appear in the intended applications.

What would settle it

A real-world dataset on which every composition of elementary functions produces visibly worse point predictions or longer training times than a numerically inverted warped GP using the same base kernel.

Figures

Figures reproduced from arXiv: 1906.09665 by Felipe Tobar, Gonzalo Rios.

Figure 1
Figure 1. Figure 1: Single-layer feedforward neural network: [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: General structure of warped Gaussian processes where a [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Proposed Box-Cox and SinhArcsinh elementary transformations. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Approximation of a WGP warping (sum of three hyperbolic [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representation of error measures in Table 2 normalised wrt [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Training (left, NLL) and evaluation (right, NLPD) perfor [PITH_FULL_IMAGE:figures/full_fig_p010_7.png] view at source ↗
Figure 9
Figure 9. Figure 9: shows both GP (top) and CWGP (bottom) pos￾terior distributions with only 40 observations for the time series, together with their means, error bars and sample trajectories, while [PITH_FULL_IMAGE:figures/full_fig_p011_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: NLPD histograms (65 runs) for all models considered and the Abalone, Ailerons and Creep datasets. The white points are the [PITH_FULL_IMAGE:figures/full_fig_p012_10.png] view at source ↗
read the original abstract

The Gaussian process (GP) is a nonparametric prior distribution over functions indexed by time, space, or other high-dimensional index set. The GP is a flexible model yet its limitation is given by its very nature: it can only model Gaussian marginal distributions. To model non-Gaussian data, a GP can be warped by a nonlinear transformation (or warping) as performed by warped GPs (WGPs) and more computationally-demanding alternatives such as Bayesian WGPs and deep GPs. However, the WGP requires a numerical approximation of the inverse warping for prediction, which increases the computational complexity in practice. To sidestep this issue, we construct a novel class of warpings consisting of compositions of multiple elementary functions, for which the inverse is known explicitly. We then propose the compositionally-warped GP (CWGP), a non-Gaussian generative model whose expressiveness follows from its deep compositional architecture, and its computational efficiency is guaranteed by the analytical inverse warping. Experimental validation using synthetic and real-world datasets confirms that the proposed CWGP is robust to the choice of warpings and provides more accurate point predictions, better trained models and shorter computation times than WGP.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces compositionally-warped Gaussian processes (CWGP), a non-Gaussian GP model whose warping functions are formed by composing elementary invertible functions. This guarantees an explicit inverse, avoiding the numerical inversion required by standard warped GPs (WGPs). The central empirical claim is that CWGP is robust to the specific choice of elementary warpings and yields more accurate point predictions, better-trained models, and shorter runtimes than WGP on both synthetic and real-world data.

Significance. If the reported gains hold under the full experimental protocol, the work supplies a practical, analytically tractable route to non-Gaussian marginals that sits between the simplicity of a single warping and the cost of deep or Bayesian warped GPs. The compositional construction with closed-form inverses is a concrete engineering advantage that could be adopted in time-series or spatial applications where repeated inversion is the bottleneck.

minor comments (3)
  1. Abstract: the phrase 'better trained models' is undefined; the experimental section should state the precise metric (e.g., negative log predictive density on held-out data, marginal likelihood value, or convergence speed of the optimizer).
  2. The manuscript should include an explicit table or appendix listing every elementary function employed, its closed-form inverse, and the range of composition depths tested, so that the robustness claim can be reproduced.
  3. Figure captions and axis labels in the experimental results should report the exact number of Monte-Carlo samples or quadrature points used when any numerical integration remains, even if the inverse itself is analytic.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive summary of the manuscript and recommendation for minor revision. We are encouraged by the recognition of the analytical and computational advantages of the compositional warping construction. No specific major comments were provided in the report, so we have no point-by-point rebuttals to offer at this stage.

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper introduces CWGP by directly defining a novel warping class as compositions of elementary functions with explicit inverses; this is a modeling construction, not a derivation that reduces to fitted parameters or prior results by construction. No self-citation load-bearing steps, uniqueness theorems imported from authors, or ansatzes smuggled via citation appear in the abstract or summary. The performance claims rest on experimental validation rather than internal redefinition, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The approach rests on the standard GP prior and the new compositional warping; no free parameters or invented entities beyond the model class itself are introduced in the abstract.

axioms (2)
  • domain assumption Gaussian process defines a prior over functions with Gaussian marginals
    Core modeling assumption stated in the opening of the abstract.
  • domain assumption Warping function must be invertible to recover data-space predictions
    Implicit in any warping approach for non-Gaussian marginals.
invented entities (1)
  • Compositionally-warped GP no independent evidence
    purpose: Non-Gaussian generative model with analytical inverse warping
    New model class defined by the compositional construction.

pith-pipeline@v0.9.0 · 5726 in / 1184 out tokens · 34411 ms · 2026-05-25T17:24:22.576700+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages

  1. [1]

    Araujo, E

    A. Araujo, E. Giné, The Central Limit Theorem for Real and Banach Valued Random Variables, Vol. 431, Wiley New York, 1980

  2. [2]

    C. E. Rasmussen, C. K. I. Williams, Gaussian Processes for Machine Learning, MIT, 2006

  3. [3]

    A. Bain, D. Crisan, Fundamentals of Stochastic Filtering, Springer, 2009

  4. [4]

    C. K. I. Williams, Computing with infinite networks, in: Ad- vances in Neural Information Processing Systems 9, MIT Press, 1997, pp. 295–301

  5. [5]

    R. M. Neal, Bayesian Learning for Neural Networks, Springer- Verlag New York, Inc., Secaucus, NJ, USA, 1996

  6. [6]

    Hornik, Some new results on neural network approximation, Neural Networks 6 (8) (1993) 1069–1072

    K. Hornik, Some new results on neural network approximation, Neural Networks 6 (8) (1993) 1069–1072

  7. [7]

    Tao, An Introduction to Measure Theory, Vol

    T. Tao, An Introduction to Measure Theory, Vol. 126, American Mathematical Society, 2011

  8. [8]

    R. M. Sakia, The Box-Cox transformation technique: A review, The Statistician (1992) 169–178

  9. [9]

    Snelson, Z

    E. Snelson, Z. Ghahramani, C. E. Rasmussen, Warped Gaus- sian processes, in: Advances in Neural Information Processing Systems 16, MIT Press, 2004, pp. 337–344

  10. [10]

    Lázaro-Gredilla, Bayesian warped Gaussian processes, in: Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp

    M. Lázaro-Gredilla, Bayesian warped Gaussian processes, in: Advances in Neural Information Processing Systems 25, Curran Associates, Inc., 2012, pp. 1619–1627

  11. [11]

    A. C. Damianou, N. D. Lawrence, Deep Gaussian processes, in: Proc. of the International Conference on Artificial Intelligence and Statistics, 2013, pp. 207–215

  12. [12]

    P. J. Bickel, K. A. Doksum, An analysis of transformations revisited, Journal of the American Statistical Association 76 (1981) 296–311

  13. [13]

    N. L. Johnson, Systems of frequency curves generated by meth- ods of translation, Biometrika 36 (1/2) (1949) 149–176. 13

  14. [14]

    R. V. Hogg, A. T. Craig, Introduction to Mathematical Statistics, 5th Edition, Upper Saddle River, New Jersey: Prentice Hall, 1995

  15. [15]

    K. E. Atkinson, An Introduction to Numerical Analysis, John Wiley & Sons, 2008

  16. [16]

    Titsias, Variational learning of inducing variables in sparse Gaussian processes, in: Proc

    M. Titsias, Variational learning of inducing variables in sparse Gaussian processes, in: Proc. of the International Conference on Artificial Intelligence and Statistics, Vol. 5, 2009, pp. 567–574

  17. [17]

    E. G. Tabak, E. Vanden-Eijnden, Density estimation by dual ascent of the log-likelihood, Communications in Mathematical Sciences 8 (1) (2010) 217–233

  18. [18]

    E. G. Tabak, C. V. Turner, A family of nonparametric density estimation algorithms, Communications on Pure and Applied Mathematics 66 (2) (2013) 145–164

  19. [19]

    D. J. Rezende, S. Mohamed, Variational inference with normal- izing flows, in: Proc. of the International Conference on Machine Learning, 2015, pp. 207–215

  20. [20]

    Wilson, Z

    A. Wilson, Z. Ghahramani, Copula processes, in: Advances in Neural Information Processing Systems 23, Curran Associates, Inc., 2010, pp. 2460–2468

  21. [21]

    G. Rios, F. Tobar, Learning non-Gaussian time series using the Box-Cox Gaussian process, in: Proc. of the IEEE International Joint Conference on Neural Networks, 2018, pp. 1–8

  22. [22]

    Abramowitz, I

    M. Abramowitz, I. A. Stegun, Handbook of Mathematical Functions: With Formulas, Graphs, and Mathematical Tables, Courier Corporation, 1964

  23. [23]

    Aitchison, J

    J. Aitchison, J. A. C. Brown, The Lognormal Distribution, Vol. 5, CUP Archive, 1976

  24. [24]

    Poblete, J

    B. Poblete, J. Guzman, J. Maldonado, F. Tobar, Robust detec- tion of extreme events using Twitter: Worldwide earthquake monitoring, IEEE Transactions on Multimedia 20 (10) (2018) 2551–2561

  25. [25]

    Freeman, R

    J. Freeman, R. Modarres, Inverse Box–Cox: The power-normal distribution, Statistics & Probability Letters 76 (8) (2006) 764– 772

  26. [26]

    M. C. Jones, A. Pewsey, Sinh-Arcsinh distributions, Biometrika 96 (4) (2009) 761

  27. [27]

    Watanabe, K

    C. Watanabe, K. Hiramatsu, K. Kashino, Modular representa- tion of layered neural networks, Neural Networks 97 (2018) 62 – 73

  28. [28]

    Duvenaud, J

    D. Duvenaud, J. Lloyd, R. Grosse, J. Tenenbaum, G. Zoubin, Structure discovery in nonparametric regression through compo- sitional kernel search, in: Proc. of the International Conference on Machine Learning, 2013, pp. 1166–1174

  29. [29]

    Bengio, et al., Learning deep architectures for AI, Foundations and trendsR⃝ in Machine Learning 2 (1) (2009) 1–127

    Y. Bengio, et al., Learning deep architectures for AI, Foundations and trendsR⃝ in Machine Learning 2 (1) (2009) 1–127

  30. [30]

    Goodfellow, Y

    I. Goodfellow, Y. Bengio, A. Courville, Deep Learning, MIT Press, 2016

  31. [31]

    Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117

    J. Schmidhuber, Deep learning in neural networks: An overview, Neural Networks 61 (2015) 85 – 117

  32. [32]

    SILSO World Data Center, The International Sunspot Number, International Sunspot Number Monthly Bulletin and online catalogue

  33. [33]

    Wilson, R

    A. Wilson, R. Adams, Gaussian process kernels for pattern discovery and extrapolation, in: Proceedings of the International Conference on Machine Learning, Vol. 28, 2013, pp. 1067–1075

  34. [34]

    Parra, F

    G. Parra, F. Tobar, Spectral mixture kernels for multi-output Gaussian processes, in: Advances in Neural Information Process- ing Systems 30, Curran Associates, Inc., 2017, pp. 6681–6690

  35. [35]

    Nocedal, S

    J. Nocedal, S. J. Wright, Numerical Optimization, 2nd Edition, Springer, New York, NY, USA, 2006

  36. [36]

    M. J. D. Powell, An efficient method for finding the minimum of a function of several variables without calculating derivatives, The computer journal 7 (2) (1964) 155–162

  37. [37]

    Tobar, Bayesian nonparametric spectral estimation, in: Ad- vances in Neural Information Processing Systems 31, Curran Associates, Inc., 2018, pp

    F. Tobar, Bayesian nonparametric spectral estimation, in: Ad- vances in Neural Information Processing Systems 31, Curran Associates, Inc., 2018, pp. 10148–10158

  38. [38]

    Tobar, T

    F. Tobar, T. D. Bui, R. E. Turner, Learning stationary time series using Gaussian processes with nonparametric kernels, in: Advances in Neural Information Processing Systems 28, Curran Associates, Inc., 2015, pp. 3501–3509

  39. [39]

    Tobar, T

    F. Tobar, T. Bui, R. Turner, Design of covariance functions using inter-domain inducing variables, in: NIPS 2015 - Time Series Workshop, 2015

  40. [40]

    Louis, Federal reserve economic data (2009)

    Federal Reserve Bank of St. Louis, Federal reserve economic data (2009). URL http://research.stlouisfed.org/fred2/

  41. [41]

    Duvenaud, J

    D. Duvenaud, J. R. Lloyd, R. Grosse, J. B. Tenenbaum, Z. Ghahramani, Structure discovery in nonparametric regression through compositional kernel search, in: Proc. of the Interna- tional Conference on Machine Learning, 2013

  42. [42]

    G. Rios, F. Tobar, G3py: Generalized graphical Gaussian pro- cesses, github.com/griosd/g3py (2017)

  43. [43]

    Tobar, G

    F. Tobar, G. Rios, T. Valdivia, P. Guerrero, Recovering latent signals from a mixture of measurements using a Gaussian process prior, IEEE Signal Processing Letters 24 (2) (2017) 231–235

  44. [44]

    M. A. Álvarez, N. D. Lawrence, Computationally efficient con- volved multiple output Gsaussian processes, Journal of Machine Learning Research 12 (May) (2011) 1459–s1500

  45. [45]

    Villani, Optimal Transport: Old and New, Springer Berlin Heidelberg, 2008

    C. Villani, Optimal Transport: Old and New, Springer Berlin Heidelberg, 2008

  46. [46]

    Backhoff-Veraguas, J

    J. Backhoff-Veraguas, J. Fontbona, G. Rios, F. Tobar, Bayesian learning with Wasserstein barycenters (2018). arXiv:1805. 10833

  47. [47]

    Marzouk, T

    Y. Marzouk, T. Moselhy, M. Parno, A. Spantini, Sampling via Measure Transport: An Introduction, Springer International Publishing, 2016, pp. 1–41. 14