Deep Gaussian Processes for Functional Maps

Da Long; Daniel S. Johnson; Keyan Chen; Matthew Lowery; Shandian Zhe; Varun Shankar; Yang Bai; Zhitong Xu

arxiv: 2510.22068 · v2 · submitted 2025-10-24 · 💻 cs.LG · stat.ML

Deep Gaussian Processes for Functional Maps

Matthew Lowery , Zhitong Xu , Da Long , Keyan Chen , Daniel S. Johnson , Yang Bai , Varun Shankar , Shandian Zhe This is my paper

Pith reviewed 2026-05-18 04:00 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords Deep Gaussian ProcessesFunctional Data AnalysisFunction-on-Function RegressionKernel Integral TransformsVariational InferenceUncertainty QuantificationFunctional Maps

0 comments

The pith

Deep Gaussian processes learn functional maps by stacking kernel integral transforms and nonlinear activations directly in function space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to handle function-on-function regression by defining deep Gaussian processes that apply sequences of linear and nonlinear transformations straight on functions instead of on vectorized points. It uses kernel integral transforms for the linear layers and draws nonlinear activations from Gaussian processes, with GP conditional means to propagate the mappings. This matters for applications such as spatiotemporal forecasting and climate modeling because functional data are often noisy, sparse, or irregularly sampled, where existing methods struggle with complex nonlinearities and trustworthy uncertainty. A central simplification shows that when evaluation locations are fixed, the discrete versions of the kernel transforms collapse to direct functional integral transforms, allowing flexible design choices. Variational inference with inducing points and whitening then makes the model scalable, and experiments on benchmark datasets report gains in accuracy and uncertainty calibration.

Core claim

The authors establish that mappings between functional spaces can be realized as deep compositions of Gaussian process transformations performed directly in function space: linear steps use kernel integral transforms, nonlinear steps use activations sampled from Gaussian processes, and the whole stack is trained with variational inference; under fixed evaluation locations the discrete kernel approximations reduce exactly to the continuous functional integrals, enabling practical and flexible implementations.

What carries the argument

The Deep Gaussian Process for Functional Maps (DGPFM) that composes GP-based linear kernel integral transforms with nonlinear activations drawn from Gaussian processes, all operating directly in function space.

Load-bearing premise

Under fixed evaluation locations, discrete approximations of kernel integral transforms reduce to direct functional integral transforms.

What would settle it

A direct comparison on the paper's real-world and synthetic benchmarks in which DGPFM shows no improvement over standard functional regression baselines in either predictive accuracy or uncertainty calibration.

Figures

Figures reproduced from arXiv: 2510.22068 by Da Long, Daniel S. Johnson, Keyan Chen, Matthew Lowery, Shandian Zhe, Varun Shankar, Yang Bai, Zhitong Xu.

**Figure 1.** Figure 1: Prediction examples of DGPFM on Beijing-Air dataset. The shaded regions indicate one predictive standard deviation. The top row shows the prediction of DGPFM-FT and the bottom row DGPFM-QR. We further investigated the probabilistic predictions of our method. Specifically, we randomly selected four test examples respectively from two real-world applications: Beijing-Air and Quasar, as well as two test examp… view at source ↗

**Figure 2.** Figure 2: Prediction examples of DGPFM on Quasar dataset. The shaded regions indicate one predictive standard deviation. The top row shows the prediction of DGPFM-FT and the bottom row DGPFM-QR [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Prediction examples of DGPFM-QR on 2D Darcy, σ denotes the predictive standard deviation (STD). The last two columns show the point-wise predictive std normalized by the ground-truth [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

**Figure 4.** Figure 4: Prediction examples of DGPFM-FT on 1D Burgers. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: The Singular values of learned weight function values (matrices) versus randomly initialized matrices for running DGPFM-QR on 2D Darcy. G Visualization of Learned Weight Functions To better understand the representations learned by the weight functions, we applied DGPFM-QR to the 2D Darcy problem and analyzed the weight matrices in the integration (linear) layers. Specifically, we performed Singular Value … view at source ↗

read the original abstract

Learning mappings between functional spaces, also known as function-on-function regression, is a fundamental problem in functional data analysis with broad applications, including spatiotemporal forecasting, curve prediction, and climate modeling. Existing approaches often struggle to capture complex nonlinear relationships and/or provide reliable uncertainty quantification when data are noisy, sparse, or irregularly sampled. To address these challenges, we propose Deep Gaussian Processes for Functional Maps (DGPFM). Our method constructs a sequence of GP-based linear and nonlinear transformations directly in function space, leveraging kernel integral transforms, GP conditional means, and nonlinear activations sampled from Gaussian processes. A key insight enables a simplified and flexible implementation: under fixed evaluation locations, discrete approximations of kernel integral transforms reduce to direct functional integral transforms, allowing seamless integration of diverse transform designs. To support scalable probabilistic inference, we adopt inducing points and whitening transformations within a variational learning framework. Empirical results on both real-world and synthetic benchmark datasets demonstrate the advantages of DGPFM in terms of predictive accuracy and uncertainty calibration.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DGPFM stacks GP layers in function space via kernel integrals with a fixed-grid simplification, but the irregular-sampling motivation clashes with that premise.

read the letter

The main takeaway is that the authors have defined a deep GP architecture that chains linear kernel-integral maps and nonlinear GP-sampled activations directly on functions, then uses inducing points and whitening for variational inference. The claimed practical win is that, on fixed evaluation locations, the discrete approximation collapses to the continuous functional transform, which they say lets them mix different designs easily. That construction is new relative to the functional-data and deep-GP literature they cite, and it is a coherent way to keep the model operating in function space rather than vectorizing everything upfront. The abstract also flags empirical gains on both synthetic and real benchmarks for accuracy and calibration, which is the concrete evidence they provide. The soft spot is the one flagged in the stress test. The motivation repeatedly mentions sparse and irregularly sampled curves, yet the implementation hinge is the fixed-location reduction. If that reduction only holds after interpolation onto a common grid, then either an unstated preprocessing step is doing some of the work or the uncertainty calibration claims become harder to trust in the exact regimes where existing methods are said to fail. The abstract gives no tables, error bars, or derivation details, so it is difficult to judge how large the practical advantage actually is once that assumption is relaxed. This paper is aimed at people working on function-on-function regression in climate, spatiotemporal forecasting, or curve prediction who already know deep GPs and want a probabilistic nonlinear extension. A reader who cares about keeping models in function space rather than discretizing early would find the architecture worth examining. It is coherent enough on its own terms to deserve a serious referee, even if the experiments and the irregular-data handling need closer checking.

Referee Report

1 major / 2 minor

Summary. The manuscript introduces Deep Gaussian Processes for Functional Maps (DGPFM) for function-on-function regression. It constructs sequences of GP-based linear and nonlinear transformations directly in function space via kernel integral transforms, GP conditional means, and nonlinear activations sampled from GPs. A key implementation insight is that, under fixed evaluation locations, discrete approximations of these transforms reduce to direct functional integral transforms. Scalable inference uses inducing points and whitening within a variational framework. Empirical results on real-world and synthetic benchmarks are claimed to show advantages in predictive accuracy and uncertainty calibration over existing methods.

Significance. If the function-space construction is rigorously shown to preserve the claimed advantages without hidden interpolation biases, and if the empirical gains prove robust, the work would advance functional data analysis by extending deep GPs to handle complex nonlinear functional mappings with reliable uncertainty quantification. The practical reduction insight for implementation is a potential strength for flexible transform designs.

major comments (1)

Abstract (key insight paragraph): The central implementation claim states that 'under fixed evaluation locations, discrete approximations of kernel integral transforms reduce to direct functional integral transforms.' This premise is invoked to justify flexible designs and seamless integration. However, the motivating settings explicitly include 'noisy, sparse, or irregularly sampled' data. The manuscript does not clarify how the fixed-location reduction is reconciled with irregular sampling (e.g., via unstated preprocessing or interpolation), which risks biasing the uncertainty calibration that is central to the claimed superiority over standard deep GPs and functional PCA baselines.

minor comments (2)

Abstract: No quantitative tables, specific error metrics, dataset names, or baseline comparisons are provided to support the empirical claims of superior accuracy and calibration; including at least one summary table or key result would strengthen the presentation.
Notation and terminology: The distinction between 'kernel integral transforms' and 'direct functional integral transforms' could be clarified with a brief equation or diagram in the methods section to avoid ambiguity for readers unfamiliar with the discretization step.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their constructive feedback on our manuscript. We address the single major comment below and agree that additional clarification is warranted regarding the handling of irregular sampling.

read point-by-point responses

Referee: Abstract (key insight paragraph): The central implementation claim states that 'under fixed evaluation locations, discrete approximations of kernel integral transforms reduce to direct functional integral transforms.' This premise is invoked to justify flexible designs and seamless integration. However, the motivating settings explicitly include 'noisy, sparse, or irregularly sampled' data. The manuscript does not clarify how the fixed-location reduction is reconciled with irregular sampling (e.g., via unstated preprocessing or interpolation), which risks biasing the uncertainty calibration that is central to the claimed superiority over standard deep GPs and functional PCA baselines.

Authors: We thank the referee for identifying this lack of clarity. The manuscript assumes that functional observations are first mapped to a common fixed grid of evaluation points via standard preprocessing (e.g., linear interpolation or GP-based smoothing), after which the discrete kernel integral transforms are applied directly on that grid. This is a conventional step in functional data analysis when dealing with irregular sampling and is described in the experimental setup, but we agree it is insufficiently highlighted in the abstract and early method sections. We will revise the abstract to explicitly note the preprocessing step and expand the methods discussion to address its effect on uncertainty calibration. The variational inference framework models observation noise explicitly, which mitigates some interpolation-induced bias, but we will add a short robustness check in the experiments to quantify any residual impact. revision: yes

Circularity Check

0 steps flagged

No significant circularity; central construction uses standard GP tools with independent empirical validation

full rationale

The paper's derivation defines DGPFM as a composition of GP-based linear and nonlinear maps in function space via kernel integral transforms, GP conditional means, and sampled activations. The key insight on discrete-to-functional reduction is explicitly stated as an assumption under fixed evaluation locations to enable implementation, not derived from or equivalent to the target performance claims. Scalable inference adopts standard inducing points and whitening in a variational framework. Results are demonstrated on external real-world and synthetic benchmarks, providing falsifiable evidence outside any fitted inputs. No self-citations are invoked as load-bearing uniqueness theorems, no parameters are fitted to a subset then relabeled as predictions, and no equations reduce the claimed accuracy or calibration advantages to the inputs by construction.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard Gaussian-process modeling assumptions plus a small number of implementation choices whose values are not fixed by theory.

free parameters (2)

number and locations of inducing points
Chosen to approximate the posterior for scalable variational inference.
kernel hyperparameters
Optimized inside the variational objective.

axioms (2)

domain assumption Functional data can be faithfully represented and transformed via kernel integral operators
Invoked to justify operating directly in function space.
domain assumption Variational inference with inducing points yields a sufficiently accurate posterior approximation
Required for the claimed scalable probabilistic inference.

pith-pipeline@v0.9.0 · 5715 in / 1414 out tokens · 53461 ms · 2026-05-18T04:00:29.649439+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sequence of GP-based linear and nonlinear transformations directly in function space, leveraging kernel integral transforms, GP conditional means

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 2 internal anchors

[1]

C., Kulkarni, S

doi: 10.1088/1538-3873/aaecbe. Ufuk Beyaztas and Han Lin Shang. On function-on-function regression: Partial least squares approach. Environmental and ecological statistics, 27(1):95–114,

work page doi:10.1088/1538-3873/aaecbe
[2]

Ron Bracewell and Peter B Kahn

doi: 10.1086/159843. Ron Bracewell and Peter B Kahn. The Fourier transform and its applications. American Journal of Physics, 34(8):712–712,

work page doi:10.1086/159843
[3]

doi: 10.21105/astro.2308.01505

ISSN 2565-6120. doi: 10.21105/astro.2308.01505. URL http://dx.doi. org/10.21105/astro.2308.01505. Roy Frostig, Matthew James Johnson, and Chris Leary. Compiling machine learning programs via high-level tracing. Systems for Machine Learning, 4(9),

work page doi:10.21105/astro.2308.01505
[4]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv
[5]

Transformer for partial differential equations’ operator learning.arXiv preprint arXiv:2205.13671, 2022

Zijie Li, Kazem Meidani, and Amir Barati Farimani. Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671, 2022a. Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differential equations. In ...

work page arXiv
[6]

Fourier neu- ral operator with learned deformations for pdes on general geometries

11 Preprint. Under review. Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar. Fourier neural operator with learned deformations for PDEs on general geometries. arXiv preprint arXiv:2207.05209, 2022b. Levi E Lingsch, Mike Yan Michelis, Emmanuel De Bezenac, Sirani M Perera, Robert K Katzschmann, and Siddhartha Mishra. Beyond regular grids:...

work page arXiv
[7]

J., Laher, R

ISSN 1538-3873. doi: 10.1088/1538-3873/aae8ac. URL http://dx.doi.org/10.1088/1538-3873/aae8ac. Pierre Masselot, Fateh Chebana, Taha BMJ Ouarda, Diane Bélanger, André St-Hilaire, and Pierre Gosselin. A new look at weather-related health impacts through functional regression. Scientific Reports, 8(1):15241,

work page internal anchor Pith review doi:10.1088/1538-3873/aae8ac
[8]

James O Ramsay and CJ1125714 Dalzell

doi: 10.1086/133140. James O Ramsay and CJ1125714 Dalzell. Some tools for functional data analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology, 53(3):539–561,

work page doi:10.1086/133140
[9]

doi: 10.3847/1538-3881/ac1426

ISSN 1538-3881. doi: 10.3847/1538-3881/ac1426. URL http: //dx.doi.org/10.3847/1538-3881/ac1426. Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611,

work page doi:10.3847/1538-3881/ac1426
[10]

Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized Fourier neural operators. arXiv preprint arXiv:2111.13802,

work page arXiv
[11]

Under review

13 Preprint. Under review. Appendix A GP Covariance for Integral Transformation Suppose a stochastic function f is sampled from a GP prior with covariance function as a kernel functionκ(·,·), f∼ GP(0, κ(x,x ′)). From the weight space view (Rasmussen & Williams, 2006), we can represent: f(x) =ϕ(x) ⊤w, where ϕ(x) is the implicit feature mapping of the kerne...

work page 2006
[12]

The dataset was generated and shared by Lu et al

Every input-output function pair is discretized on a 29×29 uniform grid over the input domain. The dataset was generated and shared by Lu et al. (2022). 14 Preprint. Under review. B.3 3D Compressible Naiver-Stoke (NS) Equations The third scenario involves 3D compressible NS equations: ∂tρ+∇ ·(ρv) = 0, ρ(∂ tv+v· ∇v) =−∇p+η∆v+ (ζ+η/3)∇(∇ ·v), ∂t ϵ+ ρv2 2 +∇...

work page 2022
[13]

and two nights during Phase II (December 2020–present) for its custom g-band and r-band photometric filters, with a four-night cadence for the i-band. To construct a dataset aligned with this task, we collected g-band and r-band light curves from the most recent ZTF data release (DR23) (Masci et al., 2018), focusing on the first 18,000 objects in the Mill...

work page 2020
[14]

This discrepancy, seen across datasets, motivated our use of weighted combinations of Matérn kernels to allow the model to adaptively learn optimal smoothness

As with DGPFM-FT, finitely smooth kernels were advantageous; interestingly, the Matérn 5/2 kernel outperformed the 13/2 variant — the opposite of what we observed onBeijing-Air. This discrepancy, seen across datasets, motivated our use of weighted combinations of Matérn kernels to allow the model to adaptively learn optimal smoothness. As before, increasi...

work page arXiv

[1] [1]

C., Kulkarni, S

doi: 10.1088/1538-3873/aaecbe. Ufuk Beyaztas and Han Lin Shang. On function-on-function regression: Partial least squares approach. Environmental and ecological statistics, 27(1):95–114,

work page doi:10.1088/1538-3873/aaecbe

[2] [2]

Ron Bracewell and Peter B Kahn

doi: 10.1086/159843. Ron Bracewell and Peter B Kahn. The Fourier transform and its applications. American Journal of Physics, 34(8):712–712,

work page doi:10.1086/159843

[3] [3]

doi: 10.21105/astro.2308.01505

ISSN 2565-6120. doi: 10.21105/astro.2308.01505. URL http://dx.doi. org/10.21105/astro.2308.01505. Roy Frostig, Matthew James Johnson, and Chris Leary. Compiling machine learning programs via high-level tracing. Systems for Machine Learning, 4(9),

work page doi:10.21105/astro.2308.01505

[4] [4]

Auto-Encoding Variational Bayes

Diederik P Kingma and Max Welling. Auto-encoding variational Bayes. arXiv preprint arXiv:1312.6114,

work page internal anchor Pith review Pith/arXiv arXiv

[5] [5]

Transformer for partial differential equations’ operator learning.arXiv preprint arXiv:2205.13671, 2022

Zijie Li, Kazem Meidani, and Amir Barati Farimani. Transformer for partial differential equations’ operator learning. arXiv preprint arXiv:2205.13671, 2022a. Zongyi Li, Nikola Borislavov Kovachki, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, Anima Anandkumar, et al. Fourier neural operator for parametric partial differential equations. In ...

work page arXiv

[6] [6]

Fourier neu- ral operator with learned deformations for pdes on general geometries

11 Preprint. Under review. Zongyi Li, Daniel Zhengyu Huang, Burigede Liu, and Anima Anandkumar. Fourier neural operator with learned deformations for PDEs on general geometries. arXiv preprint arXiv:2207.05209, 2022b. Levi E Lingsch, Mike Yan Michelis, Emmanuel De Bezenac, Sirani M Perera, Robert K Katzschmann, and Siddhartha Mishra. Beyond regular grids:...

work page arXiv

[7] [7]

J., Laher, R

ISSN 1538-3873. doi: 10.1088/1538-3873/aae8ac. URL http://dx.doi.org/10.1088/1538-3873/aae8ac. Pierre Masselot, Fateh Chebana, Taha BMJ Ouarda, Diane Bélanger, André St-Hilaire, and Pierre Gosselin. A new look at weather-related health impacts through functional regression. Scientific Reports, 8(1):15241,

work page internal anchor Pith review doi:10.1088/1538-3873/aae8ac

[8] [8]

James O Ramsay and CJ1125714 Dalzell

doi: 10.1086/133140. James O Ramsay and CJ1125714 Dalzell. Some tools for functional data analysis. Journal of the Royal Statistical Society Series B: Statistical Methodology, 53(3):539–561,

work page doi:10.1086/133140

[9] [9]

doi: 10.3847/1538-3881/ac1426

ISSN 1538-3881. doi: 10.3847/1538-3881/ac1426. URL http: //dx.doi.org/10.3847/1538-3881/ac1426. Makoto Takamoto, Timothy Praditia, Raphael Leiteritz, Daniel MacKinlay, Francesco Alesiani, Dirk Pflüger, and Mathias Niepert. Pdebench: An extensive benchmark for scientific machine learning. Advances in Neural Information Processing Systems, 35:1596–1611,

work page doi:10.3847/1538-3881/ac1426

[10] [10]

Factorized fourier neural oper- ators.arXiv preprint arXiv:2111.13802, 2021

Alasdair Tran, Alexander Mathews, Lexing Xie, and Cheng Soon Ong. Factorized Fourier neural operators. arXiv preprint arXiv:2111.13802,

work page arXiv

[11] [11]

Under review

13 Preprint. Under review. Appendix A GP Covariance for Integral Transformation Suppose a stochastic function f is sampled from a GP prior with covariance function as a kernel functionκ(·,·), f∼ GP(0, κ(x,x ′)). From the weight space view (Rasmussen & Williams, 2006), we can represent: f(x) =ϕ(x) ⊤w, where ϕ(x) is the implicit feature mapping of the kerne...

work page 2006

[12] [12]

The dataset was generated and shared by Lu et al

Every input-output function pair is discretized on a 29×29 uniform grid over the input domain. The dataset was generated and shared by Lu et al. (2022). 14 Preprint. Under review. B.3 3D Compressible Naiver-Stoke (NS) Equations The third scenario involves 3D compressible NS equations: ∂tρ+∇ ·(ρv) = 0, ρ(∂ tv+v· ∇v) =−∇p+η∆v+ (ζ+η/3)∇(∇ ·v), ∂t ϵ+ ρv2 2 +∇...

work page 2022

[13] [13]

and two nights during Phase II (December 2020–present) for its custom g-band and r-band photometric filters, with a four-night cadence for the i-band. To construct a dataset aligned with this task, we collected g-band and r-band light curves from the most recent ZTF data release (DR23) (Masci et al., 2018), focusing on the first 18,000 objects in the Mill...

work page 2020

[14] [14]

This discrepancy, seen across datasets, motivated our use of weighted combinations of Matérn kernels to allow the model to adaptively learn optimal smoothness

As with DGPFM-FT, finitely smooth kernels were advantageous; interestingly, the Matérn 5/2 kernel outperformed the 13/2 variant — the opposite of what we observed onBeijing-Air. This discrepancy, seen across datasets, motivated our use of weighted combinations of Matérn kernels to allow the model to adaptively learn optimal smoothness. As before, increasi...

work page arXiv