Active Learning for Manifold Gaussian Process Regression

Chun Liu; Lulu Kang; Yiwei Wang; Yuanxing Cheng

arxiv: 2506.20928 · v1 · submitted 2025-06-26 · 📊 stat.ML · cs.LG

Active Learning for Manifold Gaussian Process Regression

Yuanxing Cheng , Lulu Kang , Yiwei Wang , Chun Liu This is my paper

Pith reviewed 2026-05-19 08:24 UTC · model grok-4.3

classification 📊 stat.ML cs.LG

keywords active learningmanifold learningGaussian process regressiondimensionality reductionneural networksprediction errorhigh-dimensional datadiscontinuous functions

0 comments

The pith

Jointly optimizing a neural network for dimensionality reduction and a Gaussian process in latent space under an active learning criterion that minimizes global prediction error improves accuracy over random sampling.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces an active learning framework for manifold Gaussian process regression that combines strategic data selection with manifold learning to handle high-dimensional inputs. It jointly trains a neural network to map data to a lower-dimensional latent space while fitting the Gaussian process regressor there, with new points chosen to reduce overall prediction error. A sympathetic reader would care because many scientific and engineering problems involve expensive data collection in high dimensions with complex structures such as discontinuities, where random sampling wastes resources. Experiments on synthetic data show the approach outperforms random sequential learning while keeping computation manageable.

Core claim

The central claim is that simultaneously learning a data manifold through a neural network and performing Gaussian process regression in the resulting latent space, with point selection driven by an active learning rule that targets global error reduction, produces lower prediction errors than random sequential sampling when modeling complex discontinuous functions in high dimensions.

What carries the argument

Joint optimization of a neural network for dimensionality reduction and a Gaussian process regressor in the latent space, directed by an active learning criterion that minimizes global prediction error.

If this is right

The framework achieves superior predictive performance compared with random sequential learning on synthetic data.
It efficiently models complex discontinuous functions while preserving computational tractability.
The method offers practical value for scientific and engineering applications where data acquisition is costly.
It supports extensions toward improved scalability as noted in the paper's future work directions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same joint optimization strategy could be applied to real experimental datasets from physics or engineering domains that exhibit similar high-dimensional structure.
Replacing the current active learning criterion with one that explicitly incorporates predictive uncertainty might further improve sample efficiency on discontinuous surfaces.
Testing the approach on progressively larger input dimensions would reveal whether the joint training remains stable or requires additional regularization.

Load-bearing premise

The joint optimization of the neural network and Gaussian process under the active learning criterion will stably recover the structure of complex functions without introducing optimization instabilities or prohibitive computational costs.

What would settle it

Comparing mean squared prediction error of the proposed method against random sequential sampling on a high-dimensional synthetic test function known to contain discontinuities; failure to show lower error would falsify the performance claim.

Figures

Figures reproduced from arXiv: 2506.20928 by Chun Liu, Lulu Kang, Yiwei Wang, Yuanxing Cheng.

**Figure 2.** Figure 2: (a) The heat map is the true function value with the black dots representing the initial training [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: The three heat maps represent three latent dimensions with respect to the original input [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: (a) The heat map represents the true function value on the 3-dim unit sphere and the dots are the [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: (a) The heat map represents the true function value [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of test RMSE over iterations: our method ( [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

This paper introduces an active learning framework for manifold Gaussian Process (GP) regression, combining manifold learning with strategic data selection to improve accuracy in high-dimensional spaces. Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the latent space, supervised by an active learning criterion that minimizes global prediction error. Experiments on synthetic data demonstrate superior performance over randomly sequential learning. The framework efficiently handles complex, discontinuous functions while preserving computational tractability, offering practical value for scientific and engineering applications. Future work will focus on scalability and uncertainty-aware manifold learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces an active learning framework for manifold Gaussian Process regression that jointly optimizes a neural network for dimensionality reduction and a GP regressor in the learned latent space. The active learning criterion is designed to minimize global prediction error, and experiments on synthetic data are claimed to show superior performance over random sequential learning while efficiently handling complex discontinuous functions.

Significance. If the joint optimization proves stable and the performance gains are reproducible with proper controls, the framework could provide a useful tool for high-dimensional regression problems with manifold structure in scientific applications. The integration of manifold learning, GPs, and active selection under a global error criterion is a potentially valuable combination, though its impact depends on addressing the non-stationary dynamics.

major comments (2)

[Abstract] Abstract: the claim of superior performance over randomly sequential learning is asserted without any quantitative results, error bars, baseline implementation details, or description of how the active learning criterion is computed and applied, which is load-bearing for the central experimental claim.
[Method] Method description: the joint optimization of NN manifold parameters, GP hyperparameters, and active point selection forms a non-stationary feedback loop driven by global prediction error in the evolving latent space; no convergence analysis, latent-space stability monitoring across iterations, or ablation comparing joint vs. alternating optimization is provided, undermining reliability for discontinuous targets.

minor comments (2)

[Abstract] Clarify the exact form of the active learning acquisition function and how it is evaluated in the latent space.
[Abstract] The term 'randomly sequential learning' should be standardized to 'random sequential learning' for consistency.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of results and methodological details.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of superior performance over randomly sequential learning is asserted without any quantitative results, error bars, baseline implementation details, or description of how the active learning criterion is computed and applied, which is load-bearing for the central experimental claim.

Authors: We agree that the abstract would benefit from greater specificity to support the central claim. In the revision we will incorporate quantitative metrics (e.g., average MSE reduction across runs), reference error bars obtained from repeated trials with different random seeds, and briefly outline the active-learning acquisition function and its application in the latent space. Full implementation details remain in the experimental section, but the abstract will now point to them explicitly. revision: yes
Referee: [Method] Method description: the joint optimization of NN manifold parameters, GP hyperparameters, and active point selection forms a non-stationary feedback loop driven by global prediction error in the evolving latent space; no convergence analysis, latent-space stability monitoring across iterations, or ablation comparing joint vs. alternating optimization is provided, undermining reliability for discontinuous targets.

Authors: We acknowledge the importance of demonstrating stability in the joint optimization. We will add empirical monitoring of latent-space stability (e.g., tracking embedding drift via Frobenius norm of successive latent representations) and include an ablation comparing joint versus alternating optimization schedules. A formal convergence proof is not feasible for this non-stationary setting, but the added empirical diagnostics and ablation will clarify practical reliability on the discontinuous synthetic targets examined. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is a novel joint optimization proposal

full rationale

The paper introduces a new active learning framework that jointly optimizes a neural network for dimensionality reduction and a GP regressor in latent space under an active learning criterion minimizing global prediction error. No derivation chain, equation, or claim reduces by construction to its own fitted outputs or self-citations. The abstract and description present the approach as an original combination of existing techniques rather than a self-referential definition or renamed fit. The central claim of superior performance on synthetic data for discontinuous functions rests on empirical demonstration, not on any load-bearing self-citation or ansatz smuggled from prior author work. This is a self-contained proposal with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available, so no concrete free parameters, axioms, or invented entities can be extracted. The central claim rests on the unstated assumption that the joint optimization converges to a useful latent representation and that the active learning criterion is well-defined and computable.

pith-pipeline@v0.9.0 · 5613 in / 1138 out tokens · 50385 ms · 2026-05-19T08:24:35.864221+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our method jointly optimizes a neural network for dimensionality reduction and a Gaussian process regressor in the latent space, supervised by an active learning criterion that minimizes global prediction error.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 1 internal anchor

[1]

When is “nearest neighbor

“When is “nearest neighbor” meaningful?”. In Database Theory—ICDT’99: 7th International Conference Jerusalem, Israel, January 10–12, 1999 Proceedings 7, 217–235. Springer. Binois, M., J. Huang, R. B. Gramacy, and M. L. and

work page 1999
[2]

Replication or Exploration? Sequential Design for Stochastic Simulation Experiments

“Replication or Exploration? Sequential Design for Stochastic Simulation Experiments”. Technometrics 61(1):7–23 https://doi.org/10.1080/00401706. 2018.1469433. Borovitskiy, V ., A. Terenin, P. Mostowsky, and M. Deisenroth (he/him)

work page doi:10.1080/00401706 2018
[3]

Manifold Gaussian processes for regression

“Manifold Gaussian processes for regression”. In 2016 International joint conference on neural networks (IJCNN) , 3338–3345. IEEE https://doi.org/10.1109/IJCNN.2016.7727626. Chen, J., L. Kang, and G. L. and

work page doi:10.1109/ijcnn.2016.7727626 2016
[4]

Gaussian Process Assisted Active Learning of Physical Laws

“Gaussian Process Assisted Active Learning of Physical Laws”. Technometrics 63(3):329–342 https://doi.org/10.1080/00401706.2020.1817790. Cheng, Kang, Wang, Liu Cohn, D

work page doi:10.1080/00401706.2020.1817790 2020
[5]

Deep Gaussian Processes

IEEE: MIT Press. Damianou, A., and N. D. Lawrence. 2013, 29 Apr–01 May. “Deep Gaussian Processes”. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, edited by C. M. Carvalho and P. Ravikumar, V olume 31 of Proceedings of Machine Learning Research , 207–215. Scottsdale, Arizona, USA: PMLR. Fichera, B., S. B...

work page 2013
[6]

Bayesian optimization

“Bayesian optimization”. In Recent advances in optimization and modeling of contem- porary problems, 255–278. Informs https://doi.org/10.1287/educ.2018.0188. Heo, J., and C.-L. Sung

work page doi:10.1287/educ.2018.0188 2018
[7]

Active learning for a recursive non-additive emulator for multi-fidelity computer experiments

“Active learning for a recursive non-additive emulator for multi-fidelity computer experiments”. Technometrics 67(1):58–72 https://doi.org/10.1080/00401706.2024.2376173. Houlsby, Neil and Huszár, Ferenc and Ghahramani, Zoubin and Lengyel, Máté

work page doi:10.1080/00401706.2024.2376173 2024
[8]

Bayesian Active Learning for Classification and Preference Learning

“Bayesian active learning for classification and preference learning” https://doi.org/10.48550/arXiv.1112.5745. Hung, Y

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1112.5745
[9]

Penalized blind kriging in computer experiments

“Penalized blind kriging in computer experiments”. Statistica Sinica 21(3):1171 https: //doi.org/10.5705/ss.2009.226. Joseph, V . R., Y . Hung, and A. Sudjianto

work page doi:10.5705/ss.2009.226 2009
[10]

Blind kriging: A new method for developing metamodels

“Blind kriging: A new method for developing metamodels”. Journal of mechanical design 130(3):031102 https://doi.org/10.1115/1.2829873. Joseph, V . R., and L. Kang

work page doi:10.1115/1.2829873
[11]

Regression-based inverse distance weighting with applications to computer experiments

“Regression-based inverse distance weighting with applications to computer experiments”. Technometrics 53(3):254–265 https://doi.org/10.1198/TECH.2011.09154. Kang, L., Y . Cheng, Y . Wang, and C. Liu

work page doi:10.1198/tech.2011.09154 2011
[12]

Energetic Variational Gaussian Process Regression for Computer Experiments

“Energetic Variational Gaussian Process Regression for Computer Experiments”. arXiv preprint arXiv:2401.00395 https://doi.org/10.48550/arXiv.2401.00395. Kang, L., Y . Cheng, Y . Wang, and C. Liu

work page doi:10.48550/arxiv.2401.00395
[13]

Energetic Variational Gaussian Process Regression

“Energetic Variational Gaussian Process Regression”. In 2024 Winter Simulation Conference (WSC), 3542–3553. INFROMS https://doi.org/10.1109/WSC63780. 2024.10838889. Kapoor, A., K. Grauman, R. Urtasun, and T. Darrell

work page doi:10.1109/wsc63780 2024
[14]

Kendall, M

“Active Learning with Gaussian Processes for Object Categorization”. In 2007 IEEE 11th International Conference on Computer Vision , 1– 8 https://doi.org/10.1109/ICCV .2007.4408844. Kim, H., D. Sanz-Alonso, and R. Yang

work page doi:10.1109/iccv 2007
[15]

Optimization on Manifolds via Graph Gaussian Processes

“Optimization on Manifolds via Graph Gaussian Processes”. SIAM Journal on Mathematics of Data Science 6(1):1–25 https://doi.org/10.1137/22M1529907. Krause, A., A. Singh, and C. Guestrin

work page doi:10.1137/22m1529907
[16]

Variational Implicit Processes

Ma, C., Y . Li, and J. M. Hernandez-Lobato. 2019, 09–15 Jun. “Variational Implicit Processes”. InProceedings of the 36th International Conference on Machine Learning, edited by K. Chaudhuri and R. Salakhutdinov, V olume 97 ofProceedings of Machine Learning Research , 4222–4233: PMLR. MacKay, D. J. C. 1992,

work page 2019
[17]

Information-Based Objective Functions for Active Data Selection

“Information-Based Objective Functions for Active Data Selection”. Neural Computation 4(4):590–604 https://doi.org/10.1162/neco.1992.4.4.590. Mallasto, A., and A. Feragen. 2018, June. “Wrapped Gaussian Process Regression on Riemannian Mani- folds”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Martinez-Cantin, R...

work page doi:10.1162/neco.1992.4.4.590 1992
[18]

Active Learning for Deep Gaussian Process Surrogates

“Active Learning for Deep Gaussian Process Surrogates”. Technometrics 65(1):4–18 https://doi.org/10.1080/00401706.2021.2008505. Seo, S., M. Wallat, T. Graepel, and K. Obermayer. “Gaussian Process Regression: Active Data Selection and Test Point Rejection”. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN

work page doi:10.1080/00401706.2021.2008505 2021
[20]

Srinivas, N., A

Neural Computing: New Challenges and Perspectives for the New Millennium, V olume 3, 241–246 vol.3 https://doi.org/10.1109/IJCNN.2000.861310. Srinivas, N., A. Krause, S. Kakade, and M. Seeger

work page doi:10.1109/ijcnn.2000.861310 2000
[21]

A global geometric framework for nonlinear di- mensionality reduction

“A global geometric framework for nonlinear di- mensionality reduction”.science 290(5500):2319–2323 https://doi.org/10.1126/science.290.5500.2319. AUTHOR BIOGRAPHIES YUANXING CHENG is a Ph.D student in the Department of Applied Mathematics at the Illinois Institute of Technology in Chicago, IL. Advised by Dr. Lulu Kang and Dr. Chun Liu, he has worked on t...

work page doi:10.1126/science.290.5500.2319

[1] [1]

When is “nearest neighbor

“When is “nearest neighbor” meaningful?”. In Database Theory—ICDT’99: 7th International Conference Jerusalem, Israel, January 10–12, 1999 Proceedings 7, 217–235. Springer. Binois, M., J. Huang, R. B. Gramacy, and M. L. and

work page 1999

[2] [2]

Replication or Exploration? Sequential Design for Stochastic Simulation Experiments

“Replication or Exploration? Sequential Design for Stochastic Simulation Experiments”. Technometrics 61(1):7–23 https://doi.org/10.1080/00401706. 2018.1469433. Borovitskiy, V ., A. Terenin, P. Mostowsky, and M. Deisenroth (he/him)

work page doi:10.1080/00401706 2018

[3] [3]

Manifold Gaussian processes for regression

“Manifold Gaussian processes for regression”. In 2016 International joint conference on neural networks (IJCNN) , 3338–3345. IEEE https://doi.org/10.1109/IJCNN.2016.7727626. Chen, J., L. Kang, and G. L. and

work page doi:10.1109/ijcnn.2016.7727626 2016

[4] [4]

Gaussian Process Assisted Active Learning of Physical Laws

“Gaussian Process Assisted Active Learning of Physical Laws”. Technometrics 63(3):329–342 https://doi.org/10.1080/00401706.2020.1817790. Cheng, Kang, Wang, Liu Cohn, D

work page doi:10.1080/00401706.2020.1817790 2020

[5] [5]

Deep Gaussian Processes

IEEE: MIT Press. Damianou, A., and N. D. Lawrence. 2013, 29 Apr–01 May. “Deep Gaussian Processes”. In Proceedings of the Sixteenth International Conference on Artificial Intelligence and Statistics, edited by C. M. Carvalho and P. Ravikumar, V olume 31 of Proceedings of Machine Learning Research , 207–215. Scottsdale, Arizona, USA: PMLR. Fichera, B., S. B...

work page 2013

[6] [6]

Bayesian optimization

“Bayesian optimization”. In Recent advances in optimization and modeling of contem- porary problems, 255–278. Informs https://doi.org/10.1287/educ.2018.0188. Heo, J., and C.-L. Sung

work page doi:10.1287/educ.2018.0188 2018

[7] [7]

Active learning for a recursive non-additive emulator for multi-fidelity computer experiments

“Active learning for a recursive non-additive emulator for multi-fidelity computer experiments”. Technometrics 67(1):58–72 https://doi.org/10.1080/00401706.2024.2376173. Houlsby, Neil and Huszár, Ferenc and Ghahramani, Zoubin and Lengyel, Máté

work page doi:10.1080/00401706.2024.2376173 2024

[8] [8]

Bayesian Active Learning for Classification and Preference Learning

“Bayesian active learning for classification and preference learning” https://doi.org/10.48550/arXiv.1112.5745. Hung, Y

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1112.5745

[9] [9]

Penalized blind kriging in computer experiments

“Penalized blind kriging in computer experiments”. Statistica Sinica 21(3):1171 https: //doi.org/10.5705/ss.2009.226. Joseph, V . R., Y . Hung, and A. Sudjianto

work page doi:10.5705/ss.2009.226 2009

[10] [10]

Blind kriging: A new method for developing metamodels

“Blind kriging: A new method for developing metamodels”. Journal of mechanical design 130(3):031102 https://doi.org/10.1115/1.2829873. Joseph, V . R., and L. Kang

work page doi:10.1115/1.2829873

[11] [11]

Regression-based inverse distance weighting with applications to computer experiments

“Regression-based inverse distance weighting with applications to computer experiments”. Technometrics 53(3):254–265 https://doi.org/10.1198/TECH.2011.09154. Kang, L., Y . Cheng, Y . Wang, and C. Liu

work page doi:10.1198/tech.2011.09154 2011

[12] [12]

Energetic Variational Gaussian Process Regression for Computer Experiments

“Energetic Variational Gaussian Process Regression for Computer Experiments”. arXiv preprint arXiv:2401.00395 https://doi.org/10.48550/arXiv.2401.00395. Kang, L., Y . Cheng, Y . Wang, and C. Liu

work page doi:10.48550/arxiv.2401.00395

[13] [13]

Energetic Variational Gaussian Process Regression

“Energetic Variational Gaussian Process Regression”. In 2024 Winter Simulation Conference (WSC), 3542–3553. INFROMS https://doi.org/10.1109/WSC63780. 2024.10838889. Kapoor, A., K. Grauman, R. Urtasun, and T. Darrell

work page doi:10.1109/wsc63780 2024

[14] [14]

Kendall, M

“Active Learning with Gaussian Processes for Object Categorization”. In 2007 IEEE 11th International Conference on Computer Vision , 1– 8 https://doi.org/10.1109/ICCV .2007.4408844. Kim, H., D. Sanz-Alonso, and R. Yang

work page doi:10.1109/iccv 2007

[15] [15]

Optimization on Manifolds via Graph Gaussian Processes

“Optimization on Manifolds via Graph Gaussian Processes”. SIAM Journal on Mathematics of Data Science 6(1):1–25 https://doi.org/10.1137/22M1529907. Krause, A., A. Singh, and C. Guestrin

work page doi:10.1137/22m1529907

[16] [16]

Variational Implicit Processes

Ma, C., Y . Li, and J. M. Hernandez-Lobato. 2019, 09–15 Jun. “Variational Implicit Processes”. InProceedings of the 36th International Conference on Machine Learning, edited by K. Chaudhuri and R. Salakhutdinov, V olume 97 ofProceedings of Machine Learning Research , 4222–4233: PMLR. MacKay, D. J. C. 1992,

work page 2019

[17] [17]

Information-Based Objective Functions for Active Data Selection

“Information-Based Objective Functions for Active Data Selection”. Neural Computation 4(4):590–604 https://doi.org/10.1162/neco.1992.4.4.590. Mallasto, A., and A. Feragen. 2018, June. “Wrapped Gaussian Process Regression on Riemannian Mani- folds”. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) . Martinez-Cantin, R...

work page doi:10.1162/neco.1992.4.4.590 1992

[18] [18]

Active Learning for Deep Gaussian Process Surrogates

“Active Learning for Deep Gaussian Process Surrogates”. Technometrics 65(1):4–18 https://doi.org/10.1080/00401706.2021.2008505. Seo, S., M. Wallat, T. Graepel, and K. Obermayer. “Gaussian Process Regression: Active Data Selection and Test Point Rejection”. In Proceedings of the IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN

work page doi:10.1080/00401706.2021.2008505 2021

[19] [20]

Srinivas, N., A

Neural Computing: New Challenges and Perspectives for the New Millennium, V olume 3, 241–246 vol.3 https://doi.org/10.1109/IJCNN.2000.861310. Srinivas, N., A. Krause, S. Kakade, and M. Seeger

work page doi:10.1109/ijcnn.2000.861310 2000

[20] [21]

A global geometric framework for nonlinear di- mensionality reduction

“A global geometric framework for nonlinear di- mensionality reduction”.science 290(5500):2319–2323 https://doi.org/10.1126/science.290.5500.2319. AUTHOR BIOGRAPHIES YUANXING CHENG is a Ph.D student in the Department of Applied Mathematics at the Illinois Institute of Technology in Chicago, IL. Advised by Dr. Lulu Kang and Dr. Chun Liu, he has worked on t...

work page doi:10.1126/science.290.5500.2319