pith. sign in

arxiv: 1910.03120 · v2 · submitted 2019-10-07 · 📊 stat.ME

Gaussian Process Assisted Active Learning of Physical Laws

Pith reviewed 2026-05-24 14:15 UTC · model grok-4.3

classification 📊 stat.ME
keywords active learningGaussian processdifferential equationsD-optimalityphysical law discoveryexperimental designvariable selectionsurrogate model
0
0 comments X

The pith

An adaptive criterion blending D-optimality and maximin space-filling, using Gaussian process surrogates for solutions and derivatives, recovers differential equations more accurately from fewer noisy observations than either criterion used

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents an active learning strategy for identifying the form of unknown differential equations from noisy data when each new measurement is costly. It builds an adaptive selection rule that mixes a D-optimality objective, which normally requires the unknown solution and its derivatives, with a maximin space-filling objective that spreads points evenly. Gaussian process models fitted to the data already collected act as surrogates, supplying the needed solution values and derivative estimates for the optimality calculation. Once new points are chosen and measured, variable-selection regression identifies which terms belong in the governing equation. Multiple case studies show the combined rule yields higher accuracy at smaller sample sizes than pure D-optimality or pure space-filling.

Core claim

The proposed adaptive design criterion, which merges D-optimality based on Gaussian process estimates of solutions and derivatives with the maximin space-filling criterion, allows accurate estimation of unknown differential equations with reduced experimental data size compared to using either criterion separately, as shown in multiple case studies.

What carries the argument

Adaptive design criterion that merges D-optimality (with GP-estimated solutions and derivatives) and maximin space-filling to choose the next measurement locations.

If this is right

  • Fewer experimental measurements suffice to reach a target accuracy in the recovered differential equation.
  • The method applies directly to any system whose behavior is described by ordinary differential equations.
  • Variable selection after the active sampling step reliably identifies the sparse structure of the law.
  • Derivative estimates come from the Gaussian process rather than noisy finite differences on raw measurements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same surrogate-assisted selection could be tested on partial differential equations by replacing the GP with a suitable spatial-temporal surrogate.
  • Replacing the Gaussian process with other flexible regressors might further reduce the number of points required in high-dimensional input spaces.
  • The approach could be embedded in closed-loop laboratory control where each new measurement is decided on the fly from the current surrogate.

Load-bearing premise

Gaussian process regression models fitted to the available data serve as sufficiently accurate surrogates for the unknown solution functions and their derivatives when those surrogates are inserted into the D-optimality criterion.

What would settle it

Apply the method to a known equation such as the logistic growth model, collect data sequentially with the hybrid rule, and check whether the correct sparse equation is recovered at a sample size at which both pure D-optimality and pure maximin designs still return incorrect terms.

Figures

Figures reproduced from arXiv: 1910.03120 by Guang Lin, Jiuhai Chen, Lulu Kang.

Figure 1
Figure 1. Figure 1: Four snapshots of the solution of the estimated ODE system. Red line represents solution [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Air pollution: View of CO monitor placement for two different residential houses. The [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Air pollution: four snapshots of sequentially added batch of design points for [PITH_FULL_IMAGE:figures/full_fig_p025_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Air pollution: four snapshots of sequentially added batch of design points for [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Air pollution: the comparison between the forward stepwise regression with BIC and [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
read the original abstract

In many areas of science and engineering, discovering the governing differential equations from the noisy experimental data is an essential challenge. It is also a critical step in understanding the physical phenomena and prediction of the future behaviors of the systems. However, in many cases, it is expensive or time-consuming to collect experimental data. This article provides an active learning approach to estimate the unknown differential equations accurately with reduced experimental data size. We propose an adaptive design criterion combining the D-optimality and the maximin space-filling criterion. In contrast to active learning for other regression models, the D-optimality here requires the unknown solution of the differential equations and derivatives of the solution. We estimate the Gaussian process (GP) regression models from the available experimental data and use them as the surrogates of these unknown solution functions. The derivatives of the estimated GP models are derived and used to substitute the derivatives of the solution. Variable selection-based regression methods are used to learn the differential equations from the experimental data. Through multiple case studies, we demonstrate the proposed approach outperforms the D-optimality and the maximin space-filling design alone in terms of model accuracy and data economy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces an active learning strategy for recovering unknown differential equations from limited noisy experimental data. The approach adaptively selects new data points by combining a D-optimality criterion, where Gaussian process (GP) regression models fitted to current data serve as surrogates for the unknown solution and its derivatives, with a maximin space-filling design. Variable selection regression is then applied to identify the governing equation terms. The authors report that this hybrid method outperforms standalone D-optimality and maximin designs in model accuracy and data efficiency across multiple case studies.

Significance. If the results hold, this work could advance data-efficient discovery of physical laws by intelligently guiding experiments. The use of GP surrogates to enable D-optimality for DEs is a reasonable technical step, and the combination with space-filling provides a practical safeguard. However, the significance is tempered by the dependence on surrogate quality in early stages, which requires careful validation.

major comments (2)
  1. [Abstract/Method description] The central claim of superior performance rests on the assumption that GP surrogates provide sufficiently accurate estimates of the solution u and its derivatives for use in the D-optimality criterion. No analysis is provided on how errors in these surrogates (particularly derivatives, which are sensitive to kernel choice) propagate to the selected design points, especially when initial data is sparse. This is load-bearing because if the information matrix is dominated by surrogate artifacts, the reported gains may not generalize.
  2. [Case studies] The abstract asserts outperformance on case studies in terms of model accuracy and data economy, but provides no quantitative details such as specific error metrics, number of data points used, recovery success rates, or statistical comparisons with error bars. Without these, the strength of the empirical evidence cannot be assessed.
minor comments (1)
  1. The abstract could benefit from a brief mention of the specific differential equations used in the case studies to give readers a sense of the scope.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which help clarify the presentation and strengthen the validation of our active learning framework. We respond to each major comment below.

read point-by-point responses
  1. Referee: [Abstract/Method description] The central claim of superior performance rests on the assumption that GP surrogates provide sufficiently accurate estimates of the solution u and its derivatives for use in the D-optimality criterion. No analysis is provided on how errors in these surrogates (particularly derivatives, which are sensitive to kernel choice) propagate to the selected design points, especially when initial data is sparse. This is load-bearing because if the information matrix is dominated by surrogate artifacts, the reported gains may not generalize.

    Authors: We agree that a dedicated analysis of surrogate error propagation—particularly for derivatives under sparse initial data and varying kernels—is important for establishing robustness. The manuscript demonstrates performance through multiple case studies with different initial designs, but does not include a formal sensitivity study on this point. We will add a new subsection with additional experiments quantifying the effect of kernel choice and initial sparsity on the D-optimality criterion and selected points. revision: yes

  2. Referee: [Case studies] The abstract asserts outperformance on case studies in terms of model accuracy and data economy, but provides no quantitative details such as specific error metrics, number of data points used, recovery success rates, or statistical comparisons with error bars. Without these, the strength of the empirical evidence cannot be assessed.

    Authors: The full manuscript reports quantitative results (error metrics, data sizes, and comparisons) in the case studies section. To address the concern directly, we will revise the abstract to incorporate specific quantitative highlights from those results, including representative error reductions and recovery rates with variability measures. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method is standard surrogate-assisted active learning

full rationale

The paper describes fitting GPs to data as surrogates for the unknown solution u and derivatives, then inserting those into a combined D-optimality + maximin criterion to select new points, followed by sparse regression for the DE. This is a conventional adaptive design loop with no self-definitional steps, no fitted parameters renamed as predictions, and no load-bearing self-citations that reduce the central claim to its own inputs. The GP surrogate step is an external modeling choice whose accuracy is an empirical assumption, not a definitional tautology. The derivation chain remains independent of the target DE coefficients.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Review performed from abstract only; the ledger therefore records only the assumptions explicitly required by the described workflow.

axioms (1)
  • domain assumption Gaussian process regression models can serve as accurate surrogates for unknown solution functions and their derivatives when inserted into a D-optimality criterion.
    The design criterion explicitly requires these unknown quantities; the paper substitutes GP estimates for them.

pith-pipeline@v0.9.0 · 5727 in / 1269 out tokens · 19318 ms · 2026-05-24T14:15:16.232482+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages · 1 internal anchor

  1. [1]

    M., Kendall, M

    Beale, E. M., Kendall, M. G., and Mann, D. W. (1967), The discarding of variables in multivariate analysis. Biometrika, 54, 357--366

  2. [2]

    (2016), Best subset selection via a modern optimization lens, The annals of statistics, 44, 813--852

    Bertsimas, D., King, A., Mazumder, R., et al. (2016), Best subset selection via a modern optimization lens, The annals of statistics, 44, 813--852

  3. [3]

    B., and Ludkovski, M

    Binois, M., Huang, J., Gramacy, R. B., and Ludkovski, M. (2019), Replication or exploration? Sequential design for stochastic simulation experiments, Technometrics, 61, 7--23

  4. [4]

    Blot, W. J. and Meeter, D. A. (1973), Sequential Experimental Design Procedures, Journal of the American Statistical Association, 68, 586--593

  5. [5]

    and Lipson, H

    Bongard, J. and Lipson, H. (2007), Automated reverse engineering of nonlinear dynamical systems, Proceedings of the National Academy of Sciences, 104, 9943--9948

  6. [6]

    L., Proctor, J

    Brunton, S. L., Proctor, J. L., and Kutz, J. N. (2016), Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proceedings of the National Academy of Sciences, 113, 3932--3937

  7. [7]

    E., Ott, W

    Cheng, K.-C., Acevedo-Bolton, V., Jiang, R.-T., Klepeis, N. E., Ott, W. R., Fringer, O. B., and Hildemann, L. M. (2011), Modeling exposure close to air pollution sources in naturally ventilated residences: Association of turbulent diffusion coefficient with air change rate, Environmental science & technology, 45, 4016--4022

  8. [8]

    (1959), Sequential design of experiments, The Annals of Mathematical Statistics, 30, 755--770

    Chernoff, H. (1959), Sequential design of experiments, The Annals of Mathematical Statistics, 30, 755--770

  9. [9]

    R., Wang, Z., and Wu, C

    Dasgupta, T., Ma, C., Joseph, V. R., Wang, Z., and Wu, C. J. (2008), Statistical modeling and analysis for robust synthesis of nanostructures, Journal of the American Statistical Association, 103, 594--603

  10. [10]

    R., Sudjianto, A., and Wu, C

    Deng, X., Joseph, V. R., Sudjianto, A., and Wu, C. J. (2009), Active learning through sequential design, with applications to detection of money laundering, Journal of the American Statistical Association, 104, 969--981

  11. [11]

    Draper, N. R. and Smith, H. (2014), Applied regression analysis, vol. 326, John Wiley & Sons

  12. [12]

    Dror, H. A. and Steinberg, D. M. (2008), Sequential experimental designs for generalized linear models, Journal of the American Statistical Association, 103, 288--298

  13. [13]

    (1983), Cross validation of kriging in a unique neighborhood, Journal of the International Association for Mathematical Geology, 15, 687--699

    Dubrule, O. (1983), Cross validation of kriging in a unique neighborhood, Journal of the International Association for Mathematical Geology, 15, 687--699

  14. [14]

    Egan, B. A. and Mahoney, J. R. (1972), Numerical modeling of advection and diffusion of urban area source pollutants, Journal of applied meteorology, 11, 312--322

  15. [15]

    Eriksson, D., Dong, K., Lee, E., Bindel, D., and Wilson, A. G. (2018), Scaling Gaussian Process Regression with Derivatives, in Advances in Neural Information Processing Systems 31, eds. Bengio, S., Wallach, H., Larochelle, H., Grauman, K., Cesa-Bianchi, N., and Garnett, R., Curran Associates, Inc., pp. 6867--6877

  16. [16]

    (2010), Optimal experimental design, Wiley Interdisciplinary Reviews: Computational Statistics, 2, 581--589

    Fedorov, V. (2010), Optimal experimental design, Wiley Interdisciplinary Reviews: Computational Statistics, 2, 581--589

  17. [17]

    and Steinberg, D

    Harari, O. and Steinberg, D. M. (2014), Optimal designs for Gaussian process models| via spectral decomposition, Journal of Statistical Planning and Inference, 154, 87--101

  18. [18]

    Hocking, R. R. and Leslie, R. N. (1967), Selection of the Best Subset in Regression Analysis, Technometrics, 9, 531--540

  19. [19]

    (1990), Minimax and maximin distance designs, Journal of Statistical Planning and Inference, 26, 131 -- 148

    Johnson, M., Moore, L., and Ylvisaker, D. (1990), Minimax and maximin distance designs, Journal of Statistical Planning and Inference, 26, 131 -- 148

  20. [20]

    Joseph, V. R. (2016), Space-filling designs for computer experiments: A review, Quality Engineering, 28, 28--35

  21. [21]

    Kanagawa, M., Hennig, P., Sejdinovic, D., and Sriperumbudur, B. K. (2018), Gaussian processes and kernel methods: A review on connections and equivalences, arXiv preprint arXiv:1807.02582

  22. [22]

    K., Tsui, K.-L., and Chen, V

    Lin, Y., Mistree, F., Allen, J. K., Tsui, K.-L., and Chen, V. C. (2004), A sequential exploratory experimental design method: development of appropriate empirical models in design, in ASME 2004 International Design Engineering Technical Conferences and Computers and Information in Engineering Conference, American Society of Mechanical Engineers, pp. 1021--1035

  23. [23]

    L., Moore, L

    Loeppky, J. L., Moore, L. M., and Williams, B. J. (2010), Batch sequential designs for computer experiments, Journal of Statistical Planning and Inference, 140, 1452--1464

  24. [24]

    (2018), PDE -Net: Learning PDE s from Data, in Proceedings of the 35th International Conference on Machine Learning, eds

    Long, Z., Lu, Y., Ma, X., and Dong, B. (2018), PDE -Net: Learning PDE s from Data, in Proceedings of the 35th International Conference on Machine Learning, eds. Dy, J. and Krause, A., Stockholmsmässan, Stockholm Sweden: PMLR, vol. 80 of Proceedings of Machine Learning Research, pp. 3208--3216

  25. [25]

    Mahajan, V., Muller, E., and Bass, F. M. (1995), Diffusion of new products: Empirical generalizations and managerial uses, Marketing science, 14, G79--G88

  26. [26]

    Myers, D. E. (1982), Matrix formulation of co-kriging, Journal of the International Association for Mathematical Geology, 14, 249--257

  27. [27]

    and Karniadakis, G

    Raissi, M. and Karniadakis, G. E. (2018), Hidden physics models: Machine learning of nonlinear partial differential equations, Journal of Computational Physics, 357, 125--141

  28. [28]

    J., Williams, B

    Santner, T. J., Williams, B. J., Notz, W., and Williams, B. J. (2003), The design and analysis of computer experiments, vol. 1, New York: Springer

  29. [29]

    Schaeffer, H. (2017), Learning partial differential equations via data discovery and sparse optimization, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 473, 20160446

  30. [30]

    A., Friedman, R

    Scheff, P. A., Friedman, R. L., Franke, J. E., Conroy, L. M., and Wadden, R. A. (1992), Source activity modeling of freon emissions from open-top vapor degreasers, Applied Occupational and Environmental Hygiene, 7, 127--134

  31. [31]

    Tan, M. H. Y. (2018), Gaussian process modeling with boundary information, Statistica Sinica, 28, 621--648

  32. [32]

    (1996), Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society, 58, 267--288

    Tibshirani, R. (1996), Regression Shrinkage and Selection via the Lasso, Journal of the Royal Statistical Society, 58, 267--288

  33. [33]

    (2004), Scattered data approximation, vol

    Wendland, H. (2004), Scattered data approximation, vol. 17, Cambridge university press

  34. [34]

    J., Santner, T

    Williams, B. J., Santner, T. J., and Notz, W. I. (2000), Sequential design of computer experiments to minimize integrated response functions, Statistica Sinica, 1133--1152

  35. [35]

    M., and Anitescu, M

    Yu, J., Zavala, V. M., and Anitescu, M. (2018), A scalable design of experiments framework for optimal sensor placement, Journal of Process Control, 67, 44--55

  36. [36]

    and Lin, G

    Zhang, S. and Lin, G. (2018), Robust data-driven discovery of governing physical laws with error bars, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, 474, 20180305

  37. [37]

    , " * write output.state after.block = add.period write newline

    ENTRY address author booktitle chapter edition editor howpublished institution journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.all := #1 'mid.sentence := #2 '...

  38. [38]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in " " * FUNCTION format....