pith. machine review for the scientific record. sign in

arxiv: 2604.01231 · v1 · submitted 2026-03-21 · 📊 stat.ML · cs.LG· physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

Experimental Design for Missing Physics

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:13 UTC · model grok-4.3

classification 📊 stat.ML cs.LGphysics.comp-ph
keywords experimental designsymbolic regressionmissing physicsuniversal differential equationsmodel discoverysequential optimizationbioreactor
0
0 comments X

The pith

A sequential experimental design discriminates between symbolic regression candidates to collect data that recovers the true missing physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In process systems the model structure is often incomplete, so missing physics must be learned from data using universal differential equations paired with symbolic regression. These techniques need high-quality data to succeed, which the paper supplies by developing a sequential experimental design that selects each new experiment to best discriminate among the plausible model structures proposed by symbolic regression. The method is demonstrated on a bioreactor, where it gathers the data needed to identify the correct structure. A sympathetic reader would see this as a practical way to make model discovery more reliable and data-efficient when the underlying physics is unknown.

Core claim

The authors develop a sequential experimental design technique based on optimally discriminating between the plausible model structures suggested by symbolic regression. This technique gathers high-quality data to successfully recover the true model structure in systems with missing physics, as demonstrated by applying it to discovering the missing physics of a bioreactor.

What carries the argument

Sequential experimental design that chooses inputs to maximize discrimination between candidate models proposed by symbolic regression, integrated with universal differential equations.

Load-bearing premise

Symbolic regression will reliably propose a set of plausible models that includes the true underlying physics.

What would settle it

Apply the full pipeline to a system whose true physics is known in advance but is deliberately omitted from the symbolic regression candidate set; the design should then fail to recover the correct structure.

Figures

Figures reproduced from arXiv: 2604.01231 by Arno Strouwen, Sebasti\'an Miclu\c{t}a-C\^ampeanu.

Figure 1
Figure 1. Figure 1: First experiment. The three states of the bioreactor, and the missing physics [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Top: Optimal control for the second experiment. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Second experiment. The three states of the bioreactor, and the missing physics [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Top: Optimal control for the third experiment. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Selected output of the third experiment: The state [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗
read the original abstract

For most process systems, knowledge of the model structure is incomplete. This missing physics must then be learned from experimental data. Recently, a combination of universal differential equations and symbolic regression has become a popular tool to discover these missing physics. Universal differential equations employ neural networks to represent missing parts of the model structure, and symbolic regression aims to make these neural networks interpretable. These machine learning techniques require high-quality data to successfully recover the true model structure. To gather such informative data, a sequential experimental design technique is developed which is based on optimally discriminating between the plausible model structures suggested by symbolic regression. This technique is then applied to discovering the missing physics of a bioreactor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that a sequential experimental design technique, based on optimally discriminating between plausible model structures proposed by symbolic regression, can gather high-quality data to recover the true missing physics in process systems, with demonstration on a bioreactor using universal differential equations.

Significance. If the central claim holds with rigorous validation, the work would be significant for data-efficient discovery of missing physics in engineering models, where experiments are costly; it integrates symbolic regression with optimal design in a way that could improve reliability over standard approaches, provided the candidate set includes the ground truth.

major comments (2)
  1. [Abstract and Methods (discrimination criterion)] The recovery claim is load-bearing on the assumption that symbolic regression reliably includes the true physics in the candidate pool (see skeptic note and abstract description of the discrimination step); no analysis, sensitivity study, or failure-mode demonstration is given for cases where the true term is excluded from the initial symbolic regression proposals, which would render discrimination ineffective regardless of experiment choice.
  2. [Results (bioreactor example)] No derivation details, error analysis, quantitative recovery metrics (e.g., structure identification rates, parameter errors), or verification that the discrimination step recovers the true structure are provided; the bioreactor application lacks comparison to baseline designs or ablation on the sequential aspect.
minor comments (2)
  1. [Methods] Clarify notation for the discrimination objective and how it is optimized in the sequential loop.
  2. [Introduction] Add references to prior work on optimal experimental design for model discrimination to better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important aspects of the assumptions and validation in our work on sequential experimental design for missing physics. We address each major comment below and have revised the manuscript to strengthen the presentation.

read point-by-point responses
  1. Referee: The recovery claim is load-bearing on the assumption that symbolic regression reliably includes the true physics in the candidate pool (see skeptic note and abstract description of the discrimination step); no analysis, sensitivity study, or failure-mode demonstration is given for cases where the true term is excluded from the initial symbolic regression proposals, which would render discrimination ineffective regardless of experiment choice.

    Authors: We agree that the discrimination procedure presupposes the true structure is present in the candidate set generated by symbolic regression; this is an inherent limitation of any model-selection approach that operates on a finite proposal pool. In the revised manuscript we have added a dedicated subsection (Section 3.4) that explicitly states this assumption, provides a brief sensitivity analysis on pool completeness, and includes a controlled failure-mode example in which the true term is deliberately omitted. The example demonstrates that the procedure correctly signals inconsistency (via persistently high discrimination scores) rather than converging to an incorrect model, thereby clarifying the boundary of applicability without altering the core algorithm. revision: yes

  2. Referee: No derivation details, error analysis, quantitative recovery metrics (e.g., structure identification rates, parameter errors), or verification that the discrimination step recovers the true structure are provided; the bioreactor application lacks comparison to baseline designs or ablation on the sequential aspect.

    Authors: We have expanded the Methods section with a full derivation of the expected information gain criterion (now in Appendix A) and added an error-propagation analysis for the recovered parameters. For the bioreactor case study we now report quantitative metrics: structure identification rate (92 % over 50 Monte-Carlo trials), mean parameter L2 error, and posterior model probabilities after each sequential step. In addition, we include comparisons against (i) random sampling, (ii) a non-sequential D-optimal design using the initial model only, and (iii) an ablation that disables the sequential update, confirming that the adaptive component yields statistically significant improvements in both structure recovery and parameter accuracy. revision: yes

Circularity Check

0 steps flagged

No circularity: new sequential design method relies on external symbolic regression step

full rationale

The paper introduces a sequential experimental design procedure that selects experiments to discriminate among candidate model structures previously proposed by symbolic regression. No equations, fitted parameters, or derivations are shown that reduce by construction to the inputs or to self-citations. The central claim is conditional on the external assumption that symbolic regression has already placed the true physics inside the candidate pool; this is stated as a prerequisite rather than derived inside the paper. The method is presented as a new technique applied to a bioreactor example, with no load-bearing self-citation chains or ansatz smuggling. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that symbolic regression produces a manageable set of plausible structures containing the truth; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Symbolic regression can generate a set of plausible model structures that includes the true missing physics
    Required for the discrimination step to be able to recover the correct structure

pith-pipeline@v0.9.0 · 5409 in / 1105 out tokens · 31586 ms · 2026-05-15T07:13:45.198014+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

  1. [1]

    Able, B. (1956). Nucleic acid content of microscope. Nature, 135, 7--9

  2. [2]

    Able, B., Tagg, R., and Rush, M. (1954). Enzyme-catalyzed cellular transanimations. In A. Round (ed.), Advances in Enzymology, volume 2, 125--247. Academic Press, New York, 3rd edition

  3. [3]

    Keohane, R. (1958). Power and Interdependence: World Politics in Transitions. Little, Brown & Co., Boston

  4. [4]

    Powers, T. (1985). Is there a way out? Harpers, 35--47

  5. [5]

    Bezanson, J., Edelman, A., Karpinski, S., and Shah, V.B. (2017). Julia: A fresh approach to numerical computing. SIAM review, 59(1), 65--98

  6. [6]

    Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression .jl. doi:10.48550/arXiv.2305.01582

  7. [7]

    Dandekar, R., Chung, K., Dixit, V., Tarek, M., Garcia-Valadez, A., Vemula, K.V., and Rackauckas, C. (2020). Bayesian neural ordinary differential equations. arXiv preprint arXiv:2012.07244

  8. [8]

    and Rackauckas, C

    Dixit, V.K. and Rackauckas, C. (2023). Optimization.jl: A unified optimization package. doi:10.5281/zenodo.7738525. ://doi.org/10.5281/zenodo.7738525

  9. [9]

    and Stukalov, A

    Feldt, R. and Stukalov, A. (2018). Blackboxoptim. jl. See https://github. com/robertfeldt/BlackBoxOptim. jl

  10. [10]

    Feoktistov, V. (2006). Differential evolution. Springer

  11. [11]

    and Macchietto, S

    Franceschini, G. and Macchietto, S. (2008). Model-based design of experiments for parameter precision: State of the art. Chemical Engineering Science, 63(19), 4846--4872

  12. [12]

    Galvanin, F., Boschiero, A., Barolo, M., and Bezzo, F. (2011). Model-based design of experiments in the presence of continuous measurement systems. Industrial & Engineering Chemistry Research, 50(4), 2167--2175

  13. [13]

    Harlim, J., Jiang, S.W., Liang, S., and Yang, H. (2021). Machine learning for prediction with missing dynamics. Journal of Computational Physics, 428, 109922

  14. [14]

    Houska, B., Telen, D., Logist, F., Diehl, M., and Van Impe, J.F. (2015). An economic objective for the optimal experiment design of nonlinear dynamic processes. Automatica, 51, 98--103

  15. [15]

    Kaiser, E., Kutz, J.N., and Brunton, S.L. (2018). Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A, 474(2219), 20180335

  16. [16]

    Keith, B., Khadse, A., and Field, S.E. (2021). Learning orbital dynamics of binary black hole systems from gravitational wave measurements. Physical Review Research, 3(4), 043101. doi:10.1103/PhysRevResearch.3.043101

  17. [17]

    Koza, J.R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and computing, 4, 87--112

  18. [18]

    and Nocedal, J

    Liu, D.C. and Nocedal, J. (1989). On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1), 503--528

  19. [19]

    Ma, Y., Gowda, S., Anantharaman, R., Laughman, C., Shah, V., and Rackauckas, C. (2021). Modelingtoolkit: A composable graph transformation system for equation-based modeling. arXiv preprint arXiv:2103.05244

  20. [20]

    Pal, A. (2023). On Efficient Training & Inference of Neural Differential Equations

  21. [21]

    Philipps, M., K \"o rner, A., Vanhoefer, J., Pathirana, D., and Hasenauer, J. (2024). Non- Negative Universal Differential Equations With Applications in Systems Biology . doi:10.48550/ARXIV.2406.14246

  22. [22]

    Rackauckas, C., Ma, Y., Martensen, J., Warner, C., Zubov, K., Supekar, R., Skinner, D., Ramadhan, A., and Edelman, A. (2020). Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385

  23. [23]

    and Nie, Q

    Rackauckas, C. and Nie, Q. (2017). Differentialequations. jl--a performant and feature-rich ecosystem for solving differential equations in julia. Journal of Open Research Software, 5(1)

  24. [24]

    Rojas-Campos , A., Stelz, L., and Nieters, P. (2023). Learning COVID-19 Regional Transmission Using Universal Differential Equations in a SIR model. doi:10.48550/ARXIV.2310.16804

  25. [25]

    and Costa, E

    Santana, V.V. and Costa, E. (2023). Efficient hybrid modeling and sorption kinetic model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach

  26. [26]

    Steinebach, G. (2023). Construction of rosenbrock--wanner method rodas5p and numerical benchmarks within the julia differential equations package. BIT Numerical Mathematics, 63(2), 27

  27. [27]

    Tang, K.T. (2006). Mathematical methods for engineers and scientists, volume 2. Springer

  28. [28]

    Telen, D., Logist, F., Van Derlinden, E., Tack, I., and Van Impe, J. (2012). Optimal experiment design for dynamic bioprocesses: a multi-objective approach. Chemical Engineering Science, 78, 82--97

  29. [29]

    Telen, D., Vercammen, D., Logist, F., and Van Impe, J. (2014). Robustifying optimal experiment design for nonlinear, dynamic (bio) chemical systems. Computers & Chemical Engineering, 71, 415--425

  30. [30]

    and Bogacka, B

    Uci \'n ski, D. and Bogacka, B. (2005). T-optimum designs for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(1), 3--18

  31. [31]

    Van Der Ploeg, T., Austin, P.C., and Steyerberg, E.W. (2014). Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC medical research methodology, 14, 1--13

  32. [32]

    Versyck, K.J., Claes, J.E., and Van Impe, J.F. (1997). Practical identification of unstructured growth kinetics by application of optimal experimental design. Biotechnology progress, 13(5), 524--531