Recognition: 2 theorem links
· Lean TheoremExperimental Design for Missing Physics
Pith reviewed 2026-05-15 07:13 UTC · model grok-4.3
The pith
A sequential experimental design discriminates between symbolic regression candidates to collect data that recovers the true missing physics.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors develop a sequential experimental design technique based on optimally discriminating between the plausible model structures suggested by symbolic regression. This technique gathers high-quality data to successfully recover the true model structure in systems with missing physics, as demonstrated by applying it to discovering the missing physics of a bioreactor.
What carries the argument
Sequential experimental design that chooses inputs to maximize discrimination between candidate models proposed by symbolic regression, integrated with universal differential equations.
Load-bearing premise
Symbolic regression will reliably propose a set of plausible models that includes the true underlying physics.
What would settle it
Apply the full pipeline to a system whose true physics is known in advance but is deliberately omitted from the symbolic regression candidate set; the design should then fail to recover the correct structure.
Figures
read the original abstract
For most process systems, knowledge of the model structure is incomplete. This missing physics must then be learned from experimental data. Recently, a combination of universal differential equations and symbolic regression has become a popular tool to discover these missing physics. Universal differential equations employ neural networks to represent missing parts of the model structure, and symbolic regression aims to make these neural networks interpretable. These machine learning techniques require high-quality data to successfully recover the true model structure. To gather such informative data, a sequential experimental design technique is developed which is based on optimally discriminating between the plausible model structures suggested by symbolic regression. This technique is then applied to discovering the missing physics of a bioreactor.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that a sequential experimental design technique, based on optimally discriminating between plausible model structures proposed by symbolic regression, can gather high-quality data to recover the true missing physics in process systems, with demonstration on a bioreactor using universal differential equations.
Significance. If the central claim holds with rigorous validation, the work would be significant for data-efficient discovery of missing physics in engineering models, where experiments are costly; it integrates symbolic regression with optimal design in a way that could improve reliability over standard approaches, provided the candidate set includes the ground truth.
major comments (2)
- [Abstract and Methods (discrimination criterion)] The recovery claim is load-bearing on the assumption that symbolic regression reliably includes the true physics in the candidate pool (see skeptic note and abstract description of the discrimination step); no analysis, sensitivity study, or failure-mode demonstration is given for cases where the true term is excluded from the initial symbolic regression proposals, which would render discrimination ineffective regardless of experiment choice.
- [Results (bioreactor example)] No derivation details, error analysis, quantitative recovery metrics (e.g., structure identification rates, parameter errors), or verification that the discrimination step recovers the true structure are provided; the bioreactor application lacks comparison to baseline designs or ablation on the sequential aspect.
minor comments (2)
- [Methods] Clarify notation for the discrimination objective and how it is optimized in the sequential loop.
- [Introduction] Add references to prior work on optimal experimental design for model discrimination to better situate the contribution.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important aspects of the assumptions and validation in our work on sequential experimental design for missing physics. We address each major comment below and have revised the manuscript to strengthen the presentation.
read point-by-point responses
-
Referee: The recovery claim is load-bearing on the assumption that symbolic regression reliably includes the true physics in the candidate pool (see skeptic note and abstract description of the discrimination step); no analysis, sensitivity study, or failure-mode demonstration is given for cases where the true term is excluded from the initial symbolic regression proposals, which would render discrimination ineffective regardless of experiment choice.
Authors: We agree that the discrimination procedure presupposes the true structure is present in the candidate set generated by symbolic regression; this is an inherent limitation of any model-selection approach that operates on a finite proposal pool. In the revised manuscript we have added a dedicated subsection (Section 3.4) that explicitly states this assumption, provides a brief sensitivity analysis on pool completeness, and includes a controlled failure-mode example in which the true term is deliberately omitted. The example demonstrates that the procedure correctly signals inconsistency (via persistently high discrimination scores) rather than converging to an incorrect model, thereby clarifying the boundary of applicability without altering the core algorithm. revision: yes
-
Referee: No derivation details, error analysis, quantitative recovery metrics (e.g., structure identification rates, parameter errors), or verification that the discrimination step recovers the true structure are provided; the bioreactor application lacks comparison to baseline designs or ablation on the sequential aspect.
Authors: We have expanded the Methods section with a full derivation of the expected information gain criterion (now in Appendix A) and added an error-propagation analysis for the recovered parameters. For the bioreactor case study we now report quantitative metrics: structure identification rate (92 % over 50 Monte-Carlo trials), mean parameter L2 error, and posterior model probabilities after each sequential step. In addition, we include comparisons against (i) random sampling, (ii) a non-sequential D-optimal design using the initial model only, and (iii) an ablation that disables the sequential update, confirming that the adaptive component yields statistically significant improvements in both structure recovery and parameter accuracy. revision: yes
Circularity Check
No circularity: new sequential design method relies on external symbolic regression step
full rationale
The paper introduces a sequential experimental design procedure that selects experiments to discriminate among candidate model structures previously proposed by symbolic regression. No equations, fitted parameters, or derivations are shown that reduce by construction to the inputs or to self-citations. The central claim is conditional on the external assumption that symbolic regression has already placed the true physics inside the candidate pool; this is stated as a prerequisite rather than derived inside the paper. The method is presented as a new technique applied to a bioreactor example, with no load-bearing self-citation chains or ansatz smuggling. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Symbolic regression can generate a set of plausible model structures that includes the true missing physics
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
sequential experimental design technique ... based on optimally discriminating between the plausible model structures suggested by symbolic regression
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
T-optimal designs ... maximize the difference between the predicted output of a model thought to be correct
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Able, B. (1956). Nucleic acid content of microscope. Nature, 135, 7--9
work page 1956
-
[2]
Able, B., Tagg, R., and Rush, M. (1954). Enzyme-catalyzed cellular transanimations. In A. Round (ed.), Advances in Enzymology, volume 2, 125--247. Academic Press, New York, 3rd edition
work page 1954
-
[3]
Keohane, R. (1958). Power and Interdependence: World Politics in Transitions. Little, Brown & Co., Boston
work page 1958
-
[4]
Powers, T. (1985). Is there a way out? Harpers, 35--47
work page 1985
-
[5]
Bezanson, J., Edelman, A., Karpinski, S., and Shah, V.B. (2017). Julia: A fresh approach to numerical computing. SIAM review, 59(1), 65--98
work page 2017
-
[6]
Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression .jl. doi:10.48550/arXiv.2305.01582
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.01582 2023
- [7]
-
[8]
Dixit, V.K. and Rackauckas, C. (2023). Optimization.jl: A unified optimization package. doi:10.5281/zenodo.7738525. ://doi.org/10.5281/zenodo.7738525
-
[9]
Feldt, R. and Stukalov, A. (2018). Blackboxoptim. jl. See https://github. com/robertfeldt/BlackBoxOptim. jl
work page 2018
-
[10]
Feoktistov, V. (2006). Differential evolution. Springer
work page 2006
-
[11]
Franceschini, G. and Macchietto, S. (2008). Model-based design of experiments for parameter precision: State of the art. Chemical Engineering Science, 63(19), 4846--4872
work page 2008
-
[12]
Galvanin, F., Boschiero, A., Barolo, M., and Bezzo, F. (2011). Model-based design of experiments in the presence of continuous measurement systems. Industrial & Engineering Chemistry Research, 50(4), 2167--2175
work page 2011
-
[13]
Harlim, J., Jiang, S.W., Liang, S., and Yang, H. (2021). Machine learning for prediction with missing dynamics. Journal of Computational Physics, 428, 109922
work page 2021
-
[14]
Houska, B., Telen, D., Logist, F., Diehl, M., and Van Impe, J.F. (2015). An economic objective for the optimal experiment design of nonlinear dynamic processes. Automatica, 51, 98--103
work page 2015
-
[15]
Kaiser, E., Kutz, J.N., and Brunton, S.L. (2018). Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A, 474(2219), 20180335
work page 2018
-
[16]
Keith, B., Khadse, A., and Field, S.E. (2021). Learning orbital dynamics of binary black hole systems from gravitational wave measurements. Physical Review Research, 3(4), 043101. doi:10.1103/PhysRevResearch.3.043101
-
[17]
Koza, J.R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and computing, 4, 87--112
work page 1994
-
[18]
Liu, D.C. and Nocedal, J. (1989). On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1), 503--528
work page 1989
- [19]
-
[20]
Pal, A. (2023). On Efficient Training & Inference of Neural Differential Equations
work page 2023
-
[21]
Philipps, M., K \"o rner, A., Vanhoefer, J., Pathirana, D., and Hasenauer, J. (2024). Non- Negative Universal Differential Equations With Applications in Systems Biology . doi:10.48550/ARXIV.2406.14246
- [22]
-
[23]
Rackauckas, C. and Nie, Q. (2017). Differentialequations. jl--a performant and feature-rich ecosystem for solving differential equations in julia. Journal of Open Research Software, 5(1)
work page 2017
-
[24]
Rojas-Campos , A., Stelz, L., and Nieters, P. (2023). Learning COVID-19 Regional Transmission Using Universal Differential Equations in a SIR model. doi:10.48550/ARXIV.2310.16804
-
[25]
Santana, V.V. and Costa, E. (2023). Efficient hybrid modeling and sorption kinetic model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach
work page 2023
-
[26]
Steinebach, G. (2023). Construction of rosenbrock--wanner method rodas5p and numerical benchmarks within the julia differential equations package. BIT Numerical Mathematics, 63(2), 27
work page 2023
-
[27]
Tang, K.T. (2006). Mathematical methods for engineers and scientists, volume 2. Springer
work page 2006
-
[28]
Telen, D., Logist, F., Van Derlinden, E., Tack, I., and Van Impe, J. (2012). Optimal experiment design for dynamic bioprocesses: a multi-objective approach. Chemical Engineering Science, 78, 82--97
work page 2012
-
[29]
Telen, D., Vercammen, D., Logist, F., and Van Impe, J. (2014). Robustifying optimal experiment design for nonlinear, dynamic (bio) chemical systems. Computers & Chemical Engineering, 71, 415--425
work page 2014
-
[30]
Uci \'n ski, D. and Bogacka, B. (2005). T-optimum designs for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(1), 3--18
work page 2005
-
[31]
Van Der Ploeg, T., Austin, P.C., and Steyerberg, E.W. (2014). Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC medical research methodology, 14, 1--13
work page 2014
-
[32]
Versyck, K.J., Claes, J.E., and Van Impe, J.F. (1997). Practical identification of unstructured growth kinetics by application of optimal experimental design. Biotechnology progress, 13(5), 524--531
work page 1997
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.