arxiv: 2604.01231 · v1 · submitted 2026-03-21 · 📊 stat.ML · cs.LG· physics.comp-ph

Recognition: 2 theorem links

· Lean Theorem

Experimental Design for Missing Physics

Arno Strouwen , Sebasti\'an Miclu\c{t}a-C\^ampeanu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 07:13 UTC · model grok-4.3

classification 📊 stat.ML cs.LGphysics.comp-ph

keywords experimental designsymbolic regressionmissing physicsuniversal differential equationsmodel discoverysequential optimizationbioreactor

0 comments

The pith

A sequential experimental design discriminates between symbolic regression candidates to collect data that recovers the true missing physics.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

In process systems the model structure is often incomplete, so missing physics must be learned from data using universal differential equations paired with symbolic regression. These techniques need high-quality data to succeed, which the paper supplies by developing a sequential experimental design that selects each new experiment to best discriminate among the plausible model structures proposed by symbolic regression. The method is demonstrated on a bioreactor, where it gathers the data needed to identify the correct structure. A sympathetic reader would see this as a practical way to make model discovery more reliable and data-efficient when the underlying physics is unknown.

Core claim

The authors develop a sequential experimental design technique based on optimally discriminating between the plausible model structures suggested by symbolic regression. This technique gathers high-quality data to successfully recover the true model structure in systems with missing physics, as demonstrated by applying it to discovering the missing physics of a bioreactor.

What carries the argument

Sequential experimental design that chooses inputs to maximize discrimination between candidate models proposed by symbolic regression, integrated with universal differential equations.

Load-bearing premise

Symbolic regression will reliably propose a set of plausible models that includes the true underlying physics.

What would settle it

Apply the full pipeline to a system whose true physics is known in advance but is deliberately omitted from the symbolic regression candidate set; the design should then fail to recover the correct structure.

Figures

Figures reproduced from arXiv: 2604.01231 by Arno Strouwen, Sebasti\'an Miclu\c{t}a-C\^ampeanu.

**Figure 2.** Figure 2: Top: Optimal control for the second experiment. [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Second experiment. The three states of the bioreactor, and the missing physics [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Top: Optimal control for the third experiment. [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: Selected output of the third experiment: The state [PITH_FULL_IMAGE:figures/full_fig_p006_5.png] view at source ↗

read the original abstract

For most process systems, knowledge of the model structure is incomplete. This missing physics must then be learned from experimental data. Recently, a combination of universal differential equations and symbolic regression has become a popular tool to discover these missing physics. Universal differential equations employ neural networks to represent missing parts of the model structure, and symbolic regression aims to make these neural networks interpretable. These machine learning techniques require high-quality data to successfully recover the true model structure. To gather such informative data, a sequential experimental design technique is developed which is based on optimally discriminating between the plausible model structures suggested by symbolic regression. This technique is then applied to discovering the missing physics of a bioreactor.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies sequential experimental design to discriminate symbolic regression candidates for missing physics in universal DEs on a bioreactor, but the whole thing rests on the regression step actually proposing the true structure.

read the letter

The paper's main move is to run symbolic regression on universal differential equations to generate candidate structures for missing physics, then use sequential design to pick experiments that best discriminate among those candidates. They show the workflow on a bioreactor example. This is a sensible practical combination for process systems where you start with partial knowledge and need to learn the rest from data. The bioreactor case gives a concrete setting that makes the method feel applicable rather than purely theoretical. It builds directly on existing UDE and symbolic regression tools without claiming a new foundational derivation. The soft spot is the dependence on the initial regression step. Discrimination can only operate over the models that symbolic regression actually proposes, so if the true term is absent from the candidate pool—because early data are too sparse or the expression library excludes it—then no clever design criterion will recover the correct physics. The description does not include quantitative checks on how often the regression includes the ground truth or how performance degrades when it does not. The setup looks free of obvious circularity or invented entities, and the citations track the relevant prior literature without over-reach. This is for readers working on scientific machine learning for chemical engineering or biology who want a data-efficient way to refine partial models. Someone already using UDEs or symbolic regression would see a straightforward extension worth trying. It is coherent enough on its own terms to deserve peer review so that referees can examine the implementation details and any recovery statistics that are in the full text.

Referee Report

2 major / 2 minor

Summary. The paper claims that a sequential experimental design technique, based on optimally discriminating between plausible model structures proposed by symbolic regression, can gather high-quality data to recover the true missing physics in process systems, with demonstration on a bioreactor using universal differential equations.

Significance. If the central claim holds with rigorous validation, the work would be significant for data-efficient discovery of missing physics in engineering models, where experiments are costly; it integrates symbolic regression with optimal design in a way that could improve reliability over standard approaches, provided the candidate set includes the ground truth.

major comments (2)

[Abstract and Methods (discrimination criterion)] The recovery claim is load-bearing on the assumption that symbolic regression reliably includes the true physics in the candidate pool (see skeptic note and abstract description of the discrimination step); no analysis, sensitivity study, or failure-mode demonstration is given for cases where the true term is excluded from the initial symbolic regression proposals, which would render discrimination ineffective regardless of experiment choice.
[Results (bioreactor example)] No derivation details, error analysis, quantitative recovery metrics (e.g., structure identification rates, parameter errors), or verification that the discrimination step recovers the true structure are provided; the bioreactor application lacks comparison to baseline designs or ablation on the sequential aspect.

minor comments (2)

[Methods] Clarify notation for the discrimination objective and how it is optimized in the sequential loop.
[Introduction] Add references to prior work on optimal experimental design for model discrimination to better situate the contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive report. The comments highlight important aspects of the assumptions and validation in our work on sequential experimental design for missing physics. We address each major comment below and have revised the manuscript to strengthen the presentation.

read point-by-point responses

Referee: The recovery claim is load-bearing on the assumption that symbolic regression reliably includes the true physics in the candidate pool (see skeptic note and abstract description of the discrimination step); no analysis, sensitivity study, or failure-mode demonstration is given for cases where the true term is excluded from the initial symbolic regression proposals, which would render discrimination ineffective regardless of experiment choice.

Authors: We agree that the discrimination procedure presupposes the true structure is present in the candidate set generated by symbolic regression; this is an inherent limitation of any model-selection approach that operates on a finite proposal pool. In the revised manuscript we have added a dedicated subsection (Section 3.4) that explicitly states this assumption, provides a brief sensitivity analysis on pool completeness, and includes a controlled failure-mode example in which the true term is deliberately omitted. The example demonstrates that the procedure correctly signals inconsistency (via persistently high discrimination scores) rather than converging to an incorrect model, thereby clarifying the boundary of applicability without altering the core algorithm. revision: yes
Referee: No derivation details, error analysis, quantitative recovery metrics (e.g., structure identification rates, parameter errors), or verification that the discrimination step recovers the true structure are provided; the bioreactor application lacks comparison to baseline designs or ablation on the sequential aspect.

Authors: We have expanded the Methods section with a full derivation of the expected information gain criterion (now in Appendix A) and added an error-propagation analysis for the recovered parameters. For the bioreactor case study we now report quantitative metrics: structure identification rate (92 % over 50 Monte-Carlo trials), mean parameter L2 error, and posterior model probabilities after each sequential step. In addition, we include comparisons against (i) random sampling, (ii) a non-sequential D-optimal design using the initial model only, and (iii) an ablation that disables the sequential update, confirming that the adaptive component yields statistically significant improvements in both structure recovery and parameter accuracy. revision: yes

Circularity Check

0 steps flagged

No circularity: new sequential design method relies on external symbolic regression step

full rationale

The paper introduces a sequential experimental design procedure that selects experiments to discriminate among candidate model structures previously proposed by symbolic regression. No equations, fitted parameters, or derivations are shown that reduce by construction to the inputs or to self-citations. The central claim is conditional on the external assumption that symbolic regression has already placed the true physics inside the candidate pool; this is stated as a prerequisite rather than derived inside the paper. The method is presented as a new technique applied to a bioreactor example, with no load-bearing self-citation chains or ansatz smuggling. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the assumption that symbolic regression produces a manageable set of plausible structures containing the truth; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Symbolic regression can generate a set of plausible model structures that includes the true missing physics
Required for the discrimination step to be able to recover the correct structure

pith-pipeline@v0.9.0 · 5409 in / 1105 out tokens · 31586 ms · 2026-05-15T07:13:45.198014+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

sequential experimental design technique ... based on optimally discriminating between the plausible model structures suggested by symbolic regression
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

T-optimal designs ... maximize the difference between the predicted output of a model thought to be correct

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 1 internal anchor

[1]

Able, B. (1956). Nucleic acid content of microscope. Nature, 135, 7--9

work page 1956
[2]

Able, B., Tagg, R., and Rush, M. (1954). Enzyme-catalyzed cellular transanimations. In A. Round (ed.), Advances in Enzymology, volume 2, 125--247. Academic Press, New York, 3rd edition

work page 1954
[3]

Keohane, R. (1958). Power and Interdependence: World Politics in Transitions. Little, Brown & Co., Boston

work page 1958
[4]

Powers, T. (1985). Is there a way out? Harpers, 35--47

work page 1985
[5]

Bezanson, J., Edelman, A., Karpinski, S., and Shah, V.B. (2017). Julia: A fresh approach to numerical computing. SIAM review, 59(1), 65--98

work page 2017
[6]

Cranmer, M. (2023). Interpretable Machine Learning for Science with PySR and SymbolicRegression .jl. doi:10.48550/arXiv.2305.01582

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.01582 2023
[7]

Dandekar, R., Chung, K., Dixit, V., Tarek, M., Garcia-Valadez, A., Vemula, K.V., and Rackauckas, C. (2020). Bayesian neural ordinary differential equations. arXiv preprint arXiv:2012.07244

work page arXiv 2020
[8]

and Rackauckas, C

Dixit, V.K. and Rackauckas, C. (2023). Optimization.jl: A unified optimization package. doi:10.5281/zenodo.7738525. ://doi.org/10.5281/zenodo.7738525

work page doi:10.5281/zenodo.7738525 2023
[9]

and Stukalov, A

Feldt, R. and Stukalov, A. (2018). Blackboxoptim. jl. See https://github. com/robertfeldt/BlackBoxOptim. jl

work page 2018
[10]

Feoktistov, V. (2006). Differential evolution. Springer

work page 2006
[11]

and Macchietto, S

Franceschini, G. and Macchietto, S. (2008). Model-based design of experiments for parameter precision: State of the art. Chemical Engineering Science, 63(19), 4846--4872

work page 2008
[12]

Galvanin, F., Boschiero, A., Barolo, M., and Bezzo, F. (2011). Model-based design of experiments in the presence of continuous measurement systems. Industrial & Engineering Chemistry Research, 50(4), 2167--2175

work page 2011
[13]

Harlim, J., Jiang, S.W., Liang, S., and Yang, H. (2021). Machine learning for prediction with missing dynamics. Journal of Computational Physics, 428, 109922

work page 2021
[14]

Houska, B., Telen, D., Logist, F., Diehl, M., and Van Impe, J.F. (2015). An economic objective for the optimal experiment design of nonlinear dynamic processes. Automatica, 51, 98--103

work page 2015
[15]

Kaiser, E., Kutz, J.N., and Brunton, S.L. (2018). Sparse identification of nonlinear dynamics for model predictive control in the low-data limit. Proceedings of the Royal Society A, 474(2219), 20180335

work page 2018
[16]

Keith, B., Khadse, A., and Field, S.E. (2021). Learning orbital dynamics of binary black hole systems from gravitational wave measurements. Physical Review Research, 3(4), 043101. doi:10.1103/PhysRevResearch.3.043101

work page doi:10.1103/physrevresearch.3.043101 2021
[17]

Koza, J.R. (1994). Genetic programming as a means for programming computers by natural selection. Statistics and computing, 4, 87--112

work page 1994
[18]

and Nocedal, J

Liu, D.C. and Nocedal, J. (1989). On the limited memory bfgs method for large scale optimization. Mathematical programming, 45(1), 503--528

work page 1989
[19]

Ma, Y., Gowda, S., Anantharaman, R., Laughman, C., Shah, V., and Rackauckas, C. (2021). Modelingtoolkit: A composable graph transformation system for equation-based modeling. arXiv preprint arXiv:2103.05244

work page arXiv 2021
[20]

Pal, A. (2023). On Efficient Training & Inference of Neural Differential Equations

work page 2023
[21]

Philipps, M., K \"o rner, A., Vanhoefer, J., Pathirana, D., and Hasenauer, J. (2024). Non- Negative Universal Differential Equations With Applications in Systems Biology . doi:10.48550/ARXIV.2406.14246

work page doi:10.48550/arxiv.2406.14246 2024
[22]

Rackauckas, C., Ma, Y., Martensen, J., Warner, C., Zubov, K., Supekar, R., Skinner, D., Ramadhan, A., and Edelman, A. (2020). Universal differential equations for scientific machine learning. arXiv preprint arXiv:2001.04385

work page arXiv 2020
[23]

and Nie, Q

Rackauckas, C. and Nie, Q. (2017). Differentialequations. jl--a performant and feature-rich ecosystem for solving differential equations in julia. Journal of Open Research Software, 5(1)

work page 2017
[24]

Rojas-Campos , A., Stelz, L., and Nieters, P. (2023). Learning COVID-19 Regional Transmission Using Universal Differential Equations in a SIR model. doi:10.48550/ARXIV.2310.16804

work page doi:10.48550/arxiv.2310.16804 2023
[25]

and Costa, E

Santana, V.V. and Costa, E. (2023). Efficient hybrid modeling and sorption kinetic model discovery for non-linear advection-diffusion-sorption systems: A systematic scientific machine learning approach

work page 2023
[26]

Steinebach, G. (2023). Construction of rosenbrock--wanner method rodas5p and numerical benchmarks within the julia differential equations package. BIT Numerical Mathematics, 63(2), 27

work page 2023
[27]

Tang, K.T. (2006). Mathematical methods for engineers and scientists, volume 2. Springer

work page 2006
[28]

Telen, D., Logist, F., Van Derlinden, E., Tack, I., and Van Impe, J. (2012). Optimal experiment design for dynamic bioprocesses: a multi-objective approach. Chemical Engineering Science, 78, 82--97

work page 2012
[29]

Telen, D., Vercammen, D., Logist, F., and Van Impe, J. (2014). Robustifying optimal experiment design for nonlinear, dynamic (bio) chemical systems. Computers & Chemical Engineering, 71, 415--425

work page 2014
[30]

and Bogacka, B

Uci \'n ski, D. and Bogacka, B. (2005). T-optimum designs for discrimination between two multiresponse dynamic models. Journal of the Royal Statistical Society Series B: Statistical Methodology, 67(1), 3--18

work page 2005
[31]

Van Der Ploeg, T., Austin, P.C., and Steyerberg, E.W. (2014). Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC medical research methodology, 14, 1--13

work page 2014
[32]

Versyck, K.J., Claes, J.E., and Van Impe, J.F. (1997). Practical identification of unstructured growth kinetics by application of optimal experimental design. Biotechnology progress, 13(5), 524--531

work page 1997