pith. sign in

arxiv: 2606.23215 · v1 · pith:Y6CO56QAnew · submitted 2026-06-22 · ⚛️ physics.data-an · cs.AI

Where Is My Physics Wrong? Localized and Identifiable Discovery of Model Discrepancy

Pith reviewed 2026-06-26 06:04 UTC · model grok-4.3

classification ⚛️ physics.data-an cs.AI
keywords model discrepancysparse discoverylocalizationhybrid physics modelssymbolic regressionstatistical testinggrey-box modelingbuilding energy models
0
0 comments X

The pith

LISDD localizes where a trusted physics model fails, recovers the exact symbolic missing mechanism, and certifies the discovery with exact finite-sample tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents LISDD as a way to diagnose hybrid models by first locating a clean regime where the known physics holds exactly and estimating parameters there without contamination. It then identifies discrepant operating regimes via a residual-energy statistic, tests candidate symbolic corrections from a library using exhaustive holdout, and confirms each selected term with a sample-split F-test. An extension controls the false-discovery rate when multiple regions each have their own missing mechanism. This matters for applications such as building-energy models because global discrepancy fits can bias the trusted physics parameters and spread local errors into clean data, whereas the localized approach keeps parameter estimates clean while delivering statistically certified symbolic explanations.

Core claim

LISDD fits the known physics on an automatically detected clean regime, flags discrepant regions with a calibrated residual-energy statistic, selects the local missing term by exhaustive holdout over a candidate library, and confirms significance with a sample-split F-test. A false-discovery-rate extension handles multiple discrepant regions with different missing mechanisms. In controlled experiments, LISDD keeps physical-parameter bias at 0.002 versus 0.43 for global-discrepancy and black-box baselines, raises localization F1 from 0.44 to 0.80, recovers the correct symbolic form with probability one, attains exact detection, and controls the multi-region false-discovery rate while recoveri

What carries the argument

The LISDD procedure of clean-regime parameter estimation followed by local residual flagging, library-based holdout selection, and sample-split F-test certification.

If this is right

  • Physical parameters stay unbiased at 0.002 error even when discrepancies exist in other regimes.
  • Localization F1 rises from 0.44 to 0.80 relative to global and black-box baselines.
  • The correct symbolic form of each missing mechanism is recovered with probability one.
  • Exact detection is achieved while the multi-region false-discovery rate remains controlled.
  • Every planted mechanism is recovered when several discrepant regions are present.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The separation of clean-regime estimation from local search could be applied to other grey-box domains such as fluid or climate modeling where regime-dependent failures occur.
  • If clean regimes cannot be detected automatically, the method would require pairing with external regime-classification tools before the discrepancy stage.
  • The requirement for sample splitting and holdout implies a minimum data volume per regime that may limit use on very sparse observational datasets.
  • Certified local discoveries could feed directly into automated model-update pipelines that replace or augment the original physics law only in the affected regime.

Load-bearing premise

An automatically detectable clean regime exists in which the known physics model holds exactly, allowing unbiased estimation of physical parameters before discrepancy search begins.

What would settle it

Run a controlled simulation with one planted local discrepancy; if LISDD either fails to recover the planted symbolic form with high probability or produces physical-parameter bias above 0.01, the central performance claims are falsified.

Figures

Figures reproduced from arXiv: 2606.23215 by Yifan Wang.

Figure 1
Figure 1. Figure 1: LISDD localizes a model fault that a global correction smears out. (a) Residual root-mean-square of the known-physics [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: E1. (a) Physical-parameter bias on a logarithmic scale; LISDD is more than two orders of magnitude smaller than the [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: E4. Out-of-sample forecast RMSE. LISDD (high [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: E5. (a) Per-region evidence as − log10 pk for four candidate regions; the two planted discrepant regions clear the Benjamini and Hochberg threshold while the two clean regions do not. (b) False-discovery rate, detection power, and both-forms recovery as a function of the target level q; the false-discovery rate stays at zero while power and form recovery stay at one. output is directly usable inside a mode… view at source ↗
read the original abstract

Hybrid models combine trusted physics with data-driven correction, but a physical model is rarely wrong everywhere or in the same way. The key diagnostic question is local: where does the model fail, what missing mechanism explains the failure, and is the evidence statistically real? Existing sparse-discovery and discrepancy-learning methods usually fit one global correction, which can spread a local error into clean regimes, bias trusted physical parameters, and provide no calibrated significance for selected terms. We introduce LISDD, Localized, Identifiable Sparse Discovery of Discrepancy, a framework that localizes model error to an operating regime, identifies a sparse symbolic form for the missing mechanism, and certifies the discovery with an exact finite-sample test. LISDD fits the known physics on an automatically detected clean regime, flags discrepant regions with a calibrated residual-energy statistic, selects the local missing term by exhaustive holdout over a candidate library, and confirms significance with a sample-split $F$-test. A false-discovery-rate extension handles multiple discrepant regions with different missing mechanisms. In controlled experiments, LISDD keeps physical-parameter bias at 0.002 versus 0.43 for global-discrepancy and black-box baselines, raises localization $F_1$ from 0.44 to 0.80, recovers the correct symbolic form with probability one, attains exact detection, and controls the multi-region false-discovery rate while recovering every planted mechanism. The result is a calibrated diagnostic tool for grey-box building-energy models when a fixed physical law silently breaks in one operating regime.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript introduces LISDD, a framework for localizing model discrepancies in hybrid physics-data models. It automatically detects clean regimes to obtain unbiased estimates of trusted physical parameters, flags discrepant operating regimes via a residual-energy statistic, selects a sparse symbolic correction term by exhaustive holdout over a candidate library, and certifies significance with a sample-split F-test. An FDR extension handles multiple regions with distinct missing mechanisms. In controlled experiments with planted discrepancies, the method reports physical-parameter bias of 0.002 (vs. 0.43 for baselines), localization F1 of 0.80 (vs. 0.44), probability-one recovery of the correct symbolic form, exact detection, and control of multi-region false-discovery rate while recovering all planted mechanisms. The target application is grey-box building-energy models.

Significance. If the central claims hold, LISDD supplies a statistically calibrated, localized diagnostic that avoids spreading local errors into clean regimes and biasing trusted parameters, which is a practical advance for hybrid modeling where physics is reliable only in subsets of the operating space. The use of finite-sample exact tests and explicit FDR control for multiple discoveries is a methodological strength.

major comments (2)
  1. [Abstract] Abstract: the reported bias reduction (0.002 vs. 0.43) and exact detection claims rest on the existence of an automatically detectable clean regime large enough for unbiased physical-parameter estimation. The manuscript provides no analysis or experiments showing that this detection step remains reliable when the missing mechanism is weak, present at low amplitude everywhere, or when operating regimes are not sharply separable; failure of this step would invalidate the subsequent performance numbers.
  2. [Abstract] Abstract: the term-selection step performs exhaustive holdout over a candidate library before applying the sample-split F-test. It is unclear whether the F-test is adjusted for the preceding combinatorial search or whether the reported probability-one recovery accounts for the effective multiple-testing burden induced by library size; this directly affects the claim of calibrated, identifiable discovery.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting two important points about the scope of our claims. We address each comment below and commit to revisions that strengthen the statistical justification and empirical characterization of LISDD.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported bias reduction (0.002 vs. 0.43) and exact detection claims rest on the existence of an automatically detectable clean regime large enough for unbiased physical-parameter estimation. The manuscript provides no analysis or experiments showing that this detection step remains reliable when the missing mechanism is weak, present at low amplitude everywhere, or when operating regimes are not sharply separable; failure of this step would invalidate the subsequent performance numbers.

    Authors: We agree that the reported performance numbers presuppose successful identification of a sufficiently large clean regime. The controlled experiments in the manuscript use planted discrepancies of moderate strength with clearly separable regimes; they do not systematically vary discrepancy amplitude or regime overlap. We will add a dedicated subsection (and corresponding figures) that (i) sweeps the amplitude of the missing term from 0.1× to 2× the nominal scale, (ii) introduces controlled regime overlap via smoothed transition functions, and (iii) reports the empirical probability that the residual-energy detector recovers a clean regime large enough for unbiased parameter estimation. These results will be presented alongside the existing bias and F1 metrics so that readers can see the operating envelope of the method. revision: yes

  2. Referee: [Abstract] Abstract: the term-selection step performs exhaustive holdout over a candidate library before applying the sample-split F-test. It is unclear whether the F-test is adjusted for the preceding combinatorial search or whether the reported probability-one recovery accounts for the effective multiple-testing burden induced by library size; this directly affects the claim of calibrated, identifiable discovery.

    Authors: The sample-split F-test is exact conditional on the term that was selected by the holdout procedure; it does not incorporate an explicit correction for the size of the candidate library. The probability-one recovery reported in the experiments is therefore an empirical observation under the specific library sizes and signal strengths tested, not a guarantee that holds after accounting for the combinatorial search. We will revise the methods section to state this conditioning explicitly, add a short theoretical remark on the induced multiple-testing burden, and include an ablation that varies library cardinality while tracking the empirical false-positive rate of the subsequent F-test. If the ablation reveals inflation, we will also report results with a simple Bonferroni adjustment applied post-selection. revision: partial

Circularity Check

0 steps flagged

No significant circularity; method uses external statistical tests

full rationale

The paper describes LISDD as fitting known physics on a detected clean regime then applying standard residual-energy statistics and sample-split F-tests for discrepancy detection and significance. These are independent external procedures, not reductions of outputs to fitted inputs by construction. No self-citations, ansatzes, or uniqueness theorems are invoked in the provided text to load-bear the central claims. The controlled-experiment results follow from the planted setup but do not render the derivation chain tautological. The method remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5802 in / 1199 out tokens · 36275 ms · 2026-06-26T06:04:47.997145+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

23 extracted references

  1. [1]

    and Proctor, Joshua L

    Brunton, Steven L. and Proctor, Joshua L. and Kutz, J. Nathan , title =. Proceedings of the National Academy of Sciences , volume =

  2. [2]

    and Brunton, Steven L

    Rudy, Samuel H. and Brunton, Steven L. and Proctor, Joshua L. and Kutz, J. Nathan , title =. Science Advances , volume =

  3. [3]

    Nathan and Brunton, Steven L

    Champion, Kathleen and Lusch, Bethany and Kutz, J. Nathan and Brunton, Steven L. , title =. Proceedings of the National Academy of Sciences , volume =

  4. [4]

    and Bortz, David M

    Messenger, Daniel A. and Bortz, David M. , title =. Multiscale Modeling & Simulation , volume =

  5. [5]

    and Bortz, David M

    Messenger, Daniel A. and Bortz, David M. , title =. Journal of Computational Physics , volume =

  6. [6]

    Nathan and Brunton, Bingni W

    Fasel, Urban and Kutz, J. Nathan and Brunton, Bingni W. and Brunton, Steven L. , title =. Proceedings of the Royal Society A , volume =

  7. [7]

    and Steele, Katherine M

    Ebers, Megan R. and Steele, Katherine M. and Kutz, J. Nathan , title =. 2022 , eprint =

  8. [8]

    Chaos: An Interdisciplinary Journal of Nonlinear Science , volume =

    Mojgani, Rambod and Chattopadhyay, Ashesh and Hassanzadeh, Pedram , title =. Chaos: An Interdisciplinary Journal of Nonlinear Science , volume =

  9. [9]

    and O'Hagan, Anthony , title =

    Kennedy, Marc C. and O'Hagan, Anthony , title =. Journal of the Royal Statistical Society: Series B , volume =

  10. [10]

    , title =

    Raissi, Maziar and Perdikaris, Paris and Karniadakis, George E. , title =. Journal of Computational Physics , volume =

  11. [11]

    and Lu, Lu and Perdikaris, Paris and Wang, Sifan and Yang, Liu , title =

    Karniadakis, George Em and Kevrekidis, Ioannis G. and Lu, Lu and Perdikaris, Paris and Wang, Sifan and Yang, Liu , title =. Nature Reviews Physics , volume =

  12. [12]

    Journal of the Royal Statistical Society: Series B , volume =

    Benjamini, Yoav and Hochberg, Yosef , title =. Journal of the Royal Statistical Society: Series B , volume =

  13. [13]

    , title =

    Rousseeuw, Peter J. , title =. Journal of the American Statistical Association , volume =

  14. [14]

    and Champion, Kathleen and Quade, Markus and Loiseau, Jean-Christophe and Kutz, J

    de Silva, Brian M. and Champion, Kathleen and Quade, Markus and Loiseau, Jean-Christophe and Kutz, J. Nathan and Brunton, Steven L. , title =. Journal of Open Source Software , volume =

  15. [15]

    , title =

    Cox, David R. , title =. Biometrika , volume =

  16. [16]

    Annals of Statistics , volume =

    Wasserman, Larry and Roeder, Kathryn , title =. Annals of Statistics , volume =

  17. [17]

    Reinbold, Patrick A. K. and Gurevich, Daniel R. and Grigoriev, Roman O. , title =. Physical Review E , volume =

  18. [18]

    , title =

    Loiseau, Jean-Christophe and Brunton, Steven L. , title =. Journal of Fluid Mechanics , volume =

  19. [19]

    Science , volume =

    Schmidt, Michael and Lipson, Hod , title =. Science , volume =

  20. [20]

    Advances in Neural Information Processing Systems , volume =

    Cranmer, Miles and Sanchez-Gonzalez, Alvaro and Battaglia, Peter and Xu, Rui and Cranmer, Kyle and Spergel, David and Ho, Shirley , title =. Advances in Neural Information Processing Systems , volume =

  21. [21]

    Nathan and Brunton, Steven L

    Quade, Markus and Abel, Markus and Kutz, J. Nathan and Brunton, Steven L. , title =. Chaos: An Interdisciplinary Journal of Nonlinear Science , volume =

  22. [22]

    Building Optimization Testing Framework (

    Blum, David and Arroyo, Javier and Huang, Sen and Drgo. Building Optimization Testing Framework (. Journal of Building Performance Simulation , volume =. 2021 , doi =

  23. [23]

    All You Need to Know About Model Predictive Control for Buildings , journal =

    Drgo. All You Need to Know About Model Predictive Control for Buildings , journal =. 2020 , doi =