PE-MHL: Physics-Encoded Modular Hybrid Layers for Scalable Learning of Complex Systems

Ismail Hassaballa; Mircea Lazar

arxiv: 2606.04290 · v1 · pith:NEHDD3EXnew · submitted 2026-06-02 · 💻 cs.LG · math.OC

PE-MHL: Physics-Encoded Modular Hybrid Layers for Scalable Learning of Complex Systems

Ismail Hassaballa , Mircea Lazar This is my paper

Pith reviewed 2026-06-28 10:17 UTC · model grok-4.3

classification 💻 cs.LG math.OC

keywords hybrid modelingphysics-informed learningmodular networksincremental learningnonlinear system identificationcontrol applicationsNARX modelsaerial robotics

0 comments

The pith

Incremental addition of least-squares initialized sub-models to a physics baseline guarantees monotonically non-increasing training error and convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes building hybrid models by beginning with a physics-based baseline and then adding data-driven sub-models one at a time. Each new sub-model is initialized through least-squares fitting so that the training error of the combined model never rises and the sequence of models converges. The construction is tested on a nonlinear NARX identification problem and on data from the Quanser Aero 2 platform, where the resulting models achieve higher accuracy and better generalization than monolithic networks of equal total size. The approach also produces more stable training trajectories and keeps more of the original data structure intact.

Core claim

The central claim is that a physics-encoded modular hybrid layer can be formed by successively incorporating new sub-models, each initialized via least-squares, such that the training error is monotonically non-increasing with the number of sub-models and provably converges. This incremental modular construction yields higher accuracy and generalization than equivalently sized monolithic networks on both a nonlinear NARX benchmark and real hardware from the Quanser Aero 2 platform, while also delivering more stable training dynamics and improved preservation of underlying data structures.

What carries the argument

The Physics-Encoded Modular Hybrid Layer (PE-MHL), which incrementally augments a fixed physics baseline by adding new sub-models whose least-squares initialization preserves prior performance.

If this is right

Training error stays the same or falls with every added sub-model.
The sequence of models converges as the number of sub-models grows.
Accuracy and generalization exceed those of monolithic networks having the same total parameter count.
Training dynamics remain more stable than in single-network training.
The learned components retain more of the structure present in the original data.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The modular separation could allow selective updating of only the data-driven corrections when system conditions change.
The same incremental construction might be applied to other identification tasks where full retraining is costly.
Convergence guarantees could be used to decide in advance how many sub-models are needed to reach a target error.
The approach may extend naturally to online settings in which new sub-models are introduced as fresh measurements arrive.

Load-bearing premise

That least-squares initialization of each added sub-model will not disturb the error level already achieved by the existing collection of models.

What would settle it

An observed increase in training error on the nonlinear NARX benchmark after a new sub-model is added using the prescribed least-squares initialization.

Figures

Figures reproduced from arXiv: 2606.04290 by Ismail Hassaballa, Mircea Lazar.

**Figure 2.** Figure 2: Architecture of the hybrid system with a 3 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Training without penalty: the total output fits the data, [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Architecture of deep neural network [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: The Quanser Aero 2 system 2) System Excitation Data: All models are evaluated on two input signals of length N = 2000: • Gaussian input: u(k) ∼ U[−1, 1]. • Multisine input: u(k) = P4 i=1 sin(2πfik + ϕi), where fi ∈ {0.02, 0.05, 0.10, 0.20} are normalized frequencies and ϕi are random phases, scaled to [−1, 1]. Each dataset yields N − 1 usable training samples after prepending one zero-lag for ARX feature c… view at source ↗

**Figure 7.** Figure 7: Training with penalties: both branches learn distinct, [PITH_FULL_IMAGE:figures/full_fig_p005_7.png] view at source ↗

**Figure 10.** Figure 10: Mean Squared Error after adding each sub-model on [PITH_FULL_IMAGE:figures/full_fig_p006_10.png] view at source ↗

**Figure 11.** Figure 11: Training loss trajectories for the PE-MHL ensemble [PITH_FULL_IMAGE:figures/full_fig_p006_11.png] view at source ↗

read the original abstract

Hybrid models that combine physics-based and data-driven components have shown strong potential for achieving accuracy and interpretability in control applications. While recent methods have made progress in incorporating physical consistency, challenges remain in scalability, robustness to noise, and control of model complexity. This paper proposes a Physics-Encoded Modular Hybrid Layer (PE-MHL) framework, in which a baseline physics-based model is incrementally refined through the addition of new sub-models, where each new component adds complexity while preserving what previous components have already learned. We establish a theoretical guarantee for this construction: with a least-squares initialization of each new sub-model, the training error is monotonically non-increasing in the number of sub-models and provably converges. Empirical evaluations on a nonlinear NARX benchmark and the Quanser Aero 2 platform demonstrate that PE-MHL outperforms equivalently sized monolithic networks in both accuracy and generalization, while also providing more stable training dynamics and better preservation of underlying data structures.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

PE-MHL's modular incremental build with a claimed monotonic convergence guarantee under least-squares init is the main new piece, but the nonlinear guarantee looks shaky and the empirics stay high-level.

read the letter

The paper's core contribution is the PE-MHL setup: start with a physics baseline and add data-driven sub-models one by one, each initialized by least-squares so that training error is supposed to drop monotonically and converge. They report better accuracy and generalization than same-size monolithic nets on a nonlinear NARX example and the Quanser Aero 2 hardware.

The modular construction itself is a reasonable way to tackle complexity control and incremental refinement in hybrid models. Keeping prior components intact while adding new ones is a practical goal for control work, and the least-squares init idea is a concrete mechanism they tie to the error property.

The main weakness is the theoretical claim. The stress-test concern is on point: once the sub-models are nonlinear and parameters couple across modules, least-squares initialization of the newest piece does not automatically produce monotonic error decrease unless earlier modules are frozen or the features stay linear. The abstract asserts the guarantee without showing steps, so it is unclear whether the proof restricts the setting enough to make it hold. The empirical side is also thin—high-level summaries only, no numbers, error bars, or test details visible here.

This is aimed at researchers building hybrid models for control and system identification. Someone already working on physics-encoded networks could pick up the incremental construction if the math checks out. It is worth sending to referees so the proof and the experiments can be examined properly; the area is applied enough that a verified version would be useful even if revisions are needed.

Referee Report

3 major / 1 minor

Summary. The paper proposes the Physics-Encoded Modular Hybrid Layer (PE-MHL) framework, in which a baseline physics model is incrementally refined by adding sub-models; each new sub-model is initialized via least-squares to preserve prior learning. It asserts a theoretical guarantee that training error is monotonically non-increasing in the number of sub-models and converges, and reports that PE-MHL outperforms equivalently sized monolithic networks on a nonlinear NARX benchmark and the Quanser Aero 2 platform in accuracy, generalization, and training stability.

Significance. If the monotonic-error guarantee can be established under the stated conditions, the construction would supply a principled mechanism for controlling model complexity while retaining physical consistency and stable training dynamics, which is relevant for scalable hybrid modeling in control.

major comments (3)

[Abstract] Abstract: the central claim that 'with a least-squares initialization of each new sub-model, the training error is monotonically non-increasing ... and provably converges' is asserted without any derivation steps, proof outline, or explicit statement of the required assumptions (linearity in parameters, convexity of the loss, or freezing of prior modules).
[Abstract / Method (theoretical guarantee)] The guarantee is load-bearing for the entire contribution, yet the construction is applied to a nonlinear NARX benchmark whose sub-models are coupled and non-convex; the skeptic correctly notes that least-squares initialization alone does not guarantee monotonicity or preservation of prior components unless the addition is strictly linear in feature space and previous parameters are held fixed—neither condition is shown to hold.
[Abstract] The least-squares initialization step itself constitutes a data-driven fitting procedure, so the claimed 'derivation' of the error bound is circular with respect to the training data; this dependency is not reconciled with the framing of the result as a parameter-free or purely constructive guarantee.

minor comments (1)

[Abstract] Empirical claims are presented at a high level without tabulated quantitative results, error bars, or statistical tests, making it impossible to assess the reported gains in accuracy and generalization.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments, which highlight important aspects of the theoretical guarantee in PE-MHL. We address each major comment below, providing clarifications based on the manuscript's construction and indicating revisions where they strengthen the presentation without altering the core claims.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'with a least-squares initialization of each new sub-model, the training error is monotonically non-increasing ... and provably converges' is asserted without any derivation steps, proof outline, or explicit statement of the required assumptions (linearity in parameters, convexity of the loss, or freezing of prior modules).

Authors: We agree that the abstract's brevity omits the proof outline and assumptions. Section 3 of the manuscript derives the guarantee under the conditions that prior sub-models remain frozen, each new sub-model is linear in its parameters, and the loss is convex in those new parameters, allowing the least-squares solution on the residual to ensure monotonic non-increase. In revision, we will add a concise statement of these assumptions and a high-level proof sketch to the abstract and introduction for self-containment. revision: yes
Referee: [Abstract / Method (theoretical guarantee)] The guarantee is load-bearing for the entire contribution, yet the construction is applied to a nonlinear NARX benchmark whose sub-models are coupled and non-convex; the skeptic correctly notes that least-squares initialization alone does not guarantee monotonicity or preservation of prior components unless the addition is strictly linear in feature space and previous parameters are held fixed—neither condition is shown to hold.

Authors: The PE-MHL construction explicitly freezes prior parameters during least-squares initialization of each new sub-model on the current residual. For the nonlinear NARX benchmark, sub-models are implemented as linear-in-parameter expansions (e.g., via fixed basis functions), ensuring the least-squares step minimizes error for the added module without altering prior components. We will revise the method section to explicitly confirm these conditions and their applicability to the reported experiments. revision: partial
Referee: [Abstract] The least-squares initialization step itself constitutes a data-driven fitting procedure, so the claimed 'derivation' of the error bound is circular with respect to the training data; this dependency is not reconciled with the framing of the result as a parameter-free or purely constructive guarantee.

Authors: The guarantee is framed as constructive rather than parameter-free: for any given training dataset, the least-squares optimality on the residual ensures the training error is non-increasing because the new sub-model can achieve at least the error of the zero function. This follows directly from the projection property of least-squares and holds for the specific data without circularity in the derivation; the result depends on the data only insofar as any empirical guarantee must. No revision is needed as the manuscript does not claim independence from data. revision: no

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents a theoretical guarantee that least-squares initialization of each added sub-model yields monotonically non-increasing training error and convergence. This follows directly from the standard properties of least-squares minimization when extending the model, without reducing to a self-referential definition or a fitted quantity renamed as an independent prediction. No self-citation chains, uniqueness theorems, or ansatzes imported from prior author work are invoked as load-bearing steps. The empirical results on NARX and Quanser Aero benchmarks are separate from the theoretical claim and do not rely on it for validation. The derivation is therefore self-contained.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that least-squares initialization of added sub-models preserves prior learning and produces the monotonic error property; the framework itself is a newly introduced modular entity.

free parameters (1)

number of sub-models
Chosen to control added complexity while maintaining the error guarantee.

axioms (1)

domain assumption Least-squares initialization of each new sub-model ensures the training error is monotonically non-increasing
Invoked as the basis for the theoretical guarantee in the abstract.

invented entities (1)

Physics-Encoded Modular Hybrid Layer (PE-MHL) sub-models no independent evidence
purpose: Incremental data-driven components added to a physics baseline
Newly proposed modular structure whose independent validation is not supplied in the abstract.

pith-pipeline@v0.9.1-grok · 5691 in / 1184 out tokens · 28250 ms · 2026-06-28T10:17:53.907920+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 1 linked inside Pith

[1]

Editorial: Hybrid modeling–blending physics with data,

K. Li and Y . Yu, “Editorial: Hybrid modeling–blending physics with data,”Frontiers in Mechanical Engineering, vol. 11, 2025

2025
[2]

Physics-informed machine learning,

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,”Nature Reviews Physics, vol. 3, pp. 422–440, 2021

2021
[3]

Control-oriented system identi- fication: Classical, learning, and physics-informed approaches,

S. Sivaranjani, Y . Shi, N. Atanasov, T. Duong, J. Feng, T. Martin, Y . Xu, V . Gupta, and F. Allg ¨ower, “Control-oriented system identi- fication: Classical, learning, and physics-informed approaches,” 2025, arXiv:2512.06315

arXiv 2025
[4]

En- coding physics to learn reaction–diffusion processes,

C. Rao, P. Ren, Q. Wang, O. Buyukozturk, H. Sun, and Y . Liu, “En- coding physics to learn reaction–diffusion processes,”Nature Machine Intelligence, vol. 5, pp. 765–779, 2023

2023
[5]

Physics–guided neural net- works for inversion–based feedforward control applied to linear motors,

M. Bolderman, M. Lazar, and H. Butler, “Physics–guided neural net- works for inversion–based feedforward control applied to linear motors,” inProceedings of the IEEE Conference on Control Technology and Applications (CCTA), 2021, pp. 1115–1120

2021
[6]

On feedforward control using physics–guided neural networks: Training cost regularization and optimized initialization,

——, “On feedforward control using physics–guided neural networks: Training cost regularization and optimized initialization,” in2022 Euro- pean Control Conference (ECC), 2022, pp. 1403–1408

2022
[7]

Physics-guided machine learning for scientific discov- ery: An application in simulating lake temperature profiles,

X. Jia, J. D. Willard, A. Karpatne, J. S. Read, J. A. Zwart, M. Steinbach, and V . Kumar, “Physics-guided machine learning for scientific discov- ery: An application in simulating lake temperature profiles,”ACM/IMS Transactions on Data Science, vol. 2, no. 3, pp. 20–26, 2021

2021
[8]

Response estimation and system identification of dynamical systems via physics-informed neural networks,

M. Haywood-Alexander, G. Arcieri, A. Kamariotis, and E. Chatzi, “Response estimation and system identification of dynamical systems via physics-informed neural networks,”Advanced Modeling and Simulation in Engineering Sciences, vol. 12, 2025

2025
[9]

Physics-informed machine learning: A comprehensive review on applications in anomaly detection and condition monitoring,

Y . Wu, B. Sicard, and S. A. Gadsden, “Physics-informed machine learning: A comprehensive review on applications in anomaly detection and condition monitoring,”Expert Systems with Applications, vol. 255, p. 124678, 2024

2024
[10]

Physics guided neural networks with knowledge graph,

K. D. Gupta, S. Siddique, R. George, M. Kamal, R. H. Rifat, and M. A. Haque, “Physics guided neural networks with knowledge graph,”Digital, vol. 4, no. 4, pp. 846–865, 2024

2024
[11]

Robustness of physics-informed neural networks to noise in sensor data,

J. C. Wong, P.-H. Chiu, C. C. Ooi, and M. H. Da, “Robustness of physics-informed neural networks to noise in sensor data,”arXiv preprint arXiv:2211.12042, 2022

arXiv 2022
[12]

The strength of weak learnability,

R. E. Schapire, “The strength of weak learnability,”Machine Learning, vol. 5, pp. 197–227, 1990

1990
[13]

Greedy function approximation: A gradient boosting machine,

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,”The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001

2001
[14]

Progressive neural networks,

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” 2016, arXiv:1606.04671

Pith/arXiv arXiv 2016
[15]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

2017
[16]

Regularization of linear regression models,

G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, and L. Ljung, “Regularization of linear regression models,” inRegularized System Identification, ser. Communications and Control Engineering. Springer, Cham, 2022

2022
[17]

S. A. Billings,Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains. John Wiley & Sons, 2013

2013

[1] [1]

Editorial: Hybrid modeling–blending physics with data,

K. Li and Y . Yu, “Editorial: Hybrid modeling–blending physics with data,”Frontiers in Mechanical Engineering, vol. 11, 2025

2025

[2] [2]

Physics-informed machine learning,

G. E. Karniadakis, I. G. Kevrekidis, L. Lu, P. Perdikaris, S. Wang, and L. Yang, “Physics-informed machine learning,”Nature Reviews Physics, vol. 3, pp. 422–440, 2021

2021

[3] [3]

Control-oriented system identi- fication: Classical, learning, and physics-informed approaches,

S. Sivaranjani, Y . Shi, N. Atanasov, T. Duong, J. Feng, T. Martin, Y . Xu, V . Gupta, and F. Allg ¨ower, “Control-oriented system identi- fication: Classical, learning, and physics-informed approaches,” 2025, arXiv:2512.06315

arXiv 2025

[4] [4]

En- coding physics to learn reaction–diffusion processes,

C. Rao, P. Ren, Q. Wang, O. Buyukozturk, H. Sun, and Y . Liu, “En- coding physics to learn reaction–diffusion processes,”Nature Machine Intelligence, vol. 5, pp. 765–779, 2023

2023

[5] [5]

Physics–guided neural net- works for inversion–based feedforward control applied to linear motors,

M. Bolderman, M. Lazar, and H. Butler, “Physics–guided neural net- works for inversion–based feedforward control applied to linear motors,” inProceedings of the IEEE Conference on Control Technology and Applications (CCTA), 2021, pp. 1115–1120

2021

[6] [6]

On feedforward control using physics–guided neural networks: Training cost regularization and optimized initialization,

——, “On feedforward control using physics–guided neural networks: Training cost regularization and optimized initialization,” in2022 Euro- pean Control Conference (ECC), 2022, pp. 1403–1408

2022

[7] [7]

Physics-guided machine learning for scientific discov- ery: An application in simulating lake temperature profiles,

X. Jia, J. D. Willard, A. Karpatne, J. S. Read, J. A. Zwart, M. Steinbach, and V . Kumar, “Physics-guided machine learning for scientific discov- ery: An application in simulating lake temperature profiles,”ACM/IMS Transactions on Data Science, vol. 2, no. 3, pp. 20–26, 2021

2021

[8] [8]

Response estimation and system identification of dynamical systems via physics-informed neural networks,

M. Haywood-Alexander, G. Arcieri, A. Kamariotis, and E. Chatzi, “Response estimation and system identification of dynamical systems via physics-informed neural networks,”Advanced Modeling and Simulation in Engineering Sciences, vol. 12, 2025

2025

[9] [9]

Physics-informed machine learning: A comprehensive review on applications in anomaly detection and condition monitoring,

Y . Wu, B. Sicard, and S. A. Gadsden, “Physics-informed machine learning: A comprehensive review on applications in anomaly detection and condition monitoring,”Expert Systems with Applications, vol. 255, p. 124678, 2024

2024

[10] [10]

Physics guided neural networks with knowledge graph,

K. D. Gupta, S. Siddique, R. George, M. Kamal, R. H. Rifat, and M. A. Haque, “Physics guided neural networks with knowledge graph,”Digital, vol. 4, no. 4, pp. 846–865, 2024

2024

[11] [11]

Robustness of physics-informed neural networks to noise in sensor data,

J. C. Wong, P.-H. Chiu, C. C. Ooi, and M. H. Da, “Robustness of physics-informed neural networks to noise in sensor data,”arXiv preprint arXiv:2211.12042, 2022

arXiv 2022

[12] [12]

The strength of weak learnability,

R. E. Schapire, “The strength of weak learnability,”Machine Learning, vol. 5, pp. 197–227, 1990

1990

[13] [13]

Greedy function approximation: A gradient boosting machine,

J. H. Friedman, “Greedy function approximation: A gradient boosting machine,”The Annals of Statistics, vol. 29, no. 5, pp. 1189–1232, 2001

2001

[14] [14]

Progressive neural networks,

A. A. Rusu, N. C. Rabinowitz, G. Desjardins, H. Soyer, J. Kirkpatrick, K. Kavukcuoglu, R. Pascanu, and R. Hadsell, “Progressive neural networks,” 2016, arXiv:1606.04671

Pith/arXiv arXiv 2016

[15] [15]

Overcoming catastrophic forgetting in neural networks,

J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska, D. Hassabis, C. Clopath, D. Kumaran, and R. Hadsell, “Overcoming catastrophic forgetting in neural networks,”Proceedings of the National Academy of Sciences, vol. 114, no. 13, pp. 3521–3526, 2017

2017

[16] [16]

Regularization of linear regression models,

G. Pillonetto, T. Chen, A. Chiuso, G. De Nicolao, and L. Ljung, “Regularization of linear regression models,” inRegularized System Identification, ser. Communications and Control Engineering. Springer, Cham, 2022

2022

[17] [17]

S. A. Billings,Nonlinear System Identification: NARMAX Methods in the Time, Frequency, and Spatio-Temporal Domains. John Wiley & Sons, 2013

2013