Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

Adeel Pervez; Dan Alistarh; Dingling Yao; Francesco Locatello; Jiale Chen

arxiv: 2410.06074 · v3 · pith:I3R7HRJ4new · submitted 2024-10-08 · 💻 cs.LG · cs.NA· math.NA

Scalable Mechanistic Neural Networks for Differential Equations and Machine Learning

Jiale Chen , Dingling Yao , Adeel Pervez , Dan Alistarh , Francesco Locatello This is my paper

Pith reviewed 2026-05-23 19:31 UTC · model grok-4.3

classification 💻 cs.LG cs.NAmath.NA

keywords mechanistic neural networksscalable modelsdifferential equationsscientific machine learninglong temporal sequencesdynamical systemsneural network reformulation

0 comments

The pith

S-MNN reformulates Mechanistic Neural Networks to reduce time and space complexity from cubic and quadratic to linear in sequence length.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents Scalable Mechanistic Neural Network (S-MNN) as a direct reformulation of the original Mechanistic Neural Network. The reformulation lowers computational time complexity from cubic and space complexity from quadratic in the length of temporal sequences down to linear scaling. Experiments confirm that S-MNN produces the same precision as the original while using far less compute and memory, allowing the same mechanistic interpretability to be applied to longer dynamical sequences. The result is positioned as a drop-in replacement that integrates mechanistic bottlenecks into neural models of complex systems without added cost.

Core claim

By reformulating the original Mechanistic Neural Network (MNN), S-MNN reduces computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources, and S-MNN can therefore serve as a drop-in replacement in applications that integrate mechanistic bottlenecks into neural network models of complex dynamical systems.

What carries the argument

The reformulation of MNN into S-MNN that converts cubic/quadratic complexity in sequence length to linear complexity while preserving mechanistic structure.

If this is right

S-MNN can be substituted directly into existing MNN pipelines for modeling longer temporal sequences in differential equations and dynamical systems.
Mechanistic bottlenecks can now be embedded in neural networks at scales previously limited by cubic or quadratic costs.
Interpretability features of the original MNN remain available for long-horizon scientific machine learning tasks.
Applications involving extended time-series data in scientific domains become computationally feasible without loss of the original model's properties.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The linear scaling may open the door to real-time or online adaptation of mechanistic models in control or simulation settings.
Similar complexity-reduction techniques could be tested on other neural architectures that embed differential-equation structure.
If the linear regime holds for sequences orders of magnitude longer than those tested, the method could support multi-scale or multi-physics simulations that combine many coupled dynamical systems.

Load-bearing premise

The reformulation preserves the mechanistic properties, accuracy, and interpretability of the original MNN exactly, with no hidden trade-offs introduced by the complexity reduction.

What would settle it

Run both MNN and S-MNN on the same long-sequence dynamical task and observe either a measurable drop in predictive precision for S-MNN or no reduction to linear scaling in measured runtime and memory use.

Figures

Figures reproduced from arXiv: 2410.06074 by Adeel Pervez, Dan Alistarh, Dingling Yao, Francesco Locatello, Jiale Chen.

**Figure 2.** Figure 2: Standalone S-MNN solver validation results compared with the closed-form solutions. 5.2 COMPARATIVE ANALYSIS: DISCOVERY OF GOVERNING EQUATIONS 0 200 400 600 800 1000 Optimization Step 10 2 10 1 10 0 10 1 10 2 Lorenz Discovery Loss S-MNN MNN Dense MNN Sparse [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Lorenz discovery loss over first 1,000 optimization steps (exponential moving average factor = 0.9) using SMNN (ours) compared with the original MNN dense and sparse solvers (Pervez et al., 2024). In this experiment, we evaluate the capability of our S-MNN in discovering the coefficients of the governing equations for the Lorenz system following Section 5.1 in the origin MNN paper (Pervez et al., 2024). … view at source ↗

**Figure 4.** Figure 4: Error visualization for the S-MNN 4-year [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Testing error, training runtime and GPU memory usage comparisons between [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Visual comparisons between the ground truth, the MNN [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

read the original abstract

We propose Scalable Mechanistic Neural Network (S-MNN), an enhanced neural network framework designed for scientific machine learning applications involving long temporal sequences. By reformulating the original Mechanistic Neural Network (MNN) (Pervez et al., 2024), we reduce the computational time and space complexities from cubic and quadratic with respect to the sequence length, respectively, to linear. This significant improvement enables efficient modeling of long-term dynamics without sacrificing accuracy or interpretability. Extensive experiments demonstrate that S-MNN matches the original MNN in precision while substantially reducing computational resources. Consequently, S-MNN can drop-in replace the original MNN in applications, providing a practical and efficient tool for integrating mechanistic bottlenecks into neural network models of complex dynamical systems. Source code is available at https://github.com/IST-DASLab/ScalableMNN.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes Scalable Mechanistic Neural Network (S-MNN) obtained by reformulating the original Mechanistic Neural Network (MNN) of Pervez et al. (2024). The reformulation is claimed to reduce time and space complexity from cubic and quadratic (in sequence length) to linear while preserving exact equivalence, accuracy, and interpretability, enabling efficient modeling of long temporal sequences in differential equations and SciML. Experiments are stated to show matching precision to the original MNN, and the code is released publicly.

Significance. If the reformulation is exactly equivalent and the complexity reduction holds without hidden approximations or accuracy trade-offs, the result would be a practical, drop-in improvement for applying mechanistic bottlenecks to long-sequence problems. The public code release is a clear strength supporting reproducibility and independent verification of the linear-complexity claim.

major comments (1)

[Abstract and §3] Abstract and §3 (reformulation): the central claim is that the change is an exact reformulation (not an approximation) that eliminates the cubic/quadratic terms while preserving all mechanistic properties. No explicit derivation, equivalence proof, or complexity analysis is visible that would allow verification that the new formulation is mathematically identical to the original MNN for arbitrary sequence lengths.

minor comments (2)

[Abstract] The abstract refers to 'extensive experiments' demonstrating matching precision but supplies no information on sequence lengths tested, datasets, error metrics, or runtime measurements; a table or figure with these quantitative comparisons would strengthen the claim.
Notation for the original MNN components (e.g., the mechanistic bottleneck operators) should be restated briefly when introducing the S-MNN reformulation to make the mapping between the two models immediately clear.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive assessment and recommendation of minor revision. We address the single major comment below.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (reformulation): the central claim is that the change is an exact reformulation (not an approximation) that eliminates the cubic/quadratic terms while preserving all mechanistic properties. No explicit derivation, equivalence proof, or complexity analysis is visible that would allow verification that the new formulation is mathematically identical to the original MNN for arbitrary sequence lengths.

Authors: We agree that an explicit derivation, equivalence proof, and complexity analysis are necessary for full verifiability. In the revised manuscript we will add a dedicated subsection to §3 that (i) derives the S-MNN equations directly from the original MNN formulation of Pervez et al. (2024), (ii) provides a formal inductive proof of exact equivalence for arbitrary sequence lengths, and (iii) presents the detailed big-O analysis confirming the reduction from cubic/quadratic to linear time and space complexity. These additions will be self-contained so that readers need not consult the original MNN paper to confirm identity and complexity claims. revision: yes

Circularity Check

0 steps flagged

Minor self-citation to prior MNN; reformulation independently verifiable via code and benchmarks

full rationale

The paper's central contribution is an explicit reformulation of the cited MNN (Pervez et al. 2024) that reduces complexity from O(n^3)/O(n^2) to O(n) while claiming exact equivalence. This self-citation is present but not load-bearing: correctness is asserted via mathematical reformulation plus external runtime/accuracy benchmarks that can be reproduced from the released code without relying on any fitted parameter or self-referential definition inside the present work. No step equates a prediction to its own input by construction, imports uniqueness from the authors, or renames a known result as a derivation.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The central claim rests on the assumption that a reformulation exists that simultaneously achieves linear scaling and preserves all original MNN properties; no free parameters, axioms, or invented entities are identifiable from the abstract alone.

pith-pipeline@v0.9.0 · 5688 in / 1076 out tokens · 23293 ms · 2026-05-23T19:31:06.354674+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages · 3 internal anchors

[1]

A mathematical guide to operator learning

6 Nicolas Boull´e and Alex Townsend. A mathematical guide to operator learning. arXiv preprint arXiv:2312.14688,

work page arXiv
[2]

6 Johannes Brandstetter, Max Welling, and Daniel E. Worrall. Lie point symmetry data augmentation for neural PDE solvers. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv ´ari, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of M...

work page 2022
[3]

Odeformer: Symbolic regression of dynamical systems with transformers

6 St´ephane d’Ascoli, S ¨oren Becker, Alexander Mathis, Philippe Schwaller, and Niki Kilbertus. Odeformer: Symbolic regression of dynamical systems with transformers. arXiv preprint arXiv:2310.05573,

work page arXiv
[4]

Ode- former: Symbolic regression of dynamical systems with transformers

6 St´ephane d’Ascoli, S¨oren Becker, Philippe Schwaller, Alexander Mathis, and Niki Kilbertus. Ode- former: Symbolic regression of dynamical systems with transformers. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11,

work page 2024
[5]

7, 18 11 Published as a conference paper at ICLR 2025 J.R

URL https://openreview.net/forum?id=TzoHLiGVMo. 7, 18 11 Published as a conference paper at ICLR 2025 J.R. Dormand and P.J. Prince. A family of embedded runge-kutta formulae. Journal of Compu- tational and Applied Mathematics , 6(1):19–26,

work page 2025
[6]

doi: https://doi.org/ 10.1016/0771-050X(80)90013-3

ISSN 0377-0427. doi: https://doi.org/ 10.1016/0771-050X(80)90013-3. URL https://www.sciencedirect.com/science/ article/pii/0771050X80900133. 7, 19 A. C. Hindmarsh. ODEPACK, a systematized collection of ODE solvers. In R. S. Stepleman (ed.), Scientific Computing, pp. 55–64, Amsterdam,

work page doi:10.1016/0771-050x(80)90013-3
[7]

Neural Operator: Graph Kernel Network for Partial Differential Equations

1, 6 Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020a. 6 Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Andrew Stuart, Kaushik Bhattacharya, and Anim...

work page internal anchor Pith review Pith/arXiv arXiv 2003
[8]

Mechanistic neural networks for scientific machine learning

1, 6 12 Published as a conference paper at ICLR 2025 Adeel Pervez, Francesco Locatello, and Stratis Gavves. Mechanistic neural networks for scientific machine learning. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

work page 2025
[9]

Universal Differential Equations for Scientific Machine Learning

doi: 10.1137/0904010. URL https://doi.org/10.1137/0904010. 7, 19 Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar, Dominic Skinner, and Ali Jasim Ramadhan. Universal differential equations for scientific machine learning. CoRR, abs/2001.04385,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1137/0904010 2001
[10]

URL https://arxiv.org/abs/2001.04385. 6 M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707,

work page internal anchor Pith review Pith/arXiv arXiv 2001
[11]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial dif- ferential equations

ISSN 0021-9991. doi: https:// doi.org/10.1016/j.jcp.2018.10.045. URL https://www.sciencedirect.com/science/ article/pii/S0021999118307125. 6 Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven discovery of partial differential equations. Science advances, 3(4):e1602614,

work page doi:10.1016/j.jcp.2018.10.045 2018
[12]

Climode: Climate and weather forecasting with physics-informed neural odes

6 Yogesh Verma, Markus Heinonen, and Vikas Garg. Climode: Climate and weather forecasting with physics-informed neural odes. arXiv preprint arXiv:2404.10024,

work page arXiv
[13]

A, b, and W are only theoretical and they are not explicitly constructed during computation

1, 9 13 Published as a conference paper at ICLR 2025 A T HEORETICAL DERIVATIONS A.1 D EFINITIONS OF A, b, W , AND y A, b, W , and y are defined as follows. A, b, and W are only theoretical and they are not explicitly constructed during computation. Only y is computed. Let ¯¯¯At,q,v = [ct,q,v,0, . . . , ct,q,v,R]⊤ ∈ RR+1. Let ¯¯At,q = h ¯¯¯A⊤ t,q,1, . . . ...

work page 2025
[14]

AutoDiff

mod ( R + 1). Define constant matrix F ∈ R(R+1)×(R+1) such that [F ]i,j = 0 if i > j, 1/ (j − i)! otherwise . (24) Define matrix S+ t = diag s0 t , s1 t , . . . , sR t ∈ R(R+1)×(R+1). Define matrix S− t = diag (−st)0 , (−st)1 , . . . ,(−st)R ∈ R(R+1)×(R+1). Define matrix S2 t = diag s0 t , s2 t , . . . , s2R t ∈ R(R+1)×(R+1). 15 Published as a conference ...

work page 2025
[15]

u0 = y (0), u0 = y (0), u1 = y′ (0), u2 = y′′ (0) are initial values

and an additional third-order ODE.c0, c1, c2 are constant numbers. u0 = y (0), u0 = y (0), u1 = y′ (0), u2 = y′′ (0) are initial values. RC-circuit (charging capacitor), (c0, c1, c2) = (0.7, 1.2, 2.31), (u0) = (10), y c1 + c2 dy dt = c0, (39) y = c0c1 + (u0 − c0c1) exp − t c1c2 . (40) Population growth (naive), (c0) = (0.23), (u0) = (4.78), c0y − dy dt = ...

work page 2025
[16]

Our modifications in S-MNN provide alterna- tive approximation methods that improve efficiency without sacrificing accuracy

solver approximates continuous-time dynamics through time discretization. Our modifications in S-MNN provide alterna- tive approximation methods that improve efficiency without sacrificing accuracy. While our main focus is on presenting these improvements, for completeness, we briefly describe the components from the original MNN that we have modified or ...

work page 2024
[17]

54, 55, and 56): the approximation errors bounded by a slack variable ϵ ∈ R

models the smoothness constraints as inequalities (Eqs. 54, 55, and 56): the approximation errors bounded by a slack variable ϵ ∈ R. 20 Published as a conference paper at ICLR 2025 The forward and backward Taylor approximation errors in MNN are defined as: Eforward t,v,r = yt+1,v,r − RX r′=r sr′−r t (r′ − r)! yt,v,r′, (51) Ebackward t,v,r = yt,v,r − RX r′...

work page 2025
[18]

The square matrix in Eq

problems are ill-defined because the number of constraints m′ exceeds the number of variables n + 1 when T is large, making the problem infeasible. The square matrix in Eq. 60 is not full rank and the problem cannot be solved directly. To circumvent this issue, the QP problem is transformed into its dual form: γI m′×m′ A′ A′⊤ 0(n+1)×(n+1) −λ y′ = b′ −∆ . ...

work page 2025

[1] [1]

A mathematical guide to operator learning

6 Nicolas Boull´e and Alex Townsend. A mathematical guide to operator learning. arXiv preprint arXiv:2312.14688,

work page arXiv

[2] [2]

6 Johannes Brandstetter, Max Welling, and Daniel E. Worrall. Lie point symmetry data augmentation for neural PDE solvers. In Kamalika Chaudhuri, Stefanie Jegelka, Le Song, Csaba Szepesv ´ari, Gang Niu, and Sivan Sabato (eds.), International Conference on Machine Learning, ICML 2022, 17-23 July 2022, Baltimore, Maryland, USA, volume 162 of Proceedings of M...

work page 2022

[3] [3]

Odeformer: Symbolic regression of dynamical systems with transformers

6 St´ephane d’Ascoli, S ¨oren Becker, Alexander Mathis, Philippe Schwaller, and Niki Kilbertus. Odeformer: Symbolic regression of dynamical systems with transformers. arXiv preprint arXiv:2310.05573,

work page arXiv

[4] [4]

Ode- former: Symbolic regression of dynamical systems with transformers

6 St´ephane d’Ascoli, S¨oren Becker, Philippe Schwaller, Alexander Mathis, and Niki Kilbertus. Ode- former: Symbolic regression of dynamical systems with transformers. In The Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11,

work page 2024

[5] [5]

7, 18 11 Published as a conference paper at ICLR 2025 J.R

URL https://openreview.net/forum?id=TzoHLiGVMo. 7, 18 11 Published as a conference paper at ICLR 2025 J.R. Dormand and P.J. Prince. A family of embedded runge-kutta formulae. Journal of Compu- tational and Applied Mathematics , 6(1):19–26,

work page 2025

[6] [6]

doi: https://doi.org/ 10.1016/0771-050X(80)90013-3

ISSN 0377-0427. doi: https://doi.org/ 10.1016/0771-050X(80)90013-3. URL https://www.sciencedirect.com/science/ article/pii/0771050X80900133. 7, 19 A. C. Hindmarsh. ODEPACK, a systematized collection of ODE solvers. In R. S. Stepleman (ed.), Scientific Computing, pp. 55–64, Amsterdam,

work page doi:10.1016/0771-050x(80)90013-3

[7] [7]

Neural Operator: Graph Kernel Network for Partial Differential Equations

1, 6 Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: Graph kernel network for partial differential equations. arXiv preprint arXiv:2003.03485, 2020a. 6 Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Andrew Stuart, Kaushik Bhattacharya, and Anim...

work page internal anchor Pith review Pith/arXiv arXiv 2003

[8] [8]

Mechanistic neural networks for scientific machine learning

1, 6 12 Published as a conference paper at ICLR 2025 Adeel Pervez, Francesco Locatello, and Stratis Gavves. Mechanistic neural networks for scientific machine learning. In Forty-first International Conference on Machine Learning, ICML 2024, Vienna, Austria, July 21-27,

work page 2025

[9] [9]

Universal Differential Equations for Scientific Machine Learning

doi: 10.1137/0904010. URL https://doi.org/10.1137/0904010. 7, 19 Christopher Rackauckas, Yingbo Ma, Julius Martensen, Collin Warner, Kirill Zubov, Rohit Supekar, Dominic Skinner, and Ali Jasim Ramadhan. Universal differential equations for scientific machine learning. CoRR, abs/2001.04385,

work page internal anchor Pith review Pith/arXiv arXiv doi:10.1137/0904010 2001

[10] [10]

URL https://arxiv.org/abs/2001.04385. 6 M. Raissi, P. Perdikaris, and G.E. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics, 378:686–707,

work page internal anchor Pith review Pith/arXiv arXiv 2001

[11] [11]

Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial dif- ferential equations

ISSN 0021-9991. doi: https:// doi.org/10.1016/j.jcp.2018.10.045. URL https://www.sciencedirect.com/science/ article/pii/S0021999118307125. 6 Samuel H Rudy, Steven L Brunton, Joshua L Proctor, and J Nathan Kutz. Data-driven discovery of partial differential equations. Science advances, 3(4):e1602614,

work page doi:10.1016/j.jcp.2018.10.045 2018

[12] [12]

Climode: Climate and weather forecasting with physics-informed neural odes

6 Yogesh Verma, Markus Heinonen, and Vikas Garg. Climode: Climate and weather forecasting with physics-informed neural odes. arXiv preprint arXiv:2404.10024,

work page arXiv

[13] [13]

A, b, and W are only theoretical and they are not explicitly constructed during computation

1, 9 13 Published as a conference paper at ICLR 2025 A T HEORETICAL DERIVATIONS A.1 D EFINITIONS OF A, b, W , AND y A, b, W , and y are defined as follows. A, b, and W are only theoretical and they are not explicitly constructed during computation. Only y is computed. Let ¯¯¯At,q,v = [ct,q,v,0, . . . , ct,q,v,R]⊤ ∈ RR+1. Let ¯¯At,q = h ¯¯¯A⊤ t,q,1, . . . ...

work page 2025

[14] [14]

AutoDiff

mod ( R + 1). Define constant matrix F ∈ R(R+1)×(R+1) such that [F ]i,j = 0 if i > j, 1/ (j − i)! otherwise . (24) Define matrix S+ t = diag s0 t , s1 t , . . . , sR t ∈ R(R+1)×(R+1). Define matrix S− t = diag (−st)0 , (−st)1 , . . . ,(−st)R ∈ R(R+1)×(R+1). Define matrix S2 t = diag s0 t , s2 t , . . . , s2R t ∈ R(R+1)×(R+1). 15 Published as a conference ...

work page 2025

[15] [15]

u0 = y (0), u0 = y (0), u1 = y′ (0), u2 = y′′ (0) are initial values

and an additional third-order ODE.c0, c1, c2 are constant numbers. u0 = y (0), u0 = y (0), u1 = y′ (0), u2 = y′′ (0) are initial values. RC-circuit (charging capacitor), (c0, c1, c2) = (0.7, 1.2, 2.31), (u0) = (10), y c1 + c2 dy dt = c0, (39) y = c0c1 + (u0 − c0c1) exp − t c1c2 . (40) Population growth (naive), (c0) = (0.23), (u0) = (4.78), c0y − dy dt = ...

work page 2025

[16] [16]

Our modifications in S-MNN provide alterna- tive approximation methods that improve efficiency without sacrificing accuracy

solver approximates continuous-time dynamics through time discretization. Our modifications in S-MNN provide alterna- tive approximation methods that improve efficiency without sacrificing accuracy. While our main focus is on presenting these improvements, for completeness, we briefly describe the components from the original MNN that we have modified or ...

work page 2024

[17] [17]

54, 55, and 56): the approximation errors bounded by a slack variable ϵ ∈ R

models the smoothness constraints as inequalities (Eqs. 54, 55, and 56): the approximation errors bounded by a slack variable ϵ ∈ R. 20 Published as a conference paper at ICLR 2025 The forward and backward Taylor approximation errors in MNN are defined as: Eforward t,v,r = yt+1,v,r − RX r′=r sr′−r t (r′ − r)! yt,v,r′, (51) Ebackward t,v,r = yt,v,r − RX r′...

work page 2025

[18] [18]

The square matrix in Eq

problems are ill-defined because the number of constraints m′ exceeds the number of variables n + 1 when T is large, making the problem infeasible. The square matrix in Eq. 60 is not full rank and the problem cannot be solved directly. To circumvent this issue, the QP problem is transformed into its dual form: γI m′×m′ A′ A′⊤ 0(n+1)×(n+1) −λ y′ = b′ −∆ . ...

work page 2025