pith. sign in

arxiv: 2601.12334 · v2 · submitted 2026-01-18 · 📡 eess.SY · cs.SY

Worst-case Nonlinear Regression with Error Bounds

Pith reviewed 2026-05-16 13:38 UTC · model grok-4.3

classification 📡 eess.SY cs.SY
keywords minimax regressionactive learningworst-case error boundssurrogate modelingnonlinear approximationneural network trainingmodel predictive control
0
0 comments X

The pith

An active-learning method trains nonlinear surrogate models to minimize the maximum approximation error while providing explicit worst-case bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a technique for fitting surrogate models like neural networks to nonlinear functions by directly minimizing the largest absolute error over a compact input set. A smooth approximation to the infinity-norm loss allows the use of standard gradient descent for training. The training dataset grows iteratively as global optimization identifies input locations where the current model deviates most from the true function. Both constant and input-dependent error bounds are proven to hold uniformly across the entire domain. This framework supports applications in approximating nonlinear dynamics, nonconvex sets, and explicit model predictive control policies.

Core claim

By minimizing a smooth L_infinity approximation to the worst-case loss and actively enriching the training set at points of maximum error found through global optimization, surrogate models such as neural networks can be obtained with guaranteed constant or input-dependent worst-case error bounds over the full input domain.

What carries the argument

A smooth L_infinity loss approximation that replaces the nonsmooth minimax objective, paired with global optimization to select new training points at the current maximum residual locations.

If this is right

  • Surrogate models achieve minimax optimality in the sense of smallest possible maximum error.
  • Explicit error bounds allow certification of approximation quality everywhere in the input space.
  • The method applies directly to approximating uncertain nonlinear dynamics and explicit MPC laws.
  • Gradient-based training becomes feasible despite the original nonsmooth loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Such bounds could enable safer use of learned models in safety-critical control systems.
  • The active sampling strategy may require fewer evaluations than passive uniform sampling for the same accuracy.
  • Extensions to other surrogate types beyond neural networks are straightforward given the general formulation.

Load-bearing premise

Global optimization must reliably locate the inputs that maximize the current approximation error, and the smoothing must not significantly change the location of the minimax solution.

What would settle it

Running the method on a known nonlinear function where the true maximum-error points are analytically known; if the derived bounds are violated on a test point, the claim fails.

Figures

Figures reproduced from arXiv: 2601.12334 by Alberto Bemporad.

Figure 1
Figure 1. Figure 1: Uncertainty bounds obtained by running Algorithm 1 on Example 1. [PITH_FULL_IMAGE:figures/full_fig_p010_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Execution of Algorithm 1 on Example 1: maximum error [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Gaussian function. Top-left: function f(x); Top-right: learned leaky￾ReLU ˆf(x; θ); Bottom-left: pointwise error e(x) = f(x) − ˆf(x; θ) and envelope [¯e ⋆ min(x), e¯ ⋆ max(x)]; Bottom-right: WCE and MSE over active-learning iterations of Algorithm 1. with x1 ∈ [−2, 2], x2 ∈ [−2, 2]. Following the approach described in Section 4, we run Algorithm 1 with N0 = 50 for M = 50 active-learning steps, using η = 10… view at source ↗
Figure 4
Figure 4. Figure 4: Convex inner approximation of a nonconvex set [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Pendulum model example: reconstructed discrete-time PWA mapping [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Pendulum model example: regions of equivalent PWA representation of [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: mpQP example. Top-left: exact QP solution [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Linear MPC example: closed-loop simulations under exact (QP-based) [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗
read the original abstract

We propose an active-learning method for nonlinear minimax regression. Given a nonlinear function that can be arbitrarily evaluated over a compact set, we fit a surrogate model, such as a feedforward neural network, by minimizing the maximum absolute approximation error. To handle the nonsmoothness of this worst-case loss, we introduce a smooth $L_\infty$ approximation that enables efficient gradient-based training. The training set is iteratively enriched by querying points of largest error via global optimization. We also derive constant and input-dependent worst-case error bounds over the entire input domain. The approach is validated on approximations of nonlinear functions and nonconvex sets, uncertain models of nonlinear dynamics, and explicit model predictive control laws. A Python library is available at https://github.com/bemporad/maxfit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an active-learning method for nonlinear minimax regression: a surrogate (e.g., feedforward neural network) is trained by minimizing a smooth L∞ approximation to the maximum absolute error over a compact domain; the training set is iteratively enriched by querying points that maximize |f(x) − ŷ(x)| via global optimization; constant and input-dependent worst-case L∞ error bounds are derived over the entire input domain; the approach is demonstrated on nonlinear function approximation, nonconvex-set approximation, uncertain nonlinear dynamics, and explicit MPC laws, with an accompanying Python library.

Significance. If the derived bounds are valid and the active-learning procedure reliably identifies the true maximizers, the method would supply a practical route to surrogate models with explicit, non-asymptotic worst-case guarantees—particularly valuable for robust control and safety-critical approximation tasks. The open-source library is a concrete strength that supports reproducibility.

major comments (2)
  1. [§3] §3 (Active-learning loop): the validity of both the constant and input-dependent L∞ bounds over the whole domain rests on the global optimizer returning a point sufficiently close to the true maximizer of |f(x) − ŷ(x)| at each iteration. No convergence rate, failure-probability bound, or empirical verification on multimodal error surfaces (typical for neural-network residuals) is supplied; if the optimizer returns a local maximum, the subsequent analytic bounds become invalid.
  2. [§4] §4 (Error-bound derivations): the constant and input-dependent bounds are stated to hold over the entire compact domain, yet the text does not clarify whether they are computed solely from the trained surrogate parameters or whether they implicitly require the training set to contain the exact global maximizers; this relationship must be made explicit for the bounds to be usable in practice.
minor comments (2)
  1. [Abstract] The abstract asserts “bound tightness” and “training efficiency” but supplies no quantitative metrics; a short table of final L∞ errors versus training-set size would strengthen the validation section.
  2. [§2] Notation for the smooth L∞ approximation (e.g., the smoothing parameter ε) should be introduced once with a clear reference to its effect on the minimax solution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the active-learning procedure and error-bound derivations. We address each major comment below and indicate the planned revisions.

read point-by-point responses
  1. Referee: [§3] §3 (Active-learning loop): the validity of both the constant and input-dependent L∞ bounds over the whole domain rests on the global optimizer returning a point sufficiently close to the true maximizer of |f(x) − ŷ(x)| at each iteration. No convergence rate, failure-probability bound, or empirical verification on multimodal error surfaces (typical for neural-network residuals) is supplied; if the optimizer returns a local maximum, the subsequent analytic bounds become invalid.

    Authors: We agree that the bounds' validity is conditional on the global optimizer returning points sufficiently close to the true maximizers. The manuscript employs a global optimizer (such as differential evolution) for this purpose during active learning. We will revise §3 to state this assumption explicitly and add empirical verification showing the optimizer's behavior on multimodal error surfaces typical of neural-network residuals. A general convergence rate or failure-probability bound cannot be supplied for arbitrary multimodal functions, as global optimization lacks such guarantees in the nonconvex case; this limitation will be noted in the revision. revision: partial

  2. Referee: [§4] §4 (Error-bound derivations): the constant and input-dependent bounds are stated to hold over the entire compact domain, yet the text does not clarify whether they are computed solely from the trained surrogate parameters or whether they implicitly require the training set to contain the exact global maximizers; this relationship must be made explicit for the bounds to be usable in practice.

    Authors: The bounds are derived from the properties of the trained surrogate and the minimax objective; they hold over the domain for the final surrogate without requiring the training set to contain exact global maximizers at every iteration. The active-learning process enriches the set to support minimization of the worst-case error, after which the bounds apply to the resulting model. We will revise §4 to make this relationship explicit and clarify the conditions under which the bounds are valid. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations remain independent of fitted values

full rationale

The paper's core steps—smooth L∞ approximation of the minimax loss, iterative enrichment of the training set via global maximization of |f(x)−ŷ(x)|, and derivation of constant plus input-dependent L∞ error bounds—are presented as self-contained analytic constructions. No equation reduces a claimed prediction or bound to a fitted parameter by definition, no uniqueness theorem is imported from the author's prior work to force the method, and no ansatz is smuggled via self-citation. The global-optimization step is an algorithmic choice whose correctness is external to the bound formulas themselves; the bounds are stated to hold once the maximizers are located, without the bounds being used to define the maximizers. This matches the reader's assessment of score 2.0 and yields an overall circularity score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond standard neural-network training assumptions.

pith-pipeline@v0.9.0 · 5415 in / 969 out tokens · 34953 ms · 2026-05-16T13:38:00.325608+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    Alsmeier, L

    H. Alsmeier, L. Theiner, A. Savchenko, A. Mesbah, and R. Findeisen. Imita- tion learning of MPC with neural networks: Error guarantees and sparsifica- tion. InIEEE 63rd Conference on Decision and Control, pages 4777–4782, Milano, Italy, 2024

  2. [2]

    Angelopoulos and S

    A.N. Angelopoulos and S. Bates. Conformal prediction: A gentle introduc- tion.Foundations and Trends in Machine Learning, 16(4):494–591, 2023

  3. [3]

    A High-Performant Multi-Parametric Quadratic Programming Solver,

    D. Arnström and D. Axehill. A high-performant multi-parametric quadratic programming solver. 2024. arXiv preprint 2404.05511. 20

  4. [4]

    Arnström, A

    D. Arnström, A. Bemporad, and D. Axehill. A dual active-set solver for em- bedded quadratic programming using recursive LDLT updates.IEEE Trans- actions on Automatic Control, 67(8):4362–4369, 2022

  5. [5]

    Barber, E.J

    R.F. Barber, E.J. Candès, A. Ramdas, and R.J. Tibshirani. Predictive infer- ence with the jackknife+.The Annals of Statistics, 49(1):486–507, 2021

  6. [6]

    Bemporad

    A. Bemporad. Explicit model predictive control. In J. Baillieul and T. Samad, editors,Encyclopedia of Systems and Control, pages 744–751. 2021

  7. [7]

    Bemporad

    A. Bemporad. Active learning for regression by inverse distance weighting. Information Sciences, 626:275–292, May 2023

  8. [8]

    Bemporad

    A. Bemporad. An L-BFGS-B approach for linear and nonlinear system iden- tification underℓ 1 and group-lasso regularization.IEEE Transactions on Au- tomatic Control, 70(7):4857–4864, 2025

  9. [9]

    Bemporad, M

    A. Bemporad, M. Morari, V . Dua, and E.N. Pistikopoulos. The explicit linear quadratic regulator for constrained systems.Automatica, 38(1):3–20, 2002

  10. [10]

    Bemporad, M

    A. Bemporad, M. Morari, and N.L. Ricker.Model Predictive Control Toolbox for MATLAB – User’s Guide. The Mathworks, Inc., 2004

  11. [11]

    Canale, L

    M. Canale, L. Fagiano, and M. Milanese. Fast nonlinear model predictive control via set membership approximation: an overview. InProc. 17th IFAC World Congress, pages 12165–12170. Seoul, Korea, 2008

  12. [12]

    S. Chen, K. Saulnier, N. Atanasov, D. Lee, V . Kumar, G. Pappas, and M. Morari. Approximating explicit model predictive control using con- strained neural networks. pages 1520–1527. American Control Conference, 2018

  13. [13]

    Fazlyab, M

    M. Fazlyab, M. Morari, and G.J. Pappas. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite pro- gramming.IEEE Transactions on Automatic Control, 67(1):1–15, 2022

  14. [14]

    Huber and E.M

    P.J. Huber and E.M. Ronchetti.Robust Statistics. Wiley, Hoboken, NJ, 2nd edition, 2009

  15. [15]

    Jones, M

    D.R. Jones, M. Schonlau, and W.J. Matthias. Efficient global optimization of expensive black-box functions.Journal of Global Optimization, 13(4):455– 492, 1998

  16. [16]

    Karg and S

    B. Karg and S. Lucia. Efficient representation and approximation of model predictive control laws via deep learning.IEEE Transactions on Cybernetics, 50(9):3866–3878, 2020

  17. [17]

    Kidger.On Neural Differential Equations

    P. Kidger.On Neural Differential Equations. PhD thesis, University of Ox- ford, 2021. 21

  18. [18]

    Liu and J

    D.C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization.Mathematical programming, 45(1-3):503–528, 1989

  19. [19]

    Masti, F

    D. Masti, F. Fabiani, G. Gnecco, and A. Bemporad. Counter-example guided inductive synthesis of control Lyapunov functions for uncertain systems. IEEE Control Systems Letters, 7:2047–2052, 2023

  20. [20]

    McKay, R.J

    M.D. McKay, R.J. Beckman, and W.J. Conover. Comparison of three meth- ods for selecting values of input variables in the analysis of output from a computer code.Technometrics, 21(2):239–245, 1979

  21. [21]

    Milanese and C

    M. Milanese and C. Novara. Set membership identification of nonlinear sys- tems.Automatica, 40(6):957–975, 2004

  22. [22]

    Parisini and R

    T. Parisini and R. Zoppoli. A receding-horizon regulator for nonlinear sys- tems and a neural approximation.Automatica, 31(10):1443–1451, 1995

  23. [23]

    Powell.Approximation Theory and Methods

    M.J.D. Powell.Approximation Theory and Methods. Cambridge University Press, Cambridge, UK, 1981

  24. [24]

    Rasmussen and C.K.I

    C.E. Rasmussen and C.K.I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006

  25. [25]

    Rios and N.V

    L.M. Rios and N.V . Sahinidis. Derivative-free optimization: a review of al- gorithms and comparison of software implementations.Journal of Global Optimization, 56(3):1247–1293, 2013

  26. [26]

    Schaller, A

    M. Schaller, A. Bemporad, and S. Boyd. Learning parametric convex func- tions. 2025. arXiv preprint 2506.04183

  27. [27]

    B. Settles. Active learning. InSynthesis Lectures on Artificial Intelligence and Machine Learning, number 18. Morgan & Claypool Publishers, 2012. A Appendix A.1 Proof of Proposition 1 Consider any given vectorθ∈R nθ, lete k(θ)≜y k−f(x k;θ),˜e(θ)≜max k |ek(θ)|, and let ˜K(θ)be the set of indicesksuch that|e k(θ)|= ˜e(θ). Denote by# ˜K(θ)the cardinality of ...