Worst-case Nonlinear Regression with Error Bounds

Alberto Bemporad

arxiv: 2601.12334 · v2 · submitted 2026-01-18 · 📡 eess.SY · cs.SY

Worst-case Nonlinear Regression with Error Bounds

Alberto Bemporad This is my paper

Pith reviewed 2026-05-16 13:38 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords minimax regressionactive learningworst-case error boundssurrogate modelingnonlinear approximationneural network trainingmodel predictive control

0 comments

The pith

An active-learning method trains nonlinear surrogate models to minimize the maximum approximation error while providing explicit worst-case bounds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a technique for fitting surrogate models like neural networks to nonlinear functions by directly minimizing the largest absolute error over a compact input set. A smooth approximation to the infinity-norm loss allows the use of standard gradient descent for training. The training dataset grows iteratively as global optimization identifies input locations where the current model deviates most from the true function. Both constant and input-dependent error bounds are proven to hold uniformly across the entire domain. This framework supports applications in approximating nonlinear dynamics, nonconvex sets, and explicit model predictive control policies.

Core claim

By minimizing a smooth L_infinity approximation to the worst-case loss and actively enriching the training set at points of maximum error found through global optimization, surrogate models such as neural networks can be obtained with guaranteed constant or input-dependent worst-case error bounds over the full input domain.

What carries the argument

A smooth L_infinity loss approximation that replaces the nonsmooth minimax objective, paired with global optimization to select new training points at the current maximum residual locations.

If this is right

Surrogate models achieve minimax optimality in the sense of smallest possible maximum error.
Explicit error bounds allow certification of approximation quality everywhere in the input space.
The method applies directly to approximating uncertain nonlinear dynamics and explicit MPC laws.
Gradient-based training becomes feasible despite the original nonsmooth loss.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such bounds could enable safer use of learned models in safety-critical control systems.
The active sampling strategy may require fewer evaluations than passive uniform sampling for the same accuracy.
Extensions to other surrogate types beyond neural networks are straightforward given the general formulation.

Load-bearing premise

Global optimization must reliably locate the inputs that maximize the current approximation error, and the smoothing must not significantly change the location of the minimax solution.

What would settle it

Running the method on a known nonlinear function where the true maximum-error points are analytically known; if the derived bounds are violated on a test point, the claim fails.

Figures

Figures reproduced from arXiv: 2601.12334 by Alberto Bemporad.

**Figure 2.** Figure 2: Execution of Algorithm 1 on Example 1: maximum error [PITH_FULL_IMAGE:figures/full_fig_p011_2.png] view at source ↗

**Figure 3.** Figure 3: Gaussian function. Top-left: function f(x); Top-right: learned leakyReLU ˆf(x; θ); Bottom-left: pointwise error e(x) = f(x) − ˆf(x; θ) and envelope [¯e ⋆ min(x), e¯ ⋆ max(x)]; Bottom-right: WCE and MSE over active-learning iterations of Algorithm 1. with x1 ∈ [−2, 2], x2 ∈ [−2, 2]. Following the approach described in Section 4, we run Algorithm 1 with N0 = 50 for M = 50 active-learning steps, using η = 10… view at source ↗

**Figure 4.** Figure 4: Convex inner approximation of a nonconvex set [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Pendulum model example: reconstructed discrete-time PWA mapping [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Pendulum model example: regions of equivalent PWA representation of [PITH_FULL_IMAGE:figures/full_fig_p018_6.png] view at source ↗

**Figure 7.** Figure 7: mpQP example. Top-left: exact QP solution [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: Linear MPC example: closed-loop simulations under exact (QP-based) [PITH_FULL_IMAGE:figures/full_fig_p020_8.png] view at source ↗

read the original abstract

We propose an active-learning method for nonlinear minimax regression. Given a nonlinear function that can be arbitrarily evaluated over a compact set, we fit a surrogate model, such as a feedforward neural network, by minimizing the maximum absolute approximation error. To handle the nonsmoothness of this worst-case loss, we introduce a smooth $L_\infty$ approximation that enables efficient gradient-based training. The training set is iteratively enriched by querying points of largest error via global optimization. We also derive constant and input-dependent worst-case error bounds over the entire input domain. The approach is validated on approximations of nonlinear functions and nonconvex sets, uncertain models of nonlinear dynamics, and explicit model predictive control laws. A Python library is available at https://github.com/bemporad/maxfit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a practical active-learning method for minimax nonlinear regression with derived error bounds, but the global optimization step for finding max-error points lacks the guarantees needed to make those bounds reliable.

read the letter

This paper gives a workable active-learning procedure for fitting nonlinear models under worst-case error, along with derived bounds that depend on the input. It fits a surrogate such as a neural net by minimizing a smooth approximation to the maximum absolute error, then repeatedly adds new points by solving a global optimization problem to locate the current largest error, and finally supplies both constant and input-dependent worst-case bounds over the whole domain. A Python library is released with the work, which is a concrete help for anyone who wants to test the approach on their own problems. The validation examples cover nonlinear function approximation, nonconvex sets, uncertain dynamics, and explicit MPC laws, so the method has clear reach inside control and optimization. The combination of the smooth L_infty training, the active enrichment loop, and the explicit bounds is not directly in the cited prior work, so that part is new. The soft spot is the dependence on global optimization to find the true maximum-error locations at each iteration. When the error surface is multimodal, as it typically is with neural-net surrogates of non-convex maps, a local optimizer can miss the global peak. In that case the training set is incomplete, the surrogate is not minimized at the right places, and the analytic bounds no longer hold over the full domain. No convergence rates or failure-probability statements are given for the global search step, so the guarantees remain conditional on perfect maximization. Without detailed numerical results showing how tight the bounds actually are in the examples, it is also hard to judge practical value. This is aimed at control engineers and optimization researchers who need surrogate models with explicit worst-case error control rather than average-case fits. A reader working on robust MPC or uncertainty quantification could get direct use from the method and the code. It deserves a serious referee because the core procedure is coherent and the implementation is public, even if the optimization guarantees need tightening.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes an active-learning method for nonlinear minimax regression: a surrogate (e.g., feedforward neural network) is trained by minimizing a smooth L∞ approximation to the maximum absolute error over a compact domain; the training set is iteratively enriched by querying points that maximize |f(x) − ŷ(x)| via global optimization; constant and input-dependent worst-case L∞ error bounds are derived over the entire input domain; the approach is demonstrated on nonlinear function approximation, nonconvex-set approximation, uncertain nonlinear dynamics, and explicit MPC laws, with an accompanying Python library.

Significance. If the derived bounds are valid and the active-learning procedure reliably identifies the true maximizers, the method would supply a practical route to surrogate models with explicit, non-asymptotic worst-case guarantees—particularly valuable for robust control and safety-critical approximation tasks. The open-source library is a concrete strength that supports reproducibility.

major comments (2)

[§3] §3 (Active-learning loop): the validity of both the constant and input-dependent L∞ bounds over the whole domain rests on the global optimizer returning a point sufficiently close to the true maximizer of |f(x) − ŷ(x)| at each iteration. No convergence rate, failure-probability bound, or empirical verification on multimodal error surfaces (typical for neural-network residuals) is supplied; if the optimizer returns a local maximum, the subsequent analytic bounds become invalid.
[§4] §4 (Error-bound derivations): the constant and input-dependent bounds are stated to hold over the entire compact domain, yet the text does not clarify whether they are computed solely from the trained surrogate parameters or whether they implicitly require the training set to contain the exact global maximizers; this relationship must be made explicit for the bounds to be usable in practice.

minor comments (2)

[Abstract] The abstract asserts “bound tightness” and “training efficiency” but supplies no quantitative metrics; a short table of final L∞ errors versus training-set size would strengthen the validation section.
[§2] Notation for the smooth L∞ approximation (e.g., the smoothing parameter ε) should be introduced once with a clear reference to its effect on the minimax solution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the active-learning procedure and error-bound derivations. We address each major comment below and indicate the planned revisions.

read point-by-point responses

Referee: [§3] §3 (Active-learning loop): the validity of both the constant and input-dependent L∞ bounds over the whole domain rests on the global optimizer returning a point sufficiently close to the true maximizer of |f(x) − ŷ(x)| at each iteration. No convergence rate, failure-probability bound, or empirical verification on multimodal error surfaces (typical for neural-network residuals) is supplied; if the optimizer returns a local maximum, the subsequent analytic bounds become invalid.

Authors: We agree that the bounds' validity is conditional on the global optimizer returning points sufficiently close to the true maximizers. The manuscript employs a global optimizer (such as differential evolution) for this purpose during active learning. We will revise §3 to state this assumption explicitly and add empirical verification showing the optimizer's behavior on multimodal error surfaces typical of neural-network residuals. A general convergence rate or failure-probability bound cannot be supplied for arbitrary multimodal functions, as global optimization lacks such guarantees in the nonconvex case; this limitation will be noted in the revision. revision: partial
Referee: [§4] §4 (Error-bound derivations): the constant and input-dependent bounds are stated to hold over the entire compact domain, yet the text does not clarify whether they are computed solely from the trained surrogate parameters or whether they implicitly require the training set to contain the exact global maximizers; this relationship must be made explicit for the bounds to be usable in practice.

Authors: The bounds are derived from the properties of the trained surrogate and the minimax objective; they hold over the domain for the final surrogate without requiring the training set to contain exact global maximizers at every iteration. The active-learning process enriches the set to support minimization of the worst-case error, after which the bounds apply to the resulting model. We will revise §4 to make this relationship explicit and clarify the conditions under which the bounds are valid. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivations remain independent of fitted values

full rationale

The paper's core steps—smooth L∞ approximation of the minimax loss, iterative enrichment of the training set via global maximization of |f(x)−ŷ(x)|, and derivation of constant plus input-dependent L∞ error bounds—are presented as self-contained analytic constructions. No equation reduces a claimed prediction or bound to a fitted parameter by definition, no uniqueness theorem is imported from the author's prior work to force the method, and no ansatz is smuggled via self-citation. The global-optimization step is an algorithmic choice whose correctness is external to the bound formulas themselves; the bounds are stated to hold once the maximizers are located, without the bounds being used to define the maximizers. This matches the reader's assessment of score 2.0 and yields an overall circularity score of 0.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no explicit free parameters, axioms, or invented entities beyond standard neural-network training assumptions.

pith-pipeline@v0.9.0 · 5415 in / 969 out tokens · 34953 ms · 2026-05-16T13:38:00.325608+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

[1]

Alsmeier, L

H. Alsmeier, L. Theiner, A. Savchenko, A. Mesbah, and R. Findeisen. Imita- tion learning of MPC with neural networks: Error guarantees and sparsifica- tion. InIEEE 63rd Conference on Decision and Control, pages 4777–4782, Milano, Italy, 2024

work page 2024
[2]

Angelopoulos and S

A.N. Angelopoulos and S. Bates. Conformal prediction: A gentle introduc- tion.Foundations and Trends in Machine Learning, 16(4):494–591, 2023

work page 2023
[3]

A High-Performant Multi-Parametric Quadratic Programming Solver,

D. Arnström and D. Axehill. A high-performant multi-parametric quadratic programming solver. 2024. arXiv preprint 2404.05511. 20

work page arXiv 2024
[4]

Arnström, A

D. Arnström, A. Bemporad, and D. Axehill. A dual active-set solver for em- bedded quadratic programming using recursive LDLT updates.IEEE Trans- actions on Automatic Control, 67(8):4362–4369, 2022

work page 2022
[5]

Barber, E.J

R.F. Barber, E.J. Candès, A. Ramdas, and R.J. Tibshirani. Predictive infer- ence with the jackknife+.The Annals of Statistics, 49(1):486–507, 2021

work page 2021
[6]

Bemporad

A. Bemporad. Explicit model predictive control. In J. Baillieul and T. Samad, editors,Encyclopedia of Systems and Control, pages 744–751. 2021

work page 2021
[7]

Bemporad

A. Bemporad. Active learning for regression by inverse distance weighting. Information Sciences, 626:275–292, May 2023

work page 2023
[8]

Bemporad

A. Bemporad. An L-BFGS-B approach for linear and nonlinear system iden- tification underℓ 1 and group-lasso regularization.IEEE Transactions on Au- tomatic Control, 70(7):4857–4864, 2025

work page 2025
[9]

Bemporad, M

A. Bemporad, M. Morari, V . Dua, and E.N. Pistikopoulos. The explicit linear quadratic regulator for constrained systems.Automatica, 38(1):3–20, 2002

work page 2002
[10]

Bemporad, M

A. Bemporad, M. Morari, and N.L. Ricker.Model Predictive Control Toolbox for MATLAB – User’s Guide. The Mathworks, Inc., 2004

work page 2004
[11]

Canale, L

M. Canale, L. Fagiano, and M. Milanese. Fast nonlinear model predictive control via set membership approximation: an overview. InProc. 17th IFAC World Congress, pages 12165–12170. Seoul, Korea, 2008

work page 2008
[12]

S. Chen, K. Saulnier, N. Atanasov, D. Lee, V . Kumar, G. Pappas, and M. Morari. Approximating explicit model predictive control using con- strained neural networks. pages 1520–1527. American Control Conference, 2018

work page 2018
[13]

Fazlyab, M

M. Fazlyab, M. Morari, and G.J. Pappas. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite pro- gramming.IEEE Transactions on Automatic Control, 67(1):1–15, 2022

work page 2022
[14]

Huber and E.M

P.J. Huber and E.M. Ronchetti.Robust Statistics. Wiley, Hoboken, NJ, 2nd edition, 2009

work page 2009
[15]

Jones, M

D.R. Jones, M. Schonlau, and W.J. Matthias. Efficient global optimization of expensive black-box functions.Journal of Global Optimization, 13(4):455– 492, 1998

work page 1998
[16]

Karg and S

B. Karg and S. Lucia. Efficient representation and approximation of model predictive control laws via deep learning.IEEE Transactions on Cybernetics, 50(9):3866–3878, 2020

work page 2020
[17]

Kidger.On Neural Differential Equations

P. Kidger.On Neural Differential Equations. PhD thesis, University of Ox- ford, 2021. 21

work page 2021
[18]

Liu and J

D.C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization.Mathematical programming, 45(1-3):503–528, 1989

work page 1989
[19]

Masti, F

D. Masti, F. Fabiani, G. Gnecco, and A. Bemporad. Counter-example guided inductive synthesis of control Lyapunov functions for uncertain systems. IEEE Control Systems Letters, 7:2047–2052, 2023

work page 2047
[20]

McKay, R.J

M.D. McKay, R.J. Beckman, and W.J. Conover. Comparison of three meth- ods for selecting values of input variables in the analysis of output from a computer code.Technometrics, 21(2):239–245, 1979

work page 1979
[21]

Milanese and C

M. Milanese and C. Novara. Set membership identification of nonlinear sys- tems.Automatica, 40(6):957–975, 2004

work page 2004
[22]

Parisini and R

T. Parisini and R. Zoppoli. A receding-horizon regulator for nonlinear sys- tems and a neural approximation.Automatica, 31(10):1443–1451, 1995

work page 1995
[23]

Powell.Approximation Theory and Methods

M.J.D. Powell.Approximation Theory and Methods. Cambridge University Press, Cambridge, UK, 1981

work page 1981
[24]

Rasmussen and C.K.I

C.E. Rasmussen and C.K.I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006

work page 2006
[25]

Rios and N.V

L.M. Rios and N.V . Sahinidis. Derivative-free optimization: a review of al- gorithms and comparison of software implementations.Journal of Global Optimization, 56(3):1247–1293, 2013

work page 2013
[26]

Schaller, A

M. Schaller, A. Bemporad, and S. Boyd. Learning parametric convex func- tions. 2025. arXiv preprint 2506.04183

work page arXiv 2025
[27]

B. Settles. Active learning. InSynthesis Lectures on Artificial Intelligence and Machine Learning, number 18. Morgan & Claypool Publishers, 2012. A Appendix A.1 Proof of Proposition 1 Consider any given vectorθ∈R nθ, lete k(θ)≜y k−f(x k;θ),˜e(θ)≜max k |ek(θ)|, and let ˜K(θ)be the set of indicesksuch that|e k(θ)|= ˜e(θ). Denote by# ˜K(θ)the cardinality of ...

work page 2012

[1] [1]

Alsmeier, L

H. Alsmeier, L. Theiner, A. Savchenko, A. Mesbah, and R. Findeisen. Imita- tion learning of MPC with neural networks: Error guarantees and sparsifica- tion. InIEEE 63rd Conference on Decision and Control, pages 4777–4782, Milano, Italy, 2024

work page 2024

[2] [2]

Angelopoulos and S

A.N. Angelopoulos and S. Bates. Conformal prediction: A gentle introduc- tion.Foundations and Trends in Machine Learning, 16(4):494–591, 2023

work page 2023

[3] [3]

A High-Performant Multi-Parametric Quadratic Programming Solver,

D. Arnström and D. Axehill. A high-performant multi-parametric quadratic programming solver. 2024. arXiv preprint 2404.05511. 20

work page arXiv 2024

[4] [4]

Arnström, A

D. Arnström, A. Bemporad, and D. Axehill. A dual active-set solver for em- bedded quadratic programming using recursive LDLT updates.IEEE Trans- actions on Automatic Control, 67(8):4362–4369, 2022

work page 2022

[5] [5]

Barber, E.J

R.F. Barber, E.J. Candès, A. Ramdas, and R.J. Tibshirani. Predictive infer- ence with the jackknife+.The Annals of Statistics, 49(1):486–507, 2021

work page 2021

[6] [6]

Bemporad

A. Bemporad. Explicit model predictive control. In J. Baillieul and T. Samad, editors,Encyclopedia of Systems and Control, pages 744–751. 2021

work page 2021

[7] [7]

Bemporad

A. Bemporad. Active learning for regression by inverse distance weighting. Information Sciences, 626:275–292, May 2023

work page 2023

[8] [8]

Bemporad

A. Bemporad. An L-BFGS-B approach for linear and nonlinear system iden- tification underℓ 1 and group-lasso regularization.IEEE Transactions on Au- tomatic Control, 70(7):4857–4864, 2025

work page 2025

[9] [9]

Bemporad, M

A. Bemporad, M. Morari, V . Dua, and E.N. Pistikopoulos. The explicit linear quadratic regulator for constrained systems.Automatica, 38(1):3–20, 2002

work page 2002

[10] [10]

Bemporad, M

A. Bemporad, M. Morari, and N.L. Ricker.Model Predictive Control Toolbox for MATLAB – User’s Guide. The Mathworks, Inc., 2004

work page 2004

[11] [11]

Canale, L

M. Canale, L. Fagiano, and M. Milanese. Fast nonlinear model predictive control via set membership approximation: an overview. InProc. 17th IFAC World Congress, pages 12165–12170. Seoul, Korea, 2008

work page 2008

[12] [12]

S. Chen, K. Saulnier, N. Atanasov, D. Lee, V . Kumar, G. Pappas, and M. Morari. Approximating explicit model predictive control using con- strained neural networks. pages 1520–1527. American Control Conference, 2018

work page 2018

[13] [13]

Fazlyab, M

M. Fazlyab, M. Morari, and G.J. Pappas. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite pro- gramming.IEEE Transactions on Automatic Control, 67(1):1–15, 2022

work page 2022

[14] [14]

Huber and E.M

P.J. Huber and E.M. Ronchetti.Robust Statistics. Wiley, Hoboken, NJ, 2nd edition, 2009

work page 2009

[15] [15]

Jones, M

D.R. Jones, M. Schonlau, and W.J. Matthias. Efficient global optimization of expensive black-box functions.Journal of Global Optimization, 13(4):455– 492, 1998

work page 1998

[16] [16]

Karg and S

B. Karg and S. Lucia. Efficient representation and approximation of model predictive control laws via deep learning.IEEE Transactions on Cybernetics, 50(9):3866–3878, 2020

work page 2020

[17] [17]

Kidger.On Neural Differential Equations

P. Kidger.On Neural Differential Equations. PhD thesis, University of Ox- ford, 2021. 21

work page 2021

[18] [18]

Liu and J

D.C. Liu and J. Nocedal. On the limited memory BFGS method for large scale optimization.Mathematical programming, 45(1-3):503–528, 1989

work page 1989

[19] [19]

Masti, F

D. Masti, F. Fabiani, G. Gnecco, and A. Bemporad. Counter-example guided inductive synthesis of control Lyapunov functions for uncertain systems. IEEE Control Systems Letters, 7:2047–2052, 2023

work page 2047

[20] [20]

McKay, R.J

M.D. McKay, R.J. Beckman, and W.J. Conover. Comparison of three meth- ods for selecting values of input variables in the analysis of output from a computer code.Technometrics, 21(2):239–245, 1979

work page 1979

[21] [21]

Milanese and C

M. Milanese and C. Novara. Set membership identification of nonlinear sys- tems.Automatica, 40(6):957–975, 2004

work page 2004

[22] [22]

Parisini and R

T. Parisini and R. Zoppoli. A receding-horizon regulator for nonlinear sys- tems and a neural approximation.Automatica, 31(10):1443–1451, 1995

work page 1995

[23] [23]

Powell.Approximation Theory and Methods

M.J.D. Powell.Approximation Theory and Methods. Cambridge University Press, Cambridge, UK, 1981

work page 1981

[24] [24]

Rasmussen and C.K.I

C.E. Rasmussen and C.K.I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006

work page 2006

[25] [25]

Rios and N.V

L.M. Rios and N.V . Sahinidis. Derivative-free optimization: a review of al- gorithms and comparison of software implementations.Journal of Global Optimization, 56(3):1247–1293, 2013

work page 2013

[26] [26]

Schaller, A

M. Schaller, A. Bemporad, and S. Boyd. Learning parametric convex func- tions. 2025. arXiv preprint 2506.04183

work page arXiv 2025

[27] [27]

B. Settles. Active learning. InSynthesis Lectures on Artificial Intelligence and Machine Learning, number 18. Morgan & Claypool Publishers, 2012. A Appendix A.1 Proof of Proposition 1 Consider any given vectorθ∈R nθ, lete k(θ)≜y k−f(x k;θ),˜e(θ)≜max k |ek(θ)|, and let ˜K(θ)be the set of indicesksuch that|e k(θ)|= ˜e(θ). Denote by# ˜K(θ)the cardinality of ...

work page 2012