Worst-case Nonlinear Regression with Error Bounds
Pith reviewed 2026-05-16 13:38 UTC · model grok-4.3
The pith
An active-learning method trains nonlinear surrogate models to minimize the maximum approximation error while providing explicit worst-case bounds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By minimizing a smooth L_infinity approximation to the worst-case loss and actively enriching the training set at points of maximum error found through global optimization, surrogate models such as neural networks can be obtained with guaranteed constant or input-dependent worst-case error bounds over the full input domain.
What carries the argument
A smooth L_infinity loss approximation that replaces the nonsmooth minimax objective, paired with global optimization to select new training points at the current maximum residual locations.
If this is right
- Surrogate models achieve minimax optimality in the sense of smallest possible maximum error.
- Explicit error bounds allow certification of approximation quality everywhere in the input space.
- The method applies directly to approximating uncertain nonlinear dynamics and explicit MPC laws.
- Gradient-based training becomes feasible despite the original nonsmooth loss.
Where Pith is reading between the lines
- Such bounds could enable safer use of learned models in safety-critical control systems.
- The active sampling strategy may require fewer evaluations than passive uniform sampling for the same accuracy.
- Extensions to other surrogate types beyond neural networks are straightforward given the general formulation.
Load-bearing premise
Global optimization must reliably locate the inputs that maximize the current approximation error, and the smoothing must not significantly change the location of the minimax solution.
What would settle it
Running the method on a known nonlinear function where the true maximum-error points are analytically known; if the derived bounds are violated on a test point, the claim fails.
Figures
read the original abstract
We propose an active-learning method for nonlinear minimax regression. Given a nonlinear function that can be arbitrarily evaluated over a compact set, we fit a surrogate model, such as a feedforward neural network, by minimizing the maximum absolute approximation error. To handle the nonsmoothness of this worst-case loss, we introduce a smooth $L_\infty$ approximation that enables efficient gradient-based training. The training set is iteratively enriched by querying points of largest error via global optimization. We also derive constant and input-dependent worst-case error bounds over the entire input domain. The approach is validated on approximations of nonlinear functions and nonconvex sets, uncertain models of nonlinear dynamics, and explicit model predictive control laws. A Python library is available at https://github.com/bemporad/maxfit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an active-learning method for nonlinear minimax regression: a surrogate (e.g., feedforward neural network) is trained by minimizing a smooth L∞ approximation to the maximum absolute error over a compact domain; the training set is iteratively enriched by querying points that maximize |f(x) − ŷ(x)| via global optimization; constant and input-dependent worst-case L∞ error bounds are derived over the entire input domain; the approach is demonstrated on nonlinear function approximation, nonconvex-set approximation, uncertain nonlinear dynamics, and explicit MPC laws, with an accompanying Python library.
Significance. If the derived bounds are valid and the active-learning procedure reliably identifies the true maximizers, the method would supply a practical route to surrogate models with explicit, non-asymptotic worst-case guarantees—particularly valuable for robust control and safety-critical approximation tasks. The open-source library is a concrete strength that supports reproducibility.
major comments (2)
- [§3] §3 (Active-learning loop): the validity of both the constant and input-dependent L∞ bounds over the whole domain rests on the global optimizer returning a point sufficiently close to the true maximizer of |f(x) − ŷ(x)| at each iteration. No convergence rate, failure-probability bound, or empirical verification on multimodal error surfaces (typical for neural-network residuals) is supplied; if the optimizer returns a local maximum, the subsequent analytic bounds become invalid.
- [§4] §4 (Error-bound derivations): the constant and input-dependent bounds are stated to hold over the entire compact domain, yet the text does not clarify whether they are computed solely from the trained surrogate parameters or whether they implicitly require the training set to contain the exact global maximizers; this relationship must be made explicit for the bounds to be usable in practice.
minor comments (2)
- [Abstract] The abstract asserts “bound tightness” and “training efficiency” but supplies no quantitative metrics; a short table of final L∞ errors versus training-set size would strengthen the validation section.
- [§2] Notation for the smooth L∞ approximation (e.g., the smoothing parameter ε) should be introduced once with a clear reference to its effect on the minimax solution.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on the active-learning procedure and error-bound derivations. We address each major comment below and indicate the planned revisions.
read point-by-point responses
-
Referee: [§3] §3 (Active-learning loop): the validity of both the constant and input-dependent L∞ bounds over the whole domain rests on the global optimizer returning a point sufficiently close to the true maximizer of |f(x) − ŷ(x)| at each iteration. No convergence rate, failure-probability bound, or empirical verification on multimodal error surfaces (typical for neural-network residuals) is supplied; if the optimizer returns a local maximum, the subsequent analytic bounds become invalid.
Authors: We agree that the bounds' validity is conditional on the global optimizer returning points sufficiently close to the true maximizers. The manuscript employs a global optimizer (such as differential evolution) for this purpose during active learning. We will revise §3 to state this assumption explicitly and add empirical verification showing the optimizer's behavior on multimodal error surfaces typical of neural-network residuals. A general convergence rate or failure-probability bound cannot be supplied for arbitrary multimodal functions, as global optimization lacks such guarantees in the nonconvex case; this limitation will be noted in the revision. revision: partial
-
Referee: [§4] §4 (Error-bound derivations): the constant and input-dependent bounds are stated to hold over the entire compact domain, yet the text does not clarify whether they are computed solely from the trained surrogate parameters or whether they implicitly require the training set to contain the exact global maximizers; this relationship must be made explicit for the bounds to be usable in practice.
Authors: The bounds are derived from the properties of the trained surrogate and the minimax objective; they hold over the domain for the final surrogate without requiring the training set to contain exact global maximizers at every iteration. The active-learning process enriches the set to support minimization of the worst-case error, after which the bounds apply to the resulting model. We will revise §4 to make this relationship explicit and clarify the conditions under which the bounds are valid. revision: yes
Circularity Check
No significant circularity; derivations remain independent of fitted values
full rationale
The paper's core steps—smooth L∞ approximation of the minimax loss, iterative enrichment of the training set via global maximization of |f(x)−ŷ(x)|, and derivation of constant plus input-dependent L∞ error bounds—are presented as self-contained analytic constructions. No equation reduces a claimed prediction or bound to a fitted parameter by definition, no uniqueness theorem is imported from the author's prior work to force the method, and no ansatz is smuggled via self-citation. The global-optimization step is an algorithmic choice whose correctness is external to the bound formulas themselves; the bounds are stated to hold once the maximizers are located, without the bounds being used to define the maximizers. This matches the reader's assessment of score 2.0 and yields an overall circularity score of 0.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
H. Alsmeier, L. Theiner, A. Savchenko, A. Mesbah, and R. Findeisen. Imita- tion learning of MPC with neural networks: Error guarantees and sparsifica- tion. InIEEE 63rd Conference on Decision and Control, pages 4777–4782, Milano, Italy, 2024
work page 2024
-
[2]
A.N. Angelopoulos and S. Bates. Conformal prediction: A gentle introduc- tion.Foundations and Trends in Machine Learning, 16(4):494–591, 2023
work page 2023
-
[3]
A High-Performant Multi-Parametric Quadratic Programming Solver,
D. Arnström and D. Axehill. A high-performant multi-parametric quadratic programming solver. 2024. arXiv preprint 2404.05511. 20
-
[4]
D. Arnström, A. Bemporad, and D. Axehill. A dual active-set solver for em- bedded quadratic programming using recursive LDLT updates.IEEE Trans- actions on Automatic Control, 67(8):4362–4369, 2022
work page 2022
-
[5]
R.F. Barber, E.J. Candès, A. Ramdas, and R.J. Tibshirani. Predictive infer- ence with the jackknife+.The Annals of Statistics, 49(1):486–507, 2021
work page 2021
- [6]
- [7]
- [8]
-
[9]
A. Bemporad, M. Morari, V . Dua, and E.N. Pistikopoulos. The explicit linear quadratic regulator for constrained systems.Automatica, 38(1):3–20, 2002
work page 2002
-
[10]
A. Bemporad, M. Morari, and N.L. Ricker.Model Predictive Control Toolbox for MATLAB – User’s Guide. The Mathworks, Inc., 2004
work page 2004
- [11]
-
[12]
S. Chen, K. Saulnier, N. Atanasov, D. Lee, V . Kumar, G. Pappas, and M. Morari. Approximating explicit model predictive control using con- strained neural networks. pages 1520–1527. American Control Conference, 2018
work page 2018
-
[13]
M. Fazlyab, M. Morari, and G.J. Pappas. Safety verification and robustness analysis of neural networks via quadratic constraints and semidefinite pro- gramming.IEEE Transactions on Automatic Control, 67(1):1–15, 2022
work page 2022
-
[14]
P.J. Huber and E.M. Ronchetti.Robust Statistics. Wiley, Hoboken, NJ, 2nd edition, 2009
work page 2009
- [15]
-
[16]
B. Karg and S. Lucia. Efficient representation and approximation of model predictive control laws via deep learning.IEEE Transactions on Cybernetics, 50(9):3866–3878, 2020
work page 2020
-
[17]
Kidger.On Neural Differential Equations
P. Kidger.On Neural Differential Equations. PhD thesis, University of Ox- ford, 2021. 21
work page 2021
- [18]
- [19]
-
[20]
M.D. McKay, R.J. Beckman, and W.J. Conover. Comparison of three meth- ods for selecting values of input variables in the analysis of output from a computer code.Technometrics, 21(2):239–245, 1979
work page 1979
-
[21]
M. Milanese and C. Novara. Set membership identification of nonlinear sys- tems.Automatica, 40(6):957–975, 2004
work page 2004
-
[22]
T. Parisini and R. Zoppoli. A receding-horizon regulator for nonlinear sys- tems and a neural approximation.Automatica, 31(10):1443–1451, 1995
work page 1995
-
[23]
Powell.Approximation Theory and Methods
M.J.D. Powell.Approximation Theory and Methods. Cambridge University Press, Cambridge, UK, 1981
work page 1981
-
[24]
C.E. Rasmussen and C.K.I. Williams.Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA, 2006
work page 2006
-
[25]
L.M. Rios and N.V . Sahinidis. Derivative-free optimization: a review of al- gorithms and comparison of software implementations.Journal of Global Optimization, 56(3):1247–1293, 2013
work page 2013
-
[26]
M. Schaller, A. Bemporad, and S. Boyd. Learning parametric convex func- tions. 2025. arXiv preprint 2506.04183
-
[27]
B. Settles. Active learning. InSynthesis Lectures on Artificial Intelligence and Machine Learning, number 18. Morgan & Claypool Publishers, 2012. A Appendix A.1 Proof of Proposition 1 Consider any given vectorθ∈R nθ, lete k(θ)≜y k−f(x k;θ),˜e(θ)≜max k |ek(θ)|, and let ˜K(θ)be the set of indicesksuch that|e k(θ)|= ˜e(θ). Denote by# ˜K(θ)the cardinality of ...
work page 2012
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.