The Error in Multivariate Linear Extrapolation with Applications to Derivative-Free Optimization

Liyuan Cao; Ya-Xiang Yuan; Zaiwen Wen

arxiv: 2307.00358 · v3 · pith:EESGJGUInew · submitted 2023-07-01 · 🧮 math.OC

The Error in Multivariate Linear Extrapolation with Applications to Derivative-Free Optimization

Liyuan Cao , Zaiwen Wen , Ya-xiang Yuan This is my paper

Pith reviewed 2026-05-24 08:00 UTC · model grok-4.3

classification 🧮 math.OC

keywords multivariate linear extrapolationerror boundssharp boundsderivative-free optimizationLipschitz gradientsimplicial searchaffinely independent pointsinterpolation error

0 comments

The pith

A numerical method computes the sharp error bound for multivariate linear extrapolation under Lipschitz gradient assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a procedure to find the smallest possible upper bound on the error made when a linear function fitted to several points is evaluated outside their convex hull. It supplies both this numerical procedure and closed-form expressions that achieve the same bound for particular point geometries. The resulting estimates are inserted into an iteration-complexity argument for a basic simplicial search algorithm used when gradients are unavailable. Readers should care because linear extrapolation appears routinely in derivative-free routines, and a tight, computable error constant directly controls the number of function evaluations needed for convergence.

Core claim

The authors introduce a method to numerically compute the sharp bound on the error of linear extrapolation for functions with Lipschitz continuous gradients when the interpolation points form an affinely independent set. They also derive analytical bounds that are sharp under certain conditions and apply the bounds to obtain complexity results for a simplicial search method.

What carries the argument

Numerical maximization of the extrapolation residual over all admissible functions whose gradients satisfy a unit Lipschitz condition, subject to the given affinely independent sample set.

Load-bearing premise

The target function has a Lipschitz continuous gradient and the sample points are affinely independent.

What would settle it

A concrete function whose gradient is Lipschitz continuous, together with an affinely independent point set, for which the observed linear extrapolation error exceeds the numerically computed sharp bound.

Figures

Figures reproduced from arXiv: 2307.00358 by Liyuan Cao, Ya-Xiang Yuan, Zaiwen Wen.

**Figure 2.** Figure 2: The sharp error bound on | ˆf(x) − f(x)| for each x on the 100 × 100 grid that covers the area [−2.5, 2.5] × [−1.5, 2.5] evenly. The sample set and the Lipschitz constant are chosen as Θ = {(−0.3, 1),(−1.1, −0.5),(1, 0)} and ν = 1. In (f-EEP), the point x, its derivative, and its function value are represented by the triplet (x0, g0, y0), whereas {(xi , gi , yi)} n+1 i=1 are used for the interpolation poin… view at source ↗

**Figure 3.** Figure 3: A visualization of results in Section 4 for bivariate interpolat [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗

**Figure 2.** Figure 2: However, this does not mean (21) is a formula to the optima [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 4.** Figure 4: The areas to which if x belongs, (21) is not an upper bound on the function approximation error for bivariate interpolation. The dashed line on the left is perpendicular to the line going through x1 and x2; and the one on the right is perpendicular to the line going through x3 and x1. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Two configurations of Θ and x where (21) is an invalid error bound for bivariate extrapolation. The case in Figure 5a can be defined mathematically as ℓ2 > 0, ℓ3 < 0, and ℓ1[x2 −x1]·[x3 −x1]−ℓ3[x2 − x3] · [x1 − x3] < 0. The following lemma shows the point w, as defined in (30), is the intersection of the line going through x1 and x3 and the line going through x and x2. 17 [PITH_FULL_IMAGE:figures/full_fig… view at source ↗

read the original abstract

We study in this paper the function approximation error of multivariate linear extrapolation. The sharp error bound of linear interpolation already exists in the literature. However, linear extrapolation is used far more often in applications such as derivative-free optimization, while its error is not well-studied. We introduce in this paper a method to numerically compute the sharp bound on the error, and then present several analytical bounds along with the conditions under which they are sharp. We also provide a complexity analysis of a basic simplicial search method to illustrate an application of these error bounds in derivative-free optimization. All results are under the assumptions that the function being interpolated has Lipschitz continuous gradient and is interpolated on an affinely independent sample set.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies a numerical method for sharp linear extrapolation error bounds under Lipschitz gradients plus an illustrative DFO complexity result.

read the letter

The main thing to know is that the authors shift the focus from the already-studied interpolation error to extrapolation, which they note is more common in practice, and they give both a numerical procedure to compute the sharp constant and some analytical forms with explicit sharpness conditions. They then feed the bounds into a basic simplicial-search complexity argument to show an application in derivative-free optimization. All of this sits on the standard assumptions of Lipschitz-continuous gradient and affinely independent sample points, with no circularity in the derivations. The numerical approach looks like the most reusable piece because it avoids manual case analysis for different geometries. The analytical bounds are presented cleanly with the conditions under which they attain the worst case. The complexity corollary is straightforward and serves its illustrative purpose without claiming new rates. Minor limitations include that the computational expense of the numerical method in higher dimensions is not explored, and the DFO example stays at the level of a basic method rather than a competitive algorithm. The work is aimed at researchers who already work with error bounds in derivative-free optimization and want tighter constants for their analyses. A reader outside that niche will not find broad new theory. It is solid enough on its own terms to merit peer review rather than a desk reject, even if the scope remains narrow.

Referee Report

0 major / 4 minor

Summary. The paper studies the error of multivariate linear extrapolation for functions with Lipschitz continuous gradients, sampled on affinely independent sets. It introduces a numerical procedure to compute the sharp worst-case error bound, derives several analytical bounds together with conditions under which each is sharp, and applies the bounds to obtain a complexity result for a basic simplicial search method in derivative-free optimization.

Significance. If the derivations hold, the work addresses a practical gap: linear extrapolation appears frequently in DFO yet lacks the sharp error analysis already available for interpolation. The manuscript supplies an explicit numerical formulation for the sharp bound, proves sharpness conditions for the analytical expressions, and carries the bounds through to a falsifiable complexity corollary under standard assumptions, all without circularity or hidden parameters.

minor comments (4)

[§2] §2 (Preliminaries): the definition of the extrapolation point relative to the simplex could be accompanied by a short diagram or explicit coordinate example to aid readers unfamiliar with the geometry.
[§3] The numerical procedure in §3 is described at a high level; adding a short pseudocode listing or explicit statement of the optimization problem solved would improve reproducibility.
[Table 1] Table 1 (numerical sharpness examples): the column headers for the analytical bounds could be aligned with the theorem numbers that state the corresponding conditions.
[§5] The complexity corollary in §5 inherits the O(h²) scaling from the error bound; a one-sentence reminder of the precise constant dependence on the Lipschitz constant L would help readers trace the result.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary, significance assessment, and recommendation of minor revision. No major comments appear in the report, so we have no specific points to address point-by-point. We will incorporate any minor editorial or presentation changes in the revised version.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The derivation starts from external, explicitly stated assumptions (Lipschitz continuous gradient with constant L, affinely independent sample set) that are standard in approximation theory and DFO. The sharp error bound is obtained by a numerical procedure that solves an optimization problem over the unit ball in the gradient-Lipschitz seminorm; this is not a fit to data but a direct computation from the assumptions. Analytical bounds are then proved by relaxing or specializing that same optimization problem, with explicit sharpness conditions. The simplicial-search complexity corollary applies the derived O(h²) error directly to a standard DFO convergence argument. No equation reduces to a prior result by definition, no parameter is fitted and then relabeled as a prediction, and no load-bearing step rests on a self-citation whose content is itself unverified. The manuscript is therefore self-contained against the declared external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on two domain assumptions stated in the abstract; no free parameters or invented entities are mentioned.

axioms (2)

domain assumption The function has Lipschitz continuous gradient
Explicitly required for all error bounds and complexity results in the abstract.
domain assumption The sample set is affinely independent
Required so that the linear extrapolation model is uniquely defined.

pith-pipeline@v0.9.0 · 5646 in / 1299 out tokens · 38008 ms · 2026-05-24T08:00:57.121710+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Computation of sparse low degree inter- polating polynomials and their application to derivative-free optimizat ion

Afonso S Bandeira, Katya Scheinberg, and Lu ´ ıs Nunes Vicente. Computation of sparse low degree inter- polating polynomials and their application to derivative-free optimizat ion. Mathematical programming, 134(1):223–257, 2012

work page 2012
[2]

C onvergence of trust-region methods based on probabilistic models

Afonso S Bandeira, Katya Scheinberg, and Luis Nunes Vicente. C onvergence of trust-region methods based on probabilistic models. SIAM Journal on Optimization , 24(3):1238–1264, 2014

work page 2014
[3]

Convergence rate analysis of a stochastic trust-region method via supermartingales

Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinb erg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS journal on optimization , 1(2):92–119, 2019

work page 2019
[4]

First-and second-order high probability com- plexity bounds for trust-region methods with noisy oracles

Liyuan Cao, Albert S Berahas, and Katya Scheinberg. First-and second-order high probability com- plexity bounds for trust-region methods with noisy oracles. Mathematical Programming, pages 1–52, 2023

work page 2023
[5]

Stochast ic optimization using a trust-region method and random models

Ruobing Chen, Matt Menickelly, and Katya Scheinberg. Stochast ic optimization using a trust-region method and random models. Mathematical Programming, 169:447–487, 2018

work page 2018
[6]

General Lagrange a nd Hermite interpolation in Rn with applications to ﬁnite element methods

Philippe G Ciarlet and Pierre-Arnaud Raviart. General Lagrange a nd Hermite interpolation in Rn with applications to ﬁnite element methods. Archive for Rational Mechanics and Analysis , 46(3):177–199, 1972

work page 1972
[7]

On the conver gence of derivative-free methods for unconstrained optimization

Andrew R Conn, Katya Scheinberg, and Ph L Toint. On the conver gence of derivative-free methods for unconstrained optimization. Approximation theory and optimization: tributes to MJD Pow ell, pages 83–108, 1997

work page 1997
[8]

Geometr y of interpolation sets in derivative free optimization

Andrew R Conn, Katya Scheinberg, and Lu ´ ıs N Vicente. Geometr y of interpolation sets in derivative free optimization. Mathematical programming, 111:141–172, 2008. 26

work page 2008
[9]

Global co nvergence of general derivative- free trust-region algorithms to ﬁrst-and second-order critical points

Andrew R Conn, Katya Scheinberg, and Lu ´ ıs N Vicente. Global co nvergence of general derivative- free trust-region algorithms to ﬁrst-and second-order critical points. SIAM Journal on Optimization , 20(1):387–415, 2009

work page 2009
[10]

Introduction to derivative-free optimization

Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization . SIAM, 2009

work page 2009
[11]

Interpolation and Approximation

Philip J Davis. Interpolation and Approximation . Courier Corporation, 1975

work page 1975
[12]

Performance of ﬁrst-order me thods for smooth convex minimization: a novel approach

Yoel Drori and Marc Teboulle. Performance of ﬁrst-order me thods for smooth convex minimization: a novel approach. Mathematical Programming, 145(1-2):451–482, 2014

work page 2014
[13]

Complexity and global rates of trust-region methods based on probabilistic models

Serge Gratton, Cl´ ement W Royer, Lu ´ ıs N Vicente, and Zaikun Zhang. Complexity and global rates of trust-region methods based on probabilistic models. IMA Journal of Numerical Analysis , 38(3):1579– 1597, 2018

work page 2018
[14]

Matrix Analysis

Roger A Horn and Charles R Johnson. Matrix Analysis . Cambridge university press, 2012

work page 2012
[15]

Con vergence of the restricted nelder–mead algorithm in two dimensions

Jeﬀrey C Lagarias, Bjorn Poonen, and Margaret H Wright. Con vergence of the restricted nelder–mead algorithm in two dimensions. SIAM Journal on Optimization , 22(2):501–532, 2012

work page 2012
[16]

Derivative-f ree optimization methods

Jeﬀrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-f ree optimization methods. Acta Numerica, 28:287–404, 2019

work page 2019
[17]

Convergence of the nelder–mead simplex meth od to a nonstationary point

Ken IM McKinnon. Convergence of the nelder–mead simplex meth od to a nonstationary point. SIAM Journal on optimization , 9(1):148–158, 1998

work page 1998
[18]

A simplex method for function minim ization

John A Nelder and Roger Mead. A simplex method for function minim ization. The computer journal , 7(4):308–313, 1965

work page 1965
[19]

Introductory Lectures on Convex Optimization: A Basic Cour se, volume 87

Y Nesterov. Introductory Lectures on Convex Optimization: A Basic Cour se, volume 87. Springer Science & Business Media, 2013

work page 2013
[20]

A direct search optimization method that mode ls the objective and constraint functions by linear interpolation

Michael JD Powell. A direct search optimization method that mode ls the objective and constraint functions by linear interpolation. In Advances in optimization and numerical analysis , pages 51–67. Springer, 1994

work page 1994
[21]

On the Lagrange functions of quadratic models that are deﬁned by interpolation

MJD Powell. On the Lagrange functions of quadratic models that are deﬁned by interpolation. Opti- mization Methods and Software , 16(1-4):289–309, 2001

work page 2001
[22]

The calculus of simplex gradients

Rommel G Regis. The calculus of simplex gradients. Optimization Letters , 9:845–865, 2015

work page 2015
[23]

Sequ ential application of simplex designs in optimisation and evolutionary operation

WGRFR Spendley, George R Hext, and Francis R Himsworth. Sequ ential application of simplex designs in optimisation and evolutionary operation. Technometrics, 4(4):441–461, 1962

work page 1962
[24]

Optimal estimates for the linear interpolation error on simplices

Martin St¨ ampﬂe. Optimal estimates for the linear interpolation error on simplices. Journal of Approx- imation Theory, 103(1):78–90, 2000

work page 2000
[25]

James Joseph Sylvester. Xix. a demonstration of the theorem that every homogeneous quadratic polyno- mial is reducible by real orthogonal substitutions to the form of a s um of positive and negative squares. The London, Edinburgh, and Dublin Philosophical Magazine a nd Journal of Science , 4(23):138–142, 1852

work page
[26]

Exact worst-case performance of ﬁrst-order methods for composite convex optimization

Adrien B Taylor, Julien M Hendrickx, and Fran¸ cois Glineur. Exact worst-case performance of ﬁrst-order methods for composite convex optimization. SIAM Journal on Optimization , 27(3):1283–1313, 2017

work page 2017
[27]

Smoot h strongly convex interpolation and exact worst-case performance of ﬁrst-order methods

Adrien B Taylor, Julien M Hendrickx, and Fran¸ cois Glineur. Smoot h strongly convex interpolation and exact worst-case performance of ﬁrst-order methods. Mathematical Programming, 161:307–345, 2017. 27

work page 2017
[28]

Fortiﬁed-descent simplicial search method: A gen eral approach

Paul Tseng. Fortiﬁed-descent simplicial search method: A gen eral approach. SIAM Journal on Opti- mization, 10(1):269–288, 1999

work page 1999
[29]

The error in linear interpolation at the vertices of a simplex

Shayne Waldron. The error in linear interpolation at the vertices of a simplex. SIAM Journal on Numerical Analysis, 35(3):1191–1200, 1998

work page 1998
[30]

An interactive approach for solving multi-ob jective optimization problems

Daniel John Woods. An interactive approach for solving multi-ob jective optimization problems. 1985. 28

work page 1985

[1] [1]

Computation of sparse low degree inter- polating polynomials and their application to derivative-free optimizat ion

Afonso S Bandeira, Katya Scheinberg, and Lu ´ ıs Nunes Vicente. Computation of sparse low degree inter- polating polynomials and their application to derivative-free optimizat ion. Mathematical programming, 134(1):223–257, 2012

work page 2012

[2] [2]

C onvergence of trust-region methods based on probabilistic models

Afonso S Bandeira, Katya Scheinberg, and Luis Nunes Vicente. C onvergence of trust-region methods based on probabilistic models. SIAM Journal on Optimization , 24(3):1238–1264, 2014

work page 2014

[3] [3]

Convergence rate analysis of a stochastic trust-region method via supermartingales

Jose Blanchet, Coralia Cartis, Matt Menickelly, and Katya Scheinb erg. Convergence rate analysis of a stochastic trust-region method via supermartingales. INFORMS journal on optimization , 1(2):92–119, 2019

work page 2019

[4] [4]

First-and second-order high probability com- plexity bounds for trust-region methods with noisy oracles

Liyuan Cao, Albert S Berahas, and Katya Scheinberg. First-and second-order high probability com- plexity bounds for trust-region methods with noisy oracles. Mathematical Programming, pages 1–52, 2023

work page 2023

[5] [5]

Stochast ic optimization using a trust-region method and random models

Ruobing Chen, Matt Menickelly, and Katya Scheinberg. Stochast ic optimization using a trust-region method and random models. Mathematical Programming, 169:447–487, 2018

work page 2018

[6] [6]

General Lagrange a nd Hermite interpolation in Rn with applications to ﬁnite element methods

Philippe G Ciarlet and Pierre-Arnaud Raviart. General Lagrange a nd Hermite interpolation in Rn with applications to ﬁnite element methods. Archive for Rational Mechanics and Analysis , 46(3):177–199, 1972

work page 1972

[7] [7]

On the conver gence of derivative-free methods for unconstrained optimization

Andrew R Conn, Katya Scheinberg, and Ph L Toint. On the conver gence of derivative-free methods for unconstrained optimization. Approximation theory and optimization: tributes to MJD Pow ell, pages 83–108, 1997

work page 1997

[8] [8]

Geometr y of interpolation sets in derivative free optimization

Andrew R Conn, Katya Scheinberg, and Lu ´ ıs N Vicente. Geometr y of interpolation sets in derivative free optimization. Mathematical programming, 111:141–172, 2008. 26

work page 2008

[9] [9]

Global co nvergence of general derivative- free trust-region algorithms to ﬁrst-and second-order critical points

Andrew R Conn, Katya Scheinberg, and Lu ´ ıs N Vicente. Global co nvergence of general derivative- free trust-region algorithms to ﬁrst-and second-order critical points. SIAM Journal on Optimization , 20(1):387–415, 2009

work page 2009

[10] [10]

Introduction to derivative-free optimization

Andrew R Conn, Katya Scheinberg, and Luis N Vicente. Introduction to derivative-free optimization . SIAM, 2009

work page 2009

[11] [11]

Interpolation and Approximation

Philip J Davis. Interpolation and Approximation . Courier Corporation, 1975

work page 1975

[12] [12]

Performance of ﬁrst-order me thods for smooth convex minimization: a novel approach

Yoel Drori and Marc Teboulle. Performance of ﬁrst-order me thods for smooth convex minimization: a novel approach. Mathematical Programming, 145(1-2):451–482, 2014

work page 2014

[13] [13]

Complexity and global rates of trust-region methods based on probabilistic models

Serge Gratton, Cl´ ement W Royer, Lu ´ ıs N Vicente, and Zaikun Zhang. Complexity and global rates of trust-region methods based on probabilistic models. IMA Journal of Numerical Analysis , 38(3):1579– 1597, 2018

work page 2018

[14] [14]

Matrix Analysis

Roger A Horn and Charles R Johnson. Matrix Analysis . Cambridge university press, 2012

work page 2012

[15] [15]

Con vergence of the restricted nelder–mead algorithm in two dimensions

Jeﬀrey C Lagarias, Bjorn Poonen, and Margaret H Wright. Con vergence of the restricted nelder–mead algorithm in two dimensions. SIAM Journal on Optimization , 22(2):501–532, 2012

work page 2012

[16] [16]

Derivative-f ree optimization methods

Jeﬀrey Larson, Matt Menickelly, and Stefan M Wild. Derivative-f ree optimization methods. Acta Numerica, 28:287–404, 2019

work page 2019

[17] [17]

Convergence of the nelder–mead simplex meth od to a nonstationary point

Ken IM McKinnon. Convergence of the nelder–mead simplex meth od to a nonstationary point. SIAM Journal on optimization , 9(1):148–158, 1998

work page 1998

[18] [18]

A simplex method for function minim ization

John A Nelder and Roger Mead. A simplex method for function minim ization. The computer journal , 7(4):308–313, 1965

work page 1965

[19] [19]

Introductory Lectures on Convex Optimization: A Basic Cour se, volume 87

Y Nesterov. Introductory Lectures on Convex Optimization: A Basic Cour se, volume 87. Springer Science & Business Media, 2013

work page 2013

[20] [20]

A direct search optimization method that mode ls the objective and constraint functions by linear interpolation

Michael JD Powell. A direct search optimization method that mode ls the objective and constraint functions by linear interpolation. In Advances in optimization and numerical analysis , pages 51–67. Springer, 1994

work page 1994

[21] [21]

On the Lagrange functions of quadratic models that are deﬁned by interpolation

MJD Powell. On the Lagrange functions of quadratic models that are deﬁned by interpolation. Opti- mization Methods and Software , 16(1-4):289–309, 2001

work page 2001

[22] [22]

The calculus of simplex gradients

Rommel G Regis. The calculus of simplex gradients. Optimization Letters , 9:845–865, 2015

work page 2015

[23] [23]

Sequ ential application of simplex designs in optimisation and evolutionary operation

WGRFR Spendley, George R Hext, and Francis R Himsworth. Sequ ential application of simplex designs in optimisation and evolutionary operation. Technometrics, 4(4):441–461, 1962

work page 1962

[24] [24]

Optimal estimates for the linear interpolation error on simplices

Martin St¨ ampﬂe. Optimal estimates for the linear interpolation error on simplices. Journal of Approx- imation Theory, 103(1):78–90, 2000

work page 2000

[25] [25]

James Joseph Sylvester. Xix. a demonstration of the theorem that every homogeneous quadratic polyno- mial is reducible by real orthogonal substitutions to the form of a s um of positive and negative squares. The London, Edinburgh, and Dublin Philosophical Magazine a nd Journal of Science , 4(23):138–142, 1852

work page

[26] [26]

Exact worst-case performance of ﬁrst-order methods for composite convex optimization

Adrien B Taylor, Julien M Hendrickx, and Fran¸ cois Glineur. Exact worst-case performance of ﬁrst-order methods for composite convex optimization. SIAM Journal on Optimization , 27(3):1283–1313, 2017

work page 2017

[27] [27]

Smoot h strongly convex interpolation and exact worst-case performance of ﬁrst-order methods

Adrien B Taylor, Julien M Hendrickx, and Fran¸ cois Glineur. Smoot h strongly convex interpolation and exact worst-case performance of ﬁrst-order methods. Mathematical Programming, 161:307–345, 2017. 27

work page 2017

[28] [28]

Fortiﬁed-descent simplicial search method: A gen eral approach

Paul Tseng. Fortiﬁed-descent simplicial search method: A gen eral approach. SIAM Journal on Opti- mization, 10(1):269–288, 1999

work page 1999

[29] [29]

The error in linear interpolation at the vertices of a simplex

Shayne Waldron. The error in linear interpolation at the vertices of a simplex. SIAM Journal on Numerical Analysis, 35(3):1191–1200, 1998

work page 1998

[30] [30]

An interactive approach for solving multi-ob jective optimization problems

Daniel John Woods. An interactive approach for solving multi-ob jective optimization problems. 1985. 28

work page 1985