arxiv: 2605.09361 · v1 · submitted 2026-05-10 · 🧮 math.OC

Recognition: no theorem link

Newton Method for Soft Quadratic Surface Support Vector Machine with 0-1 Loss Function

Guoping Li , Wen Song

Authors on Pith no claims yet

Pith reviewed 2026-05-12 02:04 UTC · model grok-4.3

classification 🧮 math.OC

keywords 0-1 loss functionquadratic surface SVMNewton methodproximal operatornon-convex optimizationbinary classificationlocal quadratic convergencestationary equation

0 comments

The pith

A Newton method solves the non-convex L0/1-SQSSVM model and converges quadratically locally under the second-order sufficient condition.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes the L0/1-SQSSVM, a kernel-free soft quadratic surface support vector machine that uses the 0-1 loss for binary classification. This model is non-convex and discontinuous, prompting the derivation of first- and second-order optimality conditions. The authors construct a stationary equation from the proximal operator of the 0-1 loss function and design a Newton method to solve the resulting optimization problem. They prove local quadratic convergence of the iterates whenever the second-order sufficient condition holds at the solution. Experiments on artificial and benchmark datasets show the method attains higher classification accuracy at lower computational cost than existing solvers.

Core claim

We propose the L0/1-SQSSVM model and establish its first- and second-order optimality conditions. Using properties of the proximal operator of the 0-1 loss, we obtain a stationary equation and design a Newton method based on it. We prove that this Newton method has local quadratic convergence under the second-order sufficient condition. Numerical experience demonstrates higher classification accuracy with less CPU time than other state-of-the-art methods.

What carries the argument

The Newton method constructed from the stationary equation of the L0/1-SQSSVM model, obtained via the proximal operator of the 0-1 loss function, which enables iterative solution of the non-convex discontinuous problem.

If this is right

The non-convex L0/1-SQSSVM model becomes solvable by an iterative procedure that reaches quadratic speed near the optimum.
Binary classification tasks gain access to a kernel-free quadratic surface separator with the 0-1 loss.
Practical performance improves in accuracy and CPU time on both synthetic and real benchmark data relative to prior solvers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The stationary-equation approach using the proximal operator may extend to other non-convex discontinuous losses in support vector models.
Because the surface is quadratic and kernel-free, the method could scale more readily to high-dimensional data than kernel-based alternatives.
Checking the second-order sufficient condition numerically on new problems would indicate when quadratic convergence can be expected in practice.

Load-bearing premise

The second-order sufficient condition holds at the solution point of the L0/1-SQSSVM model.

What would settle it

A concrete instance where the Newton iterates exhibit only linear or slower convergence even though the second-order sufficient condition is satisfied at the limit point.

Figures

Figures reproduced from arXiv: 2605.09361 by Guoping Li, Wen Song.

**Figure 2.** Figure 2: 2d-convex data set with 100 points. -3 -2 -1 0 1 2 3 QSSVM0/1-ADMM -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Acc =1.00 nSV = 7 (a) -3 -2 -1 0 1 2 3 L 0/1-SQSSVM-Newton -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Acc = 1.00 nSV = 6 (b) -3 -2 -1 0 1 2 3 SVM-kernel -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Acc = 1.00 nSV = 6 (c) -3 -2 -1 0 1 2 3 QSSVM -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 Acc = 1.00 nSV = 9 (d) [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: 2d-convex data set with 100 points [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: The accuracy of L0/1-QSSVM versus the parameters λ and τ on artificial data sets [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

read the original abstract

A nonlinear kernel-free soft quadratic surface support vector machine model with 0-1 loss function ($L_{0/1}$-SQSSVM) is proposed for binary classification problems, which is non-convex discontinuous. We are devoted to establishing the first and the second-order optimality conditions for the $L_{0/1}$-SQSSVM. We establish a stationary equation using the properties of proximal operator of 0-1 loss function. We design a Newton method based on the stationary equation to solve $L_{0/1}$-SQSSVM model and prove that the Newton method has local quadratic convergence under the second-order sufficient condition. Numerical experience on artificial datasets and benchmark datasets demonstrate that the Newton method for $L_{0/1}$-SQSSVM achieves higher classification accuracy with less CPU time cost than other state-of-the-art methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Newton method for 0-1 loss quadratic surface SVM gives practical gains in experiments but ties its quadratic convergence claim to an unverified SOSC assumption.

read the letter

The core contribution is a Newton solver for the L_{0/1}-SQSSVM model that replaces the usual surrogate loss with the exact 0-1 loss on a quadratic surface. They build a stationary equation from the proximal operator of the 0-1 loss, then run Newton on that equation and prove local quadratic convergence when the second-order sufficient condition holds at the limit point. Experiments on artificial and benchmark data show higher accuracy and lower CPU time than several existing methods.

Referee Report

2 major / 2 minor

Summary. The paper proposes the L_{0/1}-SQSSVM model, a kernel-free soft quadratic surface SVM using the 0-1 loss for binary classification. It derives first- and second-order optimality conditions, constructs a stationary equation from the proximal operator of the 0-1 loss, develops a Newton method on this equation, and proves local quadratic convergence under the second-order sufficient condition (SOSC) at the limit point. Numerical tests on artificial and benchmark datasets claim superior accuracy and speed versus state-of-the-art solvers.

Significance. If the stationary equation is correctly derived and the Newton iteration is well-defined, the work supplies a concrete algorithmic framework for a non-convex discontinuous SVM variant together with a local convergence guarantee. The numerical evidence of improved accuracy and reduced CPU time, if reproducible, would indicate practical utility. However, the conditional nature of the quadratic rate limits the theoretical contribution unless the SOSC hypothesis is shown to hold generically or in the regimes tested.

major comments (2)

[Convergence analysis / Theorem on quadratic convergence] Convergence analysis (likely §4 or Theorem on local quadratic rate): local quadratic convergence is proved only under the assumption that the second-order sufficient condition holds at the stationary point obtained from the proximal-operator equation. For the non-convex discontinuous L_{0/1}-SQSSVM, no theorem, generic condition (e.g., strict complementarity or non-degeneracy), or numerical verification is supplied to establish that SOSC is satisfied at solutions of the stationary equation; the rate therefore remains conditional on an unverified hypothesis.
[Stationary equation / proximal-operator section] Stationary equation derivation (likely §3): while the proximal-operator characterization is invoked to obtain the stationary equation, the manuscript does not address whether every solution of this equation satisfies the first-order optimality conditions already derived earlier, nor does it discuss possible spurious stationary points introduced by the proximal mapping of the discontinuous 0-1 loss.

minor comments (2)

[Model formulation] Notation for the quadratic surface parameters and the soft-margin variables should be introduced once with explicit dimensions to avoid ambiguity when the Newton system is written.
[Numerical experiments] The numerical section would benefit from reporting the fraction of runs in which the SOSC was observed to hold (via Hessian eigenvalue check) at termination, even if only as a diagnostic.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the careful reading and constructive comments. Below we respond point by point to the major comments and indicate planned revisions.

read point-by-point responses

Referee: Convergence analysis (likely §4 or Theorem on local quadratic rate): local quadratic convergence is proved only under the assumption that the second-order sufficient condition holds at the stationary point obtained from the proximal-operator equation. For the non-convex discontinuous L_{0/1}-SQSSVM, no theorem, generic condition (e.g., strict complementarity or non-degeneracy), or numerical verification is supplied to establish that SOSC is satisfied at solutions of the stationary equation; the rate therefore remains conditional on an unverified hypothesis.

Authors: We agree that the local quadratic convergence result is conditional on the second-order sufficient condition (SOSC) at the limit point. Such an assumption is standard for Newton methods on non-convex and discontinuous problems, where unconditional rates are rarely available. In the revision we will add a remark in the convergence section together with numerical verification: we will report the observed convergence orders from the artificial and benchmark experiments, which consistently exhibit quadratic rates and thereby provide empirical support that SOSC holds in the tested regimes. A generic proof that SOSC is satisfied at every stationary point lies beyond the scope of the present work. revision: partial
Referee: Stationary equation derivation (likely §3): while the proximal-operator characterization is invoked to obtain the stationary equation, the manuscript does not address whether every solution of this equation satisfies the first-order optimality conditions already derived earlier, nor does it discuss possible spurious stationary points introduced by the proximal mapping of the discontinuous 0-1 loss.

Authors: The stationary equation is obtained by substituting the closed-form proximal operator of the 0-1 loss into the first-order optimality condition. By construction, solutions of the equation are stationary points of the original problem. We acknowledge that an explicit equivalence statement and a brief discussion of possible spurious points would strengthen the exposition. In the revised manuscript we will insert a short proposition in Section 3 establishing that every solution of the stationary equation satisfies both the first- and second-order optimality conditions derived earlier, and we will note that no spurious solutions were observed across all numerical instances. revision: yes

standing simulated objections not resolved

Proving that the second-order sufficient condition holds generically at every stationary point of the non-convex L_{0/1}-SQSSVM model.

Circularity Check

0 steps flagged

No circularity: derivation uses standard proximal and Newton analysis with explicit conditional convergence

full rationale

The paper constructs the L_{0/1}-SQSSVM model, derives first- and second-order optimality conditions, builds a stationary equation from proximal-operator properties of the 0-1 loss, and applies a standard Newton iteration to that equation. Local quadratic convergence is proven only under the explicit assumption of the second-order sufficient condition (SOSC) at the limit point. This is a conventional conditional result for Newton methods on nonsmooth problems and does not reduce any claimed prediction or result to the inputs by construction. No self-citations, fitted parameters presented as predictions, self-definitional steps, or ansatz smuggling appear in the chain. The derivation remains self-contained against external optimization benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard properties of the proximal operator for the 0-1 loss and on the second-order sufficient condition for quadratic convergence; both are treated as background assumptions rather than derived inside the paper.

axioms (2)

domain assumption Properties of the proximal operator of the 0-1 loss function allow construction of a stationary equation
Invoked to obtain the equation that the Newton method solves.
standard math Second-order sufficient condition implies local quadratic convergence of Newton iteration
Standard result from nonlinear optimization used to complete the convergence proof.

pith-pipeline@v0.9.0 · 5440 in / 1284 out tokens · 48085 ms · 2026-05-12T02:04:46.050999+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

45 extracted references · 45 canonical work pages

[1]

Support-vector networks,

C. Cortes and V . Vapnik, “Support-vector networks,”Mach Learn, vol. 20, no. 3, pp. 273–297, 1995

work page 1995
[2]

A tutorial on support vector machines for pattern recognition,

C. J. Burges, “A tutorial on support vector machines for pattern recognition,”Data Min Knowl Disc, vol. 2, no. 2, pp. 121–167, 1998

work page 1998
[3]

Kt. race: A kernel k-means procedure for classification,

C. Cifarelli, L. Nieddu, O. Seref, and P. M. Pardalos, “Kt. race: A kernel k-means procedure for classification,”Comput. Oper. Res, vol. 34, no. 10, pp. 3154–3161, 2007

work page 2007
[4]

Abello, P

J. Abello, P. M. Pardalos, and M. G. Resende,Handbook of massive data sets. New York: Springer, 2013, vol. 4

work page 2013
[5]

Predicting financial distress of chinese listed companies using rough set theory and support vector machine,

Y . Cao, G. Wan, and F. Wang, “Predicting financial distress of chinese listed companies using rough set theory and support vector machine,” Asia Pac J Opre Res, vol. 28, no. 01, pp. 95–109, 2011

work page 2011
[6]

An optimised support vector machine with ringed seal search algorithm for efficient text classification,

W. Sharif, I. Yanto, N. A. Samsudin, M. Deris, A. Khan, M. F. Mushtaq, and M. Ashraf, “An optimised support vector machine with ringed seal search algorithm for efficient text classification,”J. Eng. Sci. Technol., vol. 14, no. 3, pp. 1601–1613, 2019

work page 2019
[7]

Support vector machines for texture classification,

K. I. Kim, K. Jung, S. H. Park, and H. J. Kim, “Support vector machines for texture classification,”IEEE Trans. Pattern Anal. Mach. Intell, vol. 24, no. 11, pp. 1542–1550, 2002

work page 2002
[8]

A training algorithm for optimal margin classifiers,

B. E. Boser, I. M. Guyon, and V . N. Vapnik, “A training algorithm for optimal margin classifiers,” 1992, pp. 144–152

work page 1992
[9]

On the design of loss functions for classification: theory, robustness to outliers, and savageboost,

H. Masnadi Shirazi and N. Vasconcelos, “On the design of loss functions for classification: theory, robustness to outliers, and savageboost,”Proc. Int.Conf.Neural Inf. Process. Syst., vol. 21, pp. 1049–1056, 2008

work page 2008
[10]

Support vector machine classifier with pinball loss,

X. Huang, L. Shi, and J. A. K. Suykens, “Support vector machine classifier with pinball loss,”IEEE Trans. Pattern Anal. Mach. Intell, vol. 36, no. 5, pp. 984–997, 2014

work page 2014
[11]

Sparse support vector machine with pinball loss,

M. Tanveer, S. Sharma, R. Rastogi, and P. Anand, “Sparse support vector machine with pinball loss,”Trans. Emerging Telecommun. Technol., vol. 32, no. 2, p. e3820, 2021

work page 2021
[12]

Classification with a reject option using a hinge loss,

P. L. Bartlett and M. H. Wegkamp, “Classification with a reject option using a hinge loss,”J Mach Learn Res, vol. 9, pp. 1823–1840, 2008

work page 2008
[13]

Ramp loss least squares support vector machine,

D. Liu, Y . Shi, Y . Tian, and X. Huang, “Ramp loss least squares support vector machine,”J. Comput. Sci., vol. 14, pp. 61–68, 2016

work page 2016
[14]

Heuristic approaches for support vector machines with the ramp loss,

E. J. Carrizosa Priego, A. Nogales G ´omez, and M. D. Romero Morales, “Heuristic approaches for support vector machines with the ramp loss,” Optim Lett, vol. 8, p. 1125–1135, 2014

work page 2014
[15]

Least squares support vector machine classifiers,

J. A. Suykens and J. Vandewalle, “Least squares support vector machine classifiers,”Neural Process. Lett., vol. 9, no. 3, pp. 293–300, 1999

work page 1999
[16]

Kernel-free quadratic least squares twin support vector clustering,

Z. Gao, H. Fu, M. Huang, and J. Luo, “Kernel-free quadratic least squares twin support vector clustering,”Neural Computing and Appli- cations, vol. 37, no. 19, pp. 13 391–13 410, 2025

work page 2025
[17]

Support vector machines with the ramp loss and the hard margin loss,

J. P. Brooks, “Support vector machines with the ramp loss and the hard margin loss,”Oper. Res, vol. 59, no. 2, pp. 467–479, 2011

work page 2011
[18]

Robust support vector machines for classification with nonconvex and smooth losses,

Y . Feng, Y . Yang, X. Huang, S. Mehrkanoon, and J. A. Suykens, “Robust support vector machines for classification with nonconvex and smooth losses,”Neural Comput, vol. 28, no. 6, pp. 1217–1247, 2016

work page 2016
[19]

Vapnik,Statistical Learning Theory

V . Vapnik,Statistical Learning Theory. New York, NY: Wiley- Interscience, 1998

work page 1998
[20]

Sch ¨olkopf and A

B. Sch ¨olkopf and A. J. Smola,Learning with kernels: support vector machines, regularization, optimization, and beyond. Cambridge, MA, USA: MIT press, 2002

work page 2002
[21]

A nonlinear kernel svm classifier vial 0/1 soft-margin loss with classification performance,

J. Liu, L. Huang, Y . Shao, W. Chen, and C. Li, “A nonlinear kernel svm classifier vial 0/1 soft-margin loss with classification performance,”J. Comput. Appl. Math., vol. 437, p. 115471, 2024

work page 2024
[22]

Cristianini and J

N. Cristianini and J. Shawe-Taylor,An introduction to support vector machines and other kernel-based learning methods. Cambridge, UK: Cambridge university press, 2000

work page 2000
[23]

Quadratic kernel-free non-linear support vector machine,

I. Dagher, “Quadratic kernel-free non-linear support vector machine,”J. Global Optim., vol. 41, no. 1, pp. 15–30, 2008

work page 2008
[24]

Soft quadratic surface support vector machine for binary classification,

J. Luo, S. Fang, Z. Deng, and X. Guo, “Soft quadratic surface support vector machine for binary classification,”Asia Pac J Opre Res, vol. 33, no. 06, p. 1650046, 2016

work page 2016
[25]

Quadratic kernel-free least squares support vector machine for target diseases classification,

Y . Bai, X. Han, T. Chen, and H. Yu, “Quadratic kernel-free least squares support vector machine for target diseases classification,”J. Comb. Optim., vol. 30, no. 4, pp. 850–870, 2015

work page 2015
[26]

Quadratic hyper-surface kernel-free least squares support vector regression,

J. Ye, Z. Yang, and Z. Li, “Quadratic hyper-surface kernel-free least squares support vector regression,”Intell. Data Anal, vol. 25, no. 2, pp. 265–281, 2021

work page 2021
[27]

Quadratic hyper-surface kernel- free large margin distribution machine-based regression and its least- square form,

H. He, K. Wang, Y . Jiang, and H. Pei, “Quadratic hyper-surface kernel- free large margin distribution machine-based regression and its least- square form,”Mach Learn Sci Techn, vol. 5, no. 2, p. 025024, 2024

work page 2024
[28]

Quadratic surface support vector machine withl 1-norm regularization,

A. Mousavi, Z. Gao, L. Han, and A. Lim, “Quadratic surface support vector machine withl 1-norm regularization,”J. Ind. Manage. Optim., vol. 18, no. 3, pp. 1835–1861, 2022

work page 2022
[29]

Nonlinear kernel-free quadratic hyper- surface support vector machine with 0-1 loss function,

M. Wu, Z. Yang, and J. Ye, “Nonlinear kernel-free quadratic hyper- surface support vector machine with 0-1 loss function,” 2024

work page 2024
[30]

Sparse approximate solutions to linear systems,

B. K. Natarajan, “Sparse approximate solutions to linear systems,”SIAM J Comput, vol. 24, no. 2, pp. 227–234, 1995

work page 1995
[31]

On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems,

E. Amaldi and V . Kann, “On the approximability of minimizing nonzero variables or unsatisfied relations in linear systems,”Theor. Comput. Sci., vol. 209, no. 1-2, pp. 237–260, 1998

work page 1998
[32]

Support vector machine classifier vial 0/1 soft-margin loss,

H. Wang, Y . Shao, S. Zhou, C. Zhang, and N. Xiu, “Support vector machine classifier vial 0/1 soft-margin loss,”IEEE Trans. Pattern Anal. Mach. Intell, vol. 44, no. 10, pp. 7253–7265, 2022

work page 2022
[33]

Quadratic convergence of smoothing newton’s method for 0/1 loss optimization,

S. Zhou, L. Pan, N. Xiu, and H. Qi, “Quadratic convergence of smoothing newton’s method for 0/1 loss optimization,”SIAM J. Optim, vol. 31, no. 4, pp. 3184–3211, 2021

work page 2021
[34]

Zero-one composite optimization: Lyapunov exact penalty and a globally convergent inexact augmented lagrangian method,

P. Zhang, N. Xiu, and Z. Luo, “Zero-one composite optimization: Lyapunov exact penalty and a globally convergent inexact augmented lagrangian method,”Math. Oper. Res, vol. 49, no. 4, pp. 2602–2625, 2024

work page 2024
[35]

A variational approach to sparsity optimization based on lagrange multiplier theory,

K. Ito and K. Kunisch, “A variational approach to sparsity optimization based on lagrange multiplier theory,”Inverse Probl., vol. 30, no. 1, p. 015001, 2014

work page 2014
[36]

inalm: An inexact newton augmented lagrangian method for zero-one composite optimization,

P. Zhang, N. Xiu, and H.-D. Qi, “inalm: An inexact newton augmented lagrangian method for zero-one composite optimization,”arXiv preprint arXiv:2306.08991, 2023

work page arXiv 2023
[37]

Zero-one composite optimization: Lyapunov exact penalty and a globally convergent inexact augmented lagrangian method,

P. Zhang, N. Xiu, and Z. Ziyan Luo, “Zero-one composite optimization: Lyapunov exact penalty and a globally convergent inexact augmented lagrangian method,”Math. Oper. Res., vol. 49, no. 4, pp. 2049–2802, 2023

work page 2049
[38]

R. T. Rockafellar and R. J. B. Wets,Variational analysis. Berlin, Heidelberg: Springer, 2009, vol. 317

work page 2009
[39]

Twice epi-differentiability of extended-real-valued functions with applications in composite optimiza- tion,

A. Mohammadi and M. E. Sarabi, “Twice epi-differentiability of extended-real-valued functions with applications in composite optimiza- tion,”SIAM J. Optim., vol. 30, no. 3, pp. 3184–3211, 2020

work page 2020
[40]

Optimality conditions for zero- one composite optimization problems,

G. Li, J. Feng, C. Kan, and W. Song, “Optimality conditions for zero- one composite optimization problems,”J. Global Optim., vol. 93, no. 2, pp. 471–487, 2025

work page 2025
[41]

J. B. Hiriart-Urruty and C. Lemarechal,Fundatnentals of Convex Anal- ysis. Berlin, Heidelberg: Springer, 2001

work page 2001
[42]

B. S. Mordukhovich,Variational Analysis and Generalized Differenti- ation. I. Basic Theory, II. Applications.Berlin, Heidelberg: Springer, 2006, vol. 330

work page 2006
[43]

J. F. Bonnans and A. Shapiro,Perturbation Analysis of Optimization Problems. New York: Springer, 2000

work page 2000
[44]

G. H. Golub and C. F. Van Loan,Matrix computations. Philadelphia, PA: JHU press, 2013

work page 2013
[45]

L ¨utkepohl,Handbook of matrices

H. L ¨utkepohl,Handbook of matrices. New York: John Wiley & Sons, 1997. APPENDIX PROOF OFPROPOSITION3.1 From Lemma 2.1, we have thatgis regular at a given F(X ∗)∈R n and ∂pg(F(X ∗)) = ˆ∂g(F(X ∗)) =∂g(F(X ∗)) =∂ ∞g(F(X ∗)) ={v∈R n |v i ≥0, i∈ T ∗;v i = 0, i /∈ T∗}. (32) For everyv∈∂ ∞g(F(X ∗)) ={v∈R n |v i ≥0, i∈ T∗;v i = 0, i̸∈ T ∗}withA T v= 0, the Assum...

work page 1997