pith. sign in

arxiv: 2605.01446 · v2 · submitted 2026-05-02 · 🧮 math.NA · cs.NA· stat.ML

Sequential Minimal Optimization for varepsilon-SVR with MAPE Loss and Sample-Dependent Box Constraints

Pith reviewed 2026-05-11 00:42 UTC · model grok-4.3

classification 🧮 math.NA cs.NAstat.ML
keywords Sequential Minimal Optimizationε-SVRMAPE losssample-dependent box constraintsquadratic programmingsupport vector regressionshrinking heuristicsymmetric kernel
0
0 comments X

The pith

SMO for ε-SVR with MAPE loss modifies only feasibility sets and clipping bounds while keeping curvature and gradient updates identical to the standard algorithm.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to adapt the Sequential Minimal Optimization solver to an ε-support vector regression formulation that directly minimizes mean absolute percentage error. The central change is that the box constraints on the dual variables become sample-dependent, specifically α_k and α_k* lying in [0, 100C/y_k]. This alteration touches only the working-set feasibility sets Iup and Idown plus the analytic clipping bounds used in the two-variable subproblem. All other components of the SMO procedure, including the curvature formula and the gradient maintenance rule, remain structurally unchanged from the classical version. The same modified solver also covers the symmetric-kernel variant after a simple matrix replacement, and numerical checks against a general interior-point quadratic program solver confirm matching solutions to solver tolerance.

Core claim

The key structural difference from standard ε-SVR is that the box constraints become sample-dependent: α_k, α_k^* ∈ [0, 100C/y_k]. We show that this modification affects only (i) the feasibility sets Iup and Idown in the working-set selection and (ii) the clipping bounds in the analytic two-variable update, while leaving the curvature formula and gradient update structurally identical to the standard SMO. A shrinking heuristic adapted to the sample-dependent bounds is derived and shown to introduce an asymmetry between α- and α^*-variables controlled by the gap 2y_k ε/100. The same solver applies to the symmetric-kernel variant by replacing Ω with Ω_s = 1/2(Ω + aΩ^*).

What carries the argument

Sample-dependent box constraints α_k, α_k^* ∈ [0, 100C/y_k] that arise from the MAPE loss and require only local adjustments to the Iup/Idown sets and clipping bounds inside an otherwise unchanged SMO loop.

If this is right

  • The identical curvature and gradient formulas allow reuse of existing SMO code with only localized changes to working-set selection and the two-variable analytic step.
  • A shrinking heuristic can be applied with an asymmetry between α- and α^*-variables governed by the gap 2y_k ε/100.
  • The solver extends immediately to the symmetric-kernel variant (m2) by the matrix replacement Ω_s = 1/2(Ω + aΩ^*).
  • Numerical agreement to within solver tolerance holds across ten synthetic configurations spanning both kernel variants and symmetry types.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same localized modification pattern may carry over to other percentage-error losses inside the broader SVR family, such as least-squares MAPE variants.
  • Because relative-error emphasis is common in forecasting, an efficient direct MAPE solver could reduce the need for post-hoc scaling or custom loss tuning in time-series applications.
  • The documented asymmetry in the shrinking rule suggests that convergence speed may differ between positive and negative residual regimes when y_k varies widely.

Load-bearing premise

The MAPE-modified dual remains a convex quadratic program whose only structural change is the sample-dependent upper bounds on the multipliers, with no hidden effects on the Hessian or gradient that would invalidate the standard SMO update formulas.

What would settle it

Running the modified SMO and an interior-point reference QP solver on the same MAPE dual problem and finding optimal objective values or KKT residuals that differ by more than the declared termination tolerance.

Figures

Figures reproduced from arXiv: 2605.01446 by Juan Diego S\'anchez-Torres, Pablo Benavides-Herrera, Riemann Ruiz-Cruz.

Figure 1
Figure 1. Figure 1: KKT violation ∆ = τi ∗−τj ∗ as a function of iteration for two representative configurations. Left: C1 (N = 50, σ = 0.1, ε = 5%, 76 iterations); right: C8 (N = 300, σ = 2.0, ε = 10%, 33,048 iterations). The red dashed line marks the target-scaled stopping threshold ∆ ≤ εtol · y¯ (εtol = 10−3 ); the dotted vertical line marks the convergence iteration. The transient spike visible in C8 near iteration 27,000… view at source ↗
read the original abstract

We derive a Sequential Minimal Optimization (SMO) algorithm for the quadratic dual problem arising from $\varepsilon$-SVR~\cite{Vapnik1995, Drucker1997, Smola2004} modified to minimize the Mean Absolute Percentage Error (MAPE)~\cite{Makridakis1993, Hyndman2006} directly in the loss function~\cite{benavides2025support}. This formulation is part of a broader family of SVR models with percentage-error losses that also includes least-squares variants~\cite{Suykens2002} and symmetric-kernel extensions~\cite{Espinoza2005}, whose unified structure is studied in~\cite{benavides2026unified}. The key structural difference from standard $\varepsilon$-SVR is that the box constraints become \emph{sample-dependent}: $\alpha_k, \alpha_k^* \in [0,\, 100C/y_k]$. We show that this modification affects only (i) the feasibility sets $\Iup$ and $\Idown$ in the working-set selection and (ii) the clipping bounds in the analytic two-variable update, while leaving the curvature formula and gradient update structurally identical to the standard SMO~\cite{Platt1998, Platt1999, Fan2005}. A shrinking heuristic adapted to the sample-dependent bounds is derived and shown to introduce an asymmetry between $\alpha$- and $\alpha^*$-variables controlled by the gap $2y_k\varepsilon/100$. The same solver applies to the symmetric-kernel variant (m2) by replacing $\Omega$ with $\Omega_s = \tfrac{1}{2}(\Omega + a\Omega^*)$~\cite{Espinoza2005}. Numerical validation against an interior-point QP reference solver confirms solution agreement to within solver termination tolerance across ten synthetic configurations spanning both kernel variants and symmetry types. An implementation is available in the open-source \texttt{psvr} R package~\cite{BenavidesHerrera2026Rpsvr}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript derives a Sequential Minimal Optimization (SMO) solver for the dual quadratic program of ε-SVR with MAPE loss. The key innovation is handling sample-dependent box constraints α_k, α_k^* ∈ [0, 100C/y_k]. The authors show that this only modifies the working-set selection sets I_up and I_down and the clipping in the two-variable analytic solution, while the curvature η = K_ii + K_jj - 2K_ij and gradient maintenance remain unchanged from standard SMO. They derive an adapted shrinking heuristic and validate the solver numerically against an interior-point method on ten synthetic datasets for both kernel variants.

Significance. If the central claim holds, this provides a practical and efficient implementation for MAPE-SVR, which is useful in applications where percentage errors are preferred. The availability of open-source code in the psvr R package is a strength. The work fits within the numerical analysis of optimization algorithms for machine learning models and extends prior SVR variants.

major comments (2)
  1. The central claim that the Hessian (kernel matrix with sign pattern) and linear term are unchanged by the MAPE modification is load-bearing for the entire SMO adaptation. The manuscript states this follows from per-sample weighting of the absolute ε-insensitive loss, but an explicit expansion of the dual objective (showing the quadratic term is identical to standard ε-SVR) should be added to the derivation section to remove any ambiguity.
  2. Numerical validation section: agreement to within solver termination tolerance is reported across ten synthetic configurations, but without tabulated maximum or average deviations, or sensitivity to C and ε values, it is difficult to assess whether the adapted clipping and shrinking rules introduce any systematic bias.
minor comments (2)
  1. Abstract: the citation format for the R package (BenavidesHerrera2026Rpsvr) should be standardized with the other references.
  2. The description of the symmetric-kernel variant (m2) via Ω_s = 1/2(Ω + aΩ*) is clear, but a brief reminder of how the sign pattern in the Hessian changes for this case would aid readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the positive assessment, the recommendation for minor revision, and the constructive comments. We address each major comment below.

read point-by-point responses
  1. Referee: The central claim that the Hessian (kernel matrix with sign pattern) and linear term are unchanged by the MAPE modification is load-bearing for the entire SMO adaptation. The manuscript states this follows from per-sample weighting of the absolute ε-insensitive loss, but an explicit expansion of the dual objective (showing the quadratic term is identical to standard ε-SVR) should be added to the derivation section to remove any ambiguity.

    Authors: We agree that an explicit expansion strengthens the presentation and removes ambiguity. The MAPE formulation weights the ε-insensitive loss by 100/y_k in the primal; the resulting dual quadratic term is identical to standard ε-SVR, (1/2)∑∑(α_i−α_i*)(α_j−α_j*)K(x_i,x_j), while the linear term and box constraints incorporate the per-sample weights. We will insert this derivation in the revised Section 2. revision: yes

  2. Referee: Numerical validation section: agreement to within solver termination tolerance is reported across ten synthetic configurations, but without tabulated maximum or average deviations, or sensitivity to C and ε values, it is difficult to assess whether the adapted clipping and shrinking rules introduce any systematic bias.

    Authors: We acknowledge the value of additional quantitative detail. In the revision we will add a table reporting maximum and average absolute deviations (for both α and α*) between the SMO and interior-point solutions across all ten datasets. We will also include a sensitivity study for representative ranges of C and ε. The open-source psvr package already permits independent verification. revision: yes

Circularity Check

0 steps flagged

Derivation self-contained; no circularity in SMO adaptation

full rationale

The paper states the MAPE-SVR dual from prior self-citations but then derives the SMO updates directly from the standard Platt/Fan structure. The central claim (only I_up/I_down and clipping bounds change; curvature η = K_ii + K_jj − 2K_ij and gradient g ← g + Δα · K_column remain identical) follows immediately from the dual's explicit form (Hessian and linear term unchanged except for sample-dependent [0, 100C/y_k] bounds) and is confirmed by numerical agreement with an independent interior-point QP solver on ten synthetic instances. No equation reduces to a fit, no uniqueness theorem is imported from self-work, and no ansatz is smuggled; the self-citations supply only the problem statement, not the algorithmic steps.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The derivation rests on the convexity of the MAPE dual (inherited from standard SVR) and on the correctness of the standard SMO two-variable analytic solution when bounds are adjusted. No new entities are postulated.

free parameters (2)
  • C
    Regularization parameter that scales the sample-dependent upper bound 100C/y_k.
  • epsilon
    Tube width parameter appearing in the loss and in the shrinking asymmetry 2 y_k epsilon / 100.
axioms (2)
  • domain assumption The MAPE loss yields a convex quadratic dual problem whose Hessian is identical to standard epsilon-SVR.
    Invoked when claiming that only feasibility sets and clipping bounds change.
  • standard math The two-variable subproblem remains analytically solvable after bound adjustment.
    Standard SMO analytic update is reused without re-derivation.

pith-pipeline@v0.9.0 · 5690 in / 1436 out tokens · 35926 ms · 2026-05-11T00:42:51.164428+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages

  1. [1]

    V. N. Vapnik,The Nature of Statistical Learning Theory. New York, NY: Springer-Verlag, 1995

  2. [2]

    Support vector regression machines,

    H. Drucker, C. J. C. Burges, L. Kaufman, A. J. Smola, and V. N. Vapnik, “Support vector regression machines,” inAdvances in Neural Information Processing Systems 9 (NIPS 1996), 14 M. C. Mozer, M. I. Jordan, and T. Petsche, Eds. Cambridge, MA: MIT Press, 1997, pp. 155–161

  3. [3]

    A tutorial on support vector regression,

    A. J. Smola and B. Sch¨ olkopf, “A tutorial on support vector regression,”Statistics and Computing, vol. 14, no. 3, pp. 199–222, 2004

  4. [4]

    Accuracy measures: Theoretical and practical concerns,

    S. Makridakis, “Accuracy measures: Theoretical and practical concerns,”International Journal of Forecasting, vol. 9, no. 4, pp. 527–529, 1993

  5. [5]

    Another look at measures of forecast accuracy,

    R. J. Hyndman and A. B. Koehler, “Another look at measures of forecast accuracy,” International Journal of Forecasting, vol. 22, no. 4, pp. 679–688, 2006

  6. [6]

    Support vector regression under percentage-error loss,

    P. Benavides-Herrera, S. Rodr´ ıguez-Reyes, G. ´Alvarez-´Alvarez, R. Ruiz-Cruz, and J. D. S´ anchez-Torres, “Support vector regression under percentage-error loss,” in2025 22nd International Conference on Electrical Engineering, Computing Science and Automatic Control (CCE). IEEE, 2025, pp. 1–5

  7. [7]

    J. A. K. Suykens, T. Van Gestel, J. De Brabanter, B. De Moor, and J. Vandewalle,Least Squares Support Vector Machines. Singapore: World Scientific, 2002

  8. [8]

    Imposing symmetry in least squares support vector machines regression,

    M. Espinoza, J. A. K. Suykens, and B. De Moor, “Imposing symmetry in least squares support vector machines regression,” inProceedings of the 44th IEEE Conference on Decision and Control (CDC 2005). Seville, Spain: IEEE, 2005, pp. 5716–5721

  9. [9]

    A unified framework for support vector regression with percentage-error loss functions,

    P. Benavides-Herrera, G. ´Alvarez-´Alvarez, R. Ruiz-Cruz, and J. D. S´ anchez-Torres, “A unified framework for support vector regression with percentage-error loss functions,”Mathematics, 2026, under review

  10. [10]

    Sequential minimal optimization: A fast algorithm for training support vector machines,

    J. C. Platt, “Sequential minimal optimization: A fast algorithm for training support vector machines,” Microsoft Research, Tech. Rep. MSR-TR-98- 14, 1998. [Online]. Available: https://www.microsoft.com/en-us/research/publication/ %sequential-minimal-optimization-a-fast-algorithm-for-%training-support-vector-machines/

  11. [11]

    Fast training of support vector machines using sequential minimal optimization,

    ——, “Fast training of support vector machines using sequential minimal optimization,” in Advances in Kernel Methods: Support Vector Learning, B. Sch¨ olkopf, C. J. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press, 1999, pp. 185–208

  12. [12]

    Working set selection using second order information for training support vector machines,

    R.-E. Fan, P.-H. Chen, and C.-J. Lin, “Working set selection using second order information for training support vector machines,”Journal of Machine Learning Research, vol. 6, pp. 1889–1918, 2005. [Online]. Available: https://jmlr.org/papers/v6/fan05a.html

  13. [13]

    Benavides-Herrera,psvr: Percentage-Error Support Vector Regression, 2026

    P. Benavides-Herrera,psvr: Percentage-Error Support Vector Regression, 2026. [Online]. Available: https://doi.org/10.5281/zenodo.19935781

  14. [14]

    LIBSVM: A library for support vector machines,

    C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector machines,”ACM Transactions on Intelligent Systems and Technology, vol. 2, no. 3, pp. 27:1–27:27, 2011, software available at http://www.csie.ntu.edu.tw/∼cjlin/libsvm

  15. [15]

    OSQP: An operator splitting solver for quadratic programs,

    B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd, “OSQP: An operator splitting solver for quadratic programs,”Mathematical Programming Computation, vol. 12, no. 4, pp. 637–672, 2020. 15

  16. [16]

    Efficient SVM regression training with SMO,

    G. W. Flake and S. Lawrence, “Efficient SVM regression training with SMO,”Machine Learning, vol. 46, no. 1–3, pp. 271–290, 2002

  17. [17]

    Making large-scale SVM learning practical,

    T. Joachims, “Making large-scale SVM learning practical,” inAdvances in Kernel Methods — Support Vector Learning, B. Sch¨ olkopf, C. J. C. Burges, and A. J. Smola, Eds. Cambridge, MA: MIT Press, 1999, pp. 169–184

  18. [18]

    A new asymmetricε-insensitive pinball loss function based support vector quantile regression model,

    P. Anand, R. Rastogi, and S. Chandra, “A new asymmetricε-insensitive pinball loss function based support vector quantile regression model,”Applied Soft Computing, vol. 94, p. 106478, 2020

  19. [19]

    V. N. Vapnik,Statistical Learning Theory. New York, NY: Wiley-Interscience, 1998

  20. [20]

    Boyd and L

    S. Boyd and L. Vandenberghe,Convex Optimization. Cambridge, UK: Cambridge University Press, 2004. [Online]. Available: https://web.stanford.edu/ ∼boyd/cvxbook/ 16