pith. sign in

arxiv: 2501.06350 · v2 · pith:HSPXSZ7Dnew · submitted 2025-01-10 · 🧮 math.OC

SMOP: Stochastic trust region method for multi-objective problems

Pith reviewed 2026-05-23 05:15 UTC · model grok-4.3

classification 🧮 math.OC
keywords multi-objective optimizationtrust region methodsstochastic optimizationPareto critical pointsprobabilistic modelsnoisy functionsconvergence analysismachine learning
0
0 comments X

The pith

A stochastic trust region method converges almost surely to Pareto critical points for noisy multi-objective problems.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an algorithm for multi-objective optimization that uses trust regions with probabilistic models to manage noise and inaccurate evaluations. Each objective is approximated by a probabilistically fully linear model, and these are combined using the max operator to approximate the nonsmooth scalarized objective. The authors establish almost sure convergence of the iterates to a Pareto critical point. This approach is relevant for problems like training fair machine learning models where multiple objectives and data noise are common.

Core claim

The proposed algorithm approximates each function in the multi-objective problem with a probabilistically fully linear model. This yields a composite model via the max operator that satisfactorily approximates the nonsmooth scalarized objective. The method is proven to converge almost surely to a Pareto critical point.

What carries the argument

The composite model obtained by applying the max operator to the probabilistically fully linear models of each objective function.

If this is right

  • The algorithm handles noisy function and gradient information in multi-objective settings.
  • It guarantees convergence to points where no direction simultaneously improves all objectives.
  • Numerical experiments show it competes with other stochastic multi-objective solvers on test problems.
  • It applies to supervised machine learning tasks such as training non-discriminatory logistic regression models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The probabilistic modeling technique might extend to other scalarization methods if the approximation properties are preserved.
  • Similar ideas could improve single-objective stochastic trust region methods under noise.

Load-bearing premise

Each objective function admits a probabilistically fully linear model whose max-based combination provides a satisfactory approximation to the scalarized objective.

What would settle it

Constructing a noisy multi-objective problem where the probabilistic models do not satisfy the fully linear property sufficiently often and checking whether the algorithm's iterates fail to approach a Pareto critical point.

Figures

Figures reproduced from arXiv: 2501.06350 by Luka Rute\v{s}i\'c, Nata\v{s}a Kreji\'c, Nata\v{s}a Krklec Jerinki\'c.

Figure 1
Figure 1. Figure 1: Pareto front for problem (51) for different levels of noise [PITH_FULL_IMAGE:figures/full_fig_p023_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pareto front for problem (52) for different levels of noise [PITH_FULL_IMAGE:figures/full_fig_p024_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Comparison of SMOP with DMOP [30] and SMG [21], in terms [PITH_FULL_IMAGE:figures/full_fig_p026_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Algorithm comparison. Model parameters: x0 = (0, .., 0), kmax = 150, θ = 0.01, γ1 = 0.5, γ2 = 2, η1 = 0.25 5.5 Finding Pareto front As previously mentioned, the goal of the SMOP algorithm is to identify a Pareto critical point. However, in some cases, we may wish to better under￾stand the structure of the entire Pareto front, or even find a specific optimal point within it. Hence, approximating the Pareto … view at source ↗
Figure 5
Figure 5. Figure 5: Pareto front for logistic regression (left), and test example 2 (right) [PITH_FULL_IMAGE:figures/full_fig_p028_5.png] view at source ↗
read the original abstract

The problem we consider is a multi-objective optimization problem, in which the goal is to find an optimal value of a vector function representing various criteria. The aim of this work is to develop an algorithm which utilizes the trust region framework with probabilistic model functions, able to cope with noisy problems, using inaccurate functions and gradients. The key novelty is approximation of each function in the multiobjective problem with probabilistically fully linear model which yields the composite model defined by max operator as a satisfactory approximation for the nonsmooth scalarized objective function. We prove the almost sure convergence of the proposed algorithm to a Pareto critical point. Numerical results demonstrate effectiveness of the probabilistic trust region by comparing it to competitive stochastic multi-objective solvers. The application in supervised machine learning is showcased by training non discriminatory Logistic Regression models on different size data groups. Additionally, we use several test examples with irregularly shaped fronts to exhibit the efficiency of the algorithm

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes SMOP, a stochastic trust-region algorithm for multi-objective optimization that approximates each objective with a probabilistically fully linear model; these are combined via the max operator to form a model for the nonsmooth scalarized objective. The central claim is an almost-sure convergence proof to a Pareto critical point under noisy or inaccurate function/gradient information. Numerical results compare the method against other stochastic multi-objective solvers on test problems with irregular Pareto fronts and demonstrate an application to training non-discriminatory logistic regression models.

Significance. If the convergence result is rigorously supported, the work would extend trust-region methods to the stochastic multi-objective setting with explicit probabilistic modeling of the composite nonsmooth scalarization, which is a non-trivial step. The numerical experiments and ML application provide practical context. Credit is due for targeting the composite-model issue under the max operator rather than avoiding it.

major comments (2)
  1. [modeling assumption and convergence proof] The key modeling step (abstract and likely §3): the assertion that probabilistically fully linear models for each objective yield a probabilistically fully linear composite under the max operator for the nonsmooth scalarized objective is load-bearing for the a.s. convergence theorem. Because the max is nonsmooth, pointwise error bounds and success probabilities do not transfer automatically; an explicit uniform bound, active-set argument, or separate probability estimate is required. Without this derivation the central claim rests on an unverified transfer.
  2. [convergence theorem] Convergence theorem (likely Theorem 4.1 or equivalent): the almost-sure convergence statement to a Pareto critical point depends on the composite model satisfying the fully-linear property with positive probability bounded away from zero. The paper must state the precise constants and probability lower bound for the composite; if these are only asserted rather than derived from the per-objective assumptions, the result is not yet supported.
minor comments (2)
  1. [§2] Notation for the scalarized objective and the max-composite model should be introduced with explicit definitions early in §2 to avoid ambiguity when reading the modeling and convergence sections.
  2. [numerical experiments] Numerical section: clarify how the probabilistic models are constructed in practice (e.g., sampling strategy or noise model) so that the experiments can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful and constructive report. The two major comments correctly identify that the transfer of the probabilistically fully linear property through the max operator requires an explicit argument; we will supply this derivation in the revision.

read point-by-point responses
  1. Referee: [modeling assumption and convergence proof] The key modeling step (abstract and likely §3): the assertion that probabilistically fully linear models for each objective yield a probabilistically fully linear composite under the max operator for the nonsmooth scalarized objective is load-bearing for the a.s. convergence theorem. Because the max is nonsmooth, pointwise error bounds and success probabilities do not transfer automatically; an explicit uniform bound, active-set argument, or separate probability estimate is required. Without this derivation the central claim rests on an unverified transfer.

    Authors: We agree that the nonsmoothness of the max operator prevents automatic transfer and that an explicit argument is required. The manuscript asserts the composite property but does not contain the detailed derivation. In the revision we will insert a new lemma that (i) identifies the active set of objectives at the current point, (ii) obtains a uniform error bound over that set, and (iii) derives a strictly positive lower bound on the success probability of the composite model directly from the per-objective probabilities. This lemma will be placed immediately before the convergence theorem. revision: yes

  2. Referee: [convergence theorem] Convergence theorem (likely Theorem 4.1 or equivalent): the almost-sure convergence statement to a Pareto critical point depends on the composite model satisfying the fully-linear property with positive probability bounded away from zero. The paper must state the precise constants and probability lower bound for the composite; if these are only asserted rather than derived from the per-objective assumptions, the result is not yet supported.

    Authors: We will revise the statement of the convergence theorem to display the explicit lower bound on the success probability of the composite model and the associated constants (expressed in terms of the individual model parameters and the number of objectives). The proof will invoke the new lemma described above to obtain the bound from the per-objective assumptions, thereby removing any unsupported assertion. revision: yes

Circularity Check

0 steps flagged

No circularity; convergence proof is self-contained under stated modeling assumptions

full rationale

The paper states a novel algorithm that approximates each objective with a probabilistically fully linear model, forms a composite via the max operator for the scalarized nonsmooth objective, and proves almost-sure convergence to a Pareto critical point. No quoted step reduces a claimed prediction or theorem to a fitted parameter, self-definition, or load-bearing self-citation chain; the derivation rests on explicit probabilistic model assumptions and standard trust-region analysis that are independent of the target result. The central claim therefore does not collapse to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the existence of probabilistically fully linear models for each objective and on standard assumptions of stochastic trust-region analysis; no free parameters or invented entities are visible in the abstract.

axioms (1)
  • domain assumption Each objective admits a probabilistically fully linear model whose composite via the max operator approximates the nonsmooth scalarized function.
    This modeling assumption is invoked to justify both the algorithm and the convergence analysis.

pith-pipeline@v0.9.0 · 5702 in / 1172 out tokens · 41783 ms · 2026-05-23T05:15:56.691403+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages · 1 internal anchor

  1. [1]

    & Vicente, L.N

    Bandeira, A.S., Scheinberg, K. & Vicente, L.N. (2014) Con- vergence of trust-region methods based on probabilistic models, SIAM Journal on Optimization, 24(3), 1238-1264

  2. [2]

    & Narayanan, A

    Barocas, S., Hardt, M. & Narayanan, A. (2017) Fairness in ma- chine learning. NIPS Tutorial, 1

  3. [3]

    & Morini, B

    Bellavia, S., Kreji ´c, N. & Morini, B. (2020) Inexact restora- tion with subsampled trust-region methods for finite-sum minimization. Comput Optim Appl 76, 701–736

  4. [4]

    & Peitz, S

    Berkemeier, M. & Peitz, S. (2021) Derivative-Free Multiobjective Trust Region Descent Method Using Radial Basis Function Surrogate Models, Math. Comput. Appl., 26, 31

  5. [5]

    S., Bollapragada R

    Berahas A. S., Bollapragada R. & Nocedal J. (2020) An In- vestigation of Newton-Sketch and Subsampled Newton Methods, Opti- mization Methods and Software, 35(4), 661–680

  6. [6]

    & Scheinberg, K

    Blanchet, J., Cartis, C., Menickelly, M. & Scheinberg, K. (2019) Convergence Rate Analysis of a Stochastic Trust-Region Method via Supermartingales, INFORMS Journal on Optimization, 1(2), 92- 119

  7. [7]

    (1998) Covertype [Dataset]

    Blackard, J. (1998) Covertype [Dataset]. UCI Machine Learning Repository

  8. [8]

    & Nocedal, J

    Bollapragada, R., Byrd, R. & Nocedal, J. (2019) Exact and Inexact Subsampled Newton Methods for Optimization, IMA Journal of Numerical Analysis, 39(20), 545-578

  9. [9]

    (2018) Optimization Meth- ods for LargeScale Machine Learning, SIAM Review, 60(2), 223-311

    Bottou, L., Curtis F.E., Nocedal, J. (2018) Optimization Meth- ods for LargeScale Machine Learning, SIAM Review, 60(2), 223-311. 29

  10. [10]

    & Singer, Y

    Byrd, R.H., Hansen, S.L., Nocedal, J. & Singer, Y. (2016) A Stochastic QuasiNewton Method for Large-Scale Optimization, SIAM Journal on Optimization, 26(2), 1008-1021

  11. [11]

    (2012) Sample size selection in optimization methods for machine learning, Mathematical Programming, 134(1), 127-155

    Byrd, R.H., Chin, G.M., Nocedal, J.& Wu, Y. (2012) Sample size selection in optimization methods for machine learning, Mathematical Programming, 134(1), 127-155

  12. [12]

    & Scheinberg, K

    Chen, R., Menickelly, M. & Scheinberg, K. (2018) Stochastic optimization using a trust-region method and random models, Math. Program., 169, 447–487

  13. [13]

    Curtis, F.E., Scheinberg, K. & Shi R. (2018) A Stochastic Trust Region Algorithm Based on Careful Step Normalization, INFORMS Journal on Optimization, 1(3), 200-220

  14. [14]

    & Vicente, L

    Custodio, A.L., Madeira, J.A., Vaz, A.I.F. & Vicente, L. N. (2011) Direct multisearch for multiobjective optimization.SIAM J. Op- tim, 21, 1109-1140

  15. [15]

    (2010) Probability: Theory and Examples

    Durrett, R. (2010) Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathemat- ics. Cambridge Uni- versity Press, Cambridge, fourth edition

  16. [16]

    & Svaiter, B.F

    Fliege, J. & Svaiter, B.F. (2000) Steepest Descent Methods for Multicriteria Optimization, Mathematical Methods of Operations Re- search, 51, 479-494

  17. [17]

    Fukuda, E. H. & Drummond, L. M. G. (2014) A survey on multi- objective descent methods. Pesq. Oper. 34 (3)

  18. [18]

    & Srebro, N

    Hardt, M., Price, E. & Srebro, N. (2016) Equality of opportu- nity in supervised learning. Advances in neural information processing systems, 3315-3323

  19. [19]

    & Detrano, R

    Janosi, A., Steinbrunn, W., Pfisterer, M. & Detrano, R. (1989) Heart Disease [Dataset]. UCI Machine Learning Repository

  20. [20]

    & Yousefi, M

    Kreji´c, N., Krklec Jerinki ´c, N., Mart´ınez, A. & Yousefi, M. (2024) A non-monotone trust-region method with noisy oracles and additional sampling. Comput Optim Appl 89, 247–278

  21. [21]

    & Vicente, L.N

    Liu, S. & Vicente, L.N. (2024) The stochastic multi-gradient algo- rithm for multi-objective optimization and its application to supervised machine learning, Ann Oper Res, 339, 1119–1148

  22. [22]

    & Monro, S

    Robbins, H. & Monro, S. (2011) A Stochastic Approximation Method SIAM J. Optim, 21, 1109-1140. 30

  23. [23]

    & Tanino, T

    Sawaragi, Y., Nakayama, H. & Tanino, T. (1985) Theory of mul- tiobjective optimization, Elsevier, MR 807529

  24. [24]

    & Siegmund, D

    Robbins, H. & Siegmund, D. (1971) A convergence theorem for non negative almost supermartingales and some applications, Optimizing Methods in Statistics, 233-257

  25. [25]

    Sub-Sampled Newton Methods I: Globally Convergent Algorithms

    Roosta-Khorasani, F. & Mahoney, M. W. (2016) Sub-Sampled Newton Methods I: Globally Convergent Algorithms, arXiv:1601.04737

  26. [26]

    & Yamashita, N.(2019) Proximal gra- dient methods for multiobjective optimization and their applications Comput

    Tanabe, H., Fukuda, E.H. & Yamashita, N.(2019) Proximal gra- dient methods for multiobjective optimization and their applications Comput. Optim. Appl. 72, 339–361

  27. [27]

    & Eichfelder, G

    Thomann, J. & Eichfelder, G. (2019) A Trust-Region Algorithm for Heterogeneous Multiobjective Optimization, SIAM Journal on Op- timization, 29, 1017 - 1047

  28. [28]

    & Eichfelder, G

    Thomann, J. & Eichfelder, G. (2019) Numerical results for the multiobjective trust region algorithm MHT, Data in Brief, 25

  29. [29]

    & Jordan, M.I

    Tripuraneni, N., Stern, M., Jin, C., Regier, J. & Jordan, M.I. (2018) Stochastic Cubic Regularization for Fast Nonconvex Optimiza- tion Advances in Neural Information Processing Systems 31

  30. [30]

    & Soubeyran, A, (2014) A Trust-Region Method for Unconstrained Multiobjective Problems with Applications in Satisficing Processes, J Optim Theory Appl 160, 865–889

    Villacorta, K.D.V., Oliveira, P.R. & Soubeyran, A, (2014) A Trust-Region Method for Unconstrained Multiobjective Problems with Applications in Satisficing Processes, J Optim Theory Appl 160, 865–889

  31. [31]

    & Sre- bro, N

    Woodworth, B., Gunasekar, S., Ohannessian, M.I. & Sre- bro, N. (2017) Learning non-discriminatory predictors. Conference on Learning Theory, 1920-1953

  32. [32]

    & Gummadi, K.P

    Zafar, M.B., Valera, I., Gomez Rodriguez, M. & Gummadi, K.P. (2017) Fairness constraints: Mechanisms for fair classiffication. Artifficial Intelligence and Statistics, 962-970

  33. [33]

    & Dwork, C

    Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. (2013) Learning fair representations. International Conference on Ma- chine Learning, 325-333. 31