SMOP: Stochastic trust region method for multi-objective problems
Pith reviewed 2026-05-23 05:15 UTC · model grok-4.3
The pith
A stochastic trust region method converges almost surely to Pareto critical points for noisy multi-objective problems.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed algorithm approximates each function in the multi-objective problem with a probabilistically fully linear model. This yields a composite model via the max operator that satisfactorily approximates the nonsmooth scalarized objective. The method is proven to converge almost surely to a Pareto critical point.
What carries the argument
The composite model obtained by applying the max operator to the probabilistically fully linear models of each objective function.
If this is right
- The algorithm handles noisy function and gradient information in multi-objective settings.
- It guarantees convergence to points where no direction simultaneously improves all objectives.
- Numerical experiments show it competes with other stochastic multi-objective solvers on test problems.
- It applies to supervised machine learning tasks such as training non-discriminatory logistic regression models.
Where Pith is reading between the lines
- The probabilistic modeling technique might extend to other scalarization methods if the approximation properties are preserved.
- Similar ideas could improve single-objective stochastic trust region methods under noise.
Load-bearing premise
Each objective function admits a probabilistically fully linear model whose max-based combination provides a satisfactory approximation to the scalarized objective.
What would settle it
Constructing a noisy multi-objective problem where the probabilistic models do not satisfy the fully linear property sufficiently often and checking whether the algorithm's iterates fail to approach a Pareto critical point.
Figures
read the original abstract
The problem we consider is a multi-objective optimization problem, in which the goal is to find an optimal value of a vector function representing various criteria. The aim of this work is to develop an algorithm which utilizes the trust region framework with probabilistic model functions, able to cope with noisy problems, using inaccurate functions and gradients. The key novelty is approximation of each function in the multiobjective problem with probabilistically fully linear model which yields the composite model defined by max operator as a satisfactory approximation for the nonsmooth scalarized objective function. We prove the almost sure convergence of the proposed algorithm to a Pareto critical point. Numerical results demonstrate effectiveness of the probabilistic trust region by comparing it to competitive stochastic multi-objective solvers. The application in supervised machine learning is showcased by training non discriminatory Logistic Regression models on different size data groups. Additionally, we use several test examples with irregularly shaped fronts to exhibit the efficiency of the algorithm
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes SMOP, a stochastic trust-region algorithm for multi-objective optimization that approximates each objective with a probabilistically fully linear model; these are combined via the max operator to form a model for the nonsmooth scalarized objective. The central claim is an almost-sure convergence proof to a Pareto critical point under noisy or inaccurate function/gradient information. Numerical results compare the method against other stochastic multi-objective solvers on test problems with irregular Pareto fronts and demonstrate an application to training non-discriminatory logistic regression models.
Significance. If the convergence result is rigorously supported, the work would extend trust-region methods to the stochastic multi-objective setting with explicit probabilistic modeling of the composite nonsmooth scalarization, which is a non-trivial step. The numerical experiments and ML application provide practical context. Credit is due for targeting the composite-model issue under the max operator rather than avoiding it.
major comments (2)
- [modeling assumption and convergence proof] The key modeling step (abstract and likely §3): the assertion that probabilistically fully linear models for each objective yield a probabilistically fully linear composite under the max operator for the nonsmooth scalarized objective is load-bearing for the a.s. convergence theorem. Because the max is nonsmooth, pointwise error bounds and success probabilities do not transfer automatically; an explicit uniform bound, active-set argument, or separate probability estimate is required. Without this derivation the central claim rests on an unverified transfer.
- [convergence theorem] Convergence theorem (likely Theorem 4.1 or equivalent): the almost-sure convergence statement to a Pareto critical point depends on the composite model satisfying the fully-linear property with positive probability bounded away from zero. The paper must state the precise constants and probability lower bound for the composite; if these are only asserted rather than derived from the per-objective assumptions, the result is not yet supported.
minor comments (2)
- [§2] Notation for the scalarized objective and the max-composite model should be introduced with explicit definitions early in §2 to avoid ambiguity when reading the modeling and convergence sections.
- [numerical experiments] Numerical section: clarify how the probabilistic models are constructed in practice (e.g., sampling strategy or noise model) so that the experiments can be reproduced.
Simulated Author's Rebuttal
We thank the referee for the careful and constructive report. The two major comments correctly identify that the transfer of the probabilistically fully linear property through the max operator requires an explicit argument; we will supply this derivation in the revision.
read point-by-point responses
-
Referee: [modeling assumption and convergence proof] The key modeling step (abstract and likely §3): the assertion that probabilistically fully linear models for each objective yield a probabilistically fully linear composite under the max operator for the nonsmooth scalarized objective is load-bearing for the a.s. convergence theorem. Because the max is nonsmooth, pointwise error bounds and success probabilities do not transfer automatically; an explicit uniform bound, active-set argument, or separate probability estimate is required. Without this derivation the central claim rests on an unverified transfer.
Authors: We agree that the nonsmoothness of the max operator prevents automatic transfer and that an explicit argument is required. The manuscript asserts the composite property but does not contain the detailed derivation. In the revision we will insert a new lemma that (i) identifies the active set of objectives at the current point, (ii) obtains a uniform error bound over that set, and (iii) derives a strictly positive lower bound on the success probability of the composite model directly from the per-objective probabilities. This lemma will be placed immediately before the convergence theorem. revision: yes
-
Referee: [convergence theorem] Convergence theorem (likely Theorem 4.1 or equivalent): the almost-sure convergence statement to a Pareto critical point depends on the composite model satisfying the fully-linear property with positive probability bounded away from zero. The paper must state the precise constants and probability lower bound for the composite; if these are only asserted rather than derived from the per-objective assumptions, the result is not yet supported.
Authors: We will revise the statement of the convergence theorem to display the explicit lower bound on the success probability of the composite model and the associated constants (expressed in terms of the individual model parameters and the number of objectives). The proof will invoke the new lemma described above to obtain the bound from the per-objective assumptions, thereby removing any unsupported assertion. revision: yes
Circularity Check
No circularity; convergence proof is self-contained under stated modeling assumptions
full rationale
The paper states a novel algorithm that approximates each objective with a probabilistically fully linear model, forms a composite via the max operator for the scalarized nonsmooth objective, and proves almost-sure convergence to a Pareto critical point. No quoted step reduces a claimed prediction or theorem to a fitted parameter, self-definition, or load-bearing self-citation chain; the derivation rests on explicit probabilistic model assumptions and standard trust-region analysis that are independent of the target result. The central claim therefore does not collapse to its inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Each objective admits a probabilistically fully linear model whose composite via the max operator approximates the nonsmooth scalarized function.
Reference graph
Works this paper leans on
-
[1]
Bandeira, A.S., Scheinberg, K. & Vicente, L.N. (2014) Con- vergence of trust-region methods based on probabilistic models, SIAM Journal on Optimization, 24(3), 1238-1264
work page 2014
-
[2]
Barocas, S., Hardt, M. & Narayanan, A. (2017) Fairness in ma- chine learning. NIPS Tutorial, 1
work page 2017
-
[3]
Bellavia, S., Kreji ´c, N. & Morini, B. (2020) Inexact restora- tion with subsampled trust-region methods for finite-sum minimization. Comput Optim Appl 76, 701–736
work page 2020
-
[4]
Berkemeier, M. & Peitz, S. (2021) Derivative-Free Multiobjective Trust Region Descent Method Using Radial Basis Function Surrogate Models, Math. Comput. Appl., 26, 31
work page 2021
-
[5]
Berahas A. S., Bollapragada R. & Nocedal J. (2020) An In- vestigation of Newton-Sketch and Subsampled Newton Methods, Opti- mization Methods and Software, 35(4), 661–680
work page 2020
-
[6]
Blanchet, J., Cartis, C., Menickelly, M. & Scheinberg, K. (2019) Convergence Rate Analysis of a Stochastic Trust-Region Method via Supermartingales, INFORMS Journal on Optimization, 1(2), 92- 119
work page 2019
-
[7]
Blackard, J. (1998) Covertype [Dataset]. UCI Machine Learning Repository
work page 1998
-
[8]
Bollapragada, R., Byrd, R. & Nocedal, J. (2019) Exact and Inexact Subsampled Newton Methods for Optimization, IMA Journal of Numerical Analysis, 39(20), 545-578
work page 2019
-
[9]
(2018) Optimization Meth- ods for LargeScale Machine Learning, SIAM Review, 60(2), 223-311
Bottou, L., Curtis F.E., Nocedal, J. (2018) Optimization Meth- ods for LargeScale Machine Learning, SIAM Review, 60(2), 223-311. 29
work page 2018
-
[10]
Byrd, R.H., Hansen, S.L., Nocedal, J. & Singer, Y. (2016) A Stochastic QuasiNewton Method for Large-Scale Optimization, SIAM Journal on Optimization, 26(2), 1008-1021
work page 2016
-
[11]
Byrd, R.H., Chin, G.M., Nocedal, J.& Wu, Y. (2012) Sample size selection in optimization methods for machine learning, Mathematical Programming, 134(1), 127-155
work page 2012
-
[12]
Chen, R., Menickelly, M. & Scheinberg, K. (2018) Stochastic optimization using a trust-region method and random models, Math. Program., 169, 447–487
work page 2018
-
[13]
Curtis, F.E., Scheinberg, K. & Shi R. (2018) A Stochastic Trust Region Algorithm Based on Careful Step Normalization, INFORMS Journal on Optimization, 1(3), 200-220
work page 2018
-
[14]
Custodio, A.L., Madeira, J.A., Vaz, A.I.F. & Vicente, L. N. (2011) Direct multisearch for multiobjective optimization.SIAM J. Op- tim, 21, 1109-1140
work page 2011
-
[15]
(2010) Probability: Theory and Examples
Durrett, R. (2010) Probability: Theory and Examples. Cambridge Series in Statistical and Probabilistic Mathemat- ics. Cambridge Uni- versity Press, Cambridge, fourth edition
work page 2010
-
[16]
Fliege, J. & Svaiter, B.F. (2000) Steepest Descent Methods for Multicriteria Optimization, Mathematical Methods of Operations Re- search, 51, 479-494
work page 2000
-
[17]
Fukuda, E. H. & Drummond, L. M. G. (2014) A survey on multi- objective descent methods. Pesq. Oper. 34 (3)
work page 2014
-
[18]
Hardt, M., Price, E. & Srebro, N. (2016) Equality of opportu- nity in supervised learning. Advances in neural information processing systems, 3315-3323
work page 2016
-
[19]
Janosi, A., Steinbrunn, W., Pfisterer, M. & Detrano, R. (1989) Heart Disease [Dataset]. UCI Machine Learning Repository
work page 1989
-
[20]
Kreji´c, N., Krklec Jerinki ´c, N., Mart´ınez, A. & Yousefi, M. (2024) A non-monotone trust-region method with noisy oracles and additional sampling. Comput Optim Appl 89, 247–278
work page 2024
-
[21]
Liu, S. & Vicente, L.N. (2024) The stochastic multi-gradient algo- rithm for multi-objective optimization and its application to supervised machine learning, Ann Oper Res, 339, 1119–1148
work page 2024
-
[22]
Robbins, H. & Monro, S. (2011) A Stochastic Approximation Method SIAM J. Optim, 21, 1109-1140. 30
work page 2011
-
[23]
Sawaragi, Y., Nakayama, H. & Tanino, T. (1985) Theory of mul- tiobjective optimization, Elsevier, MR 807529
work page 1985
-
[24]
Robbins, H. & Siegmund, D. (1971) A convergence theorem for non negative almost supermartingales and some applications, Optimizing Methods in Statistics, 233-257
work page 1971
-
[25]
Sub-Sampled Newton Methods I: Globally Convergent Algorithms
Roosta-Khorasani, F. & Mahoney, M. W. (2016) Sub-Sampled Newton Methods I: Globally Convergent Algorithms, arXiv:1601.04737
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[26]
Tanabe, H., Fukuda, E.H. & Yamashita, N.(2019) Proximal gra- dient methods for multiobjective optimization and their applications Comput. Optim. Appl. 72, 339–361
work page 2019
-
[27]
Thomann, J. & Eichfelder, G. (2019) A Trust-Region Algorithm for Heterogeneous Multiobjective Optimization, SIAM Journal on Op- timization, 29, 1017 - 1047
work page 2019
-
[28]
Thomann, J. & Eichfelder, G. (2019) Numerical results for the multiobjective trust region algorithm MHT, Data in Brief, 25
work page 2019
-
[29]
Tripuraneni, N., Stern, M., Jin, C., Regier, J. & Jordan, M.I. (2018) Stochastic Cubic Regularization for Fast Nonconvex Optimiza- tion Advances in Neural Information Processing Systems 31
work page 2018
-
[30]
Villacorta, K.D.V., Oliveira, P.R. & Soubeyran, A, (2014) A Trust-Region Method for Unconstrained Multiobjective Problems with Applications in Satisficing Processes, J Optim Theory Appl 160, 865–889
work page 2014
-
[31]
Woodworth, B., Gunasekar, S., Ohannessian, M.I. & Sre- bro, N. (2017) Learning non-discriminatory predictors. Conference on Learning Theory, 1920-1953
work page 2017
-
[32]
Zafar, M.B., Valera, I., Gomez Rodriguez, M. & Gummadi, K.P. (2017) Fairness constraints: Mechanisms for fair classiffication. Artifficial Intelligence and Statistics, 962-970
work page 2017
-
[33]
Zemel, R., Wu, Y., Swersky, K., Pitassi, T. & Dwork, C. (2013) Learning fair representations. International Conference on Ma- chine Learning, 325-333. 31
work page 2013
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.