pith. sign in

arxiv: 2605.19306 · v1 · pith:4B4FX6OAnew · submitted 2026-05-19 · 💻 cs.LG · math.OC

A Two-Phase Adaptive Balanced Penalty Method for Controllable Pareto Front Learning under Split Feasibility Conditions

Pith reviewed 2026-05-20 06:56 UTC · model grok-4.3

classification 💻 cs.LG math.OC
keywords Controllable Pareto Front LearningHypernetworksSplit FeasibilityAdaptive Penalty MethodMulti-Objective OptimizationConvergence AnalysisBi-Level Optimization
0
0 comments X

The pith

A new adaptive penalty method trains hypernetworks to learn controllable Pareto fronts while satisfying split feasibility constraints and proving full-sequence convergence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper solves the open issue of training hypernetworks for controllable Pareto front learning when split feasibility conditions must hold, by giving the first rigorous convergence guarantees for the task. It reformulates the constrained problem as a Bi-Level Scalarized Split Problem and introduces the Adaptive Balanced Penalty algorithm whose gradients for optimality, set feasibility, and image feasibility are combined adaptively using a computable lower bound. A convex surrogate technique then establishes full-sequence convergence under ordinary convexity and Robbins-Monro step-size rules. The same penalty structure becomes a two-phase feasibility-first training procedure for Hyper-MLP and HyperTrans networks, and a new Expected Feasible Hypervolume metric jointly scores solution quality and constraint satisfaction. On five multi-objective benchmarks the solver matches ground truth, and on three multi-task datasets it lifts feasible hypervolume up to 2.3 times higher than unconstrained baselines by moving feasibility rates from 36-49 percent to 87-100 percent.

Core claim

The Adaptive Balanced Penalty algorithm, when applied to the Bi-Level Scalarized Split Problem reformulation of constrained Pareto optimization, achieves full-sequence convergence for hypernetwork training in Controllable Pareto Front Learning under standard convexity and Robbins-Monro step-size assumptions, which in turn supports a two-phase feasibility-first training strategy that demonstrably raises constraint satisfaction rates to 87-100 percent.

What carries the argument

The Adaptive Balanced Penalty (ABP) algorithm, which blends optimality, set feasibility, and image feasibility gradient components through an adaptive indicator driven by a computable lower bound.

If this is right

  • The two-phase ABP-HyperNet training strategy produces hypernetworks whose generated Pareto fronts satisfy the split feasibility conditions at rates of 87-100 percent.
  • The Expected Feasible Hypervolume metric provides a joint measure of solution quality and constraint satisfaction that can be used to compare constrained CPFL methods.
  • The ABP solver matches ground-truth solutions on standard multi-objective benchmarks while enforcing the feasibility constraints.
  • Hyper-MLP and HyperTrans architectures trained with the translated ABP penalty structure outperform unconstrained baselines by up to 2.3 times in EFHV.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same adaptive blending mechanism could be tested on problems where only approximate convexity holds, to see whether practical performance remains strong even if the formal proof does not apply.
  • The Expected Feasible Hypervolume could serve as an evaluation tool in other constrained multi-objective settings such as resource allocation or neural architecture search with hard constraints.
  • Because the method separates feasibility and optimality phases, it may reduce the need for heavy constraint-handling machinery in related hypernetwork applications.

Load-bearing premise

The problems satisfy standard convexity assumptions and Robbins-Monro step-size conditions required for the convergence proof.

What would settle it

A counter-example in which the ABP algorithm diverges or fails to reach a feasible solution on a convex Bi-Level Scalarized Split Problem instance that obeys the Robbins-Monro step-size schedule would disprove the full-sequence convergence claim.

Figures

Figures reproduced from arXiv: 2605.19306 by Dung D. Le, Nguyen Viet Hoang, Tran Ngoc Thang.

Figure 1
Figure 1. Figure 1: Conceptual overview of Controllable Pareto Front Learning. (a) Existing CPFL methods (Navon et al., 2021; Tuan et al., 2024a,b) approximate the entire unconstrained Pareto front; solutions may lie anywhere in the objective space. (b) Our BSSP two-phase training strategy restricts solutions to a decision-maker-specified region 𝑄, systematically driving them into 𝑄 while optimizing the trade-off. N.V. Hoang,… view at source ↗
Figure 2
Figure 2. Figure 2: Training pipeline for hypernetwork-based CPFL under split feasibility conditions. When 𝜀0 = 0: 𝓁∗ = 0, all three bounds vanish, and ̂𝑥 ∈ Ω (exact optimality and feasibility). (Proof in Appendix A.4.) Remark 4.22.1 (Summary of convergence conclusions) [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Multi-LeNet target network for Multi-MNIST, Multi-Fashion, and Fashion+MNIST. All weights 𝜽 are generated by the hypernetwork; no learnable parameters reside in the target network itself. The two task heads produce logits for the left-image task (Task 0) and right-image task (Task 1), respectively [PITH_FULL_IMAGE:figures/full_fig_p015_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Hyper-MLP architecture. A three-layer shared MLP trunk maps 𝒓 to 𝒉MLP(𝒓) ∈ ℝ𝑑 ; separate linear heads project this representation onto each target parameter tensor 𝜃𝑗 . 5.2.2. HyperTrans The key limitation of Hyper-MLP is that 𝒓 is processed as a monolithic input, obscuring per-objective contributions. This matters for the BSSP penalty structure because the image feasibility residual 𝜌 𝑘 has a per-objectiv… view at source ↗
Figure 5
Figure 5. Figure 5: HyperTrans architecture: each 𝑟𝑖 is embedded into a 𝑑-dimensional token; a single Transformer block captures pairwise interactions; mean-pooling aggregates the tokens into a shared state projected via linear heads to produce 𝜽. The attention scores capture the pairwise trade-off structure, providing HyperTrans with an inductive bias for modelling the coupling between objectives—precisely what the image-fea… view at source ↗
Figure 6
Figure 6. Figure 6: Constrained Pareto front approximation on two convex benchmarks with 50 rays. Left: CVX1 (dim 𝑥 = 1). Right: CVX2 (dim 𝑥 = 2, Binh & Korn). ABP-HyperTrans places its output points tightly along the ground-truth constrained front inside 𝑄+ [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Constrained Pareto front approximation on two non-convex ZDT benchmarks with 50 rays. Left: ZDT1 (dim 𝑥 = 30, convex front 𝑓2 = 1 − √ 𝑓1 ). Right: ZDT2 (dim 𝑥 = 30, concave front 𝑓2 = 1 − 𝑓 2 1 ). Layout as in [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Multi-MNIST: Pareto fronts of all four methods under Box (left) and Sphere (right) constraints for a representative fold. Shaded regions: constraint set 𝑄 / 𝑄+ ; legend shows ray-level feasibility [PITH_FULL_IMAGE:figures/full_fig_p023_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Multi-Fashion: Pareto fronts of all four methods. Layout as in [PITH_FULL_IMAGE:figures/full_fig_p023_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Fashion+MNIST: Pareto fronts of all four methods. Layout as in [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Multi-MNIST, ABP-HyperMLP: Pareto front under None (blue), Box (red), and Sphere (green) constraints. Dashed lines: constraint boundaries; shaded: 𝑄 or 𝑄+ [PITH_FULL_IMAGE:figures/full_fig_p031_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Multi-MNIST, ABP-HyperTrans: layout as in [PITH_FULL_IMAGE:figures/full_fig_p031_12.png] view at source ↗
Figure 13
Figure 13. Figure 13: Multi-Fashion, ABP-HyperMLP: layout as in [PITH_FULL_IMAGE:figures/full_fig_p032_13.png] view at source ↗
Figure 14
Figure 14. Figure 14: Multi-Fashion, ABP-HyperTrans: layout as in [PITH_FULL_IMAGE:figures/full_fig_p032_14.png] view at source ↗
Figure 15
Figure 15. Figure 15: Fashion+MNIST, ABP-HyperMLP: layout as in [PITH_FULL_IMAGE:figures/full_fig_p032_15.png] view at source ↗
Figure 16
Figure 16. Figure 16: Fashion+MNIST, ABP-HyperTrans: layout as in [PITH_FULL_IMAGE:figures/full_fig_p033_16.png] view at source ↗
Figure 17
Figure 17. Figure 17: ABP-HyperTrans predictions on CVX2 at 10, 20, 50, and 100 preference rays. Grey curve: unconstrained Pareto front; blue circle: boundary of 𝑄; shaded region: 𝑄+ ; red stars: predicted solutions; cyan cross: 𝑧 ∗ . N.V. Hoang, D.D. Le and T.N. Thang: Preprint submitted to Elsevier Page 31 of 34 [PITH_FULL_IMAGE:figures/full_fig_p033_17.png] view at source ↗
Figure 18
Figure 18. Figure 18: Hyperparameter sensitivity on Multi-MNIST (Hyper-MLP, Box constraint) across four key parameters. Bars represent the mean over 5 independent seeds for (a) Phase-1 weight 𝜀, (b) penalty growth 𝜌, (c) penalty cap 𝛽max, and (d) initial penalty 𝛽0 . N.V. Hoang, D.D. Le and T.N. Thang: Preprint submitted to Elsevier Page 32 of 34 [PITH_FULL_IMAGE:figures/full_fig_p034_18.png] view at source ↗
read the original abstract

We address the open problem of training hypernetworks for Controllable Pareto Front Learning (CPFL) under split feasibility conditions with rigorous theoretical guarantees. We reformulate the constrained Pareto problem as a Bi-Level Scalarized Split Problem (BSSP) and propose the Adaptive Balanced Penalty (ABP) algorithm, whose three gradient components -- optimality, set feasibility, and image feasibility -- are blended through an adaptive indicator driven by a computable lower bound. Using a novel convex surrogate technique, we prove full-sequence convergence under standard convexity and Robbins-Monro step-size assumptions. The ABP penalty structure is then translated into a two-phase, feasibility-first training strategy for Hyper-MLP and HyperTrans architectures (ABP-HyperNet). To evaluate constrained CPFL, we introduce the Expected Feasible Hypervolume (EFHV), which jointly captures solution quality and constraint satisfaction. Experiments on five multi-objective benchmarks validate the ABP solver against ground truth, while three multi-task learning datasets demonstrate that ABP-HyperNet achieves up to 2.3x higher EFHV than unconstrained baselines by raising feasibility from 36-49% to 87-100%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper addresses controllable Pareto front learning (CPFL) under split feasibility by reformulating the problem as a Bi-Level Scalarized Split Problem (BSSP). It introduces the Adaptive Balanced Penalty (ABP) algorithm that blends optimality, set feasibility, and image feasibility gradients via an adaptive indicator. A novel convex surrogate technique is used to prove full-sequence convergence under standard convexity and Robbins-Monro step-size conditions. The ABP structure is translated into a two-phase feasibility-first training procedure for Hyper-MLP and HyperTrans hypernetworks (ABP-HyperNet). A new Expected Feasible Hypervolume (EFHV) metric is proposed to evaluate both quality and feasibility. Experiments on five multi-objective benchmarks and three multi-task datasets report improved feasibility rates (87-100%) and up to 2.3x higher EFHV versus unconstrained baselines.

Significance. If the convergence result holds and the guarantees transfer to the hypernetwork setting, the work would advance constrained multi-objective optimization by providing the first rigorous full-sequence convergence for controllable Pareto front learning with split feasibility. The convex surrogate technique, the two-phase training heuristic, and the EFHV metric are potentially useful contributions for practical hypernetwork-based Pareto approximation in multi-task learning.

major comments (2)
  1. The convergence proof (abstract and theoretical section) establishes full-sequence convergence for the ABP solver on the BSSP under convexity and Robbins-Monro assumptions via the convex surrogate. However, the central application translates ABP into two-phase training of Hyper-MLP and HyperTrans architectures, whose parameter spaces are non-convex. No argument shows that the surrogate technique or the two-phase heuristic inherits the same guarantees; the reported EFHV gains remain purely empirical. This gap is load-bearing for the claim of 'rigorous theoretical guarantees' for ABP-HyperNet.
  2. The weakest assumption listed (standard convexity of the BSSP) is invoked for the proof, yet the manuscript does not verify or relax this assumption when the BSSP is instantiated inside a hypernetwork whose outer optimization is non-convex. A concrete test or counter-example analysis for the non-convex regime would be required to support the transfer.
minor comments (2)
  1. The definition and computability of the 'computable lower bound' driving the adaptive indicator should be stated explicitly with pseudocode or an equation reference.
  2. Clarify whether the five benchmark experiments validate the ABP solver in isolation or already include the hypernetwork training; the distinction affects how the theoretical guarantees are claimed to support the empirical results.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments and for highlighting the potential impact of our contributions to constrained CPFL. We address each major comment below, clarifying the scope of our theoretical results and the practical nature of the hypernetwork extension.

read point-by-point responses
  1. Referee: The convergence proof (abstract and theoretical section) establishes full-sequence convergence for the ABP solver on the BSSP under convexity and Robbins-Monro assumptions via the convex surrogate. However, the central application translates ABP into two-phase training of Hyper-MLP and HyperTrans architectures, whose parameter spaces are non-convex. No argument shows that the surrogate technique or the two-phase heuristic inherits the same guarantees; the reported EFHV gains remain purely empirical. This gap is load-bearing for the claim of 'rigorous theoretical guarantees' for ABP-HyperNet.

    Authors: We agree that the full-sequence convergence proof via the convex surrogate applies specifically to the ABP solver on the convex BSSP under the stated assumptions. The ABP-HyperNet translates the ABP penalty structure into a two-phase feasibility-first training procedure for the hypernetworks, but this is presented as a practical heuristic rather than a direct application of the convergence result. The manuscript does not claim that the guarantees transfer to the non-convex hypernetwork parameter space, and the EFHV improvements are empirical. We will revise the abstract, Section 1, and the conclusion to explicitly delineate the theoretical guarantees (ABP on BSSP) from the empirical results (ABP-HyperNet). A dedicated limitations paragraph will be added to discuss this distinction. revision: yes

  2. Referee: The weakest assumption listed (standard convexity of the BSSP) is invoked for the proof, yet the manuscript does not verify or relax this assumption when the BSSP is instantiated inside a hypernetwork whose outer optimization is non-convex. A concrete test or counter-example analysis for the non-convex regime would be required to support the transfer.

    Authors: The convexity assumption is required for the BSSP convergence analysis. In the hypernetwork setting the outer optimization over network parameters is non-convex, and the manuscript does not verify, relax, or provide counter-example analysis for this regime. We will add a discussion subsection noting this limitation and clarifying that the two-phase procedure is motivated by the ABP structure to prioritize feasibility in practice, without inheriting the convexity-based guarantees. A full non-convex analysis or counter-example study lies outside the current scope. revision: partial

standing simulated objections not resolved
  • A concrete test or counter-example analysis for the non-convex regime in hypernetwork training

Circularity Check

0 steps flagged

No circularity: derivation relies on external standard assumptions and independent definitions

full rationale

The paper reformulates the constrained Pareto problem as BSSP, introduces the ABP algorithm with three gradient components blended via an adaptive indicator, and proves full-sequence convergence via a novel convex surrogate technique under explicitly stated standard convexity and Robbins-Monro step-size assumptions. The two-phase feasibility-first training for Hyper-MLP and HyperTrans, along with the EFHV metric, are defined directly from the penalty structure without reducing to fitted inputs or prior self-citations. No load-bearing step equates a claimed result to its own inputs by construction; the proof chain is self-contained against external mathematical benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 2 invented entities

The central claims rest on standard optimization assumptions for convergence and the introduction of new algorithmic structures and evaluation metric without independent external validation in the abstract.

free parameters (1)
  • adaptive indicator parameters
    Used to blend optimality, set feasibility, and image feasibility gradients based on computable lower bound
axioms (2)
  • domain assumption Standard convexity assumptions
    Invoked for proving full-sequence convergence of ABP algorithm
  • standard math Robbins-Monro step-size assumptions
    Required for stochastic convergence guarantees in the proof
invented entities (2)
  • Adaptive Balanced Penalty (ABP) algorithm no independent evidence
    purpose: To solve the Bi-Level Scalarized Split Problem for constrained CPFL
    New penalty structure with adaptive indicator proposed in the paper
  • Expected Feasible Hypervolume (EFHV) no independent evidence
    purpose: To jointly capture solution quality and constraint satisfaction in constrained CPFL
    New evaluation metric introduced for the constrained setting

pith-pipeline@v0.9.0 · 5738 in / 1445 out tokens · 68252 ms · 2026-05-20T06:56:09.801242+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 1 internal anchor

  1. [1]

    Constrained policy optimization, in: International Conference on Machine Learning (ICML), pp

    Achiam, J., Held, D., Tamar, A., Abbeel, P., 2017. Constrained policy optimization, in: International Conference on Machine Learning (ICML), pp. 22--31

  2. [2]

    A reductions approach to fair classification, in: International Conference on Machine Learning (ICML), pp

    Agarwal, A., Beygelzimer, A., Dud\' i k, M., Langford, J., Wallach, H., 2018. A reductions approach to fair classification, in: International Conference on Machine Learning (ICML), pp. 60--69

  3. [3]

    Constrained Markov decision processes

    Altman, E., 1999. Constrained Markov decision processes. Chapman & Hall/CRC, Boca Raton, FL

  4. [4]

    Convex analysis and monotone operator theory in Hilbert spaces

    Bauschke, H.H., Combettes, P.L., 2017. Convex analysis and monotone operator theory in Hilbert spaces. 2nd ed., Springer, Cham

  5. [5]

    Nonlinear programming

    Bertsekas, D.P., 1999. Nonlinear programming. 2nd ed., Athena Scientific, Belmont, MA

  6. [6]

    Mobes: A multiobjective evolution strategy for constrained optimization problems, in: The third international conference on genetic algorithms (Mendel 97), p

    Binh, T.T., Korn, U., 1997. Mobes: A multiobjective evolution strategy for constrained optimization problems, in: The third international conference on genetic algorithms (Mendel 97), p. 27

  7. [7]

    Dynamic string-averaging cq-methods for the split feasibility problem with percentage violation constraints arising in radiation therapy treatment planning

    Brooke, M., Censor, Y., Gibali, A., 2021. Dynamic string-averaging cq-methods for the split feasibility problem with percentage violation constraints arising in radiation therapy treatment planning. International Transactions in Operational Research 30, 181--205

  8. [8]

    Iterative oblique projection onto convex sets and the split feasibility problem

    Byrne, C., 2002. Iterative oblique projection onto convex sets and the split feasibility problem. Inverse problems 18, 441

  9. [9]

    A unified treatment of some iterative algorithms in signal processing and image reconstruction

    Byrne, C., 2004. A unified treatment of some iterative algorithms in signal processing and image reconstruction. Inverse Problems 20, 103--120

  10. [10]

    Multi-objective optimization method for enhancing chemical reaction process

    Cao, X., Jia, S., Luo, Y., Yuan, X., Qi, Z., Yu, K.T., 2019. Multi-objective optimization method for enhancing chemical reaction process. Chemical Engineering Science 195, 494--506

  11. [11]

    A multiprojection algorithm using bregman projections in a product space

    Censor, Y., Elfving, T., 1994. A multiprojection algorithm using bregman projections in a product space. Numerical Algorithms 8, 221--239

  12. [12]

    The multiple-sets split feasibility problem and its applications for inverse problems

    Censor, Y., Elfving, T., Kopf, N., Bortfeld, T., 2005. The multiple-sets split feasibility problem and its applications for inverse problems. Inverse problems 21, 2071

  13. [13]

    Algorithms for the split variational inequality problem

    Censor, Y., Gibali, A., Reich, S., 2012. Algorithms for the split variational inequality problem. Numerical Algorithms 59, 301--323

  14. [14]

    Multicriteria optimization

    Ehrgott, M., 2005. Multicriteria optimization. volume 491. Springer Science & Business Media

  15. [15]

    Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels

    Emmerich, M.T.M., Giannakoglou, K.C., Naujoks, B., 2006. Single- and multiobjective evolutionary optimization assisted by Gaussian random field metamodels. IEEE Transactions on Evolutionary Computation 10, 421--439

  16. [16]

    Bayesian optimization with unknown constraints, in: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), pp

    Gelbart, M.A., Snoek, J., Adams, R.P., 2014. Bayesian optimization with unknown constraints, in: Proceedings of the 30th Conference on Uncertainty in Artificial Intelligence (UAI), pp. 250--259

  17. [17]

    Fundamentals of convex analysis

    Hiriart-Urruty, J.-B., Lemar\' e chal, C., 2004. Fundamentals of convex analysis. Springer, Berlin

  18. [18]

    Elitist non-dominated sorting harris hawks optimization: Framework and developments for multi-objective problems

    Jangir, P., Heidari, A.A., Chen, H., 2021. Elitist non-dominated sorting harris hawks optimization: Framework and developments for multi-objective problems. Expert Systems with Applications 186, 115747

  19. [19]

    Optimization over the efficient set of a bicriteria convex programming problem

    Kim, N.T.B., Thang, T.N., 2013. Optimization over the efficient set of a bicriteria convex programming problem. Pac. J. Optim. 9, 103--115

  20. [20]

    Iteration-complexity of first-order penalty methods for convex programming

    Lan, G., Monteiro, R.D.C., 2013. Iteration-complexity of first-order penalty methods for convex programming. Mathematical Programming 138, 115--139

  21. [21]

    Pareto multi-task learning, in: Thirty-third Conference on Neural Information Processing Systems (NeurIPS), pp

    Lin, X., Zhen, H.L., Li, Z., Zhang, Q., Kwong, S., 2019. Pareto multi-task learning, in: Thirty-third Conference on Neural Information Processing Systems (NeurIPS), pp. 12037--12047

  22. [22]

    Pareto set learning for expensive multi-objective optimization

    Lin, X., Yang, Z., Zhang, Q., 2022. Pareto set learning for expensive multi-objective optimization. Advances in Neural Information Processing Systems 35, 16298--16310

  23. [23]

    Nonlinear multiobjective optimization

    Miettinen, K., 1999. Nonlinear multiobjective optimization. Kluwer Academic Publishers, Boston

  24. [24]

    Learning the Pareto front with hypernetworks, in: International Conference on Learning Representations (ICLR)

    Navon, A., Shamsian, A., Chechik, G., Fetaya, E., 2021. Learning the Pareto front with hypernetworks, in: International Conference on Learning Representations (ICLR)

  25. [25]

    Introductory lectures on convex optimization: a basic course

    Nesterov, Y., 2004. Introductory lectures on convex optimization: a basic course. Kluwer Academic Publishers, Boston

  26. [26]

    Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations

    Raissi, M., Perdikaris, P., Karniadakis, G.E., 2019. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations. Journal of Computational Physics 378, 686--707

  27. [27]

    A stochastic approximation method

    Robbins, H., Monro, S., 1951. A stochastic approximation method. Annals of Mathematical Statistics 22, 400--407

  28. [28]

    A convergence theorem for non-negative almost supermartingales and some applications, in: Rustagi, J.S

    Robbins, H., Siegmund, D., 1971. A convergence theorem for non-negative almost supermartingales and some applications, in: Rustagi, J.S. (Ed.), Optimizing methods in statistics. Academic Press, New York, pp. 233--257

  29. [29]

    Variational analysis

    Rockafellar, R.T., Wets, R.J.-B., 2009. Variational analysis. Springer, Berlin

  30. [30]

    Convex analysis

    Rockafellar, R.T., 1970. Convex analysis. Princeton University Press, Princeton, NJ

  31. [31]

    Dynamic routing between capsules, in: Advances in Neural Information Processing Systems (NeurIPS), pp

    Sabour, S., Frosst, N., Hinton, G.E., 2017. Dynamic routing between capsules, in: Advances in Neural Information Processing Systems (NeurIPS), pp. 3859--3869

  32. [32]

    Multi-task learning as multi-objective optimization

    Sener, O., Koltun, V., 2018. Multi-task learning as multi-objective optimization. Advances in neural information processing systems 31

  33. [33]

    A monotonic optimization approach for solving strictly quasiconvex multiobjective programming problems

    Thang, T.N., Solanki, V.K., Dao, T.A., Thi Ngoc Anh, N., Van Hai, P., 2020. A monotonic optimization approach for solving strictly quasiconvex multiobjective programming problems. Journal of Intelligent & Fuzzy Systems 38, 6053--6063

  34. [34]

    A framework for controllable Pareto front learning with completed scalarization functions and its applications

    Tuan, T.A., Hoang, L.P., Le, D.D., Thang, T.N., 2024. A framework for controllable Pareto front learning with completed scalarization functions and its applications. Neural Networks 169, 257--273

  35. [35]

    A HyperTrans model for controllable Pareto front learning with split feasibility constraints

    Tuan, T.A., Dung, N.V., Thang, T.N., 2024. A HyperTrans model for controllable Pareto front learning with split feasibility constraints. Neural Networks 179, 106571

  36. [36]

    Optimizing over pareto set of semistrictly quasiconcave vector maximization and application to stochastic portfolio selection

    Vuong, N.D., Thang, T.N., 2023. Optimizing over pareto set of semistrictly quasiconcave vector maximization and application to stochastic portfolio selection. Journal of Industrial and Management Optimization 19, 1999--2019

  37. [37]

    Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

    Xiao, H., Rasul, K., Vollgraf, R., 2017. Fashion-MNIST: a novel image dataset for benchmarking machine learning algorithms. arXiv preprint arXiv:1708.07747

  38. [38]

    Are transformers universal approximators of sequence-to-sequence functions?, in: International Conference on Learning Representations (ICLR)

    Yun, C., Bhojanapalli, S., Rawat, A.S., Reddi, S.J., Kumar, S., 2020. Are transformers universal approximators of sequence-to-sequence functions?, in: International Conference on Learning Representations (ICLR)

  39. [39]

    Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach

    Zitzler, E., Thiele, L., 1999. Multiobjective evolutionary algorithms: a comparative case study and the strength pareto approach. IEEE transactions on Evolutionary Computation 3, 257--271

  40. [40]

    Comparison of multiobjective evolutionary algorithms: Empirical results

    Zitzler, E., Deb, K., Thiele, L., 2000. Comparison of multiobjective evolutionary algorithms: Empirical results. Evolutionary computation 8, 173--195