Recognition: 2 theorem links
· Lean TheoremA Single Deep Preference-Conditioned Policy for Learning Pareto Coverage Sets
Pith reviewed 2026-05-12 02:17 UTC · model grok-4.3
The pith
Under mild conditions each preference maps to a unique Lipschitz-continuous Pareto-optimal return vector, enabling one policy to cover the front.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Under mild interior conditions on the preference set, smooth Tchebycheff scalarization induces a unique Pareto-optimal return vector for each preference that depends Lipschitz-continuously on it. The problem is formulated over occupancy measures and solved by Concave Mirror Descent Policy Iteration, which achieves O(1/k) objective-suboptimality and is equivalent at each step to solving a Kullback-Leibler-regularized MDP with the prior policy as reference; the resulting deep actor-critic instantiation covers the Pareto set on MO-Gymnasium benchmarks.
What carries the argument
Concave Mirror Descent Policy Iteration (CMDPI) over occupancy measures, which equates each update to a KL-regularized MDP with the previous policy as reference.
If this is right
- CMDPI attains an O(1/k) rate of objective-suboptimality.
- Each policy update is exactly equivalent to solving a KL-regularized MDP.
- The learned policy is continuous in the preference parameter across finite iterations.
- The deep instantiation achieves the best average hypervolume rank among recent baselines on eight MO-Gymnasium tasks.
Where Pith is reading between the lines
- The continuity result may allow nearby preferences to share policy parameters without full retraining.
- The same occupancy-measure formulation could be applied to other monotone scalarizations that satisfy analogous interior conditions.
- Gains observed in continuous-control experiments suggest the method scales beyond discrete actions when the actor-critic approximation remains faithful to the KL-regularized update.
Load-bearing premise
Mild interior conditions on the preference set are needed to guarantee that each preference produces a unique Pareto-optimal return vector that changes continuously with the preference.
What would settle it
An explicit preference vector inside the interior region for which two distinct Pareto-optimal return vectors yield the same scalarized value, or a sequence of preferences converging to a limit preference whose optimal return vectors fail to converge.
Figures
read the original abstract
Preference-conditioned multi-objective reinforcement learning aims to learn a single policy that captures trade-offs across preferences, but under nonlinear scalarization the uniqueness and continuity of the preference-to-solution correspondence remain unclear. We study this problem in tabular multi-objective Markov decision processes (MDPs) using smooth Tchebycheff scalarization as a monotone utility. Under mild interior conditions on the preference set, we prove that each preference induces a unique Pareto-optimal return vector and that this vector depends Lipschitz-continuously on the preference, providing a principled foundation for preference sweeping toward dense Pareto-front coverage. To compute these targets, we formulate the problem over occupancy measures and derive Concave Mirror Descent Policy Iteration (CMDPI), which achieves an $O(1/k)$ objective-suboptimality rate. We further show that each update is equivalent to solving a Kullback-Leibler-regularized MDP with the previous policy as reference, yielding a policy-iteration interpretation and finite-iterate policy continuity across preferences. We instantiate the update as a deep actor-critic algorithm preserving previous-policy regularization. On eight MO-Gymnasium tasks, it achieves the best average hypervolume rank among recent baselines and strong expected-utility performance. Continuous-control experiments indicate gains beyond the discrete-action setting.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper studies preference-conditioned multi-objective RL in tabular MDPs using smooth Tchebycheff scalarization. Under mild interior conditions on the preference set, it proves that each preference maps to a unique Pareto-optimal return vector with Lipschitz continuity. It introduces Concave Mirror Descent Policy Iteration (CMDPI) achieving O(1/k) suboptimality, shows each update equals solving a KL-regularized MDP with the prior policy as reference, and implements this as a deep actor-critic algorithm that ranks best in average hypervolume on eight MO-Gymnasium tasks.
Significance. If the mapping theorem and convergence hold, the work supplies a principled basis for learning a single policy that densely covers Pareto fronts via preference sweeping, which is useful for applications requiring explicit trade-offs. The KL-regularized MDP equivalence offers a practical policy-iteration view and finite-iterate continuity, while the empirical results on discrete and continuous control tasks demonstrate competitive performance against recent baselines.
major comments (2)
- [Abstract and theoretical analysis section] Abstract and theoretical analysis section: the 'mild interior conditions' guaranteeing uniqueness and Lipschitz continuity of the preference-to-Pareto mapping are invoked but never stated precisely (e.g., whether they require all preference components strictly positive or the vector to lie in the relative interior of the simplex). This assumption is load-bearing for the CMDPI rate and the policy-iteration interpretation, yet its necessity is not demonstrated by counter-example or boundary case.
- [CMDPI derivation and equivalence claim] CMDPI derivation and equivalence claim: the O(1/k) objective-suboptimality rate and the statement that each update is equivalent to a KL-regularized MDP with the previous policy as reference are presented as following from the occupancy-measure formulation, but the manuscript provides no explicit derivation steps, error bounds, or verification that the smooth Tchebycheff scalarization preserves the required monotonicity and concavity for the mirror-descent analysis to go through.
minor comments (2)
- [Experiments] The experimental section would benefit from an explicit statement of how preferences are sampled during training and evaluation to support reproducibility of the hypervolume results.
- [Notation] Notation for occupancy measures and the scalarization function should be introduced once and used consistently; occasional re-use of symbols for different quantities appears in the background and method sections.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which help clarify the presentation of our theoretical results. We address each major comment below and will revise the manuscript accordingly to improve precision and completeness.
read point-by-point responses
-
Referee: [Abstract and theoretical analysis section] Abstract and theoretical analysis section: the 'mild interior conditions' guaranteeing uniqueness and Lipschitz continuity of the preference-to-Pareto mapping are invoked but never stated precisely (e.g., whether they require all preference components strictly positive or the vector to lie in the relative interior of the simplex). This assumption is load-bearing for the CMDPI rate and the policy-iteration interpretation, yet its necessity is not demonstrated by counter-example or boundary case.
Authors: We agree that the precise statement of the interior conditions is missing and should be made explicit. These conditions require that the preference vector lies in the relative interior of the probability simplex (i.e., all components strictly positive). This ensures the smooth Tchebycheff scalarization yields a strictly concave objective, enabling uniqueness and Lipschitz continuity of the preference-to-Pareto mapping. In the revision we will add the exact definition to the abstract and theory section, explain its role in the CMDPI analysis, and include a brief discussion of boundary cases (with a simple counter-example sketch) where uniqueness can fail when a component is zero. revision: yes
-
Referee: [CMDPI derivation and equivalence claim] CMDPI derivation and equivalence claim: the O(1/k) objective-suboptimality rate and the statement that each update is equivalent to a KL-regularized MDP with the previous policy as reference are presented as following from the occupancy-measure formulation, but the manuscript provides no explicit derivation steps, error bounds, or verification that the smooth Tchebycheff scalarization preserves the required monotonicity and concavity for the mirror-descent analysis to go through.
Authors: We acknowledge that the current manuscript omits the full step-by-step derivation. The O(1/k) rate follows from applying concave mirror descent to the occupancy-measure formulation of the smooth Tchebycheff objective; each CMDPI update is exactly equivalent to solving a KL-regularized MDP whose reference policy is the previous iterate. The smooth Tchebycheff scalarization preserves monotonicity and concavity under the interior conditions. In the revision we will insert the explicit derivation (including the key lemmas on concavity preservation and the error-bound analysis) into the main text or a dedicated appendix subsection, thereby making the policy-iteration view and finite-iterate continuity fully rigorous. revision: yes
Circularity Check
No significant circularity; central claims rest on standard MDP occupancy-measure and mirror-descent analysis
full rationale
The paper proves uniqueness and Lipschitz continuity of the preference-to-Pareto mapping under explicitly stated mild interior conditions on the preference set, using smooth Tchebycheff scalarization as a monotone utility. CMDPI and its O(1/k) rate are derived from concave mirror descent over occupancy measures, a standard technique independent of any fitted parameters or self-referential predictions. The KL-regularized MDP equivalence follows directly from the update rule without reducing to prior self-citations or ansatzes. No load-bearing steps match the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Mild interior conditions on the preference set ensure uniqueness and Lipschitz continuity of the preference-to-Pareto mapping
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Under mild interior conditions on the preference set, we prove that each preference induces a unique Pareto-optimal return vector and that this vector depends Lipschitz-continuously on the preference
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
we derive Concave Mirror Descent Policy Iteration (CMDPI), which achieves an O(1/k) objective-suboptimality rate
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
D. J. White. Multi-objective infinite-horizon discounted markov decision processes.Journal of Mathematical Analysis and Applications, 89(2):639–647, 1982. doi: 10.1016/0022-247X(82) 90122-6
-
[2]
Multi-objective reinforcement learning using sets of pareto dominating policies.J
Kristof Van Moffaert and A Nowé. Multi-objective reinforcement learning using sets of pareto dominating policies.J. Mach. Learn. Res., 15(107):3483–3512, 2014
work page 2014
-
[3]
Diederik M Roijers, Peter Vamplew, Shimon Whiteson, and Richard Dazeley. A survey of multi-objective sequential decision-making.Journal of Artificial Intelligence Research, 48: 67–113, 2013
work page 2013
-
[4]
Hossam Mossalam, Yannis M. Assael, Diederik M. Roijers, and Shimon Whiteson. Multi- objective deep reinforcement learning. arXiv:1610.02707, 2016. URL https://arxiv.org/ abs/1610.02707
-
[5]
Prediction-guided multi-objective reinforcement learning for continuous robot control
Jie Xu, Yunsheng Tian, Pingchuan Ma, Daniela Rus, Shinjiro Sueda, and Wojciech Matusik. Prediction-guided multi-objective reinforcement learning for continuous robot control. In Hal Daumé III and Aarti Singh, editors,Proceedings of the 37th International Conference on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 10607–106...
work page 2020
-
[6]
Runzhe Yang, Xingyuan Sun, and Karthik Narasimhan. A generalized algorithm for multi- objective reinforcement learning and policy adaptation.Advances in neural information pro- cessing systems, 32, 2019
work page 2019
-
[7]
Mathieu Reymond, Eugenio Bargiacchi, and Ann Nowé. Pareto conditioned networks. In Proceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 1110–1118, 2022
work page 2022
-
[8]
Toygun Basaklar, Suat Gumussoy, and Umit Y Ogras. Pd-morl: Preference-driven multi- objective reinforcement learning algorithm.arXiv preprint arXiv:2208.07914, 2022
-
[9]
Efficient discovery of pareto front for multi-objective reinforcement learning
Ruohong Liu, Yuxin Pan, Linjie Xu, Lei Song, Pengcheng You, Yize Chen, and Jiang Bian. Efficient discovery of pareto front for multi-objective reinforcement learning. InInternational Conference on Learning Representations (ICLR), 2025. URL https://openreview.net/ forum?id=fDGPIuCdGi
work page 2025
-
[10]
Pareto set learning for multi-objective reinforcement learning
Erlong Liu, Yu-Chang Wu, Xiaobin Huang, Chengrui Gao, Ren-Jian Wang, Ke Xue, and Chao Qian. Pareto set learning for multi-objective reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 18789–18797, 2025
work page 2025
-
[11]
Multi-objective reinforcement learning with continuous pareto frontier approximation
Matteo Pirotta, Simone Parisi, and Marcello Restelli. Multi-objective reinforcement learning with continuous pareto frontier approximation. InProceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence (AAAI), 2015
work page 2015
-
[12]
Pareto-dqn: Approximating the pareto front in complex multi-objective decision problems
Mathieu Reymond and Ann Nowe. Pareto-dqn: Approximating the pareto front in complex multi-objective decision problems. InProceedings of the Adaptive and Learning Agents Work- shop 2019 (ALA-19) at AAMAS, May 2019. URL https://ala2019.vub.ac.be. 2019 Adaptive Learning Agents (ALA) workshop: Workshop of the AAMAS conference ; Confer- ence
work page 2019
-
[13]
Conor F Hayes, Roxana R˘adulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M Zintgraf, Richard Dazeley, Fredrik Heintz, et al. A practical guide to multi-objective reinforcement learning and planning.Autonomous Agents and Multi-Agent Systems, 36(1):26, 2022
work page 2022
-
[14]
Cambridge university press, 2004
Stephen Boyd and Lieven Vandenberghe.Convex optimization. Cambridge university press, 2004. 10
work page 2004
-
[15]
Dynamic weights in multi-objective deep reinforcement learning
Axel Abels, Diederik Roijers, Tom Lenaerts, Ann Nowé, and Denis Steckelmacher. Dynamic weights in multi-objective deep reinforcement learning. InInternational conference on machine learning, pages 11–20. PMLR, 2019
work page 2019
-
[16]
A theory of regularized markov decision processes
Matthieu Geist, Bruno Scherrer, and Olivier Pietquin. A theory of regularized markov decision processes. InInternational conference on machine learning, pages 2160–2169. PMLR, 2019
work page 2019
-
[17]
Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality
Haoye Lu, Daniel Herman, and Yaoliang Yu. Multi-objective reinforcement learning: Convexity, stationarity and pareto optimality. InThe Eleventh International Conference on Learning Representations, 2023. URLhttps://openreview.net/forum?id=TjEzIsyEsQ6
work page 2023
-
[18]
Nianli Peng, Muhang Tian, and Brandon Fain. Multi-objective reinforcement learning with nonlinear preferences: Provable approximation for maximizing expected scalarized return. InProceedings of the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pages 1632–1640, 2025
work page 2025
-
[19]
Smooth tchebycheff scalarization for multi-objective optimization
Xi Lin, Xiaoyuan Zhang, Zhiyuan Yang, Fei Liu, Zhenkun Wang, and Qingfu Zhang. Smooth tchebycheff scalarization for multi-objective optimization. In Ruslan Salakhutdinov, Zico Kolter, Katherine Heller, Adrian Weller, Nuria Oliver, Jonathan Scarlett, and Felix Berkenkamp, editors,Proceedings of the 41st International Conference on Machine Learning, volume ...
work page 2024
-
[20]
Amir Beck and Marc Teboulle. Smoothing and first order methods: A unified framework.SIAM Journal on Optimization, 22(2):557–580, 2012
work page 2012
-
[21]
V Joseph Bowman Jr. On the relationship of the tchebycheff norm and the efficient frontier of multiple-criteria objectives. InMultiple Criteria Decision Making: Proceedings of a Conference Jouy-en-Josas, France May 21–23, 1975, pages 76–86. Springer, 1976
work page 1975
-
[22]
Yining Li, Peizhong Ju, and Ness Shroff. How to find the exact pareto front for multi-objective mdps? InThe Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[23]
Haihao Lu, Robert M. Freund, and Yurii Nesterov. Relatively-smooth convex optimization by first-order methods, and applications, 2017. URLhttps://arxiv.org/abs/1610.05708
-
[24]
Efficient model- based concave utility reinforcement learning through greedy mirror descent
Bianca M Moreno, Margaux Bregere, Pierre Gaillard, and Nadia Oudjane. Efficient model- based concave utility reinforcement learning through greedy mirror descent. InInternational Conference on Artificial Intelligence and Statistics, pages 2206–2214. PMLR, 2024
work page 2024
-
[25]
Mridul Agarwal, V Aggarwal, and Tian Lan. Multi-objective reinforcement learning with non-linear scalarization.Adapt Agent Multi-agent Syst, pages 9–17, 2022
work page 2022
-
[26]
Concave utility reinforcement learning: The mean-field game viewpoint
Matthieu Geist, Julien Pérolat, Mathieu Laurière, Romuald Elie, Sarah Perrin, Oliver Bachem, Rémi Munos, and Olivier Pietquin. Concave utility reinforcement learning: The mean-field game viewpoint. InProceedings of the 21st International Conference on Autonomous Agents and Multiagent Systems, pages 489–497, 2022
work page 2022
-
[27]
Florian Felten, Lucas Nunes Alegre, Ann Now’e, Ana L. C. Bazzan, El Ghazali Talbi, Gr’egoire Danoy, and Bruno Castro da Silva. A toolkit for reliable benchmarking and research in multi- objective reinforcement learning. InNeurIPS Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=jfwRLudQyj
work page 2023
-
[28]
arXiv preprint arXiv:2103.09568 , year=
Conor F Hayes, Roxana R˘adulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Reymond, Timothy Verstraeten, Luisa M Zintgraf, Richard Dazeley, Fredrik Heintz, et al. A practical guide to multi-objective reinforcement learning and planning.arXiv preprint arXiv:2103.09568, 2021
- [29]
-
[30]
Thomas Degris, Martha White, and Richard S. Sutton. Off-policy actor-critic. InProceedings of the 29th International Coference on International Conference on Machine Learning, ICML’12, page 179–186, Madison, WI, USA, 2012. Omnipress. ISBN 9781450312851. 11
work page 2012
-
[31]
Soft Actor-Critic Algorithms and Applications
Tuomas Haarnoja, Aurick Zhou, Kristian Hartikainen, George Tucker, Sehoon Ha, Jie Tan, Vikash Kumar, Henry Zhu, Abhishek Gupta, Pieter Abbeel, et al. Soft actor-critic algorithms and applications.arXiv preprint arXiv:1812.05905, 2018
work page internal anchor Pith review arXiv 2018
-
[32]
Human-level control through deep reinforcement learning.Nature, 518(7540):529–533, February 2015
V olodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, Stig Pe- tersen, Charles Beattie, Amir Sadik, Ioannis Antonoglou, Helen King, Dharshan Kumaran, Daan Wierstra, Shane Legg, and Demis Hassabis. Human-level control through deep reinforcement l...
work page 2015
-
[33]
Addressing function approximation error in actor-critic methods
Scott Fujimoto, Herke Hoof, and David Meger. Addressing function approximation error in actor-critic methods. InInternational conference on machine learning, pages 1587–1596. PMLR, 2018
work page 2018
-
[34]
Florian Felten, Lucas N. Alegre, Ann Nowé, Ana L. C. Bazzan, El Ghazali Talbi, Grégoire Danoy, and Bruno C. da Silva. A toolkit for reliable benchmarking and research in multi- objective reinforcement learning. InProceedings of the 37th Conference on Neural Information Processing Systems (NeurIPS 2023), 2023
work page 2023
-
[35]
Preference controllable reinforcement learning with advanced multi-objective optimization
Yucheng Yang, Tianyi Zhou, Mykola Pechenizkiy, and Meng Fang. Preference controllable reinforcement learning with advanced multi-objective optimization. InForty-second Interna- tional Conference on Machine Learning, 2025. URL https://openreview.net/forum? id=49g4c8MWHy
work page 2025
-
[36]
Pengyi Li, Hongyao Tang, Yifu Yuan, Jianye HAO, Zibin Dong, and YAN ZHENG. COLA: Towards efficient multi-objective reinforcement learning with conflict objective regularization in latent space. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems,
-
[37]
URLhttps://openreview.net/forum?id=Cldpn7H3NN
-
[38]
Sample-efficient multi-objective learning via generalized policy improvement prioritization
Lucas N Alegre, Ana LC Bazzan, Diederik M Roijers, Ann Nowé, and Bruno C da Silva. Sample-efficient multi-objective learning via generalized policy improvement prioritization. InProceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems, pages 2003–2012, 2023
work page 2023
-
[39]
Computing convex coverage sets for multi-objective coordination graphs
Diederik M Roijers, Shimon Whiteson, and Frans A Oliehoek. Computing convex coverage sets for multi-objective coordination graphs. InInternational conference on algorithmic decision theory, pages 309–323. Springer, 2013
work page 2013
-
[40]
Roijers, Shimon Whiteson, and Frans A
Diederik M. Roijers, Shimon Whiteson, and Frans A. Oliehoek. Linear support for multi- objective coordination graphs. InProceedings of the 13th International Conference on Au- tonomous Agents and Multiagent Systems (AAMAS), 2014. URL https://www.cs.ox.ac. uk/people/shimon.whiteson/pubs/roijers-aamas14.bib
work page 2014
-
[41]
Ronald A. Howard and James E. Matheson. Risk-sensitive markov decision processes.Man- agement Science, 18(7):356–369, 1972. doi: 10.1287/mnsc.18.7.356
-
[42]
Risk-averse dynamic programming for markov decision processes
Andrzej Ruszczy’nski. Risk-averse dynamic programming for markov decision processes. Mathematical Programming, 125(2):235–261, 2010. doi: 10.1007/s10107-010-0393-3
-
[43]
A unified view of entropy-regularized markov decision processes
Gergely Neu, Vicenç G’omez, and Anders Jonsson. A unified view of entropy-regularized markov decision processes. InNeurIPS Workshop: Deep Reinforcement Learning Symposium,
- [44]
-
[45]
Error bounds for approximate policy iteration
Rémi Munos. Error bounds for approximate policy iteration. InProceedings of the Twentieth International Conference on International Conference on Machine Learning, pages 560–567, 2003
work page 2003
-
[46]
Yan Li, Guanghui Lan, and Tuo Zhao. Homotopic policy mirror descent: policy convergence, algorithmic regularization, and improved sample complexity.Mathematical Programming, 207 (1):457–513, 2024. doi: 10.1007/s10107-023-02017-4
-
[47]
Double horizon model-based policy optimization
Akihiro Kubo, Paavo Parmas, and Shin Ishii. Double horizon model-based policy optimization. Transactions on Machine Learning Research, 2025. 12
work page 2025
-
[48]
Michael Janner, Justin Fu, Marvin Zhang, and Sergey Levine. When to trust your model: Model-based policy optimization.Advances in neural information processing systems, 32, 2019
work page 2019
-
[49]
On the model- based stochastic value gradient for continuous reinforcement learning learning
Brandon Amos, Samuel Stanton, Denis Yarats, and Andrew Gordon Wilson. On the model- based stochastic value gradient for continuous reinforcement learning learning. InProceedings of the 3rd Conference on Learning for Dynamics and Control, volume 144 ofProceedings of Machine Learning Research, pages 6–20. PMLR, 07 – 08 June 2021. URL https:// proceedings.ml...
work page 2021
-
[50]
Kurtland Chua, Roberto Calandra, Rowan McAllister, and Sergey Levine. Deep reinforcement learning in a handful of trials using probabilistic dynamics models.Advances in neural information processing systems, 31, 2018
work page 2018
-
[51]
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InInternational Conference on Learning Representations, 2019. URL https://openreview.net/forum? id=Bkg6RiCqY7. 13 APPENDIX A Uniqueness of STCH Solutions in the Objective Space In the main text, we treat the policyπas the decision variable and consider max π∈Π u(J(π), ω). Since the u...
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.