arxiv: 2605.10585 · v1 · submitted 2026-05-11 · 💻 cs.LG

Recognition: 1 theorem link

· Lean Theorem

Controllability in preference-conditioned multi-objective reinforcement learning

Pau de las Heras Molins , Beyazit Yalcinkaya , Lasse Peters , David Fridovich-Keil , Georgios Bakirtzis

Authors on Pith no claims yet

Pith reviewed 2026-05-12 05:02 UTC · model grok-4.3

classification 💻 cs.LG

keywords multi-objective reinforcement learningpreference-conditioned agentscontrollabilityevaluation metricsMORLreinforcement learning

0 comments

The pith

Standard MORL metrics let agents pass tests while ignoring user preference inputs, requiring a dedicated controllability check.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper argues that typical performance measures in multi-objective reinforcement learning can be satisfied by agents whose behavior stays fixed even when the user changes the relative importance of objectives. This leaves the preference input without real effect, so the intended link between what a person wants and what the agent does is not guaranteed. A new metric focused on controllability is therefore proposed to test whether preference changes produce the expected shifts in policy. Without this, evaluation protocols cannot confirm that preference-conditioned agents are actually steerable.

Core claim

Preference-conditioned agents can record high scores on mainstream MORL metrics while remaining insensitive to the preference input, which means their behavior does not change reliably when the user alters the trade-off among objectives. The authors state that this breaks the symbolic interface between user intent and agent action, so a complementary metric is needed to measure controllability directly.

What carries the argument

Controllability: the property that changes in the preference input produce reliable, intended changes in the agent's behavior.

If this is right

Agents that appear successful on standard MORL metrics may still not be controllable by user preferences.
Evaluation protocols for preference-conditioned MORL must incorporate direct tests of sensitivity to preference changes.
Progress on preference adaptation in MORL cannot be consolidated without controllability assessment.
The symbolic user interface in MORL remains broken until controllability is routinely measured.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A controllability metric could guide the design of new algorithms that explicitly optimize for responsiveness to preferences.
The same gap between aggregate scores and input sensitivity may appear in other conditional reinforcement-learning settings.
Applying the metric to larger, more complex environments would test whether it scales without introducing measurement artifacts.

Load-bearing premise

That a controllability metric can be defined and computed reliably across environments in a way that accurately flags when preferences fail to influence behavior.

What would settle it

Finding a set of high-scoring agents on existing MORL benchmarks that nevertheless show identical behavior across widely varying preference inputs would confirm the gap the new metric aims to close.

Figures

Figures reproduced from arXiv: 2605.10585 by Beyazit Yalcinkaya, David Fridovich-Keil, Georgios Bakirtzis, Lasse Peters, Pau de las Heras Molins.

**Figure 2.** Figure 2: Mainstream MORL metrics all have limitations. Dashed lines are linear preference weights, colored solutions are induced by corresponding preference. Hypervolume is biased towards inner regions. Expected utility assumes perfect rationality (max over utilities instead of actual (w, v π w) pairs). Cosine similarity outputs unsigned misalignment. 4.3. Expected utility Expected utility (EU) (Zintgraf et al., 2… view at source ↗

**Figure 3.** Figure 3: Per-objective discounted returns in Tetris. As the conditioning weight for an objective increases, so does the corresponding induced return by MOPPO, showing positive correlation. Shaded bands represent the mean ± one standard deviation of returns for nonconditioned algorithms (PPO and MOPPO without conditioning). MAIN RESULT 2 – RANK CORRELATION OFFERS INSIGHT INTO AGENT CONTROLLABILITY Non-controllable… view at source ↗

**Figure 4.** Figure 4: Preference conditioning trades peak performance for solution diversity. Each point is the average return of a batch of episodes; solid-bordered points are globally non-dominated. While PPO reaches higher-quality solutions, MOPPO’s conditioning yields a wider spread across the objective space. 16 [PITH_FULL_IMAGE:figures/full_fig_p016_4.png] view at source ↗

**Figure 5.** Figure 5: Hypervolume and sparsity do not consistently identify MOPPO as the only controllable algorithm. (a) Hypervolume (↑ better) is dominated by PPO in most environments. (b) Sparsity (↓ better) does flag MOPPO’s wider spread, but cannot assess whether individual solutions comply with their inducing preference. (a) Expected Utility (b) Cosine Similarity [PITH_FULL_IMAGE:figures/full_fig_p017_5.png] view at source ↗

**Figure 6.** Figure 6: Expected utility and cosine similarity also fail to reliably flag MOPPO as the only controllable agent. (a) Expected utility (↑ better) reflects overall solution quality, not whether each solution complies with its inducing preference. (b) Cosine similarity (↑ better) shows only marginal differences between algorithms [PITH_FULL_IMAGE:figures/full_fig_p017_6.png] view at source ↗

**Figure 7.** Figure 7: Rank correlation uniquely characterizes MOPPO’s controllability. The per-objective breakdown exposes which objectives the agent has better learned to trade off, an insight unavailable from any mainstream metric. 17 [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗

**Figure 8.** Figure 8: Controllability varies heterogeneously across objectives and environments, revealing structural limits of preference adaptation. The corpse objective in Snake shows an inverted relationship, hinting at reward design issues. Shaded bands show the negligible variance in returns of non-conditioned baselines. 18 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗

**Figure 9.** Figure 9: Preference-conditioned MOPPO successfully adapts its behavior mid-episode without any retraining. (a) After the preference switches at step 500, the agent visibly rotates pieces more frequently. (b) A brief lag in the reward response reflects the inertia of the LSTM layer in the policy network. Appendix B. Extending PufferLib to the multi-objective setting This section provides some technical details on th… view at source ↗

**Figure 10.** Figure 10: Three environments of diverse complexity serve as a testbed for evaluating controllability. MOBA: high-dimensional, partially observable multi-agent battle arena. Snake: competitive multi-agent grid world. Tetris: fully observable single-agent puzzle game. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_10.png] view at source ↗

**Figure 11.** Figure 11: MOPPO achieves training performance comparable to PPO, validating the multiobjective extension. Means are smoothed with an exponential moving average (α = 0.95). Shaded areas represent one standard deviation. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_11.png] view at source ↗

read the original abstract

Multi-objective reinforcement learning (MORL) allows a user to express preference over outcomes in terms of the relative importance of the objectives, but standard metrics cannot capture whether changes in preference reliably change the agent's behavior in the intended way, a property termed controllability. As a result, preference-conditioned agents can score well on standard MORL metrics while being insensitive to the preference input. If the ability to control agents cannot be reliably assessed, the symbolic interface that MORL provides between user intent and agent behavior is broken. Mainstream MORL metrics alone fail to measure the controllability of preference-conditioned agents, motivating a complementary metric specifically designed to that end. We hope the results spur discussion in the community on existing evaluation protocols to consolidate advances in preference adaptation in MORL to larger and more complex problems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper flags that standard MORL metrics can be satisfied by agents that ignore the preference input, and introduces a controllability metric to catch that failure mode.

read the letter

The paper's main contribution is showing that preference-conditioned MORL agents can post strong numbers on hypervolume or scalarized returns while remaining insensitive to changes in the preference vector. This breaks the intended user-agent interface, and the authors are right to treat controllability as a separate property worth measuring on its own. They propose a complementary metric aimed at quantifying how reliably the agent's behavior shifts with the preference input, which fills a gap that prior evaluation protocols left open. The logic follows directly from the setup without extra assumptions about convexity or environment structure, and the examples make the practical problem concrete. This is useful framing for anyone building or testing these agents. The experiments appear to demonstrate cases where standard metrics pass but the new one flags the issue, which is a reasonable way to validate the idea. The main soft spot is that the metric's own robustness is not yet fully stress-tested across many environments or preference sampling schemes, so it could still carry hidden sensitivities or computational costs that only show up later. That is a normal early-stage limitation rather than a load-bearing flaw. This work is aimed at MORL researchers who care about reliable preference adaptation, especially in settings where user intent needs to translate into actual policy changes. A reader working on evaluation protocols or alignment in multi-objective settings would get direct value from it. It deserves peer review because the motivation is internally consistent and the proposal is a straightforward, falsifiable addition to the toolkit.

Referee Report

1 major / 1 minor

Summary. The paper claims that standard MORL metrics (e.g., hypervolume and scalarized returns) can be satisfied by preference-insensitive agents, failing to measure controllability—the reliable influence of preference inputs on agent behavior. This breaks the symbolic user-agent interface in preference-conditioned MORL. The work motivates a complementary controllability metric designed specifically to detect such insensitivity and calls for improved evaluation protocols to support advances on larger problems.

Significance. If the proposed metric can be rigorously defined, shown to be computable without introducing its own biases, and empirically validated to distinguish controllable from insensitive agents where standard metrics cannot, the contribution would be meaningful. It would strengthen evaluation practices in preference-conditioned MORL and help ensure that user preferences actually translate into behavioral control, addressing a practical limitation in current assessment methods.

major comments (1)

Abstract: The manuscript motivates a new controllability metric as the core response to the identified gap, yet provides neither its definition, derivation, nor any experimental results or validation. This is load-bearing for the central claim, as the motivation and call for community discussion rest on the metric's ability to complement existing measures without circularity or new computational issues.

minor comments (1)

The abstract refers to 'the results' spurring discussion but does not summarize any concrete findings, environments tested, or comparisons performed; adding a brief overview of these in the abstract or introduction would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading and for identifying a key area where the manuscript can be strengthened. We address the major comment below and outline the planned revisions.

read point-by-point responses

Referee: Abstract: The manuscript motivates a new controllability metric as the core response to the identified gap, yet provides neither its definition, derivation, nor any experimental results or validation. This is load-bearing for the central claim, as the motivation and call for community discussion rest on the metric's ability to complement existing measures without circularity or new computational issues.

Authors: We agree that the current abstract and manuscript focus on motivating the need for a controllability metric and on demonstrating that standard MORL metrics (hypervolume, scalarized returns) can be satisfied by preference-insensitive agents, without supplying an explicit definition, derivation, or empirical validation of the new metric. The manuscript is structured as a position piece whose primary goal is to expose the broken link between user preference inputs and agent behavior under existing evaluation protocols and to initiate community discussion on improved protocols. The conceptual argument—that controllability must be measured separately—stands on its own and does not rely on a specific formula. Nevertheless, the referee is correct that a concrete, computable definition would make the central claim more actionable and would allow readers to assess potential biases or computational costs. In the revised manuscript we will therefore (i) add a dedicated section that formally defines the controllability metric, (ii) derive it directly from the requirement that changes in the preference vector must produce statistically detectable changes in the induced policy, and (iii) include a small set of controlled experiments on standard MORL environments that contrast controllable and preference-insensitive agents, confirming that the new metric flags the latter while hypervolume does not. These additions will be kept concise so that the paper retains its discussion-oriented character while addressing the load-bearing concern. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's argument is conceptual and definitional: standard MORL metrics (hypervolume, scalarized returns) can be satisfied by preference-insensitive agents, which directly follows from the problem setup without any equations, fitted parameters, or derivations. No load-bearing self-citations, self-definitional reductions, or ansatzes are invoked in the provided text. The motivation for a complementary controllability metric is logically independent and self-contained against external benchmarks of agent behavior.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The contribution rests on the domain assumption that controllability is a distinct and desirable property not captured by existing metrics; no free parameters or invented physical entities are mentioned.

axioms (1)

domain assumption Standard MORL metrics cannot capture whether preference changes reliably alter agent behavior
This is the core motivation stated in the abstract.

invented entities (1)

controllability metric no independent evidence
purpose: To quantify whether preference inputs control agent behavior
New evaluation tool proposed to complement existing MORL metrics

pith-pipeline@v0.9.0 · 5448 in / 1224 out tokens · 34879 ms · 2026-05-12T05:02:41.590495+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

40 extracted references · 40 canonical work pages · 3 internal anchors

[1]

Abels, D

A. Abels, D. Roijers, T. Lenaerts, A. Now \'e , and D. Steckelmacher. Dynamic Weights in Multi-Objective Deep Reinforcement Learning . In ICML, 2019. URL https://proceedings.mlr.press/v97/abels19a.html

work page 2019
[2]

L. N. Alegre, A. L. C. Bazzan, D. M. Roijers, A. Now \' e , and B. C. da Silva . Sample-efficient multi-objective learning via generalized policy improvement prioritization. In AAMAS, 2023. doi:10.5555/3545946.3598872

work page doi:10.5555/3545946.3598872 2023
[3]

u ller, E. Knoop, and M. B \

L. N. Alegre, A. Serifi, R. Grandia, D. M \" u ller, E. Knoop, and M. B \" a cher. AMOR : A daptive character control through multi-objective reinforcement learning. In SIGGRAPH, 2025. doi:10.1145/3721238.3730656

work page doi:10.1145/3721238.3730656 2025
[4]

Audet, J

C. Audet, J. Bigeon, D. Cartier, S. Le Digabel, and L. Salomon. Performance indicators in multiobjective optimization. European Journal of Operational Research, 2021. doi:10.1016/j.ejor.2020.11.016

work page doi:10.1016/j.ejor.2020.11.016 2021
[5]

Basaklar, S

T. Basaklar, S. Gumussoy, and U. Y. Ogras. PD-MORL : P reference-driven multi-objective reinforcement learning algorithm. In ICLR, 2023. URL https://openreview.net/pdf?id=zS9sRyaPFlJ

work page 2023
[6]

K. C. Border. Introductory notes on preference and rational choice. Technical report, California Institute of Technology, 2020. URL https://healy.econ.ohio-state.edu/kcb/Notes/Choice.pdf

work page 2020
[7]

P. S. Castro. The formalism - implementation gap in reinforcement learning research. arXiv:2510.16175 [cs.LG], 2025

work page arXiv 2025
[8]

Cornelisse, S

D. Cornelisse, S. Cheng, P. Mandavilli, J. Hunt, K. Joseph, W. Doulazmi, V. Charraut, A. Gupta, J. Suarez, and E. Vinitsky. PufferDrive : A fast and friendly driving simulator for training and evaluating RL agents, 2025. URL https://github.com/Emerge-Lab/PufferDrive

work page 2025
[9]

de las Heras Molins, E

P. de las Heras Molins, E. Roy-Almonacid, D. H. Lee, L. Peters, D. Fridovich-Keil, and G. Bakirtzis. Approximate solutions to games of ordered preference. In ITSC, 2025 a . doi:10.1109/ITSC60802.2025.11423775

work page doi:10.1109/itsc60802.2025.11423775 2025
[10]

de las Heras Molins, B

P. de las Heras Molins, B. Yalcinkaya, L. Peters, D. Fridovich-Keil, and G. Bakirtzis. PufferMO . Zenodo. doi:10.5281/zenodo.19889214 https://zenodo.org/records/19889214, 2025 b

work page doi:10.5281/zenodo.19889214 2025
[11]

Felten, U

F. Felten, U. Ucak, H. Azmani, G. Peng, W. R \"o pke, H. Baier, P. Mannion, D. M. Roijers, J. K. Terry, E. G. Talbi, G. Danoy, A. Now \'e , and R. R a dulescu. MOMAland : A set of benchmarks for multi-objective multi-agent reinforcement learning. arXiv:2407.16312 [cs.MA], 2024

work page arXiv 2024
[12]

A. P. Guerreiro, C. M. Fonseca, and L. Paquete. The hypervolume indicator: P roblems and algorithms. ACM Computing Surveys, 2022. doi:10.1145/3453474

work page doi:10.1145/3453474 2022
[13]

Hayes, Roxana R ˘adulescu, Eugenio Bargiacchi, Johan Källström, Matthew Macfarlane, Mathieu Rey- mond, Timothy Verstraeten, Luisa M

C. F. Hayes, R. R a dulescu, E. Bargiacchi, J. K \"a llstr \"o m, M. Macfarlane, M. Reymond, T. Verstraeten, L. M. Zintgraf, R. Dazeley, F. Heintz, E. Howley, A. A. Irissappane, P. Mannion, A. Now \'e , G. Ramos, M. Restelli, P. Vamplew, and D. M. Roijers. A practical guide to multi-objective reinforcement learning and planning. Autonomous Agents and Mult...

work page doi:10.1007/s10458-022-09552-y 2022
[14]

Jackermeier and A

M. Jackermeier and A. Abate. DeepLTL : Learning to efficiently satisfy complex LTL specifications for multi-task RL . In ICLR , 2025. URL https://openreview.net/pdf?id=9pW2J49flQ

work page 2025
[15]

Jiang, Y

Z. Jiang, Y. Wang, R. Marr, E. Novoseller, B. T. Files, and V. Ustun. GraphAllocBench : A flexible benchmark for preference-conditioned multi-objective policy learning. arXiv:2601.20753 [cs.LG], 2026

work page arXiv 2026
[16]

Jothimurugan, S

K. Jothimurugan, S. Bansal, O. Bastani, and R. Alur. Specification-guided reinforcement learning. In NeuS, 2025. URL https://proceedings.mlr.press/v288/jothimurugan25a.html

work page 2025
[17]

Knowles and D

J. Knowles and D. Corne. On metrics for comparing nondominated sets. In CEC, 2002. doi:10.1109/CEC.2002.1007013

work page doi:10.1109/cec.2002.1007013 2002
[18]

D. H. Lee, L. Peters, and D. Fridovich-Keil . You can't always get what you want: Games of ordered preference. IEEE Robotics and Automation Letters, 2025. doi:10.1109/LRA.2025.3575324

work page doi:10.1109/lra.2025.3575324 2025
[19]

X. Lin, X. Zhang, Z. Yang, F. Liu, Z. Wang, and Q. Zhang. Smooth T chebycheff scalarization for multi-objective optimization. In ICML, 2024. URL https://proceedings.mlr.press/v235/lin24y.html

work page 2024
[20]

M. Liu, M. Zhu, and W. Zhang. Goal-conditioned reinforcement learning: P roblems and solutions. In IJCAI, 2022. URL https://www.ijcai.org/proceedings/2022/0770.pdf

work page 2022
[21]

ISBN 1595931805

S. Natarajan and P. Tadepalli. Dynamic preferences in multi-criteria reinforcement learning. In ICML, 2005. doi:10.1145/1102351.1102427

work page doi:10.1145/1102351.1102427 2005
[22]

Dota 2 with Large Scale Deep Reinforcement Learning

OpenAI, C. Berner, G. Brockman, B. Chan, V. Cheung, P. D e biak, C. Dennison, D. Farhi, Q. Fischer, S. Hashme, C. Hesse, R. J \'o zefowicz, S. Gray, C. Olsson, J. Pachocki, M. Petrov, H. P. d O. Pinto, J. Raiman, T. Salimans, J. Schlatter, J. Schneider, S. Sidor, I. Sutskever, J. Tang, F. Wolski, and S. Zhang. Dota 2 with large scale deep reinforcement le...

work page internal anchor Pith review arXiv 1912
[23]

Rustagi, Y

P. Rustagi, Y. Anand, and S. Saisubramanian. Multi-objective planning with contextual lexicographic reward preferences. In AAMAS , 2025. URL https://dl.acm.org/doi/10.5555/3709347.3743816

work page doi:10.5555/3709347.3743816 2025
[24]

High-Dimensional Continuous Control Using Generalized Advantage Estimation

J. Schulman, P. Moritz, S. Levine, M. Jordan, and P. Abbeel. High-dimensional continuous control using generalized advantage estimation. In ICLR, 2016. doi:10.48550/arXiv.1506.02438

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1506.02438 2016
[25]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford, and O. Klimov. Proximal policy optimization algorithms. arXiv:1707.06347 [cs.LG], 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[26]

J. Suarez. The full reinforcement learning iceberg, 2024. URL https://www.youtube.com/watch?v=RIkse0tJ0hE

work page 2024
[27]

J. Suarez. Pufferlib 2.0: Reinforcement learning at 1m steps/s. In RLC, 2025. URL https://openreview.net/pdf?id=qRyteMTgn0

work page 2025
[28]

Terekhov and C

M. Terekhov and C. Gulcehre. In search for architectures and loss functions in multi-objective reinforcement learning. arXiv:2407.16807 [cs.LG], 2024

work page arXiv 2024
[29]

Vaezipoor, A

P. Vaezipoor, A. C. Li, R. T. Icarte, and S. A. McIlraith. LTL2Action : Generalizing LTL instructions for multi-task RL . In ICML , 2021. URL https://proceedings.mlr.press/v139/vaezipoor21a.html

work page 2021
[30]

B. Wang, H. K. Singh, and T. Ray. Adjusting normalization bounds to improve hypervolume based search for expensive multi-objective optimization. Complex & Intelligent Systems, 2023. doi:10.1007/s40747-021-00590-9

work page doi:10.1007/s40747-021-00590-9 2023
[31]

K. H. Wray, S. Zilberstein, and A. Mouaddib. Multi-objective MDPs with conditional lexicographic reward preferences. In AAAI, 2015. doi:10.1609/aaai.v29i1.9647

work page doi:10.1609/aaai.v29i1.9647 2015
[32]

J. Xu, Y. Tian, P. Ma, D. Rus, S. Sueda, and W. Matusik. Prediction-guided multi-objective reinforcement learning for continuous robot control. In ICML, 2020. URL https://proceedings.mlr.press/v119/xu20h.html

work page 2020
[33]

Yalcinkaya, N

B. Yalcinkaya, N. Lauffer, M. Vazquez-Chanlatte, and S. A. Seshia. Compositional automata embeddings for goal-conditioned reinforcement learning. In NeurIPS, 2024. URL https://proceedings.neurips.cc/paper_files/paper/2024/hash/d8e4dad4af33dcb5d3bfd6b8e3a67a88-Abstract-Conference.html

work page 2024
[34]

Yalcinkaya, N

B. Yalcinkaya, N. Lauffer, M. Vazquez-Chanlatte , and S. A. Seshia. Provably correct automata embeddings for optimal automata-conditioned reinforcement learning. In NeuS, 2025. URL https://proceedings.mlr.press/v288/yalcinkaya25a.html

work page 2025
[35]

Y. Yang, T. Zhou, M. Pechenizkiy, and M. Fang. Preference controllable reinforcement learning with advanced multi-objective optimization. In ICML, 2025. URL https://proceedings.mlr.press/v267/yang25ax.html

work page 2025
[36]

Zanardi, G

A. Zanardi, G. Zardini, S. Srinivasan, S. Bolognani, A. Censi, F. D \"o rfler, and E. Frazzoli. Posetal games: Efficiency, existence, and refinement of equilibria in games with prioritized metrics. IEEE Robotics and Automation Letters, 2022. doi:10.1109/LRA.2021.3135030

work page doi:10.1109/lra.2021.3135030 2022
[37]

Zintgraf, T

L. Zintgraf, T. Kanters, D. Roijers, F. Oliehoek, and P. Beau. Quality assessment of MORL algorithms: A utility-based approach. In BeNeLearn, 2015. URL https://livrepository.liverpool.ac.uk/2039202/

work page arXiv 2015
[38]

Zitzler and L

E. Zitzler and L. Thiele. Multiobjective optimization using evolutionary algorithms - a comparative case study. In PPSN, 1998. doi:10.1007/BFb0056872

work page doi:10.1007/bfb0056872 1998
[39]

Zitzler, L

E. Zitzler, L. Thiele, M. Laumanns, C.M. Fonseca, and V.G. Da Fonseca. Performance assessment of multiobjective optimizers: An analysis and review. IEEE Transactions on Evolutionary Computation, 2003. doi:10.1109/TEVC.2003.810758

work page doi:10.1109/tevc.2003.810758 2003
[40]

Zitzler, D

E. Zitzler, D. Brockhoff, and L. Thiele. The hypervolume indicator revisited: On the design of P areto-compliant indicators via weighted integration. In EMO, 2007. doi:10.1007/978-3-540-70928-2_64

work page doi:10.1007/978-3-540-70928-2_64 2007