A Survey of Multi-Objective Sequential Decision-Making

Diederik Marijn Roijers; Peter Vamplew; Richard Dazeley; Shimon Whiteson

arxiv: 1402.0590 · v1 · pith:DZPUEXKOnew · submitted 2014-02-04 · 💻 cs.AI

A Survey of Multi-Objective Sequential Decision-Making

Diederik Marijn Roijers , Peter Vamplew , Shimon Whiteson , Richard Dazeley This is my paper

classification 💻 cs.AI

keywords multi-objectivemethodsdecision-makingproblemssequentiallearningliteraturemultiple

0 comments

read the original abstract

Sequential decision-making problems with multiple objectives arise naturally in practice and pose unique challenges for research in decision-theoretic planning and learning, which has largely focused on single-objective settings. This article surveys algorithms designed for sequential decision-making problems with multiple objectives. Though there is a growing body of literature on this subject, little of it makes explicit under what circumstances special methods are needed to solve multi-objective problems. Therefore, we identify three distinct scenarios in which converting such a problem to a single-objective one is impossible, infeasible, or undesirable. Furthermore, we propose a taxonomy that classifies multi-objective methods according to the applicable scenario, the nature of the scalarization function (which projects multi-objective values to scalar ones), and the type of policies considered. We show how these factors determine the nature of an optimal solution, which can be a single policy, a convex hull, or a Pareto front. Using this taxonomy, we survey the literature on multi-objective methods for planning and learning. Finally, we discuss key applications of such methods and outline opportunities for future work.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Not Every Rubric Teaches Equally: Policy-Aware Rubric Rewards for RLVR
cs.AI 2026-05 unverdicted novelty 6.0

POW3R adapts rubric criterion weights via rollout contrast in RLVR to improve mean reward, strict completion rates, and training speed over static rubric aggregation on multimodal and text tasks.