Assael, Diederik M

Hossam Mossalam, Yannis M · 2016 · cs.AI · arXiv 1610.02707

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

We propose Deep Optimistic Linear Support Learning (DOL) to solve high-dimensional multi-objective decision problems where the relative importances of the objectives are not known a priori. Using features from the high-dimensional inputs, DOL computes the convex coverage set containing all potential optimal solutions of the convex combinations of the objectives. To our knowledge, this is the first time that deep reinforcement learning has succeeded in learning multi-objective policies. In addition, we provide a testbed with two experiments to be used as a benchmark for deep multi-objective reinforcement learning.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning

cs.LG · 2026-04-27 · unverdicted · novelty 7.0

Adapting RFRL objectives as auxiliary tasks with preference-guided exploration outperforms prior MORL methods in performance and data efficiency on MO-Gymnasium tasks.

A Single Deep Preference-Conditioned Policy for Learning Pareto Coverage Sets

cs.LG · 2026-05-09 · unverdicted · novelty 6.0

A single preference-conditioned policy achieves unique and Lipschitz-continuous Pareto coverage in multi-objective MDPs via a new mirror-descent policy iteration algorithm with O(1/k) convergence.

A Production-Ready RL Framework for Personalized Utility Tuning with Pareto Sweeping in Pinterest Recommender Systems

cs.IR · 2026-05-08 · unverdicted · novelty 4.0

PRL-PUTS casts utility-weight tuning as a one-step value-based RL task and uses scalarization-parameter Pareto sweeping at inference time to generate and govern a family of policies, reporting +0.13% lift in successful sessions on Pinterest Homefeed.

citing papers explorer

Showing 3 of 3 citing papers.

A Reward-Free Viewpoint on Multi-Objective Reinforcement Learning cs.LG · 2026-04-27 · unverdicted · none · ref 6
Adapting RFRL objectives as auxiliary tasks with preference-guided exploration outperforms prior MORL methods in performance and data efficiency on MO-Gymnasium tasks.
A Single Deep Preference-Conditioned Policy for Learning Pareto Coverage Sets cs.LG · 2026-05-09 · unverdicted · none · ref 4
A single preference-conditioned policy achieves unique and Lipschitz-continuous Pareto coverage in multi-objective MDPs via a new mirror-descent policy iteration algorithm with O(1/k) convergence.
A Production-Ready RL Framework for Personalized Utility Tuning with Pareto Sweeping in Pinterest Recommender Systems cs.IR · 2026-05-08 · unverdicted · none · ref 20 · internal anchor
PRL-PUTS casts utility-weight tuning as a one-step value-based RL task and uses scalarization-parameter Pareto sweeping at inference time to generate and govern a family of policies, reporting +0.13% lift in successful sessions on Pinterest Homefeed.

Assael, Diederik M

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer