Autonomous Agents and Multi-Agent Systems , publisher =

A practical guide to multi-objective reinforcement learning · 2022 · DOI 10.1007/s10458-022-09552-y

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Controllability in preference-conditioned multi-objective reinforcement learning

cs.LG · 2026-05-11 · unverdicted · novelty 7.0

Standard MORL metrics do not measure whether preference inputs reliably control agent behavior, so a new controllability metric is introduced to restore the link between user intent and agent output.

Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization

cs.LG · 2026-04-14 · unverdicted · novelty 7.0

STOMP extends direct preference optimization to the multi-objective setting via smooth Tchebysheff scalarization and standardization of observed rewards, achieving highest hypervolume in eight of nine protein engineering evaluations.

Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control

cs.RO · 2026-04-03 · unverdicted · novelty 6.0

A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.

A Roadmap to Pluralistic Alignment

cs.AI · 2024-02-07 · unverdicted · novelty 6.0

The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

citing papers explorer

Showing 4 of 4 citing papers.

Controllability in preference-conditioned multi-objective reinforcement learning cs.LG · 2026-05-11 · unverdicted · none · ref 13
Standard MORL metrics do not measure whether preference inputs reliably control agent behavior, so a new controllability metric is introduced to restore the link between user intent and agent output.
Pareto-Optimal Offline Reinforcement Learning via Smooth Tchebysheff Scalarization cs.LG · 2026-04-14 · unverdicted · none · ref 35
STOMP extends direct preference optimization to the multi-objective setting via smooth Tchebysheff scalarization and standardization of observed rewards, achieving highest hypervolume in eight of nine protein engineering evaluations.
Behavior-Constrained Reinforcement Learning with Receding-Horizon Credit Assignment for High-Performance Control cs.RO · 2026-04-03 · unverdicted · none · ref 15
A behavior-constrained RL framework with receding-horizon credit assignment learns high-performance control policies that stay aligned with expert behavior in race car simulation.
A Roadmap to Pluralistic Alignment cs.AI · 2024-02-07 · unverdicted · none · ref 198
The paper formalizes three types of pluralistic AI models and three benchmark classes, arguing that current alignment techniques may reduce rather than increase distributional pluralism.

Autonomous Agents and Multi-Agent Systems , publisher =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer