Reward-Conditioned Reinforcement Learning

· 2026 · cs.LG · arXiv 2603.05066

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

Single-task RL agents are typically trained under a fixed reward function, which limits their robustness to reward misspecification and their ability to adapt to changing preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), an off-policy method that conditions agents on reward parameterizations while collecting experience under a single nominal objective. By recomputing counterfactual rewards from shared replay data, RCRL exposes the agent to multiple reward objectives without additional environment interaction, connecting single-task RL with ideas from multi-objective and multi-task learning. Across single-task, multi-task, and vision-based benchmarks, RCRL improves sample efficiency under the nominal reward parameterization, enables efficient adaptation to new parameterizations, and supports zero-shot behavioral adjustment at deployment. Our results show that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.

representative citing papers

Coachable agents for interactive gameplay

cs.AI · 2026-07-01 · unverdicted · novelty 6.0

A framework combining universal value function approximators with targeted training scenarios and data augmentation produces RL agents that adapt to user-specified styles in real time across video games and humanoid domains while preserving core task performance.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Coachable agents for interactive gameplay cs.AI · 2026-07-01 · unverdicted · none · ref 40 · internal anchor
A framework combining universal value function approximators with targeted training scenarios and data augmentation produces RL agents that adapt to user-specified styles in real time across video games and humanoid domains while preserving core task performance.

Reward-Conditioned Reinforcement Learning

fields

years

verdicts

representative citing papers

citing papers explorer