A framework combining universal value function approximators with targeted training scenarios and data augmentation produces RL agents that adapt to user-specified styles in real time across video games and humanoid domains while preserving core task performance.
Reward-Conditioned Reinforcement Learning
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
Single-task RL agents are typically trained under a fixed reward function, which limits their robustness to reward misspecification and their ability to adapt to changing preferences. We introduce Reward-Conditioned Reinforcement Learning (RCRL), an off-policy method that conditions agents on reward parameterizations while collecting experience under a single nominal objective. By recomputing counterfactual rewards from shared replay data, RCRL exposes the agent to multiple reward objectives without additional environment interaction, connecting single-task RL with ideas from multi-objective and multi-task learning. Across single-task, multi-task, and vision-based benchmarks, RCRL improves sample efficiency under the nominal reward parameterization, enables efficient adaptation to new parameterizations, and supports zero-shot behavioral adjustment at deployment. Our results show that RCRL provides a scalable mechanism for learning robust, steerable policies without sacrificing the simplicity of single-task training.
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Coachable agents for interactive gameplay
A framework combining universal value function approximators with targeted training scenarios and data augmentation produces RL agents that adapt to user-specified styles in real time across video games and humanoid domains while preserving core task performance.