Vicarious conditioning is proposed as a new intrinsic reward in RL that implements attention, retention, reproduction, and reinforcement via memory methods to enable low-shot learning from others without their policies or rewards, yielding longer episodes in tested environments.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.LG 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.
Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.
JAXenstein ports the Wolfenstein 3D engine to JAX to create a fast, scalable benchmark for first-person visual RL that is several times quicker than existing vision-based alternatives.
citing papers explorer
-
Intrinsic Vicarious Conditioning for Deep Reinforcement Learning
Vicarious conditioning is proposed as a new intrinsic reward in RL that implements attention, retention, reproduction, and reinforcement via memory methods to enable low-shot learning from others without their policies or rewards, yielding longer episodes in tested environments.
-
Discovering Reinforcement Learning Interfaces with Large Language Models
LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.
-
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback
Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.
-
JAXenstein: Accelerated Benchmarking for First-Person Environments
JAXenstein ports the Wolfenstein 3D engine to JAX to create a fast, scalable benchmark for first-person visual RL that is several times quicker than existing vision-based alternatives.