Title resolution pending

· 2000

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Intrinsic Vicarious Conditioning for Deep Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 7.0

Vicarious conditioning is proposed as a new intrinsic reward in RL that implements attention, retention, reproduction, and reinforcement via memory methods to enable low-shot learning from others without their policies or rewards, yielding longer episodes in tested environments.

Discovering Reinforcement Learning Interfaces with Large Language Models

cs.LG · 2026-05-05 · unverdicted · novelty 7.0

LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.

Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback

cs.LG · 2026-04-21 · unverdicted · novelty 7.0

Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.

JAXenstein: Accelerated Benchmarking for First-Person Environments

cs.LG · 2026-05-19 · unverdicted · novelty 6.0

JAXenstein ports the Wolfenstein 3D engine to JAX to create a fast, scalable benchmark for first-person visual RL that is several times quicker than existing vision-based alternatives.

citing papers explorer

Showing 4 of 4 citing papers.

Intrinsic Vicarious Conditioning for Deep Reinforcement Learning cs.LG · 2026-05-12 · unverdicted · none · ref 25
Vicarious conditioning is proposed as a new intrinsic reward in RL that implements attention, retention, reproduction, and reinforcement via memory methods to enable low-shot learning from others without their policies or rewards, yielding longer episodes in tested environments.
Discovering Reinforcement Learning Interfaces with Large Language Models cs.LG · 2026-05-05 · unverdicted · none · ref 2
LIMEN discovers effective RL interfaces by using LLMs to evolve observation and reward programs together from raw state, guided by policy training success, outperforming single-component optimization.
Policy Gradient Primal-Dual Method for Safe Reinforcement Learning from Human Feedback cs.LG · 2026-04-21 · unverdicted · none · ref 2
Primal-dual policy gradient algorithms achieve global non-asymptotic convergence for safe RLHF cast as infinite-horizon discounted CMDPs without fitting reward models.
JAXenstein: Accelerated Benchmarking for First-Person Environments cs.LG · 2026-05-19 · unverdicted · none · ref 2
JAXenstein ports the Wolfenstein 3D engine to JAX to create a fast, scalable benchmark for first-person visual RL that is several times quicker than existing vision-based alternatives.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer