Leveraging procedural generation to benchmark reinforcement learning,

· 2020

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents

cs.AI · 2026-05-22 · unverdicted · novelty 6.0

pcsp is a shared RL policy using LLM persona embeddings, low-rank projection, and PPO+InfoNCE+KL training that delivers 17x above-chance zero-shot persona identification and 22x faster inference on a 300-persona benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents cs.AI · 2026-05-22 · unverdicted · none · ref 39
pcsp is a shared RL policy using LLM persona embeddings, low-rank projection, and PPO+InfoNCE+KL training that delivers 17x above-chance zero-shot persona identification and 22x faster inference on a 300-persona benchmark.

Leveraging procedural generation to benchmark reinforcement learning,

fields

years

verdicts

representative citing papers

citing papers explorer