pcsp is a shared RL policy using LLM persona embeddings, low-rank projection, and PPO+InfoNCE+KL training that delivers 17x above-chance zero-shot persona identification and 22x faster inference on a 300-persona benchmark.
The surprising effectiveness of PPO in cooperative multi-agent games,
2 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
CAMCO enforces policy constraints on multi-agent AI at deployment time via convex projection, risk-weighted Lagrangian shaping, and bounded-convergence negotiation, yielding zero violations and 92-97% utility in tested enterprise scenarios.
citing papers explorer
-
One Policy, Infinite NPCs: Persona-Traceable Shared RL Policies for Scalable Game Agents
pcsp is a shared RL policy using LLM persona embeddings, low-rank projection, and PPO+InfoNCE+KL training that delivers 17x above-chance zero-shot persona identification and 22x faster inference on a 300-persona benchmark.
-
Safe and Policy-Compliant Multi-Agent Orchestration for Enterprise AI
CAMCO enforces policy constraints on multi-agent AI at deployment time via convex projection, risk-weighted Lagrangian shaping, and bounded-convergence negotiation, yielding zero violations and 92-97% utility in tested enterprise scenarios.