SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.
Morel: Model-based offline reinforcement learning.Advances in neural information processing systems, 33:21810–21823
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3representative citing papers
PSPO combines Bayesian posterior sampling of transition dynamics with constrained policy optimization to trade off generalization and robustness in offline RL.
Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.
citing papers explorer
-
Shaping Zero-Shot Coordination via State Blocking
SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.
-
Offline Policy Optimization with Posterior Sampling
PSPO combines Bayesian posterior sampling of transition dynamics with constrained policy optimization to trade off generalization and robustness in offline RL.
-
Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer
Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.