NEUBAY uses Bayesian posteriors over world models with long-horizon planning to match or exceed conservative offline RL methods without explicit conservatism.
Including this prior knowledge makes our algorithm directly comparable to model-based baselines, which are our main focus
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Long-Horizon Model-Based Offline Reinforcement Learning Without Explicit Conservatism
NEUBAY uses Bayesian posteriors over world models with long-horizon planning to match or exceed conservative offline RL methods without explicit conservatism.