VIPO improves model-based offline RL by minimizing value function inconsistency between direct data estimates and model predictions, achieving SOTA results on D4RL and NeoRL benchmarks.
Similar to online RL, offline RL has been explored using both model-free and model-based algorithms, distinguished by whether or not they involve learning a dynamics model
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning
VIPO improves model-based offline RL by minimizing value function inconsistency between direct data estimates and model predictions, achieving SOTA results on D4RL and NeoRL benchmarks.