VIPO improves model-based offline RL by minimizing value function inconsistency between direct data estimates and model predictions, achieving SOTA results on D4RL and NeoRL benchmarks.
v0" datasets, we reference the experimental results provided in (Sun et al., 2023), which are based on the
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning
VIPO improves model-based offline RL by minimizing value function inconsistency between direct data estimates and model predictions, achieving SOTA results on D4RL and NeoRL benchmarks.