Similar to online RL, offline RL has been explored using both model-free and model-based algorithms, distinguished by whether or not they involve learning a dynamics model

12 Preprint Supplementary Material Table of Contents A Related Work 13 B Proof of the Model Gradient Theorem 14 C Proof of Propositions 15 D Planner Details 16 E Experimental Details 18 F Ablation Study 20 G More Experiments on Adroit Tasks · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning

cs.LG · 2025-04-16 · unverdicted · novelty 5.0

VIPO improves model-based offline RL by minimizing value function inconsistency between direct data estimates and model predictions, achieving SOTA results on D4RL and NeoRL benchmarks.

citing papers explorer

Showing 1 of 1 citing paper.

VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning cs.LG · 2025-04-16 · unverdicted · none · ref 13
VIPO improves model-based offline RL by minimizing value function inconsistency between direct data estimates and model predictions, achieving SOTA results on D4RL and NeoRL benchmarks.

Similar to online RL, offline RL has been explored using both model-free and model-based algorithms, distinguished by whether or not they involve learning a dynamics model

fields

years

verdicts

representative citing papers

citing papers explorer