Off-policy deep reinforcement learning without exploration

Scott Fujimoto, David Meger, Doina Precup · 2052

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty

cs.LG · 2025-06-14 · unverdicted · novelty 7.0

DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.

Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Target-Aligned Bellman Backup (TABB) improves cross-domain offline RL by selecting source transitions according to their contribution to accurate target-domain Bellman target estimation.

Shaping Zero-Shot Coordination via State Blocking

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.

Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.

Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets

cs.LG · 2025-10-01 · conditional · novelty 6.0

Weighted BC estimates trajectory density ratios from a clean reference set via binary discrimination and reweights the BC loss to converge to the clean expert policy with finite-sample bounds independent of contamination rate.

When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited

cs.LG · 2026-05-16 · unverdicted · novelty 5.0

Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.

citing papers explorer

Showing 6 of 6 citing papers.

DR-SAC: Distributionally Robust Soft Actor-Critic for Reinforcement Learning under Uncertainty cs.LG · 2025-06-14 · unverdicted · none · ref 12
DR-SAC is the first actor-critic distributionally robust RL algorithm for offline continuous control that derives a convergent robust soft policy iteration and reports up to 9.8x higher rewards than SAC under perturbations.
Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning cs.LG · 2026-05-21 · unverdicted · none · ref 15
Target-Aligned Bellman Backup (TABB) improves cross-domain offline RL by selecting source transitions according to their contribution to accurate target-domain Bellman target estimation.
Shaping Zero-Shot Coordination via State Blocking cs.LG · 2026-05-12 · unverdicted · none · ref 46
SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.
Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 38
VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.
Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets cs.LG · 2025-10-01 · conditional · none · ref 4
Weighted BC estimates trajectory density ratios from a clean reference set via binary discrimination and reweights the BC loss to converge to the clean expert policy with finite-sample bounds independent of contamination rate.
When Dynamics Shift, Robust Task Inference Wins: Offline Imitation Learning with Behavior Foundation Models Revisited cs.LG · 2026-05-16 · unverdicted · none · ref 23
Robust minimax task inference in BFMs achieves dynamics-shift robustness from nominal offline data alone and outperforms standard baselines.

Off-policy deep reinforcement learning without exploration

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer