Offline reinforcement learning with implicit Q-learning

Ilya Kostrikov, Ashvin Nair, Sergey Levine · 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.

Delightful Distributed Policy Gradient

cs.LG · 2026-03-20 · unverdicted · novelty 6.0

Delightful Policy Gradient gates updates with advantage times surprisal to suppress rare failures while preserving rare successes in distributed RL with stale or buggy data.

Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets

cs.LG · 2025-10-01 · conditional · novelty 6.0

Weighted BC estimates trajectory density ratios from a clean reference set via binary discrimination and reweights the BC loss to converge to the clean expert policy with finite-sample bounds independent of contamination rate.

citing papers explorer

Showing 3 of 3 citing papers.

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 33
Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.
Delightful Distributed Policy Gradient cs.LG · 2026-03-20 · unverdicted · none · ref 13
Delightful Policy Gradient gates updates with advantage times surprisal to suppress rare failures while preserving rare successes in distributed RL with stale or buggy data.
Density-Ratio Weighted Behavioral Cloning: Learning Control Policies from Corrupted Datasets cs.LG · 2025-10-01 · conditional · none · ref 14
Weighted BC estimates trajectory density ratios from a clean reference set via binary discrimination and reweights the BC loss to converge to the clean expert policy with finite-sample bounds independent of contamination rate.

Offline reinforcement learning with implicit Q-learning

fields

years

verdicts

representative citing papers

citing papers explorer