CoRR , volume =

Aviral Kumar, Justin Fu, George Tucker, Sergey Levine , title = · 2019 · arXiv 1906.00949

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Offline Reinforcement Learning with Implicit Q-Learning

cs.LG · 2021-10-12 · unverdicted · novelty 8.0

IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.

Decision Transformer: Reinforcement Learning via Sequence Modeling

cs.LG · 2021-06-02 · accept · novelty 8.0

Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.

D4RL: Datasets for Deep Data-Driven Reinforcement Learning

cs.LG · 2020-04-15 · accept · novelty 8.0

D4RL supplies new offline RL benchmarks and datasets from expert and mixed sources to expose weaknesses in existing algorithms and standardize evaluation.

Drift Q-Learning

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.

The hidden risks of temporal resampling in clinical reinforcement learning

cs.LG · 2026-02-06 · conditional · novelty 6.0

Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.

Behavior Regularized Offline Reinforcement Learning

cs.LG · 2019-11-26 · unverdicted · novelty 6.0

Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.

Benchmarking Batch Deep Reinforcement Learning Algorithms

cs.LG · 2019-10-03 · unverdicted · novelty 6.0

Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

cs.LG · 2019-10-01 · conditional · novelty 6.0

AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.

ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization

cs.LG · 2026-05-18 · unverdicted · novelty 5.0

ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.

RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking

cs.AI · 2026-05-11 · unverdicted · novelty 5.0 · 2 refs

RankQ augments temporal-difference Q-learning with a multi-term self-supervised ranking loss to enforce structured action ordering, yielding competitive or better results than prior methods on D4RL and large gains in vision-based robot fine-tuning.

citing papers explorer

Showing 10 of 10 citing papers.

Offline Reinforcement Learning with Implicit Q-Learning cs.LG · 2021-10-12 · unverdicted · none · ref 8
IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
Decision Transformer: Reinforcement Learning via Sequence Modeling cs.LG · 2021-06-02 · accept · none · ref 18
Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
D4RL: Datasets for Deep Data-Driven Reinforcement Learning cs.LG · 2020-04-15 · accept · none · ref 14
D4RL supplies new offline RL benchmarks and datasets from expert and mixed sources to expose weaknesses in existing algorithms and standardize evaluation.
Drift Q-Learning cs.LG · 2026-05-29 · unverdicted · none · ref 90
DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.
The hidden risks of temporal resampling in clinical reinforcement learning cs.LG · 2026-02-06 · conditional · none · ref 57
Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
Behavior Regularized Offline Reinforcement Learning cs.LG · 2019-11-26 · unverdicted · none · ref 11
Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.
Benchmarking Batch Deep Reinforcement Learning Algorithms cs.LG · 2019-10-03 · unverdicted · none · ref 11
Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning cs.LG · 2019-10-01 · conditional · none · ref 9
AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.
ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization cs.LG · 2026-05-18 · unverdicted · none · ref 9
ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.
RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking cs.AI · 2026-05-11 · unverdicted · none · ref 10 · 2 links
RankQ augments temporal-difference Q-learning with a multi-term self-supervised ranking loss to enforce structured action ordering, yielding competitive or better results than prior methods on D4RL and large gains in vision-based robot fine-tuning.

CoRR , volume =

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer