IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.
CoRR , volume =
10 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Decision Transformer casts RL as autoregressive sequence modeling conditioned on desired returns, past states and actions, matching or exceeding offline RL baselines on Atari, Gym and Key-to-Door tasks.
D4RL supplies new offline RL benchmarks and datasets from expert and mixed sources to expose weaknesses in existing algorithms and standardize evaluation.
DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.
Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.
Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.
AWR learns policies via advantage-weighted supervised regression on actions, achieving competitive off-policy performance on Gym tasks and strong results from static data alone.
ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.
RankQ augments temporal-difference Q-learning with a multi-term self-supervised ranking loss to enforce structured action ordering, yielding competitive or better results than prior methods on D4RL and large gains in vision-based robot fine-tuning.
citing papers explorer
-
Drift Q-Learning
DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.
-
The hidden risks of temporal resampling in clinical reinforcement learning
Resampling clinical time series into uniform bins for offline RL reduces performance by up to 60% and causes retrospective evaluations to overestimate returns by 1.5-3x versus unprocessed data.
-
ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization
ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.
-
RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking
RankQ augments temporal-difference Q-learning with a multi-term self-supervised ranking loss to enforce structured action ordering, yielding competitive or better results than prior methods on D4RL and large gains in vision-based robot fine-tuning.