pith. sign in

CoRR , volume =

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 1

citation-polarity summary

fields

cs.LG 9 cs.AI 1

roles

background 1

polarities

background 1

clear filters

representative citing papers

Offline Reinforcement Learning with Implicit Q-Learning

cs.LG · 2021-10-12 · unverdicted · novelty 8.0

IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.

Drift Q-Learning

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.

Behavior Regularized Offline Reinforcement Learning

cs.LG · 2019-11-26 · unverdicted · novelty 6.0

Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.

citing papers explorer

Showing 6 of 6 citing papers after filters.

  • Offline Reinforcement Learning with Implicit Q-Learning cs.LG · 2021-10-12 · unverdicted · none · ref 8

    IQL achieves policy improvement in offline RL by implicitly estimating optimal action values through state-conditional upper expectiles of value functions, without querying Q-functions on out-of-distribution actions.

  • Drift Q-Learning cs.LG · 2026-05-29 · unverdicted · none · ref 90

    DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.

  • Behavior Regularized Offline Reinforcement Learning cs.LG · 2019-11-26 · unverdicted · none · ref 11

    Behavior-regularized actor-critic methods achieve strong offline RL results with simple regularization, rendering many recent technical additions unnecessary.

  • Benchmarking Batch Deep Reinforcement Learning Algorithms cs.LG · 2019-10-03 · unverdicted · none · ref 11

    Many batch RL algorithms underperform both online DQN and the behavioral policy on Atari; an adapted discrete-action BCQ outperforms the others tested.

  • ISEP: Implicit Support Expansion for Offline Reinforcement Learning via Stochastic Policy Optimization cs.LG · 2026-05-18 · unverdicted · none · ref 9

    ISEP expands action support in offline RL via value interpolation between data and policy samples, then uses stochastic policy optimization to avoid mode collapse in the resulting multimodal objective.

  • RankQ: Offline-to-Online Reinforcement Learning via Self-Supervised Action Ranking cs.AI · 2026-05-11 · unverdicted · none · ref 10 · 2 links

    RankQ augments temporal-difference Q-learning with a multi-term self-supervised ranking loss to enforce structured action ordering, yielding competitive or better results than prior methods on D4RL and large gains in vision-based robot fine-tuning.