Advances in neural information processing systems , volume=

Conservative q-learning for offline reinforcement learning , author=

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Learning When to Stop: Selective Imitation Learning Under Arbitrary Dynamics Shift

cs.LG · 2026-05-09 · unverdicted · novelty 7.0 · 2 refs

SeqRejectron constructs a stopping rule with a small set of validator policies to achieve horizon-free sample complexity for selective imitation learning under arbitrary dynamics shifts.

Beyond Penalization: Diffusion-based Out-of-Distribution Detection and Selective Regularization in Offline Reinforcement Learning

cs.LG · 2026-05-06 · unverdicted · novelty 7.0

DOSER detects OOD actions via diffusion-model denoising error and applies selective regularization based on predicted transitions, proving gamma-contraction with performance bounds and outperforming priors on offline RL benchmarks.

Discrete Flow Matching for Offline-to-Online Reinforcement Learning

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

DRIFT enables stable offline-to-online fine-tuning of CTMC policies in discrete RL via advantage-weighted discrete flow matching, path-space regularization, and candidate-set approximation.

Geometric Pareto Control: Riemannian Gradient Flow of Energy Function via Lie Group Homotopy

eess.SY · 2026-05-11 · unverdicted · novelty 6.0

Geometric Pareto Control embeds Pareto solutions in a Lie group submanifold and navigates via Riemannian gradient flow to achieve 100% feasibility and low suboptimality in control tasks without retraining.

When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning

cs.RO · 2026-05-06 · unverdicted · novelty 6.0

Q2RL extracts Q-values from a BC policy and applies Q-gating to enable efficient offline-to-online RL, outperforming baselines on D4RL/robomimic tasks and achieving up to 100% success on real-robot manipulation in 1-2 hours.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.

Offline Reinforcement Learning for Rotation Profile Control in Tokamaks

cs.LG · 2026-05-07 · unverdicted · novelty 4.0

Offline model-based RL policy trained solely on DIII-D historical data is deployed on the tokamak and yields promising real-world rotation profile control.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Advances in neural information processing systems , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer