Conservative q-learning for offline reinforcement learning.Advances in neural information processing systems, 33:1179–1191, 2020

Aviral Kumar, Aurick Zhou, George Tucker, Sergey Levine · 2020

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

representative citing papers

Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

Target-Aligned Bellman Backup (TABB) improves cross-domain offline RL by selecting source transitions according to their contribution to accurate target-domain Bellman target estimation.

Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.

Zero-Shot Signal Temporal Logic Planning with Disjunctive Branch Selection in Dynamic Semantic Maps

cs.AI · 2026-05-02 · unverdicted · novelty 6.0

A zero-shot STL planner combines a map-conditioned Transformer with a disjunctive heuristic and Transitive RL to achieve better generalization across dynamic semantic maps.

Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning

cs.LG · 2026-04-09 · unverdicted · novelty 6.0

VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.

citing papers explorer

Showing 4 of 4 citing papers.

Target-Aligned Bellman Backup for Cross-domain Offline Reinforcement Learning cs.LG · 2026-05-21 · unverdicted · none · ref 17
Target-Aligned Bellman Backup (TABB) improves cross-domain offline RL by selecting source transitions according to their contribution to accurate target-domain Bellman target estimation.
Beyond Autoregressive RTG: Conditioning via Injection Outside Sequential Modeling in Decision Transformer cs.LG · 2026-05-07 · unverdicted · none · ref 7
Injecting RTG into states outside the autoregressive sequence yields shorter, more efficient Decision Transformers that outperform the original on offline RL tasks.
Zero-Shot Signal Temporal Logic Planning with Disjunctive Branch Selection in Dynamic Semantic Maps cs.AI · 2026-05-02 · unverdicted · none · ref 7
A zero-shot STL planner combines a map-conditioned Transformer with a disjunctive heuristic and Transitive RL to achieve better generalization across dynamic semantic maps.
Value-Guidance MeanFlow for Offline Multi-Agent Reinforcement Learning cs.LG · 2026-04-09 · unverdicted · none · ref 39
VGM²P achieves SOTA-comparable performance in offline MARL via value-guided conditional behavior cloning with MeanFlow, enabling efficient single-step action generation insensitive to regularization coefficients.

Conservative q-learning for offline reinforcement learning.Advances in neural information processing systems, 33:1179–1191, 2020

fields

years

verdicts

representative citing papers

citing papers explorer