Offline q-learning on diverse multi-task data both scales and generalizes

Aviral Kumar, Rishabh Agarwal, Xinyang Geng, George Tucker, Sergey Levine · 2023 · arXiv 2211.15144

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

Learning Interactive Real-World Simulators

cs.AI · 2023-10-09 · conditional · novelty 7.0

UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning

cs.LG · 2025-12-23 · unverdicted · novelty 6.0

Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03

Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies

cs.RO · 2026-05-01

citing papers explorer

Showing 5 of 5 citing papers.

Learning Interactive Real-World Simulators cs.AI · 2023-10-09 · conditional · none · ref 258
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 81
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning cs.LG · 2025-12-23 · unverdicted · none · ref 15
Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.
Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning cs.LG · 2026-05-03 · unreviewed · ref 38
Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies cs.RO · 2026-05-01 · unreviewed · ref 55

Offline q-learning on diverse multi-task data both scales and generalizes

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer