UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
Offline q-learning on diverse multi-task data both scales and generalizes
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 2representative citing papers
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.
citing papers explorer
-
Learning Interactive Real-World Simulators
UniSim learns a universal real-world simulator from orchestrated diverse datasets, enabling zero-shot deployment of policies trained purely in simulation.
-
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
-
Generalisation in Multitask Fitted Q-Iteration and Offline Q-learning
Multitask offline fitted Q-iteration achieves 1/sqrt(nT) generalization rates under shared low-rank structure and reduces complexity for new tasks by reusing the upstream representation.
- Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
- Learning While Deploying: Fleet-Scale Reinforcement Learning for Generalist Robot Policies