MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

Deheng Ye; Jian Shen; Menghui Zhu; Minghuan Liu; Qiang Fu; Sheng Chen; Weinan Zhang; Wei Yang; Yong Yu; Zhicheng Zhang

arxiv: 2105.06350 · v1 · pith:ML7WDL3Unew · submitted 2021-05-13 · 💻 cs.AI · cs.LG

MapGo: Model-Assisted Policy Optimization for Goal-Oriented Tasks

Menghui Zhu , Minghuan Liu , Jian Shen , Zhicheng Zhang , Sheng Chen , Weinan Zhang , Deheng Ye , Yong Yu

show 2 more authors

Qiang Fu Wei Yang

This is my paper

classification 💻 cs.AI cs.LG

keywords goal-orientedgoalsmapgopolicytaskscompareddynamicsefficiency

0 comments

read the original abstract

In Goal-oriented Reinforcement learning, relabeling the raw goals in past experience to provide agents with hindsight ability is a major solution to the reward sparsity problem. In this paper, to enhance the diversity of relabeled goals, we develop FGI (Foresight Goal Inference), a new relabeling strategy that relabels the goals by looking into the future with a learned dynamics model. Besides, to improve sample efficiency, we propose to use the dynamics model to generate simulated trajectories for policy training. By integrating these two improvements, we introduce the MapGo framework (Model-Assisted Policy Optimization for Goal-oriented tasks). In our experiments, we first show the effectiveness of the FGI strategy compared with the hindsight one, and then show that the MapGo framework achieves higher sample efficiency when compared to model-free baselines on a set of complicated tasks.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SVL: Goal-Conditioned Reinforcement Learning as Survival Learning
cs.LG 2026-04 unverdicted novelty 7.0

Survival value learning expresses the goal-conditioned value function as a discounted sum of survival probabilities and estimates it with maximum-likelihood hazard models on censored data, matching or exceeding TD bas...
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
cs.LG 2026-05 unverdicted novelty 6.0

QHyer achieves state-of-the-art results in offline goal-conditioned RL by replacing return-to-go with a state-conditioned Q-estimator and introducing a gated hybrid attention-mamba backbone for content-adaptive histor...
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
cs.LG 2026-05 unverdicted novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markov...