Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Bo Song; Chao Zhou; Kun Dong; Lili Zhao; Yongle Luo; Zhiyong Sun

arxiv: 2003.02740 · v1 · pith:HNZZBQCGnew · submitted 2020-03-05 · 💻 cs.LG · stat.ML

Balance Between Efficient and Effective Learning: Dense2Sparse Reward Shaping for Robot Manipulation with Environment Uncertainty

Yongle Luo , Kun Dong , Lili Zhao , Zhiyong Sun , Chao Zhou , Bo Song This is my paper

classification 💻 cs.LG stat.ML

keywords rewardlearningdense2sparserobotuncertaintymanipulationmethodsystem

0 comments

read the original abstract

Efficient and effective learning is one of the ultimate goals of the deep reinforcement learning (DRL), although the compromise has been made in most of the time, especially for the application of robot manipulations. Learning is always expensive for robot manipulation tasks and the learning effectiveness could be affected by the system uncertainty. In order to solve above challenges, in this study, we proposed a simple but powerful reward shaping method, namely Dense2Sparse. It combines the advantage of fast convergence of dense reward and the noise isolation of the sparse reward, to achieve a balance between learning efficiency and effectiveness, which makes it suitable for robot manipulation tasks. We evaluated our Dense2Sparse method with a series of ablation experiments using the state representation model with system uncertainty. The experiment results show that the Dense2Sparse method obtained higher expected reward compared with the ones using standalone dense reward or sparse reward, and it also has a superior tolerance of system uncertainty.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Temporal Self-Imitation Learning
cs.RO 2026-06 unverdicted novelty 6.0

TSIL mines temporally efficient successful trajectories during RL training to supply configuration-conditioned adaptive targets and efficiency-weighted self-imitation, improving efficiency and robustness across 15 lon...