pith. sign in

arxiv: 2301.08184 · v1 · pith:2KFU5XGInew · submitted 2023-01-19 · 💻 cs.RO

Keyframe Demonstration Seeded and Bayesian Optimized Policy Search

classification 💻 cs.RO
keywords explorationapproachbayesianpolicysearchskillsbo-pi2demonstration
0
0 comments X
read the original abstract

This paper introduces a novel Learning from Demonstration framework to learn robotic skills with keyframe demonstrations using a Dynamic Bayesian Network (DBN) and a Bayesian Optimized Policy Search approach to improve the learned skills. DBN learns the robot motion, perceptual change in the object of interest (aka skill sub-goals) and the relation between them. The rewards are also learned from the perceptual part of the DBN. The policy search part is a semiblack box algorithm, which we call BO-PI2 . It utilizes the action-perception relation to focus the high-level exploration, uses Gaussian Processes to model the expected-return and performs Upper Confidence Bound type low-level exploration for sampling the rollouts. BO-PI2 is compared against a stateof-the-art method on three different skills in a real robot setting with expert and naive user demonstrations. The results show that our approach successfully focuses the exploration on the failed sub-goals and the addition of reward-predictive exploration outperforms the state-of-the-art approach on cumulative reward, skill success, and termination time metrics.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.