Path Integral Policy Improvement with Covariance Matrix Adaptation

Freek Stulp (Ecole Nationale Superieure de Techniques Avancees); Olivier Sigaud (Universite Pierre et Marie Curie)

arxiv: 1206.4621 · v1 · pith:665KFKV5new · submitted 2012-06-18 · 💻 cs.LG

Path Integral Policy Improvement with Covariance Matrix Adaptation

Freek Stulp (Ecole Nationale Superieure de Techniques Avancees) , Olivier Sigaud (Universite Pierre et Marie Curie) This is my paper

classification 💻 cs.LG

keywords adaptationcovariancederivationfamilyimprovementintegralmatrixmethods

0 comments

read the original abstract

There has been a recent focus in reinforcement learning on addressing continuous state and action problems by optimizing parameterized policies. PI2 is a recent example of this approach. It combines a derivation from first principles of stochastic optimal control with tools from statistical estimation theory. In this paper, we consider PI2 as a member of the wider family of methods which share the concept of probability-weighted averaging to iteratively update parameters to optimize a cost function. We compare PI2 to other members of the same family - Cross-Entropy Methods and CMAES - at the conceptual level and in terms of performance. The comparison suggests the derivation of a novel algorithm which we call PI2-CMA for "Path Integral Policy Improvement with Covariance Matrix Adaptation". PI2-CMA's main advantage is that it determines the magnitude of the exploration noise automatically.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Industrial Dual-Arm Box Handling via Online Inertial Estimation and Convex Wrench Optimization
cs.RO 2026-05 unverdicted novelty 5.0

A dual-arm robot framework performs online inertial estimation from contact wrenches and uses SOCP under ellipsoidal friction constraints to lift boxes with unknown properties while maintaining stable contact.