pith. sign in

arxiv: 1505.00274 · v2 · pith:5Y46OVVBnew · submitted 2015-05-01 · 💻 cs.AI · cs.SY· stat.ML

Stick-Breaking Policy Learning in Dec-POMDPs

classification 💻 cs.AI cs.SYstat.ML
keywords algorithmfscspolicystick-breakingdec-pomdpsdec-sbprdecentralizedlarge
0
0 comments X
read the original abstract

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Approximations and Learning for Decentralized Stochastic Control and Near Optimal Finite Window Policies

    math.OC 2026-04 unverdicted novelty 7.0

    Finite sliding window policies achieve near-optimality and Q-learning converges to them for decentralized stochastic control under OSDISP and KSPISP information structures when a predictor stability condition holds in...