pith. machine review for the scientific record. sign in

arxiv: 1301.6721 · v1 · submitted 2013-01-23 · 💻 cs.AI · cs.SY

Recognition: unknown

Learning Finite-State Controllers for Partially Observable Environments

Authors on Pith no claims yet
classification 💻 cs.AI cs.SY
keywords algorithmdescentfinite-stategradientobservablestochasticautomataexact
0
0 comments X
read the original abstract

Reactive (memoryless) policies are sufficient in completely observable Markov decision processes (MDPs), but some kind of memory is usually necessary for optimal control of a partially observable MDP. Policies with finite memory can be represented as finite-state automata. In this paper, we extend Baird and Moore's VAPS algorithm to the problem of learning general finite-state automata. Because it performs stochastic gradient descent, this algorithm can be shown to converge to a locally optimal finite-state controller. We provide the details of the algorithm and then consider the question of under what conditions stochastic gradient descent will outperform exact gradient descent. We conclude with empirical results comparing the performance of stochastic and exact gradient descent, and showing the ability of our algorithm to extract the useful information contained in the sequence of past observations to compensate for the lack of observability at each time-step.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. UniVLA: Learning to Act Anywhere with Task-centric Latent Actions

    cs.RO 2025-05 unverdicted novelty 6.0

    UniVLA trains cross-embodiment vision-language-action policies from unlabeled videos via a latent action model in DINO space, beating OpenVLA on benchmarks with 1/20th pretraining compute and 1/10th downstream data.