pith. sign in

arxiv: 1706.01417 · v1 · pith:QXLP7LLAnew · submitted 2017-06-05 · 💻 cs.AI

A method for the online construction of the set of states of a Markov Decision Process using Answer Set Programming

classification 💻 cs.AI
keywords domainoaspagentanswercapabledecisiondomainsmarkov
0
0 comments X
read the original abstract

Non-stationary domains, that change in unpredicted ways, are a challenge for agents searching for optimal policies in sequential decision-making problems. This paper presents a combination of Markov Decision Processes (MDP) with Answer Set Programming (ASP), named {\em Online ASP for MDP} (oASP(MDP)), which is a method capable of constructing the set of domain states while the agent interacts with a changing environment. oASP(MDP) updates previously obtained policies, learnt by means of Reinforcement Learning (RL), using rules that represent the domain changes observed by the agent. These rules represent a set of domain constraints that are processed as ASP programs reducing the search space. Results show that oASP(MDP) is capable of finding solutions for problems in non-stationary domains without interfering with the action-value function approximation process.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.