pith. sign in

arxiv: 1704.01869 · v3 · pith:7XONRNLCnew · submitted 2017-04-06 · 🧮 math.OC · cs.DS

Randomized Linear Programming Solves the Discounted Markov Decision Problem In Nearly-Linear (Sometimes Sublinear) Running Time

classification 🧮 math.OC cs.DS
keywords algorithmdecisionlinearmarkovoptimalpolicytimedata
0
0 comments X
read the original abstract

We propose a novel randomized linear programming algorithm for approximating the optimal policy of the discounted Markov decision problem. By leveraging the value-policy duality and binary-tree data structures, the algorithm adaptively samples state-action-state transitions and makes exponentiated primal-dual updates. We show that it finds an $\epsilon$-optimal policy using nearly-linear run time in the worst case. When the Markov decision process is ergodic and specified in some special data formats, the algorithm finds an $\epsilon$-optimal policy using run time linear in the total number of state-action pairs, which is sublinear in the input size. These results provide a new venue and complexity benchmarks for solving stochastic dynamic programs.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Sample Complexity Bounds for Stochastic Shortest Path with a Generative Model

    cs.LG 2026-04 unverdicted novelty 7.0

    SSP requires Omega(S A B_star^3 / (c_min epsilon^2)) samples in the worst case, with matching upper bounds that hold even for c_min=0 under bounded optimal hitting time.