Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

Francis Bach (SIERRA)

arxiv: 1906.08524 · v1 · pith:G3XCNHGVnew · submitted 2019-06-20 · 💻 cs.LG · stat.ML

Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

Francis Bach (SIERRA) This is my paper

Pith reviewed 2026-05-25 19:55 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords deterministic MDPsmax-plus algebravalue iterationmatching pursuitcovering numberscurse of dimensionalityvalue function approximation

0 comments

The pith

In deterministic MDPs, max-plus approximation of the optimal value function with a fixed basis has complexity governed by covering numbers of the state space rather than its cardinality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies max-plus algebra to value iteration in deterministic Markov decision processes, representing value functions on a dictionary of basis functions. This leads to an iterative algorithm whose error and complexity for fixed bases depend on covering numbers instead of the number of states. For high-dimensional or factored spaces, an adaptive basis selection similar to matching pursuit is proposed, showing empirical success on low-dimensional control problems despite lacking theoretical guarantees. The framework extends to general MDPs via measure-based formulations. A sympathetic reader would care because it offers a way to mitigate the curse of dimensionality in planning problems.

Core claim

When a fixed finite basis is used, the computational complexity of approximating the optimal value function is not directly related to the number of states but to notions of covering numbers of the state space. Adaptive bases can be selected via a matching pursuit procedure to handle factored state spaces.

What carries the argument

Max-plus linear combinations of basis value functions, used to approximate the Bellman operator iteration.

If this is right

Approximation error and runtime scale with the covering number of the state space under the chosen basis.
Deterministic MDPs can be solved approximately without enumerating all states if a good dictionary exists.
Adaptive dictionary selection via matching pursuit empirically reduces error on continuous control MDPs.
The method applies to general MDPs by lifting to measures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar max-plus techniques might apply to stochastic MDPs without measure lifting if expectations can be handled.
If covering numbers are small for many real-world MDPs, this could enable scalable planning in robotics.
Combining with function approximation from deep learning could yield hybrid methods.

Load-bearing premise

The optimal value function can be closely approximated in the max-plus sense by linear combinations of a modest number of basis functions.

What would settle it

A deterministic MDP where the max-plus approximation error remains large even when the covering number is small, or where runtime grows with state count despite small covering number.

Figures

Figures reproduced from arXiv: 1906.08524 by Francis Bach (SIERRA).

**Figure 1.** Figure 1: Approximation of a function V with different finite basis with 16 elements. Onedimensional case. From left to right: piece-wise constant basis function, piecewise affine basis functions with well chosen value of c, then too small and too large. For all cases above, the optimization error (to approximate V∞ in Prop. 1) is range(r)τe−ρt/τ ; thus, in the sparse case, the number of iterations to achieve preci… view at source ↗

**Figure 2.** Figure 2: Approximation of a function V with different finite basis with 16 or 64 elements, within the MDP (i.e., leading to V∞ in Prop. 1), and with values of ρ that are 4 and 32. One-dimensional case. From left to right: (n = 16, ρ = 4), (n = 16, ρ = 32), (n = 64, ρ = 4), (n = 16, ρ = 32). Note that the approximation with a small number of basis functions here is the same for WW> and Z >+ Z > when Z = W (this will… view at source ↗

**Figure 3.** Figure 3: Approximation error kV − V∗k1 of a function V as a function of ρ and the number of basis functions, for piecewise constant or affine functions. Top: One-dimensional case, bottom: two-dimensional case. Left: fixed non adaptive basis, right: greedy (matching pursuit). the set of actions A = {−1, 1}, following [25], we consider the dynamical systems dx/dt = a (going left or right). The goal of the control pro… view at source ↗

**Figure 4.** Figure 4: Approximation of a function V with different finite basis with 16 elements. Onedimensional case (with a convex optimal value function). From left to right: piece-wise constant basis function, piecewise affine basis functions with well chosen value of c, then too small and too large. We can write the criterion that needs to be optimized from Section 5 as follows: max s∈S min U(s) − T V (s), −T V (s) − zn… view at source ↗

**Figure 5.** Figure 5: Approximation of a function V with different finite basis with 16 or 64 elements, within the MDP, and with values of ρ that are 4 and 32. One-dimensional case (with a convex optimal value function). From left to right: (n = 16, ρ = 4), (n = 16, ρ = 32), (n = 64, ρ = 4), (n = 16, ρ = 32) [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗

**Figure 6.** Figure 6: Approximation error of a function V as a function of ρ and the number of basis functions, for piecewise constant functions and piecewise affine functions. One-dimensional case (with a convex optimal value function). Left: fixed, right: greedy. E.2 Two-dimensional We consider control problems on [0, 1]d , with d = 2, with 2d potential discrete actions (one in each coordinate direction). The corresponding Ha… view at source ↗

**Figure 3.** Figure 3: • V (x1, x2) = (1−3x1)+ + (6x1 −4)+ + (1−3x2)+ + (6x2 −4)+. This function, without potential for variable selection, is plotted in [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 7.** Figure 7: Value function with potential for variable selection. Left: optimal, right: approximation [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Value function without potential for variable selection. Left: optimal, right: approxima [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Approximation error of a function V as a function of ρ and the number of basis functions, for piecewise constant functions and piecewise affine functions. Two-dimensional case corresponding to the function in [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗

read the original abstract

We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The setup naturally leads to novel theoretical results which are simply formulated due to the max-plus algebra structure. For example, when considering a fixed (non adaptive) finite basis, the computational complexity of approximating the optimal value function is not directly related to the number of states, but to notions of covering numbers of the state space. In order to break the curse of dimensionality in factored state-spaces, we consider adaptive basis that can adapt to particular problems leading to an algorithm similar to matching pursuit from signal processing. They currently come with no theoretical guarantees but work empirically well on simple deterministic MDPs derived from low-dimensional continuous control problems. We focus primarily on deterministic MDPs but note that the framework can be applied to all MDPs by considering measure-based formulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames value iteration for deterministic MDPs via max-plus dictionaries, so that fixed-basis approximation complexity depends on covering numbers rather than |S|, but the adaptive matching-pursuit version has no theory and rests on an unproven closeness assumption.

read the letter

The main point is that max-plus algebra lets them run a reduced value iteration whose cost for a fixed dictionary scales with covering numbers of the state space instead of the raw number of states. That is the cleanest new piece, and it follows directly from the algebra once the representation is set up as max of basis functions plus offsets. The fixed-basis theory is stated simply and looks internally consistent on the terms given in the abstract. The link to covering numbers is explicit and avoids the usual state-space blowup in the deterministic setting. The adaptive part borrows matching pursuit to grow the dictionary on the fly and reports that it works on a few low-dimensional continuous-control MDPs; that is a reasonable empirical transfer and they are upfront that it currently has no guarantees. The soft spot is exactly the one the stress-test flags: the covering-number bound only delivers useful approximation error if the optimal value function sits close enough to the max-plus span of the chosen dictionary. The paper does not supply a general result showing that modest non-adaptive dictionaries achieve this for arbitrary deterministic MDPs, so the complexity claim stays conditional. Without that, the decoupling from |S| is more aspirational than proven for broad use. The work is aimed at people already thinking about dictionary or basis methods for value functions in deterministic or measure-based MDPs, especially those who want to import tools from signal processing. It is not a complete attack on the curse of dimensionality but gives a different handle on the fixed-basis case. I would send it to peer review; the fixed-basis results are novel enough to be worth referee time even if the adaptive side needs more development.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a max-plus algebra approach to approximate value iteration for deterministic MDPs using fixed and adaptive dictionaries of basis value functions. It claims that for a fixed finite basis, the complexity of approximation depends on covering numbers of the state space rather than the number of states. An adaptive algorithm inspired by matching pursuit is introduced for breaking the curse of dimensionality in factored spaces, showing empirical success on simple deterministic MDPs from low-dimensional continuous control, though without theoretical guarantees. The framework is suggested to apply to general MDPs via measure-based formulations.

Significance. The approach provides a structured way to approximate value functions in deterministic MDPs, potentially decoupling complexity from state space size through covering numbers and max-plus operations. The empirical performance on control problems indicates practical promise. However, the lack of guarantees for the adaptive case and the reliance on an unstated closeness assumption for the fixed basis case temper the significance. Strengths include the clean formulation enabled by max-plus algebra.

major comments (2)

[Abstract] Abstract: The assertion that computational complexity depends on covering numbers rather than |S| for a fixed basis presupposes that the optimal value function V* lies sufficiently close (in the max-plus sense) to the span of the dictionary so that the covering radius controls the error. No general result is supplied guaranteeing that a modest non-adaptive dictionary achieves small max-plus distance for arbitrary deterministic MDPs; without this, the covering-number argument does not establish the claimed decoupling from state cardinality.
[Abstract] Abstract: The adaptive basis algorithm is stated to 'currently come with no theoretical guarantees'. If the manuscript contains any partial bounds or conditions under which guarantees might hold, they should be explicitly stated and derived to strengthen the contribution.

minor comments (1)

The abstract mentions 'notions of covering numbers' without specifying which notions (e.g., epsilon-nets in what metric) are used in the complexity analysis; this should be clarified in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive comments on our manuscript. We address each major comment point by point below, with proposed revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion that computational complexity depends on covering numbers rather than |S| for a fixed basis presupposes that the optimal value function V* lies sufficiently close (in the max-plus sense) to the span of the dictionary so that the covering radius controls the error. No general result is supplied guaranteeing that a modest non-adaptive dictionary achieves small max-plus distance for arbitrary deterministic MDPs; without this, the covering-number argument does not establish the claimed decoupling from state cardinality.

Authors: We agree with the referee that the complexity scaling with covering numbers (rather than |S|) is conditional on the max-plus distance from V* to the dictionary span being controlled. The manuscript derives error bounds for the approximated value iteration that depend on the covering radius when this approximation quality holds, but does not provide a general existence result guaranteeing small max-plus distance for modest fixed dictionaries across arbitrary deterministic MDPs. The claim in the abstract is intended to highlight that, for a given fixed dictionary achieving sufficient approximation quality, the per-iteration cost scales with covering numbers of the state space. We will revise the abstract to explicitly note this conditional aspect and avoid any implication of unconditional decoupling. revision: yes
Referee: [Abstract] Abstract: The adaptive basis algorithm is stated to 'currently come with no theoretical guarantees'. If the manuscript contains any partial bounds or conditions under which guarantees might hold, they should be explicitly stated and derived to strengthen the contribution.

Authors: The manuscript states in the abstract that the adaptive matching-pursuit-style algorithm 'currently come with no theoretical guarantees.' A review of the full text confirms that no partial bounds, sufficient conditions, or preliminary theoretical results are derived for the adaptive case; the contribution for this part is empirical, demonstrating effectiveness on low-dimensional continuous control MDPs. We do not have additional material to add on this point. revision: no

Circularity Check

0 steps flagged

No circularity: complexity claim follows directly from max-plus structure on fixed basis

full rationale

The derivation applies standard max-plus algebra to value iteration on a fixed finite dictionary, yielding covering-number dependence as a direct algebraic consequence rather than a fitted or self-referential quantity. No equations reduce the claimed complexity bound to parameters estimated from the target result itself, and the adaptive basis section is explicitly presented without theoretical guarantees. Self-citations on max-plus tools are not load-bearing for the central claim, which remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the algebraic properties of max-plus semirings and the assumption that value functions admit useful dictionary representations; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Max-plus algebra operations preserve the Bellman optimality structure for deterministic MDPs
Invoked when the paper states that the setup 'naturally leads to novel theoretical results which are simply formulated due to the max-plus algebra structure'.

pith-pipeline@v0.9.0 · 5684 in / 1102 out tokens · 24722 ms · 2026-05-25T19:55:54.187863+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

The max-plus ﬁnite element method for solving deterministic optimal control problems: basic properties and convergence analysis

Marianne Akian, Stéphane Gaubert, and Asma Lakhoua. The max-plus ﬁnite element method for solving deterministic optimal control problems: basic properties and convergence analysis. SIAM Journal on Control and Optimization, 47(2):817–848, 2008

work page 2008
[2]

John Wiley & Sons, 1992

François Baccelli, Guy Cohen, Geert Jan Olsder, and Jean-Pierre Quadrat.Synchronization and Linearity: an Algebra for Discrete Event Systems. John Wiley & Sons, 1992

work page 1992
[3]

Bauschke, Jérôme Bolte, and Marc Teboulle

Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: ﬁrst-order methods revisited and applications.Mathematics of Operations Research, 42(2):330–348, 2016

work page 2016
[4]

Bertsekas

Dimitri P. Bertsekas. Weighted sup-norm contractions in dynamic programming: A review and some new applications. Technical Report LIDS-P-2884, Dept. Elect. Eng. Comput. Sci., Massachusetts Institute of Technology, 2012. 12

work page 2012
[5]

Bertsekas and David A

Dimitri P. Bertsekas and David A. Castanon. Adaptive aggregation methods for inﬁnite horizon dynamic programming.IEEE Transactions on Automatic Control, 34(6):589–598, 1989

work page 1989
[6]

Cambridge University Press, 2004

Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004

work page 2004
[7]

Cross-entropy optimization of control policies with adaptive basis functions.IEEE Transactions on Systems, Man, and Cybernetics, 41(1):196–209, 2011

Lucian Busoniu, Damien Ernst, Bart De Schutter, and Robert Babuska. Cross-entropy optimization of control policies with adaptive basis functions.IEEE Transactions on Systems, Man, and Cybernetics, 41(1):196–209, 2011

work page 2011
[8]

Springer Science & Business Media, 2010

Peter Butkovič.Max-linear Systems: Theory and Algorithms. Springer Science & Business Media, 2010

work page 2010
[9]

Extreme values for characteristic radii of a Poisson- Voronoi tessellation

Pierre Calka and Nicolas Chenavier. Extreme values for characteristic radii of a Poisson- Voronoi tessellation. Extremes, 17(3):359–385, 2014

work page 2014
[10]

Duality and separation theorems in idempotent semimodules.Linear Algebra and its Applications, 379(1):395–422, 2004

Guy Cohen, Stéphane Gaubert, and Jean-Pierre Quadrat. Duality and separation theorems in idempotent semimodules.Linear Algebra and its Applications, 379(1):395–422, 2004

work page 2004
[11]

Crandall and Pierre-Louis Lions

Michael G. Crandall and Pierre-Louis Lions. Viscosity solutions of Hamilton-Jacobi equations. Transactions of the American Mathematical Society, 277(1):1–42, 1983

work page 1983
[12]

Dictionary Learning Algorithms and Applications

Bogdan Dumitrescu and Paul Irofti. Dictionary Learning Algorithms and Applications. Springer, 2018

work page 2018
[13]

Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms

Stéphane Gaubert, William McEneaney, and Zheng Qu. Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms. In Conference on Decision and Control and European Control Conference, pages 1054–1061, 2011

work page 2011
[14]

A brief survey of parametric value function approxima- tion

Matthieu Geist and Olivier Pietquin. A brief survey of parametric value function approxima- tion. Technical report, Supélec, 2010

work page 2010
[15]

Eﬃcient solution algorithms for factored MDPs.Journal of Artiﬁcial Intelligence Research, 19:399–468, 2003

Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Eﬃcient solution algorithms for factored MDPs.Journal of Artiﬁcial Intelligence Research, 19:399–468, 2003

work page 2003
[16]

Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30

Onésimo Hernández-Lerma and Jean-Bernard Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30. Springer Science & Business Media, 2012

work page 2012
[17]

Sham Kakade, Mengdi Wang, and Lin F. Yang. Variance reduction methods for sublinear reinforcement learning. Technical Report 1802.09184, arXiv, 2018

work page arXiv 2018
[18]

Keller, Shie Mannor, and Doina Precup

Philipp W. Keller, Shie Mannor, and Doina Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. InProceedings of the International Conference on Machine Learning (ICML), 2006

work page 2006
[19]

Nonlinear optimal control via occupation measures and lmi-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

Jean-Bernard Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear optimal control via occupation measures and lmi-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

work page 2008
[20]

Feature selection and feature learning for high- dimensional batch reinforcement learning: A survey.International Journal of Automation and Computing, 12(3):229–242, 2015

De-Rong Liu, Hong-Liang Li, and Ding Wang. Feature selection and feature learning for high- dimensional batch reinforcement learning: A survey.International Journal of Automation and Computing, 12(3):229–242, 2015. 13

work page 2015
[21]

Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes.Journal of Machine Learning Research, 8(Oct):2169–2231, 2007

Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes.Journal of Machine Learning Research, 8(Oct):2169–2231, 2007

work page 2007
[22]

Sparse modeling for image and vision processing

Julien Mairal, Francis Bach, and Jean Ponce. Sparse modeling for image and vision processing. Foundations and TrendsR⃝ in Computer Graphics and Vision, 8(2-3):85–283, 2014

work page 2014
[23]

A Wavelet Tour of Signal Processing: the Sparse Way

Stéphane Mallat. A Wavelet Tour of Signal Processing: the Sparse Way. Academic Press, 2008

work page 2008
[24]

McEneaney

William M. McEneaney. Error analysis of a max-plus algorithm for a ﬁrst-order HJB equation. In Stochastic Theory and Control, pages 335–351. Springer, 2002

work page 2002
[25]

A study of reinforcement learning in the continuous case by the means of viscosity solutions.Machine Learning, 40(3):265–299, 2000

Rémi Munos. A study of reinforcement learning in the continuous case by the means of viscosity solutions.Machine Learning, 40(3):265–299, 2000

work page 2000
[26]

Variable resolution discretization in optimal control

Rémi Munos and Andrew Moore. Variable resolution discretization in optimal control. Machine Learning, 49(2-3):291–323, 2002

work page 2002
[27]

Greedy linear value-approximation for factored Markov decision processes

Relu Patrascu, Pascal Poupart, Dale Schuurmans, Craig Boutilier, and Carlos Guestrin. Greedy linear value-approximation for factored Markov decision processes. InProc. AAAI, 2002

work page 2002
[28]

Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014

work page 2014
[29]

Enumeration of lattice points inℓ1-norm

Joan Serra-Sagristà. Enumeration of lattice points inℓ1-norm. Information Processing Letters, 76(1-2):39–44, 2000

work page 2000
[30]

Variance reduced value iteration and faster algorithms for solving Markov decision processes

Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance reduced value iteration and faster algorithms for solving Markov decision processes. InProceedings of the Symposium on Discrete Algorithms (SODA), 2018

work page 2018
[31]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2018

work page 2018
[32]

Springer, 2015

Vladimir Temlyakov.Sparse Approximation with Bases. Springer, 2015. A Value iteration and reduced value-iteration convergence In this appendix, we provide short proofs for the lemmas and propositions of the main paper related to (reduced) value iteration. A.1 Approximation of the value function through approximate ﬁxed points This is taken from [4, Prop. ...

work page 2015

[1] [1]

The max-plus ﬁnite element method for solving deterministic optimal control problems: basic properties and convergence analysis

Marianne Akian, Stéphane Gaubert, and Asma Lakhoua. The max-plus ﬁnite element method for solving deterministic optimal control problems: basic properties and convergence analysis. SIAM Journal on Control and Optimization, 47(2):817–848, 2008

work page 2008

[2] [2]

John Wiley & Sons, 1992

François Baccelli, Guy Cohen, Geert Jan Olsder, and Jean-Pierre Quadrat.Synchronization and Linearity: an Algebra for Discrete Event Systems. John Wiley & Sons, 1992

work page 1992

[3] [3]

Bauschke, Jérôme Bolte, and Marc Teboulle

Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: ﬁrst-order methods revisited and applications.Mathematics of Operations Research, 42(2):330–348, 2016

work page 2016

[4] [4]

Bertsekas

Dimitri P. Bertsekas. Weighted sup-norm contractions in dynamic programming: A review and some new applications. Technical Report LIDS-P-2884, Dept. Elect. Eng. Comput. Sci., Massachusetts Institute of Technology, 2012. 12

work page 2012

[5] [5]

Bertsekas and David A

Dimitri P. Bertsekas and David A. Castanon. Adaptive aggregation methods for inﬁnite horizon dynamic programming.IEEE Transactions on Automatic Control, 34(6):589–598, 1989

work page 1989

[6] [6]

Cambridge University Press, 2004

Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004

work page 2004

[7] [7]

Cross-entropy optimization of control policies with adaptive basis functions.IEEE Transactions on Systems, Man, and Cybernetics, 41(1):196–209, 2011

Lucian Busoniu, Damien Ernst, Bart De Schutter, and Robert Babuska. Cross-entropy optimization of control policies with adaptive basis functions.IEEE Transactions on Systems, Man, and Cybernetics, 41(1):196–209, 2011

work page 2011

[8] [8]

Springer Science & Business Media, 2010

Peter Butkovič.Max-linear Systems: Theory and Algorithms. Springer Science & Business Media, 2010

work page 2010

[9] [9]

Extreme values for characteristic radii of a Poisson- Voronoi tessellation

Pierre Calka and Nicolas Chenavier. Extreme values for characteristic radii of a Poisson- Voronoi tessellation. Extremes, 17(3):359–385, 2014

work page 2014

[10] [10]

Duality and separation theorems in idempotent semimodules.Linear Algebra and its Applications, 379(1):395–422, 2004

Guy Cohen, Stéphane Gaubert, and Jean-Pierre Quadrat. Duality and separation theorems in idempotent semimodules.Linear Algebra and its Applications, 379(1):395–422, 2004

work page 2004

[11] [11]

Crandall and Pierre-Louis Lions

Michael G. Crandall and Pierre-Louis Lions. Viscosity solutions of Hamilton-Jacobi equations. Transactions of the American Mathematical Society, 277(1):1–42, 1983

work page 1983

[12] [12]

Dictionary Learning Algorithms and Applications

Bogdan Dumitrescu and Paul Irofti. Dictionary Learning Algorithms and Applications. Springer, 2018

work page 2018

[13] [13]

Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms

Stéphane Gaubert, William McEneaney, and Zheng Qu. Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms. In Conference on Decision and Control and European Control Conference, pages 1054–1061, 2011

work page 2011

[14] [14]

A brief survey of parametric value function approxima- tion

Matthieu Geist and Olivier Pietquin. A brief survey of parametric value function approxima- tion. Technical report, Supélec, 2010

work page 2010

[15] [15]

Eﬃcient solution algorithms for factored MDPs.Journal of Artiﬁcial Intelligence Research, 19:399–468, 2003

Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Eﬃcient solution algorithms for factored MDPs.Journal of Artiﬁcial Intelligence Research, 19:399–468, 2003

work page 2003

[16] [16]

Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30

Onésimo Hernández-Lerma and Jean-Bernard Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30. Springer Science & Business Media, 2012

work page 2012

[17] [17]

Sham Kakade, Mengdi Wang, and Lin F. Yang. Variance reduction methods for sublinear reinforcement learning. Technical Report 1802.09184, arXiv, 2018

work page arXiv 2018

[18] [18]

Keller, Shie Mannor, and Doina Precup

Philipp W. Keller, Shie Mannor, and Doina Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. InProceedings of the International Conference on Machine Learning (ICML), 2006

work page 2006

[19] [19]

Nonlinear optimal control via occupation measures and lmi-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

Jean-Bernard Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear optimal control via occupation measures and lmi-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

work page 2008

[20] [20]

Feature selection and feature learning for high- dimensional batch reinforcement learning: A survey.International Journal of Automation and Computing, 12(3):229–242, 2015

De-Rong Liu, Hong-Liang Li, and Ding Wang. Feature selection and feature learning for high- dimensional batch reinforcement learning: A survey.International Journal of Automation and Computing, 12(3):229–242, 2015. 13

work page 2015

[21] [21]

Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes.Journal of Machine Learning Research, 8(Oct):2169–2231, 2007

Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes.Journal of Machine Learning Research, 8(Oct):2169–2231, 2007

work page 2007

[22] [22]

Sparse modeling for image and vision processing

Julien Mairal, Francis Bach, and Jean Ponce. Sparse modeling for image and vision processing. Foundations and TrendsR⃝ in Computer Graphics and Vision, 8(2-3):85–283, 2014

work page 2014

[23] [23]

A Wavelet Tour of Signal Processing: the Sparse Way

Stéphane Mallat. A Wavelet Tour of Signal Processing: the Sparse Way. Academic Press, 2008

work page 2008

[24] [24]

McEneaney

William M. McEneaney. Error analysis of a max-plus algorithm for a ﬁrst-order HJB equation. In Stochastic Theory and Control, pages 335–351. Springer, 2002

work page 2002

[25] [25]

A study of reinforcement learning in the continuous case by the means of viscosity solutions.Machine Learning, 40(3):265–299, 2000

Rémi Munos. A study of reinforcement learning in the continuous case by the means of viscosity solutions.Machine Learning, 40(3):265–299, 2000

work page 2000

[26] [26]

Variable resolution discretization in optimal control

Rémi Munos and Andrew Moore. Variable resolution discretization in optimal control. Machine Learning, 49(2-3):291–323, 2002

work page 2002

[27] [27]

Greedy linear value-approximation for factored Markov decision processes

Relu Patrascu, Pascal Poupart, Dale Schuurmans, Craig Boutilier, and Carlos Guestrin. Greedy linear value-approximation for factored Markov decision processes. InProc. AAAI, 2002

work page 2002

[28] [28]

Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming

Martin L. Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014

work page 2014

[29] [29]

Enumeration of lattice points inℓ1-norm

Joan Serra-Sagristà. Enumeration of lattice points inℓ1-norm. Information Processing Letters, 76(1-2):39–44, 2000

work page 2000

[30] [30]

Variance reduced value iteration and faster algorithms for solving Markov decision processes

Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance reduced value iteration and faster algorithms for solving Markov decision processes. InProceedings of the Symposium on Discrete Algorithms (SODA), 2018

work page 2018

[31] [31]

Sutton and Andrew G

Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2018

work page 2018

[32] [32]

Springer, 2015

Vladimir Temlyakov.Sparse Approximation with Bases. Springer, 2015. A Value iteration and reduced value-iteration convergence In this appendix, we provide short proofs for the lemmas and propositions of the main paper related to (reduced) value iteration. A.1 Approximation of the value function through approximate ﬁxed points This is taken from [4, Prop. ...

work page 2015