pith. sign in

arxiv: 1906.08524 · v1 · pith:G3XCNHGVnew · submitted 2019-06-20 · 💻 cs.LG · stat.ML

Max-Plus Matching Pursuit for Deterministic Markov Decision Processes

Pith reviewed 2026-05-25 19:55 UTC · model grok-4.3

classification 💻 cs.LG stat.ML
keywords deterministic MDPsmax-plus algebravalue iterationmatching pursuitcovering numberscurse of dimensionalityvalue function approximation
0
0 comments X

The pith

In deterministic MDPs, max-plus approximation of the optimal value function with a fixed basis has complexity governed by covering numbers of the state space rather than its cardinality.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper applies max-plus algebra to value iteration in deterministic Markov decision processes, representing value functions on a dictionary of basis functions. This leads to an iterative algorithm whose error and complexity for fixed bases depend on covering numbers instead of the number of states. For high-dimensional or factored spaces, an adaptive basis selection similar to matching pursuit is proposed, showing empirical success on low-dimensional control problems despite lacking theoretical guarantees. The framework extends to general MDPs via measure-based formulations. A sympathetic reader would care because it offers a way to mitigate the curse of dimensionality in planning problems.

Core claim

When a fixed finite basis is used, the computational complexity of approximating the optimal value function is not directly related to the number of states but to notions of covering numbers of the state space. Adaptive bases can be selected via a matching pursuit procedure to handle factored state spaces.

What carries the argument

Max-plus linear combinations of basis value functions, used to approximate the Bellman operator iteration.

If this is right

  • Approximation error and runtime scale with the covering number of the state space under the chosen basis.
  • Deterministic MDPs can be solved approximately without enumerating all states if a good dictionary exists.
  • Adaptive dictionary selection via matching pursuit empirically reduces error on continuous control MDPs.
  • The method applies to general MDPs by lifting to measures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar max-plus techniques might apply to stochastic MDPs without measure lifting if expectations can be handled.
  • If covering numbers are small for many real-world MDPs, this could enable scalable planning in robotics.
  • Combining with function approximation from deep learning could yield hybrid methods.

Load-bearing premise

The optimal value function can be closely approximated in the max-plus sense by linear combinations of a modest number of basis functions.

What would settle it

A deterministic MDP where the max-plus approximation error remains large even when the covering number is small, or where runtime grows with state count despite small covering number.

Figures

Figures reproduced from arXiv: 1906.08524 by Francis Bach (SIERRA).

Figure 1
Figure 1. Figure 1: Approximation of a function V with different finite basis with 16 elements. One￾dimensional case. From left to right: piece-wise constant basis function, piecewise affine basis functions with well chosen value of c, then too small and too large. For all cases above, the optimization error (to approximate V∞ in Prop. 1) is range(r)τe−ρt/τ ; thus, in the sparse case, the number of iterations to achieve preci… view at source ↗
Figure 2
Figure 2. Figure 2: Approximation of a function V with different finite basis with 16 or 64 elements, within the MDP (i.e., leading to V∞ in Prop. 1), and with values of ρ that are 4 and 32. One-dimensional case. From left to right: (n = 16, ρ = 4), (n = 16, ρ = 32), (n = 64, ρ = 4), (n = 16, ρ = 32). Note that the approximation with a small number of basis functions here is the same for WW> and Z >+ Z > when Z = W (this will… view at source ↗
Figure 3
Figure 3. Figure 3: Approximation error kV − V∗k1 of a function V as a function of ρ and the number of basis functions, for piecewise constant or affine functions. Top: One-dimensional case, bottom: two-dimensional case. Left: fixed non adaptive basis, right: greedy (matching pursuit). the set of actions A = {−1, 1}, following [25], we consider the dynamical systems dx/dt = a (going left or right). The goal of the control pro… view at source ↗
Figure 4
Figure 4. Figure 4: Approximation of a function V with different finite basis with 16 elements. One￾dimensional case (with a convex optimal value function). From left to right: piece-wise constant basis function, piecewise affine basis functions with well chosen value of c, then too small and too large. We can write the criterion that needs to be optimized from Section 5 as follows: max s∈S min  U(s) − T V (s), −T V (s) − zn… view at source ↗
Figure 5
Figure 5. Figure 5: Approximation of a function V with different finite basis with 16 or 64 elements, within the MDP, and with values of ρ that are 4 and 32. One-dimensional case (with a convex optimal value function). From left to right: (n = 16, ρ = 4), (n = 16, ρ = 32), (n = 64, ρ = 4), (n = 16, ρ = 32) [PITH_FULL_IMAGE:figures/full_fig_p020_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Approximation error of a function V as a function of ρ and the number of basis functions, for piecewise constant functions and piecewise affine functions. One-dimensional case (with a convex optimal value function). Left: fixed, right: greedy. E.2 Two-dimensional We consider control problems on [0, 1]d , with d = 2, with 2d potential discrete actions (one in each coordinate direction). The corresponding Ha… view at source ↗
Figure 3
Figure 3. Figure 3: • V (x1, x2) = (1−3x1)+ + (6x1 −4)+ + (1−3x2)+ + (6x2 −4)+. This function, without potential for variable selection, is plotted in [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 7
Figure 7. Figure 7: Value function with potential for variable selection. Left: optimal, right: approximation [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Value function without potential for variable selection. Left: optimal, right: approxima [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Approximation error of a function V as a function of ρ and the number of basis functions, for piecewise constant functions and piecewise affine functions. Two-dimensional case corresponding to the function in [PITH_FULL_IMAGE:figures/full_fig_p021_9.png] view at source ↗
read the original abstract

We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The setup naturally leads to novel theoretical results which are simply formulated due to the max-plus algebra structure. For example, when considering a fixed (non adaptive) finite basis, the computational complexity of approximating the optimal value function is not directly related to the number of states, but to notions of covering numbers of the state space. In order to break the curse of dimensionality in factored state-spaces, we consider adaptive basis that can adapt to particular problems leading to an algorithm similar to matching pursuit from signal processing. They currently come with no theoretical guarantees but work empirically well on simple deterministic MDPs derived from low-dimensional continuous control problems. We focus primarily on deterministic MDPs but note that the framework can be applied to all MDPs by considering measure-based formulations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript presents a max-plus algebra approach to approximate value iteration for deterministic MDPs using fixed and adaptive dictionaries of basis value functions. It claims that for a fixed finite basis, the complexity of approximation depends on covering numbers of the state space rather than the number of states. An adaptive algorithm inspired by matching pursuit is introduced for breaking the curse of dimensionality in factored spaces, showing empirical success on simple deterministic MDPs from low-dimensional continuous control, though without theoretical guarantees. The framework is suggested to apply to general MDPs via measure-based formulations.

Significance. The approach provides a structured way to approximate value functions in deterministic MDPs, potentially decoupling complexity from state space size through covering numbers and max-plus operations. The empirical performance on control problems indicates practical promise. However, the lack of guarantees for the adaptive case and the reliance on an unstated closeness assumption for the fixed basis case temper the significance. Strengths include the clean formulation enabled by max-plus algebra.

major comments (2)
  1. [Abstract] Abstract: The assertion that computational complexity depends on covering numbers rather than |S| for a fixed basis presupposes that the optimal value function V* lies sufficiently close (in the max-plus sense) to the span of the dictionary so that the covering radius controls the error. No general result is supplied guaranteeing that a modest non-adaptive dictionary achieves small max-plus distance for arbitrary deterministic MDPs; without this, the covering-number argument does not establish the claimed decoupling from state cardinality.
  2. [Abstract] Abstract: The adaptive basis algorithm is stated to 'currently come with no theoretical guarantees'. If the manuscript contains any partial bounds or conditions under which guarantees might hold, they should be explicitly stated and derived to strengthen the contribution.
minor comments (1)
  1. The abstract mentions 'notions of covering numbers' without specifying which notions (e.g., epsilon-nets in what metric) are used in the complexity analysis; this should be clarified in the main text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful review and constructive comments on our manuscript. We address each major comment point by point below, with proposed revisions where appropriate.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertion that computational complexity depends on covering numbers rather than |S| for a fixed basis presupposes that the optimal value function V* lies sufficiently close (in the max-plus sense) to the span of the dictionary so that the covering radius controls the error. No general result is supplied guaranteeing that a modest non-adaptive dictionary achieves small max-plus distance for arbitrary deterministic MDPs; without this, the covering-number argument does not establish the claimed decoupling from state cardinality.

    Authors: We agree with the referee that the complexity scaling with covering numbers (rather than |S|) is conditional on the max-plus distance from V* to the dictionary span being controlled. The manuscript derives error bounds for the approximated value iteration that depend on the covering radius when this approximation quality holds, but does not provide a general existence result guaranteeing small max-plus distance for modest fixed dictionaries across arbitrary deterministic MDPs. The claim in the abstract is intended to highlight that, for a given fixed dictionary achieving sufficient approximation quality, the per-iteration cost scales with covering numbers of the state space. We will revise the abstract to explicitly note this conditional aspect and avoid any implication of unconditional decoupling. revision: yes

  2. Referee: [Abstract] Abstract: The adaptive basis algorithm is stated to 'currently come with no theoretical guarantees'. If the manuscript contains any partial bounds or conditions under which guarantees might hold, they should be explicitly stated and derived to strengthen the contribution.

    Authors: The manuscript states in the abstract that the adaptive matching-pursuit-style algorithm 'currently come with no theoretical guarantees.' A review of the full text confirms that no partial bounds, sufficient conditions, or preliminary theoretical results are derived for the adaptive case; the contribution for this part is empirical, demonstrating effectiveness on low-dimensional continuous control MDPs. We do not have additional material to add on this point. revision: no

Circularity Check

0 steps flagged

No circularity: complexity claim follows directly from max-plus structure on fixed basis

full rationale

The derivation applies standard max-plus algebra to value iteration on a fixed finite dictionary, yielding covering-number dependence as a direct algebraic consequence rather than a fitted or self-referential quantity. No equations reduce the claimed complexity bound to parameters estimated from the target result itself, and the adaptive basis section is explicitly presented without theoretical guarantees. Self-citations on max-plus tools are not load-bearing for the central claim, which remains self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on the algebraic properties of max-plus semirings and the assumption that value functions admit useful dictionary representations; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Max-plus algebra operations preserve the Bellman optimality structure for deterministic MDPs
    Invoked when the paper states that the setup 'naturally leads to novel theoretical results which are simply formulated due to the max-plus algebra structure'.

pith-pipeline@v0.9.0 · 5684 in / 1102 out tokens · 24722 ms · 2026-05-25T19:55:54.187863+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

  1. [1]

    The max-plus finite element method for solving deterministic optimal control problems: basic properties and convergence analysis

    Marianne Akian, Stéphane Gaubert, and Asma Lakhoua. The max-plus finite element method for solving deterministic optimal control problems: basic properties and convergence analysis. SIAM Journal on Control and Optimization, 47(2):817–848, 2008

  2. [2]

    John Wiley & Sons, 1992

    François Baccelli, Guy Cohen, Geert Jan Olsder, and Jean-Pierre Quadrat.Synchronization and Linearity: an Algebra for Discrete Event Systems. John Wiley & Sons, 1992

  3. [3]

    Bauschke, Jérôme Bolte, and Marc Teboulle

    Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications.Mathematics of Operations Research, 42(2):330–348, 2016

  4. [4]

    Bertsekas

    Dimitri P. Bertsekas. Weighted sup-norm contractions in dynamic programming: A review and some new applications. Technical Report LIDS-P-2884, Dept. Elect. Eng. Comput. Sci., Massachusetts Institute of Technology, 2012. 12

  5. [5]

    Bertsekas and David A

    Dimitri P. Bertsekas and David A. Castanon. Adaptive aggregation methods for infinite horizon dynamic programming.IEEE Transactions on Automatic Control, 34(6):589–598, 1989

  6. [6]

    Cambridge University Press, 2004

    Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004

  7. [7]

    Cross-entropy optimization of control policies with adaptive basis functions.IEEE Transactions on Systems, Man, and Cybernetics, 41(1):196–209, 2011

    Lucian Busoniu, Damien Ernst, Bart De Schutter, and Robert Babuska. Cross-entropy optimization of control policies with adaptive basis functions.IEEE Transactions on Systems, Man, and Cybernetics, 41(1):196–209, 2011

  8. [8]

    Springer Science & Business Media, 2010

    Peter Butkovič.Max-linear Systems: Theory and Algorithms. Springer Science & Business Media, 2010

  9. [9]

    Extreme values for characteristic radii of a Poisson- Voronoi tessellation

    Pierre Calka and Nicolas Chenavier. Extreme values for characteristic radii of a Poisson- Voronoi tessellation. Extremes, 17(3):359–385, 2014

  10. [10]

    Duality and separation theorems in idempotent semimodules.Linear Algebra and its Applications, 379(1):395–422, 2004

    Guy Cohen, Stéphane Gaubert, and Jean-Pierre Quadrat. Duality and separation theorems in idempotent semimodules.Linear Algebra and its Applications, 379(1):395–422, 2004

  11. [11]

    Crandall and Pierre-Louis Lions

    Michael G. Crandall and Pierre-Louis Lions. Viscosity solutions of Hamilton-Jacobi equations. Transactions of the American Mathematical Society, 277(1):1–42, 1983

  12. [12]

    Dictionary Learning Algorithms and Applications

    Bogdan Dumitrescu and Paul Irofti. Dictionary Learning Algorithms and Applications. Springer, 2018

  13. [13]

    Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms

    Stéphane Gaubert, William McEneaney, and Zheng Qu. Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms. In Conference on Decision and Control and European Control Conference, pages 1054–1061, 2011

  14. [14]

    A brief survey of parametric value function approxima- tion

    Matthieu Geist and Olivier Pietquin. A brief survey of parametric value function approxima- tion. Technical report, Supélec, 2010

  15. [15]

    Efficient solution algorithms for factored MDPs.Journal of Artificial Intelligence Research, 19:399–468, 2003

    Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Efficient solution algorithms for factored MDPs.Journal of Artificial Intelligence Research, 19:399–468, 2003

  16. [16]

    Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30

    Onésimo Hernández-Lerma and Jean-Bernard Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30. Springer Science & Business Media, 2012

  17. [17]

    Sham Kakade, Mengdi Wang, and Lin F. Yang. Variance reduction methods for sublinear reinforcement learning. Technical Report 1802.09184, arXiv, 2018

  18. [18]

    Keller, Shie Mannor, and Doina Precup

    Philipp W. Keller, Shie Mannor, and Doina Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. InProceedings of the International Conference on Machine Learning (ICML), 2006

  19. [19]

    Nonlinear optimal control via occupation measures and lmi-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

    Jean-Bernard Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear optimal control via occupation measures and lmi-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008

  20. [20]

    Feature selection and feature learning for high- dimensional batch reinforcement learning: A survey.International Journal of Automation and Computing, 12(3):229–242, 2015

    De-Rong Liu, Hong-Liang Li, and Ding Wang. Feature selection and feature learning for high- dimensional batch reinforcement learning: A survey.International Journal of Automation and Computing, 12(3):229–242, 2015. 13

  21. [21]

    Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes.Journal of Machine Learning Research, 8(Oct):2169–2231, 2007

    Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes.Journal of Machine Learning Research, 8(Oct):2169–2231, 2007

  22. [22]

    Sparse modeling for image and vision processing

    Julien Mairal, Francis Bach, and Jean Ponce. Sparse modeling for image and vision processing. Foundations and TrendsR⃝ in Computer Graphics and Vision, 8(2-3):85–283, 2014

  23. [23]

    A Wavelet Tour of Signal Processing: the Sparse Way

    Stéphane Mallat. A Wavelet Tour of Signal Processing: the Sparse Way. Academic Press, 2008

  24. [24]

    McEneaney

    William M. McEneaney. Error analysis of a max-plus algorithm for a first-order HJB equation. In Stochastic Theory and Control, pages 335–351. Springer, 2002

  25. [25]

    A study of reinforcement learning in the continuous case by the means of viscosity solutions.Machine Learning, 40(3):265–299, 2000

    Rémi Munos. A study of reinforcement learning in the continuous case by the means of viscosity solutions.Machine Learning, 40(3):265–299, 2000

  26. [26]

    Variable resolution discretization in optimal control

    Rémi Munos and Andrew Moore. Variable resolution discretization in optimal control. Machine Learning, 49(2-3):291–323, 2002

  27. [27]

    Greedy linear value-approximation for factored Markov decision processes

    Relu Patrascu, Pascal Poupart, Dale Schuurmans, Craig Boutilier, and Carlos Guestrin. Greedy linear value-approximation for factored Markov decision processes. InProc. AAAI, 2002

  28. [28]

    Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming

    Martin L. Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014

  29. [29]

    Enumeration of lattice points inℓ1-norm

    Joan Serra-Sagristà. Enumeration of lattice points inℓ1-norm. Information Processing Letters, 76(1-2):39–44, 2000

  30. [30]

    Variance reduced value iteration and faster algorithms for solving Markov decision processes

    Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance reduced value iteration and faster algorithms for solving Markov decision processes. InProceedings of the Symposium on Discrete Algorithms (SODA), 2018

  31. [31]

    Sutton and Andrew G

    Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2018

  32. [32]

    Springer, 2015

    Vladimir Temlyakov.Sparse Approximation with Bases. Springer, 2015. A Value iteration and reduced value-iteration convergence In this appendix, we provide short proofs for the lemmas and propositions of the main paper related to (reduced) value iteration. A.1 Approximation of the value function through approximate fixed points This is taken from [4, Prop. ...