Max-Plus Matching Pursuit for Deterministic Markov Decision Processes
Pith reviewed 2026-05-25 19:55 UTC · model grok-4.3
The pith
In deterministic MDPs, max-plus approximation of the optimal value function with a fixed basis has complexity governed by covering numbers of the state space rather than its cardinality.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
When a fixed finite basis is used, the computational complexity of approximating the optimal value function is not directly related to the number of states but to notions of covering numbers of the state space. Adaptive bases can be selected via a matching pursuit procedure to handle factored state spaces.
What carries the argument
Max-plus linear combinations of basis value functions, used to approximate the Bellman operator iteration.
If this is right
- Approximation error and runtime scale with the covering number of the state space under the chosen basis.
- Deterministic MDPs can be solved approximately without enumerating all states if a good dictionary exists.
- Adaptive dictionary selection via matching pursuit empirically reduces error on continuous control MDPs.
- The method applies to general MDPs by lifting to measures.
Where Pith is reading between the lines
- Similar max-plus techniques might apply to stochastic MDPs without measure lifting if expectations can be handled.
- If covering numbers are small for many real-world MDPs, this could enable scalable planning in robotics.
- Combining with function approximation from deep learning could yield hybrid methods.
Load-bearing premise
The optimal value function can be closely approximated in the max-plus sense by linear combinations of a modest number of basis functions.
What would settle it
A deterministic MDP where the max-plus approximation error remains large even when the covering number is small, or where runtime grows with state count despite small covering number.
Figures
read the original abstract
We consider deterministic Markov decision processes (MDPs) and apply max-plus algebra tools to approximate the value iteration algorithm by a smaller-dimensional iteration based on a representation on dictionaries of value functions. The setup naturally leads to novel theoretical results which are simply formulated due to the max-plus algebra structure. For example, when considering a fixed (non adaptive) finite basis, the computational complexity of approximating the optimal value function is not directly related to the number of states, but to notions of covering numbers of the state space. In order to break the curse of dimensionality in factored state-spaces, we consider adaptive basis that can adapt to particular problems leading to an algorithm similar to matching pursuit from signal processing. They currently come with no theoretical guarantees but work empirically well on simple deterministic MDPs derived from low-dimensional continuous control problems. We focus primarily on deterministic MDPs but note that the framework can be applied to all MDPs by considering measure-based formulations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a max-plus algebra approach to approximate value iteration for deterministic MDPs using fixed and adaptive dictionaries of basis value functions. It claims that for a fixed finite basis, the complexity of approximation depends on covering numbers of the state space rather than the number of states. An adaptive algorithm inspired by matching pursuit is introduced for breaking the curse of dimensionality in factored spaces, showing empirical success on simple deterministic MDPs from low-dimensional continuous control, though without theoretical guarantees. The framework is suggested to apply to general MDPs via measure-based formulations.
Significance. The approach provides a structured way to approximate value functions in deterministic MDPs, potentially decoupling complexity from state space size through covering numbers and max-plus operations. The empirical performance on control problems indicates practical promise. However, the lack of guarantees for the adaptive case and the reliance on an unstated closeness assumption for the fixed basis case temper the significance. Strengths include the clean formulation enabled by max-plus algebra.
major comments (2)
- [Abstract] Abstract: The assertion that computational complexity depends on covering numbers rather than |S| for a fixed basis presupposes that the optimal value function V* lies sufficiently close (in the max-plus sense) to the span of the dictionary so that the covering radius controls the error. No general result is supplied guaranteeing that a modest non-adaptive dictionary achieves small max-plus distance for arbitrary deterministic MDPs; without this, the covering-number argument does not establish the claimed decoupling from state cardinality.
- [Abstract] Abstract: The adaptive basis algorithm is stated to 'currently come with no theoretical guarantees'. If the manuscript contains any partial bounds or conditions under which guarantees might hold, they should be explicitly stated and derived to strengthen the contribution.
minor comments (1)
- The abstract mentions 'notions of covering numbers' without specifying which notions (e.g., epsilon-nets in what metric) are used in the complexity analysis; this should be clarified in the main text.
Simulated Author's Rebuttal
We thank the referee for the thoughtful review and constructive comments on our manuscript. We address each major comment point by point below, with proposed revisions where appropriate.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion that computational complexity depends on covering numbers rather than |S| for a fixed basis presupposes that the optimal value function V* lies sufficiently close (in the max-plus sense) to the span of the dictionary so that the covering radius controls the error. No general result is supplied guaranteeing that a modest non-adaptive dictionary achieves small max-plus distance for arbitrary deterministic MDPs; without this, the covering-number argument does not establish the claimed decoupling from state cardinality.
Authors: We agree with the referee that the complexity scaling with covering numbers (rather than |S|) is conditional on the max-plus distance from V* to the dictionary span being controlled. The manuscript derives error bounds for the approximated value iteration that depend on the covering radius when this approximation quality holds, but does not provide a general existence result guaranteeing small max-plus distance for modest fixed dictionaries across arbitrary deterministic MDPs. The claim in the abstract is intended to highlight that, for a given fixed dictionary achieving sufficient approximation quality, the per-iteration cost scales with covering numbers of the state space. We will revise the abstract to explicitly note this conditional aspect and avoid any implication of unconditional decoupling. revision: yes
-
Referee: [Abstract] Abstract: The adaptive basis algorithm is stated to 'currently come with no theoretical guarantees'. If the manuscript contains any partial bounds or conditions under which guarantees might hold, they should be explicitly stated and derived to strengthen the contribution.
Authors: The manuscript states in the abstract that the adaptive matching-pursuit-style algorithm 'currently come with no theoretical guarantees.' A review of the full text confirms that no partial bounds, sufficient conditions, or preliminary theoretical results are derived for the adaptive case; the contribution for this part is empirical, demonstrating effectiveness on low-dimensional continuous control MDPs. We do not have additional material to add on this point. revision: no
Circularity Check
No circularity: complexity claim follows directly from max-plus structure on fixed basis
full rationale
The derivation applies standard max-plus algebra to value iteration on a fixed finite dictionary, yielding covering-number dependence as a direct algebraic consequence rather than a fitted or self-referential quantity. No equations reduce the claimed complexity bound to parameters estimated from the target result itself, and the adaptive basis section is explicitly presented without theoretical guarantees. Self-citations on max-plus tools are not load-bearing for the central claim, which remains self-contained against the stated assumptions.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Max-plus algebra operations preserve the Bellman optimality structure for deterministic MDPs
Reference graph
Works this paper leans on
-
[1]
Marianne Akian, Stéphane Gaubert, and Asma Lakhoua. The max-plus finite element method for solving deterministic optimal control problems: basic properties and convergence analysis. SIAM Journal on Control and Optimization, 47(2):817–848, 2008
work page 2008
-
[2]
François Baccelli, Guy Cohen, Geert Jan Olsder, and Jean-Pierre Quadrat.Synchronization and Linearity: an Algebra for Discrete Event Systems. John Wiley & Sons, 1992
work page 1992
-
[3]
Bauschke, Jérôme Bolte, and Marc Teboulle
Heinz H. Bauschke, Jérôme Bolte, and Marc Teboulle. A descent lemma beyond Lipschitz gradient continuity: first-order methods revisited and applications.Mathematics of Operations Research, 42(2):330–348, 2016
work page 2016
- [4]
-
[5]
Dimitri P. Bertsekas and David A. Castanon. Adaptive aggregation methods for infinite horizon dynamic programming.IEEE Transactions on Automatic Control, 34(6):589–598, 1989
work page 1989
-
[6]
Cambridge University Press, 2004
Stephen Boyd and Lieven Vandenberghe.Convex Optimization. Cambridge University Press, 2004
work page 2004
-
[7]
Lucian Busoniu, Damien Ernst, Bart De Schutter, and Robert Babuska. Cross-entropy optimization of control policies with adaptive basis functions.IEEE Transactions on Systems, Man, and Cybernetics, 41(1):196–209, 2011
work page 2011
-
[8]
Springer Science & Business Media, 2010
Peter Butkovič.Max-linear Systems: Theory and Algorithms. Springer Science & Business Media, 2010
work page 2010
-
[9]
Extreme values for characteristic radii of a Poisson- Voronoi tessellation
Pierre Calka and Nicolas Chenavier. Extreme values for characteristic radii of a Poisson- Voronoi tessellation. Extremes, 17(3):359–385, 2014
work page 2014
-
[10]
Guy Cohen, Stéphane Gaubert, and Jean-Pierre Quadrat. Duality and separation theorems in idempotent semimodules.Linear Algebra and its Applications, 379(1):395–422, 2004
work page 2004
-
[11]
Crandall and Pierre-Louis Lions
Michael G. Crandall and Pierre-Louis Lions. Viscosity solutions of Hamilton-Jacobi equations. Transactions of the American Mathematical Society, 277(1):1–42, 1983
work page 1983
-
[12]
Dictionary Learning Algorithms and Applications
Bogdan Dumitrescu and Paul Irofti. Dictionary Learning Algorithms and Applications. Springer, 2018
work page 2018
-
[13]
Stéphane Gaubert, William McEneaney, and Zheng Qu. Curse of dimensionality reduction in max-plus based approximation methods: Theoretical estimates and improved pruning algorithms. In Conference on Decision and Control and European Control Conference, pages 1054–1061, 2011
work page 2011
-
[14]
A brief survey of parametric value function approxima- tion
Matthieu Geist and Olivier Pietquin. A brief survey of parametric value function approxima- tion. Technical report, Supélec, 2010
work page 2010
-
[15]
Carlos Guestrin, Daphne Koller, Ronald Parr, and Shobha Venkataraman. Efficient solution algorithms for factored MDPs.Journal of Artificial Intelligence Research, 19:399–468, 2003
work page 2003
-
[16]
Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30
Onésimo Hernández-Lerma and Jean-Bernard Lasserre. Discrete-Time Markov Control Processes: Basic Optimality Criteria, volume 30. Springer Science & Business Media, 2012
work page 2012
- [17]
-
[18]
Keller, Shie Mannor, and Doina Precup
Philipp W. Keller, Shie Mannor, and Doina Precup. Automatic basis function construction for approximate dynamic programming and reinforcement learning. InProceedings of the International Conference on Machine Learning (ICML), 2006
work page 2006
-
[19]
Jean-Bernard Lasserre, Didier Henrion, Christophe Prieur, and Emmanuel Trélat. Nonlinear optimal control via occupation measures and lmi-relaxations.SIAM Journal on Control and Optimization, 47(4):1643–1666, 2008
work page 2008
-
[20]
De-Rong Liu, Hong-Liang Li, and Ding Wang. Feature selection and feature learning for high- dimensional batch reinforcement learning: A survey.International Journal of Automation and Computing, 12(3):229–242, 2015. 13
work page 2015
-
[21]
Sridhar Mahadevan and Mauro Maggioni. Proto-value functions: A Laplacian framework for learning representation and control in Markov decision processes.Journal of Machine Learning Research, 8(Oct):2169–2231, 2007
work page 2007
-
[22]
Sparse modeling for image and vision processing
Julien Mairal, Francis Bach, and Jean Ponce. Sparse modeling for image and vision processing. Foundations and TrendsR⃝ in Computer Graphics and Vision, 8(2-3):85–283, 2014
work page 2014
-
[23]
A Wavelet Tour of Signal Processing: the Sparse Way
Stéphane Mallat. A Wavelet Tour of Signal Processing: the Sparse Way. Academic Press, 2008
work page 2008
- [24]
-
[25]
Rémi Munos. A study of reinforcement learning in the continuous case by the means of viscosity solutions.Machine Learning, 40(3):265–299, 2000
work page 2000
-
[26]
Variable resolution discretization in optimal control
Rémi Munos and Andrew Moore. Variable resolution discretization in optimal control. Machine Learning, 49(2-3):291–323, 2002
work page 2002
-
[27]
Greedy linear value-approximation for factored Markov decision processes
Relu Patrascu, Pascal Poupart, Dale Schuurmans, Craig Boutilier, and Carlos Guestrin. Greedy linear value-approximation for factored Markov decision processes. InProc. AAAI, 2002
work page 2002
-
[28]
Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming
Martin L. Puterman.Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley & Sons, 2014
work page 2014
-
[29]
Enumeration of lattice points inℓ1-norm
Joan Serra-Sagristà. Enumeration of lattice points inℓ1-norm. Information Processing Letters, 76(1-2):39–44, 2000
work page 2000
-
[30]
Variance reduced value iteration and faster algorithms for solving Markov decision processes
Aaron Sidford, Mengdi Wang, Xian Wu, and Yinyu Ye. Variance reduced value iteration and faster algorithms for solving Markov decision processes. InProceedings of the Symposium on Discrete Algorithms (SODA), 2018
work page 2018
-
[31]
Richard S. Sutton and Andrew G. Barto.Reinforcement Learning: An Introduction. MIT Press, 2018
work page 2018
-
[32]
Vladimir Temlyakov.Sparse Approximation with Bases. Springer, 2015. A Value iteration and reduced value-iteration convergence In this appendix, we provide short proofs for the lemmas and propositions of the main paper related to (reduced) value iteration. A.1 Approximation of the value function through approximate fixed points This is taken from [4, Prop. ...
work page 2015
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.