An Error Bound for Aggregation in Approximate Dynamic Programming

Dimitri Bertsekas; Yuchao Li

arxiv: 2507.01324 · v2 · submitted 2025-07-02 · 🧮 math.OC · cs.SY· eess.SY

An Error Bound for Aggregation in Approximate Dynamic Programming

Yuchao Li , Dimitri Bertsekas This is my paper

Pith reviewed 2026-05-19 07:01 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY

keywords aggregationdynamic programmingerror boundsapproximate dynamic programmingreinforcement learningoptimal cost functiondiscounted problems

0 comments

The pith

A general aggregation framework for dynamic programming yields an error bound on optimal costs that applies to soft and feature-based schemes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an error bound between the optimal cost function obtained from an aggregated version of a discounted dynamic programming problem and the optimal cost of the original problem. This bound generalizes an earlier result that applied only to hard aggregation. A sympathetic reader would care because it justifies treating the solution of the smaller aggregate problem as a usable approximation for the true costs. If the bound holds, then more flexible aggregation choices become reliable for reducing problem size before applying reinforcement learning methods online.

Core claim

In a general aggregation framework for discounted finite-state infinite horizon dynamic programming problems, the optimal cost function of the aggregate problem can be obtained by exact dynamic programming and then used as a terminal cost approximation for an online reinforcement learning scheme, with a bound on the error relative to the original problem's optimal cost function. The bound applies to soft aggregation and feature-based aggregation schemes as well as the hard aggregation case treated in prior work.

What carries the argument

The error bound relating the optimal costs of the aggregate problem and the original problem, derived from the properties of the aggregation mapping.

If this is right

The optimal cost from the aggregate problem supplies a terminal cost approximation for online reinforcement learning with a known error guarantee.
Soft aggregation schemes, which blend states probabilistically, are covered by the same error analysis.
Feature-based aggregation methods are included under the bound without needing separate proofs.
Exact dynamic programming can be run on the reduced aggregate problem offline to support larger original problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Practitioners could select aggregation mappings that trade off computational savings against the size of the resulting error bound.
The framework may combine with modern function approximation techniques that learn features automatically.
Numerical checks on benchmark problems with known solutions could reveal how loose or tight the bound typically is.

Load-bearing premise

The aggregation framework produces an aggregate problem whose optimal cost function is computed exactly by dynamic programming.

What would settle it

Solve a small finite-state discounted DP problem both directly and after aggregation, then check whether the actual difference in optimal costs exceeds the value given by the derived bound.

Figures

Figures reproduced from arXiv: 2507.01324 by Dimitri Bertsekas, Yuchao Li.

read the original abstract

We consider a general aggregation framework for discounted finite-state infinite horizon dynamic programming (DP) problems. It defines an aggregate problem whose optimal cost function can be obtained off-line by exact DP and then used as a terminal cost approximation for an on-line reinforcement learning (RL) scheme. We derive a bound on the error between the optimal cost functions of the aggregate problem and the original problem. This bound was first derived by Tsitsiklis and van Roy [TvR96] for the special case of hard aggregation. Our bound is similar but applies far more broadly, including to soft aggregation and feature-based aggregation schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This generalizes the Tsitsiklis-van Roy error bound to soft and feature-based aggregation, but the contraction step for feature maps may need explicit conditions to hold.

read the letter

This paper generalizes the Tsitsiklis-van Roy error bound to soft and feature-based aggregation, but the contraction step for feature maps may need explicit conditions to hold. The authors set up a general aggregation framework for discounted finite-state infinite-horizon DP. They define an aggregate problem that can be solved exactly offline by standard DP methods, then use its optimal cost as a terminal approximation inside an online RL scheme. From this they derive a bound on the difference between the aggregate problem's optimal cost and the true optimal cost of the original problem. The form of the bound is the same as in the 1996 paper, but the claim is that it now covers soft aggregation, where the mapping from original states to aggregate states is probabilistic, and feature-based schemes that rely on a linear feature map. That extension is the main new element. The framework itself is cleanly stated and stays within standard discounted DP theory, so the basic contraction argument with factor alpha carries over without much trouble for the hard and soft cases. The soft spot sits in the feature-based part. A linear feature map acts like a projection, and projections do not automatically preserve the contraction property of the Bellman operator unless the map satisfies additional structure, such as being non-expansive with respect to the relevant norm or commuting with the transition kernel in a suitable way. The abstract does not list those conditions, so the full proof needs to show either that they follow from the setup or that they are assumed. If that step is missing or glossed over, the bound does not transfer as automatically as stated. The citations are appropriate and do not overclaim priority. The paper is aimed at specialists already working with aggregation in approximate DP and RL. Someone who needs error guarantees when moving beyond hard partitioning would find the result useful. It is worth sending to peer review so that referees can check the details of the feature case and suggest any missing assumptions.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces a general aggregation framework for discounted finite-state infinite-horizon DP problems. This framework defines an aggregate problem whose optimal cost function is computed exactly offline via DP and then employed as a terminal-cost approximation within an online RL scheme. The central result is an error bound between the optimal cost functions of the aggregate and original problems; the authors state that this bound generalizes the Tsitsiklis–van Roy (1996) result, which was limited to hard aggregation, to soft aggregation and feature-based schemes.

Significance. If the claimed generalization is rigorously established, the result would extend a classical error bound to a wider class of aggregation operators, thereby providing a uniform theoretical guarantee that could inform the choice of approximation architectures in large-scale approximate DP and RL. The manuscript does not report machine-checked proofs or reproducible code, but the derivation itself, if complete, would constitute a parameter-free theoretical contribution.

major comments (1)

[Abstract and main derivation] The central claim requires that the general aggregation operator Φ (which maps original costs to aggregate costs) preserves the contraction property of the discounted Bellman operator T, i.e., ||ΦTJ − ΦTJ′|| ≤ α||J − J′|| (or an equivalent inequality) for the feature-based case where Φ is typically a linear projection. The manuscript does not isolate or verify this step explicitly for feature-based schemes; without it, the bound does not follow automatically from the hard-aggregation case treated in TvR96.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for identifying an important point about the explicit verification of the contraction-preserving property. We address the major comment below and will revise the manuscript to strengthen the presentation.

read point-by-point responses

Referee: [Abstract and main derivation] The central claim requires that the general aggregation operator Φ (which maps original costs to aggregate costs) preserves the contraction property of the discounted Bellman operator T, i.e., ||ΦTJ − ΦTJ′|| ≤ α||J − J′|| (or an equivalent inequality) for the feature-based case where Φ is typically a linear projection. The manuscript does not isolate or verify this step explicitly for feature-based schemes; without it, the bound does not follow automatically from the hard-aggregation case treated in TvR96.

Authors: We agree that isolating the contraction-preserving property of Φ is necessary for a fully rigorous and self-contained argument. In the general aggregation framework of the manuscript, Φ is constructed as a non-expansive operator (with respect to the norm in which T is an α-contraction), so that the composition ΦT remains an α-contraction; the error bound then follows by standard arguments. For feature-based schemes, where Φ is a linear projection onto a feature subspace, this non-expansiveness holds under the weighted sup-norm when the basis functions satisfy the usual normalization conditions. Nevertheless, we acknowledge that the manuscript does not extract this step as a separate lemma for the feature-based and soft-aggregation cases. In the revised version we will add an explicit lemma that states and proves the non-expansiveness of Φ for these schemes, thereby making the generalization from the hard-aggregation result of Tsitsiklis and van Roy fully transparent. revision: yes

Circularity Check

0 steps flagged

Error bound derivation is a direct mathematical generalization from aggregation definitions and contraction properties

full rationale

The paper defines a general aggregation framework (including soft and feature-based) and derives the error bound between optimal costs of aggregate and original problems using standard discounted DP contraction arguments. This extends the TvR96 result for hard aggregation without reducing any quantity to a fitted parameter, self-definition, or self-citation chain. The central claim follows from the problem setup and Bellman operator properties, which are independent of the target bound. No load-bearing step collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions for discounted finite-state infinite-horizon DP problems and the definition of an aggregation framework that permits exact solution of the aggregate problem.

axioms (1)

domain assumption Discounted finite-state infinite horizon DP problems
Stated in the abstract as the setting for the general aggregation framework.

pith-pipeline@v0.9.0 · 5626 in / 1138 out tokens · 24055 ms · 2026-05-19T07:01:53.983541+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We derive a bound on the error between the optimal cost functions of the aggregate problem and the original problem... applies far more broadly, including to soft aggregation and feature-based aggregation schemes.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

Bertsekas and David A

Dimitri P. Bertsekas and David A. Castanon. Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control , 34(6):589--598, 1989

work page 1989
[2]

Bertsekas

Dimitri P. Bertsekas. Dynamic Programming and Optimal Control: Vol. II . Athena Scientific Belmont, 4th edition, 2012

work page 2012
[3]

Bertsekas

Dimitri P. Bertsekas. Feature-based aggregation and deep reinforcement learning: A survey and some new implementations. IEEE/CAA Journal of Automatica Sinica , 6(1):1--31, 2018

work page 2018
[4]

Bertsekas

Dimitri P. Bertsekas. Biased aggregation, rollout, and enhanced policy improvement for reinforcement learning. arXiv preprint arXiv:1910.02426 , 2019

work page arXiv 1910
[5]

Bertsekas

Dimitri P. Bertsekas. Reinforcement Learning and Optimal Control . Athena Scientific, 2019

work page 2019
[6]

Bertsekas and John N

Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming . Athena Scientific, Belmont, MA, 1996

work page 1996
[7]

Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on Machine Learning , pages 261--268. Elsevier, 1995

work page 1995
[8]

Reinforcement learning with soft state aggregation

Satinder Singh, Tommi Jaakkola, and Michael Jordan. Reinforcement learning with soft state aggregation. Advances in Neural Information Processing Systems , 7, 1994

work page 1994
[9]

Tsitsiklis and Benjamin van Roy

John N. Tsitsiklis and Benjamin van Roy. Feature-based methods for large scale dynamic programming . Machine Learning , 22(1):59--94, Mar 1996

work page 1996
[10]

Feature-based methods for large scale dynamic programming

Benjamin van Roy. Feature-based methods for large scale dynamic programming. Master's thesis, Massachusetts Institute of Technology, 1995. Lab. for Info. and Decision Systems Report LIDSTH-2289

work page 1995

[1] [1]

Bertsekas and David A

Dimitri P. Bertsekas and David A. Castanon. Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control , 34(6):589--598, 1989

work page 1989

[2] [2]

Bertsekas

Dimitri P. Bertsekas. Dynamic Programming and Optimal Control: Vol. II . Athena Scientific Belmont, 4th edition, 2012

work page 2012

[3] [3]

Bertsekas

Dimitri P. Bertsekas. Feature-based aggregation and deep reinforcement learning: A survey and some new implementations. IEEE/CAA Journal of Automatica Sinica , 6(1):1--31, 2018

work page 2018

[4] [4]

Bertsekas

Dimitri P. Bertsekas. Biased aggregation, rollout, and enhanced policy improvement for reinforcement learning. arXiv preprint arXiv:1910.02426 , 2019

work page arXiv 1910

[5] [5]

Bertsekas

Dimitri P. Bertsekas. Reinforcement Learning and Optimal Control . Athena Scientific, 2019

work page 2019

[6] [6]

Bertsekas and John N

Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming . Athena Scientific, Belmont, MA, 1996

work page 1996

[7] [7]

Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on Machine Learning , pages 261--268. Elsevier, 1995

work page 1995

[8] [8]

Reinforcement learning with soft state aggregation

Satinder Singh, Tommi Jaakkola, and Michael Jordan. Reinforcement learning with soft state aggregation. Advances in Neural Information Processing Systems , 7, 1994

work page 1994

[9] [9]

Tsitsiklis and Benjamin van Roy

John N. Tsitsiklis and Benjamin van Roy. Feature-based methods for large scale dynamic programming . Machine Learning , 22(1):59--94, Mar 1996

work page 1996

[10] [10]

Feature-based methods for large scale dynamic programming

Benjamin van Roy. Feature-based methods for large scale dynamic programming. Master's thesis, Massachusetts Institute of Technology, 1995. Lab. for Info. and Decision Systems Report LIDSTH-2289

work page 1995