pith. sign in

arxiv: 2507.01324 · v2 · submitted 2025-07-02 · 🧮 math.OC · cs.SY· eess.SY

An Error Bound for Aggregation in Approximate Dynamic Programming

Pith reviewed 2026-05-19 07:01 UTC · model grok-4.3

classification 🧮 math.OC cs.SYeess.SY
keywords aggregationdynamic programmingerror boundsapproximate dynamic programmingreinforcement learningoptimal cost functiondiscounted problems
0
0 comments X

The pith

A general aggregation framework for dynamic programming yields an error bound on optimal costs that applies to soft and feature-based schemes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes an error bound between the optimal cost function obtained from an aggregated version of a discounted dynamic programming problem and the optimal cost of the original problem. This bound generalizes an earlier result that applied only to hard aggregation. A sympathetic reader would care because it justifies treating the solution of the smaller aggregate problem as a usable approximation for the true costs. If the bound holds, then more flexible aggregation choices become reliable for reducing problem size before applying reinforcement learning methods online.

Core claim

In a general aggregation framework for discounted finite-state infinite horizon dynamic programming problems, the optimal cost function of the aggregate problem can be obtained by exact dynamic programming and then used as a terminal cost approximation for an online reinforcement learning scheme, with a bound on the error relative to the original problem's optimal cost function. The bound applies to soft aggregation and feature-based aggregation schemes as well as the hard aggregation case treated in prior work.

What carries the argument

The error bound relating the optimal costs of the aggregate problem and the original problem, derived from the properties of the aggregation mapping.

If this is right

  • The optimal cost from the aggregate problem supplies a terminal cost approximation for online reinforcement learning with a known error guarantee.
  • Soft aggregation schemes, which blend states probabilistically, are covered by the same error analysis.
  • Feature-based aggregation methods are included under the bound without needing separate proofs.
  • Exact dynamic programming can be run on the reduced aggregate problem offline to support larger original problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Practitioners could select aggregation mappings that trade off computational savings against the size of the resulting error bound.
  • The framework may combine with modern function approximation techniques that learn features automatically.
  • Numerical checks on benchmark problems with known solutions could reveal how loose or tight the bound typically is.

Load-bearing premise

The aggregation framework produces an aggregate problem whose optimal cost function is computed exactly by dynamic programming.

What would settle it

Solve a small finite-state discounted DP problem both directly and after aggregation, then check whether the actual difference in optimal costs exceeds the value given by the derived bound.

Figures

Figures reproduced from arXiv: 2507.01324 by Dimitri Bertsekas, Yuchao Li.

Figure 2
Figure 2. Figure 2: Illustration of the aggregate problem, and the corresponding transi [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗
read the original abstract

We consider a general aggregation framework for discounted finite-state infinite horizon dynamic programming (DP) problems. It defines an aggregate problem whose optimal cost function can be obtained off-line by exact DP and then used as a terminal cost approximation for an on-line reinforcement learning (RL) scheme. We derive a bound on the error between the optimal cost functions of the aggregate problem and the original problem. This bound was first derived by Tsitsiklis and van Roy [TvR96] for the special case of hard aggregation. Our bound is similar but applies far more broadly, including to soft aggregation and feature-based aggregation schemes.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript introduces a general aggregation framework for discounted finite-state infinite-horizon DP problems. This framework defines an aggregate problem whose optimal cost function is computed exactly offline via DP and then employed as a terminal-cost approximation within an online RL scheme. The central result is an error bound between the optimal cost functions of the aggregate and original problems; the authors state that this bound generalizes the Tsitsiklis–van Roy (1996) result, which was limited to hard aggregation, to soft aggregation and feature-based schemes.

Significance. If the claimed generalization is rigorously established, the result would extend a classical error bound to a wider class of aggregation operators, thereby providing a uniform theoretical guarantee that could inform the choice of approximation architectures in large-scale approximate DP and RL. The manuscript does not report machine-checked proofs or reproducible code, but the derivation itself, if complete, would constitute a parameter-free theoretical contribution.

major comments (1)
  1. [Abstract and main derivation] The central claim requires that the general aggregation operator Φ (which maps original costs to aggregate costs) preserves the contraction property of the discounted Bellman operator T, i.e., ||ΦTJ − ΦTJ′|| ≤ α||J − J′|| (or an equivalent inequality) for the feature-based case where Φ is typically a linear projection. The manuscript does not isolate or verify this step explicitly for feature-based schemes; without it, the bound does not follow automatically from the hard-aggregation case treated in TvR96.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful review and for identifying an important point about the explicit verification of the contraction-preserving property. We address the major comment below and will revise the manuscript to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract and main derivation] The central claim requires that the general aggregation operator Φ (which maps original costs to aggregate costs) preserves the contraction property of the discounted Bellman operator T, i.e., ||ΦTJ − ΦTJ′|| ≤ α||J − J′|| (or an equivalent inequality) for the feature-based case where Φ is typically a linear projection. The manuscript does not isolate or verify this step explicitly for feature-based schemes; without it, the bound does not follow automatically from the hard-aggregation case treated in TvR96.

    Authors: We agree that isolating the contraction-preserving property of Φ is necessary for a fully rigorous and self-contained argument. In the general aggregation framework of the manuscript, Φ is constructed as a non-expansive operator (with respect to the norm in which T is an α-contraction), so that the composition ΦT remains an α-contraction; the error bound then follows by standard arguments. For feature-based schemes, where Φ is a linear projection onto a feature subspace, this non-expansiveness holds under the weighted sup-norm when the basis functions satisfy the usual normalization conditions. Nevertheless, we acknowledge that the manuscript does not extract this step as a separate lemma for the feature-based and soft-aggregation cases. In the revised version we will add an explicit lemma that states and proves the non-expansiveness of Φ for these schemes, thereby making the generalization from the hard-aggregation result of Tsitsiklis and van Roy fully transparent. revision: yes

Circularity Check

0 steps flagged

Error bound derivation is a direct mathematical generalization from aggregation definitions and contraction properties

full rationale

The paper defines a general aggregation framework (including soft and feature-based) and derives the error bound between optimal costs of aggregate and original problems using standard discounted DP contraction arguments. This extends the TvR96 result for hard aggregation without reducing any quantity to a fitted parameter, self-definition, or self-citation chain. The central claim follows from the problem setup and Bellman operator properties, which are independent of the target bound. No load-bearing step collapses to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions for discounted finite-state infinite-horizon DP problems and the definition of an aggregation framework that permits exact solution of the aggregate problem.

axioms (1)
  • domain assumption Discounted finite-state infinite horizon DP problems
    Stated in the abstract as the setting for the general aggregation framework.

pith-pipeline@v0.9.0 · 5626 in / 1138 out tokens · 24055 ms · 2026-05-19T07:01:53.983541+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

  1. [1]

    Bertsekas and David A

    Dimitri P. Bertsekas and David A. Castanon. Adaptive aggregation methods for infinite horizon dynamic programming. IEEE Transactions on Automatic Control , 34(6):589--598, 1989

  2. [2]

    Bertsekas

    Dimitri P. Bertsekas. Dynamic Programming and Optimal Control: Vol. II . Athena Scientific Belmont, 4th edition, 2012

  3. [3]

    Bertsekas

    Dimitri P. Bertsekas. Feature-based aggregation and deep reinforcement learning: A survey and some new implementations. IEEE/CAA Journal of Automatica Sinica , 6(1):1--31, 2018

  4. [4]

    Bertsekas

    Dimitri P. Bertsekas. Biased aggregation, rollout, and enhanced policy improvement for reinforcement learning. arXiv preprint arXiv:1910.02426 , 2019

  5. [5]

    Bertsekas

    Dimitri P. Bertsekas. Reinforcement Learning and Optimal Control . Athena Scientific, 2019

  6. [6]

    Bertsekas and John N

    Dimitri P. Bertsekas and John N. Tsitsiklis. Neuro-Dynamic Programming . Athena Scientific, Belmont, MA, 1996

  7. [7]

    Geoffrey J. Gordon. Stable function approximation in dynamic programming. In Proceedings of the Twelfth International Conference on Machine Learning , pages 261--268. Elsevier, 1995

  8. [8]

    Reinforcement learning with soft state aggregation

    Satinder Singh, Tommi Jaakkola, and Michael Jordan. Reinforcement learning with soft state aggregation. Advances in Neural Information Processing Systems , 7, 1994

  9. [9]

    Tsitsiklis and Benjamin van Roy

    John N. Tsitsiklis and Benjamin van Roy. Feature-based methods for large scale dynamic programming . Machine Learning , 22(1):59--94, Mar 1996

  10. [10]

    Feature-based methods for large scale dynamic programming

    Benjamin van Roy. Feature-based methods for large scale dynamic programming. Master's thesis, Massachusetts Institute of Technology, 1995. Lab. for Info. and Decision Systems Report LIDSTH-2289