Nash Approximation Gap in Truncated Infinite-horizon Partially Observable Markov Games

Chinmay Maheshwari; Lan Sang

arxiv: 2604.05131 · v1 · submitted 2026-04-06 · 💻 cs.MA · cs.SY· eess.SY

Nash Approximation Gap in Truncated Infinite-horizon Partially Observable Markov Games

Lan Sang , Chinmay Maheshwari This is my paper

Pith reviewed 2026-05-10 18:51 UTC · model grok-4.3

classification 💻 cs.MA cs.SYeess.SY

keywords partially observable Markov gamesNash equilibriumtruncation approximationfilter stabilityinfinite horizonmulti-agent gamesbelief state games

0 comments

The pith

Any Nash equilibrium of a truncated finite-memory POMG approximates the original infinite-horizon game's Nash equilibrium with error vanishing as memory length grows.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a truncation approach for infinite-horizon partially observable Markov games by restricting agents to finite windows of past information. This turns the intractable infinite belief-state game into a finite Markov game that can be solved directly. Under filter stability conditions, where beliefs forget old information, the equilibria of the truncated game become arbitrarily close to those of the full game. This provides a way to compute approximate solutions for long-horizon multi-agent decision problems with hidden states and asymmetric observations.

Core claim

We propose a finite-memory truncation framework that approximates infinite-horizon POMGs by a finite-state, finite-action Markov game, where agents condition decisions only on finite windows of common and private information. Under suitable filter stability (forgetting) conditions, we show that any Nash equilibrium of the truncated game is an ε-Nash equilibrium of the original POMG, where ε → 0 as the truncation length increases.

What carries the argument

The finite-memory truncation framework, which limits conditioning to finite histories of common and private information, converting the POMG into a finite Markov game whose equilibria converge under forgetting.

Load-bearing premise

The POMG must have filter stability or forgetting properties so that the effect of information from far in the past fades away completely as time passes.

What would settle it

Construct a POMG without filter stability where the approximation error remains bounded away from zero no matter how long the truncation window is chosen.

read the original abstract

Partially Observable Markov Games (POMGs) provide a general framework for modeling multi-agent sequential decision-making under asymmetric information. A common approach is to reformulate a POMG as a fully observable Markov game over belief states, where the state is the conditional distribution of the system state and agents' private information given common information, and actions correspond to mappings (prescriptions) from private information to actions. However, this reformulation is intractable in infinite-horizon settings, as both the belief state and action spaces grow with the accumulation of information over time. We propose a finite-memory truncation framework that approximates infinite-horizon POMGs by a finite-state, finite-action Markov game, where agents condition decisions only on finite windows of common and private information. Under suitable filter stability (forgetting) conditions, we show that any Nash equilibrium of the truncated game is an $\varepsilon$-Nash equilibrium of the original POMG, where $\varepsilon \to 0$ as the truncation length increases.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean truncation scheme that turns infinite-horizon POMGs into finite games with an epsilon-Nash guarantee under filter stability, but the practical reach depends on how often those stability conditions actually hold.

read the letter

The central result here is straightforward: truncate the memory window on both common and private information, solve the resulting finite Markov game for Nash equilibria, and those equilibria become epsilon-Nash for the original infinite-horizon POMG as the window grows, provided the belief filters forget distant history. That epsilon-Nash convergence is the new piece. Prior work already used belief-state reformulations for POMGs, but the explicit finite-memory truncation plus the multi-agent approximation guarantee under forgetting conditions does not appear in the cited literature. The argument is direct and avoids extra fitting or self-referential steps, which keeps the circularity burden low. The math looks internally consistent on the stated assumptions. The soft spots are mostly about scope rather than internal gaps. The filter stability conditions are taken as given; the paper does not show how often they hold for typical POMG dynamics or give counter-examples when they fail. Without explicit rates or numerical checks on how quickly epsilon drops with truncation length, it is hard to judge whether the scheme is usable beyond toy cases. The abstract and setup suggest the proofs are there, but a referee would still need to verify the derivation steps and the precise statement of the forgetting conditions. This paper is aimed at researchers who already work on theoretical multi-agent POMGs and want a principled way to approximate infinite-horizon equilibria. A reader looking for immediate algorithmic recipes or broad empirical validation will find less. It is worth sending to peer review. The claim is precise, the framework is reproducible in principle, and the result fills a specific gap even if the conditions limit its range.

Referee Report

2 major / 2 minor

Summary. The paper proposes a finite-memory truncation scheme for infinite-horizon partially observable Markov games (POMGs). It reformulates the POMG as a belief-state Markov game and approximates it by a finite-state, finite-action game in which agents condition only on finite windows of common and private information. Under filter stability (forgetting) conditions, the central result states that any Nash equilibrium of the truncated game is an ε-Nash equilibrium of the original infinite-horizon POMG, with ε vanishing as the truncation length tends to infinity.

Significance. If the stated approximation result holds, the work supplies a rigorous justification for using finite truncations to compute approximate equilibria in otherwise intractable infinite-horizon POMGs. The reliance on standard filter-stability assumptions from the POMDP literature makes the approach credible and potentially useful for multi-agent reinforcement learning under asymmetric information. The explicit ε-Nash guarantee with vanishing gap is a concrete, falsifiable contribution.

major comments (2)

[§4.2, Assumption 4.1 and Theorem 4.3] §4.2, Assumption 4.1 and Theorem 4.3: the filter-stability condition is stated in total-variation distance, yet the proof of the ε bound only controls the one-step prediction error; an explicit telescoping argument or induction that accumulates the tail contribution over the truncation window is missing, leaving the dependence of ε on the truncation length implicit rather than quantitative.
[§5.1, Eq. (17)] §5.1, Eq. (17): the definition of the truncated strategy space restricts prescriptions to finite windows, but the paper does not show that the best-response operator remains a contraction under the same filter-stability metric used for the state approximation; without this, the existence of a Nash equilibrium in the truncated game is not guaranteed by the same fixed-point argument invoked for the original game.

minor comments (2)

Notation for the common-information belief and private-information prescription is introduced in §2 but reused without re-statement in the proof of Theorem 4.3; a short table of symbols would improve readability.
[§6] The numerical illustration in §6 reports average payoffs but does not include the realized ε values or the empirical decay rate versus truncation length; adding these quantities would directly illustrate the theoretical claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments on our manuscript. We address each major comment below.

read point-by-point responses

Referee: [§4.2, Assumption 4.1 and Theorem 4.3] §4.2, Assumption 4.1 and Theorem 4.3: the filter-stability condition is stated in total-variation distance, yet the proof of the ε bound only controls the one-step prediction error; an explicit telescoping argument or induction that accumulates the tail contribution over the truncation window is missing, leaving the dependence of ε on the truncation length implicit rather than quantitative.

Authors: We agree that the manuscript would benefit from an explicit accumulation argument. The one-step total-variation bound under Assumption 4.1 is used to establish that the belief approximation error vanishes with increasing truncation length, but the telescoping sum over the window is only implicit in the current proof of Theorem 4.3. In the revision we will add a short inductive lemma (new Lemma 4.4) showing that the accumulated error is bounded by a geometric series whose ratio is the filter-stability contraction factor; this yields the explicit quantitative dependence ε ≤ K·γ^L (L = truncation length, γ < 1). revision: yes
Referee: [§5.1, Eq. (17)] §5.1, Eq. (17): the definition of the truncated strategy space restricts prescriptions to finite windows, but the paper does not show that the best-response operator remains a contraction under the same filter-stability metric used for the state approximation; without this, the existence of a Nash equilibrium in the truncated game is not guaranteed by the same fixed-point argument invoked for the original game.

Authors: The existence of a Nash equilibrium for the truncated game does not rely on the contraction-mapping argument employed for the original infinite-horizon POMG. Because the truncated game is a finite-state, finite-action Markov game (prescriptions are restricted to finite windows), a mixed-strategy Nash equilibrium exists by Nash’s theorem for finite normal-form games. The contraction property is used only to guarantee existence in the infinite-dimensional belief-space formulation of the original game. We will insert a clarifying sentence in §5.1 stating this distinction. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper establishes an approximation result: under externally stated filter stability (forgetting) conditions on the POMG, any Nash equilibrium of a finite-memory truncation is an ε-Nash equilibrium of the infinite-horizon game with ε vanishing as truncation length grows. This is a standard conditional convergence theorem whose proof relies on the given stability assumptions rather than any self-referential definition, fitted parameter renamed as prediction, or load-bearing self-citation chain. The truncation framework and belief-state reformulation are introduced as modeling choices, not derived from the target result. No ansatz is smuggled via citation and no known empirical pattern is merely renamed. The derivation chain is therefore self-contained against the stated assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption of filter stability; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Filter stability (forgetting) conditions hold for the POMG
Invoked to ensure that the approximation error epsilon goes to zero with increasing truncation length.

pith-pipeline@v0.9.0 · 5471 in / 1075 out tokens · 29490 ms · 2026-05-10T18:51:52.395136+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under suitable filter stability (forgetting) conditions, we show that any Nash equilibrium of the truncated game is an ε-Nash equilibrium of the original POMG, where ε→0 as the truncation length increases. (Theorem 1, Assumptions 3-4)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Assumption 3 (Uniform filter stability): ... F^(ℓ)(μ, c̃) - F^(ℓ)(μ', c̃) TV ≤ f(ℓ) ||μ-μ'|| TV with f(ℓ)→0

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages

[1]

Partially observable multi-agent rl with (quasi- ) efficiency: The blessing of information sharing,

X. Liu and K. Zhang, “Partially observable multi-agent rl with (quasi- ) efficiency: The blessing of information sharing,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 22 370–22 419

work page 2023
[2]

Common informa- tion based markov perfect equilibria for stochastic games with asym- metric information: Finite games,

A. Nayyar, A. Gupta, C. Langbort, and T. Bas ¸ar, “Common informa- tion based markov perfect equilibria for stochastic games with asym- metric information: Finite games,”IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 555–570, 2013

work page 2013
[3]

Common informa- tion based markov perfect equilibria for linear-gaussian games with asymmetric information,

A. Gupta, A. Nayyar, C. Langbort, and T. Basar, “Common informa- tion based markov perfect equilibria for linear-gaussian games with asymmetric information,”SIAM Journal on Control and Optimization, vol. 52, no. 5, pp. 3228–3260, 2014

work page 2014
[4]

Dynamic games with asymmetric information: Common information based perfect bayesian equilibria and sequential decomposition,

Y . Ouyang, H. Tavafoghi, and D. Teneketzis, “Dynamic games with asymmetric information: Common information based perfect bayesian equilibria and sequential decomposition,”IEEE Transactions on Au- tomatic Control, vol. 62, no. 1, pp. 222–237, 2016

work page 2016
[5]

Planning and learning in partially observable systems via filter stability,

N. Golowich, A. Moitra, and D. Rohatgi, “Planning and learning in partially observable systems via filter stability,” inProceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023, pp. 349–362

work page 2023
[6]

Exponential filter stability via do- brushin’s coefficient,

C. McDonald and S. Y ¨uksel, “Exponential filter stability via do- brushin’s coefficient,” 2020

work page 2020
[7]

Common information based approx- imate state representations in multi-agent reinforcement learning,

H. Kao and V . Subramanian, “Common information based approx- imate state representations in multi-agent reinforcement learning,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 6947–6967

work page 2022
[8]

Convergence of finite memory q learning for pomdps and near optimality of learned policies under filter stability,

A. D. Kara and S. Y ¨uksel, “Convergence of finite memory q learning for pomdps and near optimality of learned policies under filter stability,”Mathematics of Operations Research, vol. 48, no. 4, pp. 2066–2093, 2023

work page 2066
[9]

Near optimality of finite memory feedback policies in partially observed markov decision processes,

A. Kara and S. Yuksel, “Near optimality of finite memory feedback policies in partially observed markov decision processes,”Journal of Machine Learning Research, vol. 23, no. 11, pp. 1–46, 2022

work page 2022
[10]

Scalable policy-based rl algorithms for pomdps,

A. Anjarlekar, R. Etesami, and R. Srikant, “Scalable policy-based rl algorithms for pomdps,”arXiv preprint arXiv:2510.06540, 2025

work page arXiv 2025
[11]

Stability and uniform approximation of nonlinear filters using the hilbert metric and application to particle filters,

F. Le Gland and N. Oudjane, “Stability and uniform approximation of nonlinear filters using the hilbert metric and application to particle filters,”The Annals of Applied Probability, vol. 14, no. 1, pp. 144–187, 2004

work page 2004
[12]

Forgetting the initial distribution for hidden markov models,

R. Douc, G. Fort, E. Moulines, and P. Priouret, “Forgetting the initial distribution for hidden markov models,”Stochastic processes and their applications, vol. 119, no. 4, pp. 1235–1256, 2009

work page 2009
[13]

Approximate information state for approximate planning and reinforcement learning in partially observed systems,

J. Subramanian, A. Sinha, R. Seraj, and A. Mahajan, “Approximate information state for approximate planning and reinforcement learning in partially observed systems,”Journal of Machine Learning Research, vol. 23, no. 12, pp. 1–83, 2022

work page 2022
[14]

Partially observable multiagent reinforcement learning with information sharing,

X. Liu and K. Zhang, “Partially observable multiagent reinforcement learning with information sharing,”SIAM Journal on Control and Optimization, vol. 64, no. 2, pp. 673–697, 2026

work page 2026
[15]

Information state em- bedding in partially observable cooperative multi-agent reinforcement learning,

W. Mao, K. Zhang, E. Miehling, and T. Bas ¸ar, “Information state em- bedding in partially observable cooperative multi-agent reinforcement learning,” in2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 2020, pp. 6124–6131

work page 2020
[16]

On the role of information structure in reinforcement learning for partially-observable sequential teams and games,

A. Altabaa and Z. Yang, “On the role of information structure in reinforcement learning for partially-observable sequential teams and games,”Advances in Neural Information Processing Systems, vol. 37, pp. 6648–6710, 2024

work page 2024
[17]

Information contraction and decomposition,

A. Makur, “Information contraction and decomposition,” Ph.D. disser- tation, Massachusetts Institute of Technology, 2019

work page 2019

[1] [1]

Partially observable multi-agent rl with (quasi- ) efficiency: The blessing of information sharing,

X. Liu and K. Zhang, “Partially observable multi-agent rl with (quasi- ) efficiency: The blessing of information sharing,” inInternational Conference on Machine Learning. PMLR, 2023, pp. 22 370–22 419

work page 2023

[2] [2]

Common informa- tion based markov perfect equilibria for stochastic games with asym- metric information: Finite games,

A. Nayyar, A. Gupta, C. Langbort, and T. Bas ¸ar, “Common informa- tion based markov perfect equilibria for stochastic games with asym- metric information: Finite games,”IEEE Transactions on Automatic Control, vol. 59, no. 3, pp. 555–570, 2013

work page 2013

[3] [3]

Common informa- tion based markov perfect equilibria for linear-gaussian games with asymmetric information,

A. Gupta, A. Nayyar, C. Langbort, and T. Basar, “Common informa- tion based markov perfect equilibria for linear-gaussian games with asymmetric information,”SIAM Journal on Control and Optimization, vol. 52, no. 5, pp. 3228–3260, 2014

work page 2014

[4] [4]

Dynamic games with asymmetric information: Common information based perfect bayesian equilibria and sequential decomposition,

Y . Ouyang, H. Tavafoghi, and D. Teneketzis, “Dynamic games with asymmetric information: Common information based perfect bayesian equilibria and sequential decomposition,”IEEE Transactions on Au- tomatic Control, vol. 62, no. 1, pp. 222–237, 2016

work page 2016

[5] [5]

Planning and learning in partially observable systems via filter stability,

N. Golowich, A. Moitra, and D. Rohatgi, “Planning and learning in partially observable systems via filter stability,” inProceedings of the 55th Annual ACM Symposium on Theory of Computing, 2023, pp. 349–362

work page 2023

[6] [6]

Exponential filter stability via do- brushin’s coefficient,

C. McDonald and S. Y ¨uksel, “Exponential filter stability via do- brushin’s coefficient,” 2020

work page 2020

[7] [7]

Common information based approx- imate state representations in multi-agent reinforcement learning,

H. Kao and V . Subramanian, “Common information based approx- imate state representations in multi-agent reinforcement learning,” inInternational Conference on Artificial Intelligence and Statistics. PMLR, 2022, pp. 6947–6967

work page 2022

[8] [8]

Convergence of finite memory q learning for pomdps and near optimality of learned policies under filter stability,

A. D. Kara and S. Y ¨uksel, “Convergence of finite memory q learning for pomdps and near optimality of learned policies under filter stability,”Mathematics of Operations Research, vol. 48, no. 4, pp. 2066–2093, 2023

work page 2066

[9] [9]

Near optimality of finite memory feedback policies in partially observed markov decision processes,

A. Kara and S. Yuksel, “Near optimality of finite memory feedback policies in partially observed markov decision processes,”Journal of Machine Learning Research, vol. 23, no. 11, pp. 1–46, 2022

work page 2022

[10] [10]

Scalable policy-based rl algorithms for pomdps,

A. Anjarlekar, R. Etesami, and R. Srikant, “Scalable policy-based rl algorithms for pomdps,”arXiv preprint arXiv:2510.06540, 2025

work page arXiv 2025

[11] [11]

Stability and uniform approximation of nonlinear filters using the hilbert metric and application to particle filters,

F. Le Gland and N. Oudjane, “Stability and uniform approximation of nonlinear filters using the hilbert metric and application to particle filters,”The Annals of Applied Probability, vol. 14, no. 1, pp. 144–187, 2004

work page 2004

[12] [12]

Forgetting the initial distribution for hidden markov models,

R. Douc, G. Fort, E. Moulines, and P. Priouret, “Forgetting the initial distribution for hidden markov models,”Stochastic processes and their applications, vol. 119, no. 4, pp. 1235–1256, 2009

work page 2009

[13] [13]

Approximate information state for approximate planning and reinforcement learning in partially observed systems,

J. Subramanian, A. Sinha, R. Seraj, and A. Mahajan, “Approximate information state for approximate planning and reinforcement learning in partially observed systems,”Journal of Machine Learning Research, vol. 23, no. 12, pp. 1–83, 2022

work page 2022

[14] [14]

Partially observable multiagent reinforcement learning with information sharing,

X. Liu and K. Zhang, “Partially observable multiagent reinforcement learning with information sharing,”SIAM Journal on Control and Optimization, vol. 64, no. 2, pp. 673–697, 2026

work page 2026

[15] [15]

Information state em- bedding in partially observable cooperative multi-agent reinforcement learning,

W. Mao, K. Zhang, E. Miehling, and T. Bas ¸ar, “Information state em- bedding in partially observable cooperative multi-agent reinforcement learning,” in2020 59th IEEE Conference on Decision and Control (CDC). IEEE, 2020, pp. 6124–6131

work page 2020

[16] [16]

On the role of information structure in reinforcement learning for partially-observable sequential teams and games,

A. Altabaa and Z. Yang, “On the role of information structure in reinforcement learning for partially-observable sequential teams and games,”Advances in Neural Information Processing Systems, vol. 37, pp. 6648–6710, 2024

work page 2024

[17] [17]

Information contraction and decomposition,

A. Makur, “Information contraction and decomposition,” Ph.D. disser- tation, Massachusetts Institute of Technology, 2019

work page 2019