On Data-based Nash Equilibria in LQ Nonzero-sum Differential Games

Matthias A. M\"uller; Victor G. Lopez

arxiv: 2601.11320 · v2 · pith:4E7ME22Cnew · submitted 2026-01-16 · 📡 eess.SY · cs.SY

On Data-based Nash Equilibria in LQ Nonzero-sum Differential Games

Victor G. Lopez , Matthias A. M\"uller This is my paper

Pith reviewed 2026-05-16 13:23 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords data-based controlNash equilibriumnonzero-sum differential gameslinear-quadratic gamespersistent excitationstochastic differential gamesstate observersmulti-agent systems

0 comments

The pith

Data from persistently excited multiagent trajectories yields Nash equilibrium strategies for linear-quadratic nonzero-sum differential games equivalently to model-based methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper shows how to solve linear-quadratic nonzero-sum differential games using only data collected from the system rather than an explicit model of the dynamics. In the deterministic case, persistently excited data from all agents directly produces the Nash strategies. For the stochastic case with noisy output measurements, each player designs its own state observer from its local data, and the resulting strategies again match the model-based ones. A reader would care because this removes the requirement for accurate a priori models in game-theoretic control problems, making the approach practical for real multiagent systems where models are uncertain or unavailable. The equivalence is demonstrated analytically and confirmed numerically.

Core claim

The proposed data-based solutions, which process collected data to compute the game strategies, are equivalent to the known model-based procedures for finding Nash equilibria in both the deterministic and stochastic formulations of the linear-quadratic nonzero-sum differential game.

What carries the argument

The data-based computation of Nash strategies from persistently excited input-state or input-output data, shown to be equivalent to solving the coupled Riccati equations of the model-based approach.

If this is right

Agents compute their Nash strategies solely from measured data without knowing the system matrices.
The approach extends to stochastic games by incorporating local state observers designed from noisy outputs.
Equivalence ensures that the data-based strategies achieve the same equilibrium performance as model-based ones.
Numerical experiments validate that the strategies coincide in practice.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such data-driven methods could allow online updating of strategies as new data arrives in time-varying environments.
The technique may generalize to other classes of differential games beyond the linear-quadratic case.
Implementation in hardware would require verifying persistent excitation in real-time data streams.

Load-bearing premise

The collected data must be persistently excited, and each player in the stochastic case must be able to construct a suitable state observer from its individual noisy measurements.

What would settle it

Run a simulation with data that lacks persistent excitation; the resulting data-based strategies will produce different closed-loop trajectories or costs than the true model-based Nash strategies.

read the original abstract

This paper considers data-based solutions of linear-quadratic nonzero-sum differential games. Two cases are considered. First, the deterministic game is solved and Nash equilibrium strategies are obtained by using persistently excited data from the multiagent system. Then, a stochastic formulation of the game is considered, where each agent measures a different noisy output signal and state observers must be designed for each player. It is shown that the proposed data-based solutions of these games are equivalent to known model-based procedures. The resulting data-based solutions are validated in a numerical experiment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper shows you can recover the exact Nash strategies for LQ nonzero-sum games from persistently excited data alone, matching the model-based Riccati solution in both deterministic and stochastic settings.

read the letter

The key takeaway is that this paper gives a way to find Nash equilibrium strategies for linear-quadratic nonzero-sum differential games straight from measured data, without needing the system matrices, and it proves these data-based strategies are exactly the same as the ones from the usual model-based Riccati approach. They handle two versions. In the deterministic case, they collect data from the multi-agent system under persistent excitation and form data matrices that replace the model in the coupled algebraic Riccati equations. The resulting equation for the parameters turns out to be identical once the excitation condition holds. For the stochastic version, each player only sees its own noisy output, so they design local observers. The paper shows that the observer errors vanish asymptotically under the same persistent excitation on the augmented state, allowing the data-based solution to still match the model-based one. This extends prior work on data-driven LQ control to the nonzero-sum game setting, which is new. The equivalence is shown algebraically by substitution, and a numerical example confirms the feedback gains match within floating point error. That's useful because in many applications you can run experiments to get data but struggle to identify a full model. The approach is clean on the math side. The proof relies on the standard persistent excitation rank condition, which is well-known but necessary. In the stochastic case, the observer design is per-player and local, which fits the decentralized nature of the game. However, the paper doesn't go deep into how sensitive the method is to the level of noise or to the choice of observer gains, which could be a practical limitation. Overall, the central argument holds without circularity or internal contradictions. The derivation is independent in the sense that it directly equates the two routes. This is relevant for control engineers and researchers working on multi-agent systems where models are unavailable or uncertain. Someone familiar with data-driven control techniques will find the extension straightforward and valuable. I would bring this to a reading group focused on data-driven methods or game-theoretic control. It deserves serious peer review because the equivalence is established rigorously for the given assumptions and the numerical validation supports it. Referees could help clarify any implementation details for the observers.

Referee Report

1 major / 2 minor

Summary. The manuscript develops data-based methods to compute Nash equilibria for linear-quadratic nonzero-sum differential games. In the deterministic case, persistently excited input-state trajectories are used to construct data matrices that are substituted into the coupled Riccati equations, yielding value-function parameters identical to the model-based solution once the rank condition holds. In the stochastic case, each player designs a local state observer from its own noisy output; the observer error is shown to decay asymptotically under the same persistent-excitation assumption on the augmented state, again producing feedback gains algebraically equivalent to the model-based Riccati solution. A numerical example confirms that the two routes produce identical gains within numerical tolerance.

Significance. If the algebraic equivalence is established without hidden dependence on fitted parameters, the result supplies a model-free route to exact Nash strategies for both deterministic and stochastic LQ games. This is valuable for multi-agent applications in which only trajectory data are available and explicit system matrices cannot be identified reliably. The stochastic extension with per-player observers broadens applicability to partial-observation settings.

major comments (1)

[§4] §4 (stochastic case): the claim that observer error dynamics vanish asymptotically under the PE condition on the augmented state is load-bearing for the equivalence result, yet the manuscript provides only a sketch; an explicit Lyapunov or rank argument showing that the PE condition implies uniform exponential stability of the error system is required.

minor comments (2)

[§3] Notation for the data matrices (e.g., the definition of the stacked regressor in the deterministic case) should be introduced once and used consistently; the current presentation redefines symbols across subsections.
[§5] The numerical example reports only final gain values; adding a table of the condition numbers of the data matrices and the observed convergence rate of the observer errors would strengthen the validation.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading, positive summary, and constructive comment. We address the single major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§4] §4 (stochastic case): the claim that observer error dynamics vanish asymptotically under the PE condition on the augmented state is load-bearing for the equivalence result, yet the manuscript provides only a sketch; an explicit Lyapunov or rank argument showing that the PE condition implies uniform exponential stability of the error system is required.

Authors: We agree that the stability argument in the stochastic case is central to the equivalence claim and that the current sketch can be strengthened. In the revised manuscript we will replace the sketch with a complete proof: we construct a quadratic Lyapunov function for the observer error system, invoke the persistent-excitation rank condition on the augmented regressor to establish a uniform lower bound on the excitation, and show that the derivative of the Lyapunov function is strictly negative definite, thereby proving uniform exponential stability. The added steps rely only on standard PE arguments already used in the deterministic part of the paper and do not alter any of the stated results. revision: yes

Circularity Check

0 steps flagged

No significant circularity; algebraic equivalence shown directly

full rationale

The derivation substitutes data matrices collected under persistent excitation directly into the standard model-based coupled Riccati equations for LQ nonzero-sum games, producing an identical linear system for the value-function parameters once the PE rank condition holds. This is a straightforward algebraic identity, not a fit or self-definition. The stochastic extension similarly substitutes per-player observer dynamics and shows asymptotic error vanishing under the same PE assumption. No load-bearing self-citations, ansatz smuggling, or renaming of known results occur; the central claim reduces to an explicit equivalence proof against external model-based Riccati solutions rather than to the paper's own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard linear-quadratic game assumptions and the persistent excitation condition required for data-driven identification; no new entities are introduced.

axioms (2)

domain assumption The underlying dynamics are linear and the costs are quadratic.
Standard setup for LQ differential games invoked throughout the abstract.
domain assumption The collected trajectories are persistently excited.
Required for the data-based solution to recover the equilibrium strategies.

pith-pipeline@v0.9.0 · 5383 in / 1162 out tokens · 42125 ms · 2026-05-16T13:23:31.915139+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

rank condition (19) allows obtaining the data-based representation... ˜HS(x)Γ = A − Σ BiKi

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.