Inverse Linear-Quadratic Gaussian Differential Games

Balint Varga; Felix Th\"ommes; Karl Handwerker; Lucas G\"unther; S\"oren Hohmann

arxiv: 2512.05552 · v2 · submitted 2025-12-05 · 📡 eess.SY · cs.SY

Inverse Linear-Quadratic Gaussian Differential Games

Lucas G\"unther , Felix Th\"ommes , Karl Handwerker , Balint Varga , S\"oren Hohmann This is my paper

Pith reviewed 2026-05-17 01:32 UTC · model grok-4.3

classification 📡 eess.SY cs.SY

keywords inverse differential gamesLQG gamesparameter identificationRiccati equationsstochastic differential gamescost recoveryfeedback estimation

0 comments

The pith

A method recovers cost parameters and noise levels from trajectories in finite-horizon LQG differential games.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops an inverse approach for stochastic differential games in which multiple players interact through linear-quadratic-Gaussian dynamics over a finite horizon. It first estimates the linear feedback strategies that players appear to be using, then recovers the unknown cost-function weights by applying a new algebraic rearrangement to the coupled Riccati differential equations that normally define optimal play, and finally determines the noise-intensity scalars by maximum-likelihood fitting. When the recovered parameters are inserted back into the forward game, the resulting trajectories closely reproduce the observed data in simulation.

Core claim

The paper claims that the inverse LQG differential-game problem can be solved by the three-step procedure of estimating feedback gains from data, identifying cost parameters through a novel reformulation of the coupled Riccati differential equations, and obtaining noise scaling factors via maximum-likelihood estimation, thereby recovering player cost functions that are consistent with the supplied trajectories.

What carries the argument

A novel reformulation of the coupled Riccati differential equations that converts the inverse problem into a solvable algebraic identification task once feedback strategies have been estimated.

If this is right

Recovered cost and noise parameters generate trajectories that closely match the observed data in numerical tests.
Both deterministic cost weights and stochastic noise intensities can be identified within the same framework.
The approach applies directly to any finite-horizon linear-quadratic-Gaussian differential game whose feedback laws can be estimated from measurements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same three-step structure could be applied to infer hidden objectives in multi-agent robotic or traffic systems once trajectory data are available.
If the horizon is taken to infinity the differential Riccati reformulation would reduce to an algebraic matrix equation, potentially simplifying the identification step.
Real-world application would require checking sensitivity to model mismatch between the true dynamics and the assumed linear-quadratic-Gaussian form.

Load-bearing premise

Observed trajectories are generated exactly by players employing linear feedback strategies within a finite-horizon LQG differential game whose dynamics and cost structure match the assumed model.

What would settle it

Generate trajectories from a game whose cost functions or noise statistics differ from the assumed structure, apply the identification procedure, and verify that the recovered parameters produce trajectories that deviate substantially from the original data.

Figures

Figures reproduced from arXiv: 2512.05552 by Balint Varga, Felix Th\"ommes, Karl Handwerker, Lucas G\"unther, S\"oren Hohmann.

read the original abstract

This paper presents a method for solving the Inverse Stochastic Differential Game (ISDG) problem in finite-horizon linear-quadratic Gaussian (LQG) differential games. The objective is to recover cost function parameters of all players, as well as noise scaling parameters of the stochastic system, consistent with observed trajectories. The proposed framework combines (i) estimation of the feedback strategies, (ii) identification of the cost function parameters via a novel reformulation of the coupled Riccati differential equations, and (iii) maximum likelihood estimation of the noise scaling parameters. Simulation results demonstrate that the approach recovers parameters, yielding trajectories that closely match the observed trajectories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a three-part pipeline for recovering costs and noise in finite-horizon LQG games from trajectories, with a Riccati rearrangement as the main technical step, but leaves error sensitivity unexamined.

read the letter

The core of this work is a method that first pulls linear feedback gains from observed state-control data in a finite-horizon LQG differential game, then uses a rearranged form of the coupled Riccati equations to back out each player's quadratic cost matrices, and finally runs maximum likelihood on the noise scaling. The simulations show parameter recovery that produces trajectories close to the data used for fitting. The reformulation of the Riccati equations for the inverse task is the clearest new piece; it avoids repeated forward solves and directly targets cost identification, which is a reasonable engineering step for this setting. Tying the three pieces together in one framework also shows some attention to the full stochastic game structure rather than treating costs in isolation. The main gap is the lack of any check on how estimation error in the feedback gains propagates through the inverse Riccati step. These mappings are often sensitive, so modest noise or bias in the first stage could distort the recovered Q and R matrices substantially. The paper reports good trajectory matches but gives no condition numbers, Monte Carlo error bars on the recovered parameters, or tests with perturbed gains, so the practical reliability stays hard to judge. The assumption that the data come exactly from linear-feedback LQG play is standard for these problems but narrows the scope. This is aimed at researchers in multi-agent control and inverse optimal control who need a concrete tool for finite-horizon stochastic games. A reader working on identification from trajectory data would get a usable starting point from the method and the numerical examples. I would send it for peer review; the framework is coherent enough that referees can verify the derivations and push for better robustness checks.

Referee Report

2 major / 2 minor

Summary. The paper presents a method for solving the Inverse Stochastic Differential Game (ISDG) problem in finite-horizon linear-quadratic Gaussian (LQG) differential games. It recovers cost function parameters of all players and noise scaling parameters from observed trajectories by combining estimation of feedback strategies, identification of cost parameters via a novel reformulation of the coupled Riccati differential equations, and maximum likelihood estimation of noise scaling parameters. Simulation results are reported to demonstrate parameter recovery and close matching between generated and observed trajectories.

Significance. If the central claim holds under realistic estimation noise, the work provides a practical pipeline for inferring player costs in stochastic multi-agent systems, with potential applications in robotics and economics. The reformulation of the Riccati equations is presented as enabling direct identification, and the three-step structure (strategy estimation + inverse Riccati + MLE) is a coherent contribution. However, the reported simulations do not quantify robustness, so the assessed significance remains conditional on addressing error propagation.

major comments (2)

[Cost parameter identification (Section 4)] The identification step via the novel reformulation of the coupled Riccati differential equations does not quantify how finite-sample errors in the estimated feedback gains propagate to the recovered Q_i and R_i matrices. Because the Riccati equations are integrated backward from the terminal condition and the gain-to-cost mapping is typically ill-conditioned, perturbations in the first stage can produce large deviations in the identified costs; no condition numbers, sensitivity bounds, or Monte-Carlo error bars on the recovered parameters are supplied.
[Numerical experiments (Section 5)] The simulation results claim close trajectory matching after parameter recovery, yet the experimental design does not report the magnitude of process noise, the sample size used for strategy estimation, or whether the observed trajectories were generated exactly under the assumed finite-horizon LQG structure with linear feedback. Without these controls, it is unclear whether the reported recovery survives realistic estimation error in the feedback gains.

minor comments (2)

[Problem formulation] Notation for the players' cost matrices (Q_i, R_i) and the noise scaling parameters should be introduced consistently in the problem formulation section to avoid ambiguity when the reformulation is presented.
[Abstract] The abstract states that the approach 'recovers parameters' but does not mention the key modeling assumption that all players employ linear feedback strategies within the exact LQG dynamics; adding this would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We address each major comment below and describe the revisions that will be incorporated to improve the manuscript.

read point-by-point responses

Referee: [Cost parameter identification (Section 4)] The identification step via the novel reformulation of the coupled Riccati differential equations does not quantify how finite-sample errors in the estimated feedback gains propagate to the recovered Q_i and R_i matrices. Because the Riccati equations are integrated backward from the terminal condition and the gain-to-cost mapping is typically ill-conditioned, perturbations in the first stage can produce large deviations in the identified costs; no condition numbers, sensitivity bounds, or Monte-Carlo error bars on the recovered parameters are supplied.

Authors: We agree that quantifying error propagation from estimated feedback gains to the recovered cost matrices is a valuable addition. In the revised manuscript we will include a sensitivity analysis of the inverse Riccati mapping, report condition numbers of the relevant linear operators, and add Monte Carlo experiments that display error bars on the recovered Q_i and R_i under finite-sample perturbations of the gains. revision: yes
Referee: [Numerical experiments (Section 5)] The simulation results claim close trajectory matching after parameter recovery, yet the experimental design does not report the magnitude of process noise, the sample size used for strategy estimation, or whether the observed trajectories were generated exactly under the assumed finite-horizon LQG structure with linear feedback. Without these controls, it is unclear whether the reported recovery survives realistic estimation error in the feedback gains.

Authors: We acknowledge that the current experimental description omits several implementation details. The trajectories were generated exactly under the finite-horizon LQG dynamics with linear feedback. In the revision we will explicitly state the process-noise magnitudes, the number of samples used for strategy estimation, and add further simulation trials that vary noise intensity to illustrate robustness to realistic estimation errors in the gains. revision: yes

Circularity Check

0 steps flagged

No significant circularity; inverse identification pipeline is self-contained

full rationale

The described framework first estimates feedback strategies from observed state-control trajectories, then applies a reformulation of the coupled Riccati differential equations to recover cost parameters Q_i and R_i, and finally performs MLE on noise scaling. This constitutes a standard inverse-optimal-control sequence in which the Riccati relation supplies an independent algebraic mapping from estimated gains to costs rather than a tautological re-expression of the same quantities. No quoted equations or self-citations in the abstract reduce any load-bearing step to its own inputs by construction, and the simulation validation uses external trajectory matching as an independent check. The derivation therefore remains non-circular against the observed data.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Only the abstract is available, so the ledger is necessarily incomplete; the approach rests on standard LQG assumptions whose precise parameterization is not detailed here.

axioms (2)

domain assumption System dynamics are linear with additive Gaussian noise.
Implicit in the LQG game formulation used for Riccati equations.
domain assumption Players employ linear feedback strategies.
Required for the coupled Riccati differential equations to apply.

pith-pipeline@v0.9.0 · 5412 in / 1330 out tokens · 65681 ms · 2026-05-17T01:32:01.918107+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

identification of the cost function parameters via a novel reformulation of the coupled Riccati differential equations

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

3 extracted references · 3 canonical work pages

[1]

and Ng, A

Abbeel, P. and Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning. InProceedings of the 21st International Conference on Machine Learning,

work page 2004
[2]

and Olsder, G

Ba¸ sar, T. and Olsder, G. (1999).Dynamic noncooperative game theory. SIAM, Philadelphia, PA, USA. Buckdahn, R. and Li, J. (2008). Stochastic dif- ferential games and viscosity solutions of hamil- ton–jacobi–bellman–isaacs equations.SIAM Journal on Control and Optimization, 47(1), 444–475. Chen, Z. and Guo, L. (2024). An inverse problem for adaptive linea...

work page 1999
[3]

Mehr, N., Wang, M., Bhatt, M., and Schwager, M. (2023). Maximum-entropy multi-agent dynamic games: forward and inverse solutions.IEEE Transactions on Robotics, 39(3), 1801–1815. Menner, M. and Zeilinger, M. (2020). Maximum likelihood methods for inverse learning of optimal controllers. IFAC-PapersOnLine, 53(2), 5266–5272. Molloy, T., Charaja, J., Hohmann,...

work page 2023

[1] [1]

and Ng, A

Abbeel, P. and Ng, A. (2004). Apprenticeship learning via inverse reinforcement learning. InProceedings of the 21st International Conference on Machine Learning,

work page 2004

[2] [2]

and Olsder, G

Ba¸ sar, T. and Olsder, G. (1999).Dynamic noncooperative game theory. SIAM, Philadelphia, PA, USA. Buckdahn, R. and Li, J. (2008). Stochastic dif- ferential games and viscosity solutions of hamil- ton–jacobi–bellman–isaacs equations.SIAM Journal on Control and Optimization, 47(1), 444–475. Chen, Z. and Guo, L. (2024). An inverse problem for adaptive linea...

work page 1999

[3] [3]

Mehr, N., Wang, M., Bhatt, M., and Schwager, M. (2023). Maximum-entropy multi-agent dynamic games: forward and inverse solutions.IEEE Transactions on Robotics, 39(3), 1801–1815. Menner, M. and Zeilinger, M. (2020). Maximum likelihood methods for inverse learning of optimal controllers. IFAC-PapersOnLine, 53(2), 5266–5272. Molloy, T., Charaja, J., Hohmann,...

work page 2023