arxiv: 2605.00158 · v1 · submitted 2026-04-30 · 🧮 math.OC · cs.GT· cs.SY· eess.SY

Recognition: unknown

Moral Hazard in LTI Dynamics: A Hypothesis Testing Approach

Jaewon Jeong , Pan-Yang Su , S. Shankar Sastry , Anil Aswani

Authors on Pith no claims yet

Pith reviewed 2026-05-09 20:11 UTC · model grok-4.3

classification 🧮 math.OC cs.GTcs.SYeess.SY

keywords moral hazardLTI systemsincentive designhypothesis testingcontrol systemsrisk aversionpayment schemeslinear feedback

0 comments

The pith

In LTI control systems with hidden controller choice, the optimal payment after a fixed horizon is set by a likelihood ratio hypothesis test on the observed trajectory.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper studies moral hazard in linear control where a principal cannot see which controller an agent picked but wants the agent to select the one that keeps states cheap. The agent bears the control effort cost, dislikes variability in the eventual payment, and the system is an affine LTI plant with known noise. The principal picks both the payment amount and the time at which it is disbursed; the key result is that the best payment rule reduces to testing whether the data are more likely under one controller or the other. A reader should care because the same structure appears in power-grid regulation and health-behavior programs, where effort is unobservable yet must be induced through observable outcomes.

Core claim

For an affine LTI system driven by known process noise, an agent who must choose one of two linear state-feedback controllers and who minimizes quadratic state cost plus the expected time-discounted payment (while bearing control cost and being risk-averse to payment variability), the payment that is disbursed after an optimizable but fixed horizon is optimally determined by a likelihood-ratio hypothesis test: the payment amount is a function of the ratio of the probability of the observed state sequence under each controller.

What carries the argument

Likelihood-ratio hypothesis test that compares the probability of the observed trajectory under the two candidate controllers and maps the ratio to a payment level.

If this is right

The optimal payment threshold can be found by solving a one-dimensional search that equates the agent's expected utility under each controller.
The scheme applies without change to the load-frequency-control and body-weight examples shown in the paper.
Optimizing the disbursement horizon is part of the same problem and can be performed numerically once the test is fixed.
Risk aversion enters only through the expectation of the discounted payment, so the test statistic itself remains unchanged.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same likelihood-ratio structure survives when the agent can pick from a continuum of controllers, the method would give a practical way to approximate incentives by discretizing the choice set.
The result suggests that trajectory monitoring for incentive problems in dynamics is fundamentally a statistical hypothesis test rather than a deterministic function of the final state.
Extending the horizon-optimization step to stochastic horizons or to multiple payment dates could be tested by replacing the fixed-time likelihood ratio with a sequential probability ratio test.

Load-bearing premise

The agent is restricted to exactly two known linear controllers on an exactly known affine LTI system whose noise statistics are fully observed by the principal.

What would settle it

Simulate or run an experiment in which the agent is allowed a third controller option or the noise covariance is misspecified; check whether the derived payment rule still makes the agent strictly prefer the target controller.

Figures

Figures reproduced from arXiv: 2605.00158 by Anil Aswani, Jaewon Jeong, Pan-Yang Su, S. Shankar Sastry.

**Figure 2.** Figure 2: Optimal Contract Parameters for Wellness Example. [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Many incentive design problems must contend with information asymmetries due to non-observation of efficiency (adverse selection) or non-observation of effort (moral hazard). And although a growing body of literature considers incentive design in control systems, the problem of designing incentives for control systems under information asymmetries has been less well-studied. This paper considers a model of moral hazard within control systems. In our model, the control system is described by an (affine) linear time-invariant (LTI) system with process noise. There is an agent who gets to choose (from between two choices) a linear state-feedback controller to apply to the LTI system, with one of the state-feedback controllers having a higher quadratic cost on the control inputs than the other. Our goal is to design a payment scheme that incentivizes the agent to choose the state-feedback controller that minimizes a quadratic cost on system states plus the time-discounted payment amount, subject to the understanding that the agent bears the control cost while being risk-averse with respect to their time-discounted payment. We formulate the problem as a constrained optimization, and prove that for a payment given after a fixed (but optimizable) time horizon the optimal payment scheme chooses the payment amount using a likelihood ratio hypothesis test. We numerically demonstrate our results by applying the derived optimal payment scheme to two examples: load frequency control (LFC) in power systems and wellness interventions for body weight loss.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper models moral hazard in affine LTI systems where an agent selects between two linear state-feedback controllers (differing in quadratic control cost). The principal designs a terminal payment after a fixed but optimizable horizon T to induce the agent to choose the controller minimizing quadratic state cost plus expected discounted payment, with the agent risk-averse to the payment stream and bearing control effort. The central result is a proof that the optimal payment scheme reduces to a likelihood-ratio hypothesis test on the observed trajectory; numerical illustrations are given for load-frequency control and wellness interventions.

Significance. If the characterization holds, the work supplies a clean sufficient-statistic reduction for incentive design in a restricted but practically relevant class of control problems, converting an infinite-dimensional payment-function optimization into a one-dimensional threshold rule. The explicit handling of risk aversion, known noise statistics, and an optimizable horizon distinguishes it from static contract-theory results and could inform applications in power systems and behavioral interventions. The numerical examples provide concrete validation of the derived scheme.

major comments (2)

[§3] §3 (Agent's Problem): the IR and IC constraints are written under risk aversion to the time-discounted payment, yet the precise utility function (CARA, quadratic, or other) is not stated; without it the first-order conditions used to replace arbitrary payments by LR-based payments cannot be verified to hold.
[§4] §4 (Main Theorem): the proof that any feasible payment can be replaced by an LR-threshold rule without loss of optimality relies on the two-action, finite-horizon, known-dynamics setting; the argument must explicitly show that the risk-aversion term does not destroy the sufficiency of the likelihood ratio when the horizon T is itself optimized.

minor comments (2)

[Abstract] The abstract states that numerical examples are provided but does not report the quantitative metrics (e.g., achieved cost reduction, constraint violation rates) or the baseline payment schemes against which the LR test is compared.
[Numerical Examples] In the LFC example, the state dimension, noise covariance, and exact quadratic cost matrices for the two controllers should be tabulated so that the induced trajectory distributions can be reproduced.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading, positive assessment of significance, and constructive comments. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [§3] §3 (Agent's Problem): the IR and IC constraints are written under risk aversion to the time-discounted payment, yet the precise utility function (CARA, quadratic, or other) is not stated; without it the first-order conditions used to replace arbitrary payments by LR-based payments cannot be verified to hold.

Authors: We agree that an explicit statement of the agent's utility is needed for verification. The model employs a CARA utility u(w) = -exp(-ρ w) with ρ > 0 applied to the discounted terminal payment. This choice is standard for risk-averse agents facing additive noise and yields the required first-order conditions because the agent's expected utility under each controller is a strictly increasing function of the likelihood ratio; the principal's optimization then reduces to a threshold rule on that ratio. We will add the explicit functional form and a short derivation of the FOCs in §3. revision: yes
Referee: [§4] §4 (Main Theorem): the proof that any feasible payment can be replaced by an LR-threshold rule without loss of optimality relies on the two-action, finite-horizon, known-dynamics setting; the argument must explicitly show that the risk-aversion term does not destroy the sufficiency of the likelihood ratio when the horizon T is itself optimized.

Authors: The proof in §4 first fixes T and shows that, for any payment function, an LR-threshold payment can be constructed that preserves the agent's expected utility (hence the IR and IC constraints) under both controllers while weakly lowering the principal's expected cost. Because the CARA utility is strictly increasing, the ordering of payments induced by the likelihood ratio remains optimal; the risk-aversion term therefore does not alter the sufficiency of the LR statistic. The outer optimization over T is performed after the payment rule is fixed for each T, so it does not interfere with the inner sufficiency argument. We will insert a clarifying paragraph immediately after the statement of the main theorem to make this separation explicit. revision: yes

Circularity Check

0 steps flagged

No significant circularity; standard derivation from constrained optimization

full rationale

The paper sets up the principal-agent problem as a constrained optimization over payment functions of observed trajectories, imposes the agent's IR and IC constraints (with risk aversion and quadratic costs), and shows that any optimal payment can be replaced by one depending only on the likelihood ratio of the two induced trajectory distributions. With exactly two discrete controllers, known affine LTI dynamics, and fixed horizon, the LR is a sufficient statistic, yielding a threshold rule. This is a direct characterization result from first-order conditions and sufficiency arguments; no equation reduces the claim to a fitted parameter, self-citation chain, or definitional tautology. The derivation is self-contained against the stated model assumptions.

Axiom & Free-Parameter Ledger

2 free parameters · 3 axioms · 0 invented entities

The central claim rests on standard domain assumptions from linear control theory and contract theory; no new entities are postulated and the only free parameters are the horizon length and risk-aversion coefficient that are part of the model statement.

free parameters (2)

payment horizon T
Chosen fixed but optimizable; appears in the statement of the optimal payment timing.
agent risk-aversion coefficient
Parameter in the agent's utility over the discounted payment; required for the risk-averse formulation.

axioms (3)

domain assumption The plant is an affine linear time-invariant system with additive process noise whose statistics are known to the designer.
Stated in the model description in the abstract.
domain assumption The agent must select exactly one of two linear state-feedback controllers, one of which has strictly higher quadratic control cost.
Core modeling choice that creates the moral-hazard binary choice.
domain assumption The agent's utility is quadratic in states plus a concave (risk-averse) function of the time-discounted payment, and the agent bears the control cost.
Defines the incentive misalignment that the payment must correct.

pith-pipeline@v0.9.0 · 5573 in / 1634 out tokens · 69926 ms · 2026-05-09T20:11:05.669077+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

22 extracted references · 4 canonical work pages

[1]

Unified tuning of PID load frequency controller for power systems via IMC,

W. Tan, “Unified tuning of PID load frequency controller for power systems via IMC,”IEEE Transactions on Power Systems, vol. 25, no. 1, pp. 341–350, Feb 2010

2010
[2]

The potential of economic mpc for power management,

T. G. Hovgaard, K. Edlund, and J. B. Jørgensen, “The potential of economic mpc for power management,” inIEEE CDC, 2010, pp. 7533–7538

2010
[3]

Incentive- compatible vertiport reservation in advanced air mobility: An auction- based approach,

P.-Y . Su, C. Maheshwari, V . M. Tuck, and S. Sastry, “Incentive- compatible vertiport reservation in advanced air mobility: An auction- based approach,” inIEEE CDC, 2024, pp. 7720–7727

2024
[4]

Adaptive incen- tive design with learning agents,

C. Maheshwari, K. Kulkarni, M. Wu, and S. Sastry, “Adaptive incen- tive design with learning agents,”IEEE Transactions on Automatic Control, pp. 1–16, 2025

2025
[5]

Laffont and D

J.-J. Laffont and D. Martimort,The Theory of Incentives: The Principal-Agent Model. Princeton: Princeton University Press, 2009

2009
[6]

A control-theoretic view on incentives,

Y .-C. Ho, P. B. Luh, and G. J. Olsder, “A control-theoretic view on incentives,”Automatica, vol. 18, no. 2, pp. 167–179, 1982

1982
[7]

Control synthesis for bilevel linear model predictive control,

Y . Mintz, J. A. Cabrera, J. R. Pedrasa, and A. Aswani, “Control synthesis for bilevel linear model predictive control,” inAmerican Control Conference (ACC), 2018, pp. 2338–2343

2018
[8]

Be- havioral analytics for myopic agents,

Y . Mintz, A. Aswani, P. Kaminsky, E. Flowers, and Y . Fukuoka, “Be- havioral analytics for myopic agents,”European journal of operational research, vol. 310, no. 2, pp. 793–811, 2023

2023
[9]

Dynamic incentive selection for hierarchical convex model predictive control,

A. Thirugnanam and K. Sreenath, “Dynamic incentive selection for hierarchical convex model predictive control,”arXiv preprint arXiv:2502.04642, 2025

work page arXiv 2025
[10]

A perspective on incentive design: Challenges and opportunities,

L. J. Ratliff, R. Dong, S. Sekar, and T. Fiez, “A perspective on incentive design: Challenges and opportunities,”Annu. Rev. Control. Robot. Auton. Syst., vol. 2, no. 1, pp. 305–338, 2019

2019
[11]

Repeated principal-agent games with unobserved agent rewards and perfect-knowledge agents,

I. Dogan, Z.-J. M. Shen, and A. Aswani, “Repeated principal-agent games with unobserved agent rewards and perfect-knowledge agents,” arXiv preprint arXiv:2304.07407, 2023

work page arXiv 2023
[12]

Estimating and incentivizing imperfect-knowledge agents with hidden rewards,

——, “Estimating and incentivizing imperfect-knowledge agents with hidden rewards,”arXiv preprint arXiv:2308.06717, 2023

work page arXiv 2023
[13]

Dynamic contracts with partial observations: Application to indirect load control,

I. Yang, D. S. Callaway, and C. J. Tomlin, “Dynamic contracts with partial observations: Application to indirect load control,” inAmerican Control Conference, 2014, pp. 1224–1230

2014
[14]

Delegated classifica- tion,

E. Saig, I. Talgam-Cohen, and N. Rosenfeld, “Delegated classifica- tion,”NeurIPS, vol. 36, pp. 13 200–13 236, 2023

2023
[15]

E. L. Lehmann and J. P. Romano,Testing statistical hypotheses. Springer, 2022

2022
[16]

Methods to integrate multinormals and compute classification measures,

A. Das and W. S. Geisler, “Methods to integrate multinormals and compute classification measures,”arXiv preprint arXiv:2012.14331, 2020

work page arXiv 2012
[17]

Computing the distribution of quadratic forms in normal variables,

J. P. Imhof, “Computing the distribution of quadratic forms in normal variables,”Biometrika, vol. 48, no. 3/4, pp. 419–426, 1961

1961
[18]

Algorithm as 155: The distribution of a linear combina- tion of chi-squared random variables,

R. B. Davies, “Algorithm as 155: The distribution of a linear combina- tion of chi-squared random variables,”Journal of the Royal Statistical Society. Series C, vol. 29, no. 3, pp. 323–333, 1980

1980
[19]

New methods to compute the generalized chi-square distri- bution,

A. Das, “New methods to compute the generalized chi-square distri- bution,”J. Stat. Comput. Simul., vol. 95, pp. 1–36, 2025

2025
[20]

chi2comb (version 0.1.0),

D. Horta, “chi2comb (version 0.1.0),” 2021. [Online]. Available: https://pypi.org/project/chi2comb/

2021
[21]

A new predictive equation for resting energy expenditure in healthy individuals,

M. D. Mifflin, S. T. St Jeor, L. A. Hill, B. J. Scott, S. A. Daugherty, and Y . O. Koh, “A new predictive equation for resting energy expenditure in healthy individuals,”The American journal of clinical nutrition, vol. 51, no. 2, pp. 241–247, 1990

1990
[22]

Behavioral modeling in weight loss interventions,

A. Aswani, P. Kaminsky, Y . Mintz, E. Flowers, and Y . Fukuoka, “Behavioral modeling in weight loss interventions,”European journal of operational research, vol. 272, no. 3, pp. 1058–1072, 2019

2019