pith. sign in

arxiv: 2603.10321 · v2 · submitted 2026-03-11 · 🧮 math.OC · math.AP· math.PR

Equilibrium under Time-Inconsistency: A New Existence Theory by Vanishing Entropy Regularization

Pith reviewed 2026-05-15 14:04 UTC · model grok-4.3

classification 🧮 math.OC math.APmath.PR
keywords time-inconsistent stochastic controlequilibrium Hamilton-Jacobi-Bellman equationentropy regularizationvanishing regularizationfixed-point argumentsPDE estimatesrelaxed equilibriaexistence theory
0
0 comments X

The pith

Vanishing entropy regularization proves existence of equilibria for time-inconsistent stochastic control by converging regularized solutions to a strong EHJB solution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the open problem of existence for the equilibrium Hamilton-Jacobi-Bellman equation in stochastic control with time-inconsistency, such as non-exponential discounting. It introduces entropy regularization to create an exploratory version of the equation whose classical solutions can be shown to exist through fixed-point arguments and careful PDE estimates on the solution and its derivatives. As the regularization parameter vanishes, these solutions converge in suitable norms to a strong solution of the original equation. The convergence supplies a verification theorem showing that the limiting relaxed control is indeed an equilibrium for the original time-inconsistent problem. This approach establishes well-posedness of the EHJB equation and existence of equilibria in diffusion models without imposing the stringent regularity conditions usually required upfront.

Core claim

By establishing classical solutions to the exploratory equilibrium Hamilton-Jacobi-Bellman equation via fixed-point methods and delicate PDE estimates, then proving convergence in appropriate norms as the entropy regularization vanishes, the paper obtains a strong solution to the original EHJB equation that verifies the existence of a relaxed equilibrium for the underlying time-inconsistent stochastic control problem.

What carries the argument

The vanishing entropy regularization of the exploratory equilibrium Hamilton-Jacobi-Bellman (EEHJB) equation, which enables fixed-point existence proofs and uniform PDE estimates before passage to the limit.

If this is right

  • Equilibria exist for diffusion models with initial-time-dependent preferences such as non-exponential discounting.
  • The limiting control obtained from the regularized problems satisfies the original EHJB equation in the strong sense.
  • A verification argument holds directly for the relaxed equilibrium without additional regularity hypotheses on the EHJB.
  • The framework gives well-posedness of the EHJB equation under model assumptions that avoid the usual stringent smoothness requirements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same regularization-plus-convergence strategy might extend to jump-diffusion or mean-field time-inconsistent problems where direct PDE analysis is harder.
  • Numerical schemes that solve the regularized exploratory equation for small positive regularization parameters could approximate the limiting equilibria with controllable error.
  • The method separates the existence question from regularity, potentially allowing weaker notions of solution in other classes of time-inconsistent games.

Load-bearing premise

The PDE estimates on the solution and derivatives of the exploratory equation remain uniform enough under the model assumptions to allow convergence in the required norms as the regularization parameter tends to zero.

What would settle it

A concrete counterexample in which the exploratory solutions fail to converge to a strong solution of the original EHJB in the stated norms, or in which the limiting control fails the verification test for being an equilibrium.

read the original abstract

This paper develops a framework for establishing the existence of solutions to the equilibrium Hamilton-Jacobi-Bellman (EHJB) equation arising in time-inconsistent stochastic control problems. The time-inconsistency in our setting arises from the initial-time dependence such as the non-exponential discounting. The classical approach typically relates the existence of equilibrium to the classical solution of the EHJB, whose existence is still an open problem under general model assumptions. We resolve this challenge by building on a vanishing entropy regularization approach. Using fixed-point arguments, we first establish the existence of classical solutions to the exploratory equilibrium Hamilton-Jacobi-Bellman Equation (EEHJB) by deriving a series of delicate PDE estimates for the solution and its derivatives. Building on these estimates for the solution of the EEHJB and its derivatives, we then conduct a rigorous convergence analysis under suitable norms as the entropy regularization vanishes. Our main result shows that solutions of the EEHJB converge to a strong solution of the original EHJB, corresponding to the limit of the regularized equilibria. This convergence yields a verification argument ensuring that the limiting relaxed equilibrium indeed constitutes an equilibrium for the original time-inconsistent control problem. We thus establish the well-posedness of the EHJB and the existence of equilibria in diffusion models under time-inconsistency, without resorting to conventional stringent regularity assumptions of the EHJB.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper develops a vanishing entropy regularization framework to establish existence of equilibria in time-inconsistent stochastic control problems with initial-time dependence (e.g., non-exponential discounting). It first proves existence of classical solutions to the regularized exploratory equilibrium HJB equation (EEHJB) via fixed-point arguments and a series of PDE estimates on the solution and derivatives; it then shows that, as the entropy parameter vanishes, these solutions converge in suitable norms to a strong solution of the original EHJB equation, which in turn yields a verification theorem confirming that the limiting relaxed equilibrium is indeed an equilibrium for the original problem.

Significance. If the uniform-in-ε PDE estimates and convergence hold under the stated model assumptions, the work supplies a new existence theory for the EHJB equation that avoids conventional stringent regularity requirements on coefficients, thereby addressing an open problem in the time-inconsistent control literature and providing a verification argument for the limiting equilibria.

major comments (2)
  1. [Convergence analysis (following the fixed-point existence for EEHJB)] The load-bearing step is the derivation of a priori bounds on the EEHJB solution and its derivatives that remain uniform with respect to the entropy parameter ε. The abstract and convergence analysis invoke C^{2,1} or W^{2,p} estimates, but the precise hypotheses on the diffusion coefficient (continuous versus Lipschitz) and the discount function under which these bounds are independent of ε are not stated explicitly enough to verify applicability to the general model class claimed.
  2. [Main existence and verification theorem] The verification argument asserts that the limiting strong solution of the EHJB corresponds to an equilibrium for the original time-inconsistent problem. It is unclear from the main theorem statement whether the limit satisfies the EHJB pointwise (classical sense) or only in a weak/integral sense, and how this distinction affects the verification theorem when the time-inconsistency is merely Lipschitz.
minor comments (2)
  1. [Preliminaries] Notation for the exploratory value function and the entropy-regularized Hamiltonian should be introduced with a dedicated table or list of symbols to improve readability.
  2. [Fixed-point argument for EEHJB] A few typographical inconsistencies appear in the statement of the fixed-point map (e.g., missing subscript on the discount factor in one displayed equation).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each major comment point by point below, indicating the revisions we will make to improve clarity and precision.

read point-by-point responses
  1. Referee: [Convergence analysis (following the fixed-point existence for EEHJB)] The load-bearing step is the derivation of a priori bounds on the EEHJB solution and its derivatives that remain uniform with respect to the entropy parameter ε. The abstract and convergence analysis invoke C^{2,1} or W^{2,p} estimates, but the precise hypotheses on the diffusion coefficient (continuous versus Lipschitz) and the discount function under which these bounds are independent of ε are not stated explicitly enough to verify applicability to the general model class claimed.

    Authors: We thank the referee for this observation on the need for explicit hypotheses. The diffusion coefficient is assumed Lipschitz continuous in the state variable (uniformly in time and control), and the discount function is assumed Lipschitz continuous in the time variable; these are the conditions under which the Schauder-type and W^{2,p} estimates for the EEHJB are derived and shown to be independent of ε. In the revised manuscript we will state these assumptions explicitly in the main theorems (Theorems 3.1 and 4.1) and add a dedicated remark in Section 3.2 clarifying that they suffice for uniformity in ε and for the claimed generality of the model class. revision: yes

  2. Referee: [Main existence and verification theorem] The verification argument asserts that the limiting strong solution of the EHJB corresponds to an equilibrium for the original time-inconsistent problem. It is unclear from the main theorem statement whether the limit satisfies the EHJB pointwise (classical sense) or only in a weak/integral sense, and how this distinction affects the verification theorem when the time-inconsistency is merely Lipschitz.

    Authors: We appreciate the referee drawing attention to the precise notion of solution and its implications for verification. The limit is a strong solution in the W^{2,p} sense (p > 1), satisfying the EHJB almost everywhere; it is not necessarily classical (C^{2,1}). Because the time-inconsistency (discount function) is merely Lipschitz, the verification theorem is established by passing to the limit in the regularized verification identity and controlling the remainder via the Lipschitz modulus and the strong convergence of the value functions and controls. In the revised version we will (i) define “strong solution” explicitly in the statement of Theorem 4.2, (ii) add a short paragraph in Section 5 explaining why the a.e. sense is sufficient under the Lipschitz assumption, and (iii) include a brief sketch of the limiting argument in the verification proof. revision: yes

Circularity Check

0 steps flagged

No circularity: existence via fixed-point and uniform PDE estimates on regularized equation, followed by limit passage

full rationale

The derivation proceeds by applying standard fixed-point theorems to obtain classical solutions of the exploratory EEHJB, deriving a priori C^{2,1} or W^{2,p} bounds that are uniform in the entropy parameter ε from the model coefficients, and passing to the limit in suitable norms to recover a strong solution of the original EHJB together with a verification theorem. None of these steps reduces the target equilibrium existence result to a fitted input, a self-definitional relation, or a load-bearing self-citation; the estimates are obtained directly from the PDE structure under the stated assumptions rather than by construction from the desired limit object.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the existence of classical solutions to the exploratory equation via fixed-point arguments and on convergence in suitable norms; these rest on unstated model assumptions such as regularity of coefficients and well-posedness of the underlying diffusion.

axioms (2)
  • domain assumption Existence of classical solutions to the EEHJB under suitable model assumptions via fixed-point arguments
    Invoked to start the regularization analysis before taking the vanishing limit.
  • domain assumption Convergence of EEHJB solutions and derivatives to a strong solution of the EHJB in appropriate norms
    Central step that transfers existence from the regularized to the original equation.

pith-pipeline@v0.9.0 · 5550 in / 1328 out tokens · 67857 ms · 2026-05-15T14:04:03.197436+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

  1. [1]

    Relaxed equilibria for time- inconsistent markov decision processes.Mathematics of Operations Research, 50(4):2666–2687, 2025

    Erhan Bayraktar, Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Relaxed equilibria for time- inconsistent markov decision processes.Mathematics of Operations Research, 50(4):2666–2687, 2025

  2. [2]

    Bj¨ ork, M

    T. Bj¨ ork, M. Khapko, and A. Murgoci. On time-inconsistent stochastic control in continuous time.Finance and Stochastics, 21(2):331–360, 2017. 29

  3. [3]

    On optimal tracking portfolio in incomplete markets: The reinforcement learning approach.SIAM Journal on Control and Optimization, 63(1):321– 348, 2025

    Lijun Bo, Yijie Huang, and Xiang Yu. On optimal tracking portfolio in incomplete markets: The reinforcement learning approach.SIAM Journal on Control and Optimization, 63(1):321– 348, 2025

  4. [4]

    Learning equilibrium mean-variance strategy.Math- ematical Finance, 33(4):1166–1212, 2023

    Min Dai, Yuchao Dong, and Yanwei Jia. Learning equilibrium mean-variance strategy.Math- ematical Finance, 33(4):1166–1212, 2023

  5. [5]

    Learning to optimally stop diffusion processes, with financial applications.arXiv preprint arXiv:2408.09242, 2024

    Min Dai, Yu Sun, Zuo Quan Xu, and Xun Yu Zhou. Learning to optimally stop diffusion processes, with financial applications.arXiv preprint arXiv:2408.09242, 2024

  6. [6]

    Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

    J. Dianetti, G. Ferrari, and R. Xu. Exploratory optimal stopping: A singular control formu- lation.Preprint, available at arXiv:2408.09335, 2024

  7. [7]

    Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

    Yuchao Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

  8. [8]

    Extended hjb equation for mean-variance stopping problem: Vanishing regularization method.Preprint, available at arXiv:2510.24128, 2025

    Yuchao Dong and Harry Zheng. Extended hjb equation for mean-variance stopping problem: Vanishing regularization method.Preprint, available at arXiv:2510.24128, 2025

  9. [9]

    Actor- critic learning for mean-field control in continuous time.Journal of Machine Learning Research, 26:1–42, 2025

    Noufel Frikha, Maximilien Germain, Lauriere Mathieu, Huyen Pham, and Xuanye Song. Actor- critic learning for mean-field control in continuous time.Journal of Machine Learning Research, 26:1–42, 2025

  10. [10]

    Entropy regularization for mean field games with learning.Mathematics of Operations Research, 47(4):3239–3260, 2022

    Xin Guo, Renyuan Xu, and Thaleia Zariphopoulou. Entropy regularization for mean field games with learning.Mathematics of Operations Research, 47(4):3239–3260, 2022

  11. [11]

    Continuous-time reinforcement learning for optimal switching over multiple regimes.Preprint, available at arXiv:2512.04697, 2025

    Yijie Huang, Mengge Li, Xiang Yu, and Zhou Zhou. Continuous-time reinforcement learning for optimal switching over multiple regimes.Preprint, available at arXiv:2512.04697, 2025

  12. [12]

    Convergence of policy iteration for entropy-regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

    Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Convergence of policy iteration for entropy-regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

  13. [13]

    Policy iteration achieves regularized equilibrium under time inconsistency.arXiv preprint arXiv:2603.06145, 2026

    Yu-Jui Huang and Zhang Keyu Yu, Xiang. Policy iteration achieves regularized equilibrium under time inconsistency.arXiv preprint arXiv:2603.06145, 2026

  14. [14]

    Policy gradient and actor–critic learning in continuous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

    Yanwei Jia and Xun Yu Zhou. Policy gradient and actor–critic learning in continuous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

  15. [15]

    q-Learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

    Yanwei Jia and Xun Yu Zhou. q-Learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

  16. [16]

    Springer, 1980

    Nikola˘ ı V Krylov.Controlled diffusion processes. Springer, 1980

  17. [17]

    American Mathematical Soc., 1996

    Nikola˘ ı V Krylov.Lectures on elliptic and parabolic equations in Holder spaces, volume 12. American Mathematical Soc., 1996

  18. [18]

    American Mathematical Soc., 2008

    Nikola˘ ı V Krylov.Lectures on Elliptic and Parabolic Equations in Sobolev Spaces, volume 96. American Mathematical Soc., 2008

  19. [19]

    Linear and quasi-linear equations of parabolic type, volume 23

    Olga Aleksandrovna Ladyzhenskaia, Vsevolod Alekseevich Solonnikov, and Nina N Ural’tseva. Linear and quasi-linear equations of parabolic type, volume 23. American Mathematical Soc., 1968. 30

  20. [20]

    Nonlocal fully nonlinear parabolic differential equations arising in time-inconsistent problems.Journal of Differential Equations, 358:339–385, June 2023

    Qian Lei and Chi Seng Pun. Nonlocal fully nonlinear parabolic differential equations arising in time-inconsistent problems.Journal of Differential Equations, 358:339–385, June 2023

  21. [21]

    Nonlocality, nonlinearity, and time inconsistency in stochastic differential games.Mathematical Finance, 34(1):190–256, January 2024

    Qian Lei and Chi Seng Pun. Nonlocality, nonlinearity, and time inconsistency in stochastic differential games.Mathematical Finance, 34(1):190–256, January 2024

  22. [22]

    Mc-Graw-Hill, New York, 1991

    Walter Rudin.Functional Analysis. Mc-Graw-Hill, New York, 1991

  23. [23]

    Springer, 2007

    Daniel W Stroock and SR Srinivasa Varadhan.Multidimensional diffusion processes. Springer, 2007

  24. [24]

    R.H. Strotz. Myopia and inconsistency in dynamic utility maximization.Review of Economic Studies, 23(3):165–180, 1955

  25. [25]

    Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

    Wenpin Tang, Yuming Paul Zhang, and Xun Yu Zhou. Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

  26. [26]

    On strong solutions and explicit formulas forsolutions of stochastic integral equations.Mathematics of the USSR-Sbornik, 39(3):387, 1981

    Alexander Ju Veretennikov. On strong solutions and explicit formulas forsolutions of stochastic integral equations.Mathematics of the USSR-Sbornik, 39(3):387, 1981

  27. [27]

    Reinforcement learning in contin- uous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

    Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in contin- uous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

  28. [28]

    Continuous-time mean–variance portfolio selection: A rein- forcement learning framework.Mathematical Finance, 30(4):1273–1308, 2020

    Haoran Wang and Xun Yu Zhou. Continuous-time mean–variance portfolio selection: A rein- forcement learning framework.Mathematical Finance, 30(4):1273–1308, 2020

  29. [29]

    Academic press, 2014

    Jack Warga.Optimal control of differential and functional equations. Academic press, 2014

  30. [30]

    Continuous time q-learning for mean-field control problems.Applied Mathematics & Optimization, 91:10, 2025

    Xiaoli Wei and Xiang Yu. Continuous time q-learning for mean-field control problems.Applied Mathematics & Optimization, 91:10, 2025

  31. [31]

    Time-inconsistent optimal control problems and the equilibrium HJB equation

    Jiongmin Yong. Time-inconsistent optimal control problems and the equilibrium HJB equation. Math. Control Relat. Fields, 2(3):271–329, 2012

  32. [32]

    Time-inconsistent mean-field stopping problems: A regularized equilibrium approach.Finance and Stochastics, 30:179–236, 2026

    Xiang Yu and Fengyi Yuan. Time-inconsistent mean-field stopping problems: A regularized equilibrium approach.Finance and Stochastics, 30:179–236, 2026

  33. [33]

    Major-minor mean field game of stopping: An entropy regularization approach.Preprint, available at arXiv:2501.08770, 2025

    Xiang Yu, Jiacheng Zhang, Keyu Zhang, and Zhou Zhou. Major-minor mean field game of stopping: An entropy regularization approach.Preprint, available at arXiv:2501.08770, 2025. 31