Equilibrium under Time-Inconsistency: A New Existence Theory by Vanishing Entropy Regularization

Jingjie Zhang; Xiang Yu; Zhenhua Wang; Zhou Zhou

arxiv: 2603.10321 · v2 · submitted 2026-03-11 · 🧮 math.OC · math.AP· math.PR

Equilibrium under Time-Inconsistency: A New Existence Theory by Vanishing Entropy Regularization

Zhenhua Wang , Xiang Yu , Jingjie Zhang , Zhou Zhou This is my paper

Pith reviewed 2026-05-15 14:04 UTC · model grok-4.3

classification 🧮 math.OC math.APmath.PR

keywords time-inconsistent stochastic controlequilibrium Hamilton-Jacobi-Bellman equationentropy regularizationvanishing regularizationfixed-point argumentsPDE estimatesrelaxed equilibriaexistence theory

0 comments

The pith

Vanishing entropy regularization proves existence of equilibria for time-inconsistent stochastic control by converging regularized solutions to a strong EHJB solution.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the open problem of existence for the equilibrium Hamilton-Jacobi-Bellman equation in stochastic control with time-inconsistency, such as non-exponential discounting. It introduces entropy regularization to create an exploratory version of the equation whose classical solutions can be shown to exist through fixed-point arguments and careful PDE estimates on the solution and its derivatives. As the regularization parameter vanishes, these solutions converge in suitable norms to a strong solution of the original equation. The convergence supplies a verification theorem showing that the limiting relaxed control is indeed an equilibrium for the original time-inconsistent problem. This approach establishes well-posedness of the EHJB equation and existence of equilibria in diffusion models without imposing the stringent regularity conditions usually required upfront.

Core claim

By establishing classical solutions to the exploratory equilibrium Hamilton-Jacobi-Bellman equation via fixed-point methods and delicate PDE estimates, then proving convergence in appropriate norms as the entropy regularization vanishes, the paper obtains a strong solution to the original EHJB equation that verifies the existence of a relaxed equilibrium for the underlying time-inconsistent stochastic control problem.

What carries the argument

The vanishing entropy regularization of the exploratory equilibrium Hamilton-Jacobi-Bellman (EEHJB) equation, which enables fixed-point existence proofs and uniform PDE estimates before passage to the limit.

If this is right

Equilibria exist for diffusion models with initial-time-dependent preferences such as non-exponential discounting.
The limiting control obtained from the regularized problems satisfies the original EHJB equation in the strong sense.
A verification argument holds directly for the relaxed equilibrium without additional regularity hypotheses on the EHJB.
The framework gives well-posedness of the EHJB equation under model assumptions that avoid the usual stringent smoothness requirements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same regularization-plus-convergence strategy might extend to jump-diffusion or mean-field time-inconsistent problems where direct PDE analysis is harder.
Numerical schemes that solve the regularized exploratory equation for small positive regularization parameters could approximate the limiting equilibria with controllable error.
The method separates the existence question from regularity, potentially allowing weaker notions of solution in other classes of time-inconsistent games.

Load-bearing premise

The PDE estimates on the solution and derivatives of the exploratory equation remain uniform enough under the model assumptions to allow convergence in the required norms as the regularization parameter tends to zero.

What would settle it

A concrete counterexample in which the exploratory solutions fail to converge to a strong solution of the original EHJB in the stated norms, or in which the limiting control fails the verification test for being an equilibrium.

read the original abstract

This paper develops a framework for establishing the existence of solutions to the equilibrium Hamilton-Jacobi-Bellman (EHJB) equation arising in time-inconsistent stochastic control problems. The time-inconsistency in our setting arises from the initial-time dependence such as the non-exponential discounting. The classical approach typically relates the existence of equilibrium to the classical solution of the EHJB, whose existence is still an open problem under general model assumptions. We resolve this challenge by building on a vanishing entropy regularization approach. Using fixed-point arguments, we first establish the existence of classical solutions to the exploratory equilibrium Hamilton-Jacobi-Bellman Equation (EEHJB) by deriving a series of delicate PDE estimates for the solution and its derivatives. Building on these estimates for the solution of the EEHJB and its derivatives, we then conduct a rigorous convergence analysis under suitable norms as the entropy regularization vanishes. Our main result shows that solutions of the EEHJB converge to a strong solution of the original EHJB, corresponding to the limit of the regularized equilibria. This convergence yields a verification argument ensuring that the limiting relaxed equilibrium indeed constitutes an equilibrium for the original time-inconsistent control problem. We thus establish the well-posedness of the EHJB and the existence of equilibria in diffusion models under time-inconsistency, without resorting to conventional stringent regularity assumptions of the EHJB.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Vanishing entropy regularization gives a workable existence proof for strong EHJB solutions in time-inconsistent control, but the uniform estimates are the part that still needs checking.

read the letter

Hi there, the main thing to know is that this paper proves existence of strong solutions to the equilibrium HJB equation for time-inconsistent stochastic control by adding entropy regularization, solving the regularized exploratory equation, and passing to the limit as the parameter vanishes. They first get classical solutions to the regularized version via fixed-point arguments, then derive a priori bounds on the value function and its derivatives that do not depend on the regularization strength, and finally show convergence in suitable norms to a strong solution of the original EHJB together with a verification theorem that the limit equilibrium works for the unregularized problem. This handles general diffusion models with non-exponential discounting without the strong regularity that direct attacks usually require. The approach is new in its specific combination of vanishing entropy with the fixed-point step and the convergence analysis, and it looks like a clean way around the open existence question. The paper does a solid job laying out the framework and claiming the verification result follows from the limit. The soft spot is the uniformity of those PDE estimates. The argument needs the C^{2,1} or W^{2,p} bounds to stay controlled even when the time-inconsistency is only Lipschitz and the diffusion coefficient is merely continuous; if the estimates pick up hidden dependence on the regularization parameter or require extra smoothness, the limit may not be strong enough and the verification could fail. The abstract calls the estimates delicate, so the details in the full text matter. This is for specialists in stochastic control who work on time-inconsistent problems and equilibrium equations. A reader interested in existence techniques for these models would find the method useful. It deserves a serious referee because it targets a real open issue with a concrete new route, even if the estimates might need some tightening on review. I would send it out for peer review.

Referee Report

2 major / 2 minor

Summary. The paper develops a vanishing entropy regularization framework to establish existence of equilibria in time-inconsistent stochastic control problems with initial-time dependence (e.g., non-exponential discounting). It first proves existence of classical solutions to the regularized exploratory equilibrium HJB equation (EEHJB) via fixed-point arguments and a series of PDE estimates on the solution and derivatives; it then shows that, as the entropy parameter vanishes, these solutions converge in suitable norms to a strong solution of the original EHJB equation, which in turn yields a verification theorem confirming that the limiting relaxed equilibrium is indeed an equilibrium for the original problem.

Significance. If the uniform-in-ε PDE estimates and convergence hold under the stated model assumptions, the work supplies a new existence theory for the EHJB equation that avoids conventional stringent regularity requirements on coefficients, thereby addressing an open problem in the time-inconsistent control literature and providing a verification argument for the limiting equilibria.

major comments (2)

[Convergence analysis (following the fixed-point existence for EEHJB)] The load-bearing step is the derivation of a priori bounds on the EEHJB solution and its derivatives that remain uniform with respect to the entropy parameter ε. The abstract and convergence analysis invoke C^{2,1} or W^{2,p} estimates, but the precise hypotheses on the diffusion coefficient (continuous versus Lipschitz) and the discount function under which these bounds are independent of ε are not stated explicitly enough to verify applicability to the general model class claimed.
[Main existence and verification theorem] The verification argument asserts that the limiting strong solution of the EHJB corresponds to an equilibrium for the original time-inconsistent problem. It is unclear from the main theorem statement whether the limit satisfies the EHJB pointwise (classical sense) or only in a weak/integral sense, and how this distinction affects the verification theorem when the time-inconsistency is merely Lipschitz.

minor comments (2)

[Preliminaries] Notation for the exploratory value function and the entropy-regularized Hamiltonian should be introduced with a dedicated table or list of symbols to improve readability.
[Fixed-point argument for EEHJB] A few typographical inconsistencies appear in the statement of the fixed-point map (e.g., missing subscript on the discount factor in one displayed equation).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each major comment point by point below, indicating the revisions we will make to improve clarity and precision.

read point-by-point responses

Referee: [Convergence analysis (following the fixed-point existence for EEHJB)] The load-bearing step is the derivation of a priori bounds on the EEHJB solution and its derivatives that remain uniform with respect to the entropy parameter ε. The abstract and convergence analysis invoke C^{2,1} or W^{2,p} estimates, but the precise hypotheses on the diffusion coefficient (continuous versus Lipschitz) and the discount function under which these bounds are independent of ε are not stated explicitly enough to verify applicability to the general model class claimed.

Authors: We thank the referee for this observation on the need for explicit hypotheses. The diffusion coefficient is assumed Lipschitz continuous in the state variable (uniformly in time and control), and the discount function is assumed Lipschitz continuous in the time variable; these are the conditions under which the Schauder-type and W^{2,p} estimates for the EEHJB are derived and shown to be independent of ε. In the revised manuscript we will state these assumptions explicitly in the main theorems (Theorems 3.1 and 4.1) and add a dedicated remark in Section 3.2 clarifying that they suffice for uniformity in ε and for the claimed generality of the model class. revision: yes
Referee: [Main existence and verification theorem] The verification argument asserts that the limiting strong solution of the EHJB corresponds to an equilibrium for the original time-inconsistent problem. It is unclear from the main theorem statement whether the limit satisfies the EHJB pointwise (classical sense) or only in a weak/integral sense, and how this distinction affects the verification theorem when the time-inconsistency is merely Lipschitz.

Authors: We appreciate the referee drawing attention to the precise notion of solution and its implications for verification. The limit is a strong solution in the W^{2,p} sense (p > 1), satisfying the EHJB almost everywhere; it is not necessarily classical (C^{2,1}). Because the time-inconsistency (discount function) is merely Lipschitz, the verification theorem is established by passing to the limit in the regularized verification identity and controlling the remainder via the Lipschitz modulus and the strong convergence of the value functions and controls. In the revised version we will (i) define “strong solution” explicitly in the statement of Theorem 4.2, (ii) add a short paragraph in Section 5 explaining why the a.e. sense is sufficient under the Lipschitz assumption, and (iii) include a brief sketch of the limiting argument in the verification proof. revision: yes

Circularity Check

0 steps flagged

No circularity: existence via fixed-point and uniform PDE estimates on regularized equation, followed by limit passage

full rationale

The derivation proceeds by applying standard fixed-point theorems to obtain classical solutions of the exploratory EEHJB, deriving a priori C^{2,1} or W^{2,p} bounds that are uniform in the entropy parameter ε from the model coefficients, and passing to the limit in suitable norms to recover a strong solution of the original EHJB together with a verification theorem. None of these steps reduces the target equilibrium existence result to a fitted input, a self-definitional relation, or a load-bearing self-citation; the estimates are obtained directly from the PDE structure under the stated assumptions rather than by construction from the desired limit object.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the existence of classical solutions to the exploratory equation via fixed-point arguments and on convergence in suitable norms; these rest on unstated model assumptions such as regularity of coefficients and well-posedness of the underlying diffusion.

axioms (2)

domain assumption Existence of classical solutions to the EEHJB under suitable model assumptions via fixed-point arguments
Invoked to start the regularization analysis before taking the vanishing limit.
domain assumption Convergence of EEHJB solutions and derivatives to a strong solution of the EHJB in appropriate norms
Central step that transfers existence from the regularized to the original equation.

pith-pipeline@v0.9.0 · 5550 in / 1328 out tokens · 67857 ms · 2026-05-15T14:04:03.197436+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 33 canonical work pages

[1]

Relaxed equilibria for time- inconsistent markov decision processes.Mathematics of Operations Research, 50(4):2666–2687, 2025

Erhan Bayraktar, Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Relaxed equilibria for time- inconsistent markov decision processes.Mathematics of Operations Research, 50(4):2666–2687, 2025

work page 2025
[2]

Bj¨ ork, M

T. Bj¨ ork, M. Khapko, and A. Murgoci. On time-inconsistent stochastic control in continuous time.Finance and Stochastics, 21(2):331–360, 2017. 29

work page 2017
[3]

On optimal tracking portfolio in incomplete markets: The reinforcement learning approach.SIAM Journal on Control and Optimization, 63(1):321– 348, 2025

Lijun Bo, Yijie Huang, and Xiang Yu. On optimal tracking portfolio in incomplete markets: The reinforcement learning approach.SIAM Journal on Control and Optimization, 63(1):321– 348, 2025

work page 2025
[4]

Learning equilibrium mean-variance strategy.Math- ematical Finance, 33(4):1166–1212, 2023

Min Dai, Yuchao Dong, and Yanwei Jia. Learning equilibrium mean-variance strategy.Math- ematical Finance, 33(4):1166–1212, 2023

work page 2023
[5]

Learning to optimally stop diffusion processes, with financial applications.arXiv preprint arXiv:2408.09242, 2024

Min Dai, Yu Sun, Zuo Quan Xu, and Xun Yu Zhou. Learning to optimally stop diffusion processes, with financial applications.arXiv preprint arXiv:2408.09242, 2024

work page arXiv 2024
[6]

Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

J. Dianetti, G. Ferrari, and R. Xu. Exploratory optimal stopping: A singular control formu- lation.Preprint, available at arXiv:2408.09335, 2024

work page arXiv 2024
[7]

Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

Yuchao Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

work page 2024
[8]

Extended hjb equation for mean-variance stopping problem: Vanishing regularization method.Preprint, available at arXiv:2510.24128, 2025

Yuchao Dong and Harry Zheng. Extended hjb equation for mean-variance stopping problem: Vanishing regularization method.Preprint, available at arXiv:2510.24128, 2025

work page arXiv 2025
[9]

Actor- critic learning for mean-field control in continuous time.Journal of Machine Learning Research, 26:1–42, 2025

Noufel Frikha, Maximilien Germain, Lauriere Mathieu, Huyen Pham, and Xuanye Song. Actor- critic learning for mean-field control in continuous time.Journal of Machine Learning Research, 26:1–42, 2025

work page 2025
[10]

Entropy regularization for mean field games with learning.Mathematics of Operations Research, 47(4):3239–3260, 2022

Xin Guo, Renyuan Xu, and Thaleia Zariphopoulou. Entropy regularization for mean field games with learning.Mathematics of Operations Research, 47(4):3239–3260, 2022

work page 2022
[11]

Continuous-time reinforcement learning for optimal switching over multiple regimes.Preprint, available at arXiv:2512.04697, 2025

Yijie Huang, Mengge Li, Xiang Yu, and Zhou Zhou. Continuous-time reinforcement learning for optimal switching over multiple regimes.Preprint, available at arXiv:2512.04697, 2025

work page arXiv 2025
[12]

Convergence of policy iteration for entropy-regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Convergence of policy iteration for entropy-regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

work page 2025
[13]

Policy iteration achieves regularized equilibrium under time inconsistency.arXiv preprint arXiv:2603.06145, 2026

Yu-Jui Huang and Zhang Keyu Yu, Xiang. Policy iteration achieves regularized equilibrium under time inconsistency.arXiv preprint arXiv:2603.06145, 2026

work page arXiv 2026
[14]

Policy gradient and actor–critic learning in continuous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

Yanwei Jia and Xun Yu Zhou. Policy gradient and actor–critic learning in continuous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

work page 2022
[15]

q-Learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

Yanwei Jia and Xun Yu Zhou. q-Learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

work page 2023
[16]

Springer, 1980

Nikola˘ ı V Krylov.Controlled diffusion processes. Springer, 1980

work page 1980
[17]

American Mathematical Soc., 1996

Nikola˘ ı V Krylov.Lectures on elliptic and parabolic equations in Holder spaces, volume 12. American Mathematical Soc., 1996

work page 1996
[18]

American Mathematical Soc., 2008

Nikola˘ ı V Krylov.Lectures on Elliptic and Parabolic Equations in Sobolev Spaces, volume 96. American Mathematical Soc., 2008

work page 2008
[19]

Linear and quasi-linear equations of parabolic type, volume 23

Olga Aleksandrovna Ladyzhenskaia, Vsevolod Alekseevich Solonnikov, and Nina N Ural’tseva. Linear and quasi-linear equations of parabolic type, volume 23. American Mathematical Soc., 1968. 30

work page 1968
[20]

Nonlocal fully nonlinear parabolic differential equations arising in time-inconsistent problems.Journal of Differential Equations, 358:339–385, June 2023

Qian Lei and Chi Seng Pun. Nonlocal fully nonlinear parabolic differential equations arising in time-inconsistent problems.Journal of Differential Equations, 358:339–385, June 2023

work page 2023
[21]

Nonlocality, nonlinearity, and time inconsistency in stochastic differential games.Mathematical Finance, 34(1):190–256, January 2024

Qian Lei and Chi Seng Pun. Nonlocality, nonlinearity, and time inconsistency in stochastic differential games.Mathematical Finance, 34(1):190–256, January 2024

work page 2024
[22]

Mc-Graw-Hill, New York, 1991

Walter Rudin.Functional Analysis. Mc-Graw-Hill, New York, 1991

work page 1991
[23]

Springer, 2007

Daniel W Stroock and SR Srinivasa Varadhan.Multidimensional diffusion processes. Springer, 2007

work page 2007
[24]

R.H. Strotz. Myopia and inconsistency in dynamic utility maximization.Review of Economic Studies, 23(3):165–180, 1955

work page 1955
[25]

Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

Wenpin Tang, Yuming Paul Zhang, and Xun Yu Zhou. Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

work page 2022
[26]

On strong solutions and explicit formulas forsolutions of stochastic integral equations.Mathematics of the USSR-Sbornik, 39(3):387, 1981

Alexander Ju Veretennikov. On strong solutions and explicit formulas forsolutions of stochastic integral equations.Mathematics of the USSR-Sbornik, 39(3):387, 1981

work page 1981
[27]

Reinforcement learning in contin- uous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in contin- uous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

work page 2020
[28]

Continuous-time mean–variance portfolio selection: A rein- forcement learning framework.Mathematical Finance, 30(4):1273–1308, 2020

Haoran Wang and Xun Yu Zhou. Continuous-time mean–variance portfolio selection: A rein- forcement learning framework.Mathematical Finance, 30(4):1273–1308, 2020

work page 2020
[29]

Academic press, 2014

Jack Warga.Optimal control of differential and functional equations. Academic press, 2014

work page 2014
[30]

Continuous time q-learning for mean-field control problems.Applied Mathematics & Optimization, 91:10, 2025

Xiaoli Wei and Xiang Yu. Continuous time q-learning for mean-field control problems.Applied Mathematics & Optimization, 91:10, 2025

work page 2025
[31]

Time-inconsistent optimal control problems and the equilibrium HJB equation

Jiongmin Yong. Time-inconsistent optimal control problems and the equilibrium HJB equation. Math. Control Relat. Fields, 2(3):271–329, 2012

work page 2012
[32]

Time-inconsistent mean-field stopping problems: A regularized equilibrium approach.Finance and Stochastics, 30:179–236, 2026

Xiang Yu and Fengyi Yuan. Time-inconsistent mean-field stopping problems: A regularized equilibrium approach.Finance and Stochastics, 30:179–236, 2026

work page 2026
[33]

Major-minor mean field game of stopping: An entropy regularization approach.Preprint, available at arXiv:2501.08770, 2025

Xiang Yu, Jiacheng Zhang, Keyu Zhang, and Zhou Zhou. Major-minor mean field game of stopping: An entropy regularization approach.Preprint, available at arXiv:2501.08770, 2025. 31

work page arXiv 2025

[1] [1]

Relaxed equilibria for time- inconsistent markov decision processes.Mathematics of Operations Research, 50(4):2666–2687, 2025

Erhan Bayraktar, Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Relaxed equilibria for time- inconsistent markov decision processes.Mathematics of Operations Research, 50(4):2666–2687, 2025

work page 2025

[2] [2]

Bj¨ ork, M

T. Bj¨ ork, M. Khapko, and A. Murgoci. On time-inconsistent stochastic control in continuous time.Finance and Stochastics, 21(2):331–360, 2017. 29

work page 2017

[3] [3]

On optimal tracking portfolio in incomplete markets: The reinforcement learning approach.SIAM Journal on Control and Optimization, 63(1):321– 348, 2025

Lijun Bo, Yijie Huang, and Xiang Yu. On optimal tracking portfolio in incomplete markets: The reinforcement learning approach.SIAM Journal on Control and Optimization, 63(1):321– 348, 2025

work page 2025

[4] [4]

Learning equilibrium mean-variance strategy.Math- ematical Finance, 33(4):1166–1212, 2023

Min Dai, Yuchao Dong, and Yanwei Jia. Learning equilibrium mean-variance strategy.Math- ematical Finance, 33(4):1166–1212, 2023

work page 2023

[5] [5]

Learning to optimally stop diffusion processes, with financial applications.arXiv preprint arXiv:2408.09242, 2024

Min Dai, Yu Sun, Zuo Quan Xu, and Xun Yu Zhou. Learning to optimally stop diffusion processes, with financial applications.arXiv preprint arXiv:2408.09242, 2024

work page arXiv 2024

[6] [6]

Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

J. Dianetti, G. Ferrari, and R. Xu. Exploratory optimal stopping: A singular control formu- lation.Preprint, available at arXiv:2408.09335, 2024

work page arXiv 2024

[7] [7]

Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

Yuchao Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

work page 2024

[8] [8]

Extended hjb equation for mean-variance stopping problem: Vanishing regularization method.Preprint, available at arXiv:2510.24128, 2025

Yuchao Dong and Harry Zheng. Extended hjb equation for mean-variance stopping problem: Vanishing regularization method.Preprint, available at arXiv:2510.24128, 2025

work page arXiv 2025

[9] [9]

Actor- critic learning for mean-field control in continuous time.Journal of Machine Learning Research, 26:1–42, 2025

Noufel Frikha, Maximilien Germain, Lauriere Mathieu, Huyen Pham, and Xuanye Song. Actor- critic learning for mean-field control in continuous time.Journal of Machine Learning Research, 26:1–42, 2025

work page 2025

[10] [10]

Entropy regularization for mean field games with learning.Mathematics of Operations Research, 47(4):3239–3260, 2022

Xin Guo, Renyuan Xu, and Thaleia Zariphopoulou. Entropy regularization for mean field games with learning.Mathematics of Operations Research, 47(4):3239–3260, 2022

work page 2022

[11] [11]

Continuous-time reinforcement learning for optimal switching over multiple regimes.Preprint, available at arXiv:2512.04697, 2025

Yijie Huang, Mengge Li, Xiang Yu, and Zhou Zhou. Continuous-time reinforcement learning for optimal switching over multiple regimes.Preprint, available at arXiv:2512.04697, 2025

work page arXiv 2025

[12] [12]

Convergence of policy iteration for entropy-regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

Yu-Jui Huang, Zhenhua Wang, and Zhou Zhou. Convergence of policy iteration for entropy-regularized stochastic control problems.SIAM Journal on Control and Optimization, 63(2):752–777, 2025

work page 2025

[13] [13]

Policy iteration achieves regularized equilibrium under time inconsistency.arXiv preprint arXiv:2603.06145, 2026

Yu-Jui Huang and Zhang Keyu Yu, Xiang. Policy iteration achieves regularized equilibrium under time inconsistency.arXiv preprint arXiv:2603.06145, 2026

work page arXiv 2026

[14] [14]

Policy gradient and actor–critic learning in continuous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

Yanwei Jia and Xun Yu Zhou. Policy gradient and actor–critic learning in continuous time and space: Theory and algorithms.Journal of Machine Learning Research, 23(275):1–50, 2022

work page 2022

[15] [15]

q-Learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

Yanwei Jia and Xun Yu Zhou. q-Learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

work page 2023

[16] [16]

Springer, 1980

Nikola˘ ı V Krylov.Controlled diffusion processes. Springer, 1980

work page 1980

[17] [17]

American Mathematical Soc., 1996

Nikola˘ ı V Krylov.Lectures on elliptic and parabolic equations in Holder spaces, volume 12. American Mathematical Soc., 1996

work page 1996

[18] [18]

American Mathematical Soc., 2008

Nikola˘ ı V Krylov.Lectures on Elliptic and Parabolic Equations in Sobolev Spaces, volume 96. American Mathematical Soc., 2008

work page 2008

[19] [19]

Linear and quasi-linear equations of parabolic type, volume 23

Olga Aleksandrovna Ladyzhenskaia, Vsevolod Alekseevich Solonnikov, and Nina N Ural’tseva. Linear and quasi-linear equations of parabolic type, volume 23. American Mathematical Soc., 1968. 30

work page 1968

[20] [20]

Nonlocal fully nonlinear parabolic differential equations arising in time-inconsistent problems.Journal of Differential Equations, 358:339–385, June 2023

Qian Lei and Chi Seng Pun. Nonlocal fully nonlinear parabolic differential equations arising in time-inconsistent problems.Journal of Differential Equations, 358:339–385, June 2023

work page 2023

[21] [21]

Nonlocality, nonlinearity, and time inconsistency in stochastic differential games.Mathematical Finance, 34(1):190–256, January 2024

Qian Lei and Chi Seng Pun. Nonlocality, nonlinearity, and time inconsistency in stochastic differential games.Mathematical Finance, 34(1):190–256, January 2024

work page 2024

[22] [22]

Mc-Graw-Hill, New York, 1991

Walter Rudin.Functional Analysis. Mc-Graw-Hill, New York, 1991

work page 1991

[23] [23]

Springer, 2007

Daniel W Stroock and SR Srinivasa Varadhan.Multidimensional diffusion processes. Springer, 2007

work page 2007

[24] [24]

R.H. Strotz. Myopia and inconsistency in dynamic utility maximization.Review of Economic Studies, 23(3):165–180, 1955

work page 1955

[25] [25]

Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

Wenpin Tang, Yuming Paul Zhang, and Xun Yu Zhou. Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

work page 2022

[26] [26]

On strong solutions and explicit formulas forsolutions of stochastic integral equations.Mathematics of the USSR-Sbornik, 39(3):387, 1981

Alexander Ju Veretennikov. On strong solutions and explicit formulas forsolutions of stochastic integral equations.Mathematics of the USSR-Sbornik, 39(3):387, 1981

work page 1981

[27] [27]

Reinforcement learning in contin- uous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in contin- uous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

work page 2020

[28] [28]

Continuous-time mean–variance portfolio selection: A rein- forcement learning framework.Mathematical Finance, 30(4):1273–1308, 2020

Haoran Wang and Xun Yu Zhou. Continuous-time mean–variance portfolio selection: A rein- forcement learning framework.Mathematical Finance, 30(4):1273–1308, 2020

work page 2020

[29] [29]

Academic press, 2014

Jack Warga.Optimal control of differential and functional equations. Academic press, 2014

work page 2014

[30] [30]

Continuous time q-learning for mean-field control problems.Applied Mathematics & Optimization, 91:10, 2025

Xiaoli Wei and Xiang Yu. Continuous time q-learning for mean-field control problems.Applied Mathematics & Optimization, 91:10, 2025

work page 2025

[31] [31]

Time-inconsistent optimal control problems and the equilibrium HJB equation

Jiongmin Yong. Time-inconsistent optimal control problems and the equilibrium HJB equation. Math. Control Relat. Fields, 2(3):271–329, 2012

work page 2012

[32] [32]

Time-inconsistent mean-field stopping problems: A regularized equilibrium approach.Finance and Stochastics, 30:179–236, 2026

Xiang Yu and Fengyi Yuan. Time-inconsistent mean-field stopping problems: A regularized equilibrium approach.Finance and Stochastics, 30:179–236, 2026

work page 2026

[33] [33]

Major-minor mean field game of stopping: An entropy regularization approach.Preprint, available at arXiv:2501.08770, 2025

Xiang Yu, Jiacheng Zhang, Keyu Zhang, and Zhou Zhou. Major-minor mean field game of stopping: An entropy regularization approach.Preprint, available at arXiv:2501.08770, 2025. 31

work page arXiv 2025