pith. sign in

arxiv: 2509.12018 · v6 · submitted 2025-09-15 · 🧮 math.OC

A Two-fold Randomization Framework for Impulse Control Problems

Pith reviewed 2026-05-18 16:07 UTC · model grok-4.3

classification 🧮 math.OC
keywords impulse controlrandomizationHJB equationreinforcement learningverification theoremconvergencefixed point operatorPoisson measure
0
0 comments X

The pith

Randomized impulse control problems converge to the classical problem as the randomization parameter vanishes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a randomization framework for impulse control problems in which the solution is the fixed point of a compound operator made from regularized nonlocal and stopping operators. This leads to a semi-linear HJB equation and allows a verification theorem using a Poisson compound measure. The authors prove existence through iteration and show that the randomized version approaches the classical impulse control problem as the parameter lambda goes to zero. This convergence, along with local regularity of the value function, supports using the framework to build reinforcement learning algorithms that can approximate the original solutions.

Core claim

By introducing a two-fold randomization scheme, the impulse control problem is reformulated as the fixed point of a compound operator consisting of a regularized nonlocal operator and a regularized stopping operator. This yields a semi-linear Hamilton-Jacobi-Bellman equation. An equivalent scheme using Poisson compound measure establishes a verification theorem for uniqueness, while an iterative approach proves existence. As the randomization parameter lambda tends to zero, the randomized problem converges to its classical counterpart, providing a robust approximation that enables offline reinforcement learning algorithms with geometric convergence.

What carries the argument

The compound operator formed by the regularized nonlocal operator and the regularized stopping operator, whose fixed point characterizes the solution to the randomized problem.

If this is right

  • The value function possesses local Holder continuity of order alpha in the second derivative.
  • The offline RL algorithm derived from the iterative proof converges geometrically to the randomized solution.
  • The learned randomized solution approximates the classical impulse control solution with high accuracy.
  • Sensitivity to volatility parameter reveals the exploration-exploitation balance in the algorithm.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach may extend to other types of stochastic control problems involving jumps or impulses.
  • Similar randomization could provide numerical methods for problems where direct classical solutions are intractable.
  • Combining with other RL techniques might improve scalability to high-dimensional state spaces.

Load-bearing premise

The compound operator admits a fixed point and the Poisson compound measure scheme correctly supports the verification theorem.

What would settle it

Numerical experiments in which the difference between the value function of the randomized problem and a known classical solution decreases to zero as lambda is reduced to small values.

Figures

Figures reproduced from arXiv: 2509.12018 by Haoyang Cao, Yuchao Dong, Zhouhao Yang.

Figure 1
Figure 1. Figure 1: Comparison in value functions between randomized and classical impulse control with [PITH_FULL_IMAGE:figures/full_fig_p020_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Sensitivity analysis with respect to volatility [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Sensitivity analysis with respect to volatility [PITH_FULL_IMAGE:figures/full_fig_p035_3.png] view at source ↗
read the original abstract

We propose and analyze a randomization scheme for a general class of impulse control problems. The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator. This approach allows us to derive a semi-linear Hamilton-Jacobi-Bellman (HJB) equation. Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem that implies the uniqueness of the solution. Via an iterative approach, we prove the existence of the solution. The existence-and-uniqueness result ensures the randomized problem is well-defined. We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter $\pmb \lambda$ vanishes. This convergence, combined with the value function's $C^{2,\alpha}_{loc}$ regularity, confirms our framework provides a robust approximation and a foundation for developing learning algorithms. Under this framework, we propose an offline reinforcement learning (RL) algorithm. Its policy improvement step is naturally derived from the iterative approach from the existence proof, which enjoys a geometric convergence rate. We implement a model-free version of the algorithm and numerically demonstrate its effectiveness using a widely-studied example. The results show that our RL algorithm can learn the randomized solution, which accurately approximates its classical counterpart. A sensitivity analysis with respect to the volatility parameter $\sigma$ in the state process effectively demonstrates the exploration-exploitation tradeoff.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes a two-fold randomization framework for general impulse control problems. The randomized problem is characterized as the fixed point of a compound operator formed by a regularized nonlocal operator and a regularized stopping operator, yielding a semi-linear HJB equation. Existence is proved via iteration, while uniqueness follows from a verification theorem obtained through an equivalent Poisson compound measure randomization scheme. The value function of the randomized problem is shown to converge to that of the classical impulse control problem as the randomization parameter λ vanishes; combined with C^{2,α}_loc regularity this is used to justify the approximation. An offline RL algorithm is derived from the iterative existence proof (with geometric convergence) and demonstrated numerically on a standard example, including sensitivity analysis with respect to volatility.

Significance. If the convergence and well-posedness results hold, the framework supplies a theoretically grounded regularization that enables model-free RL for impulse control while recovering the classical solution in the limit. The explicit geometric rate for the policy-improvement iteration and the numerical illustration of the exploration-exploitation trade-off via σ-sensitivity are concrete strengths that could support further algorithmic development in stochastic control.

major comments (2)
  1. [Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.
  2. [Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.
minor comments (2)
  1. Notation for the compound operator and the two regularization parameters should be introduced with explicit definitions before the fixed-point argument is stated.
  2. The numerical section would benefit from a table comparing the learned value function against a classical benchmark (e.g., finite-difference solution) for several λ values, rather than qualitative plots alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which help clarify key aspects of the convergence and uniqueness arguments. We appreciate the positive assessment of the framework's potential for model-free RL in impulse control. Below we respond point by point to the major comments, indicating revisions where appropriate to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.

    Authors: For each fixed λ the two randomization schemes are equivalent, so the verification theorem yields uniqueness of the randomized value function directly. Convergence of value functions as λ → 0 is established separately by direct comparison and the C^{2,α}_loc regularity. To make the passage to the limit fully rigorous and address the transfer of uniqueness, we will add uniform-in-λ bounds on the Poisson intensity together with weak convergence of the compound jump measures to the admissible impulse measures (in the sense of the original control set). These estimates will be placed in Section 4 and an appendix; they confirm that any limit point satisfies the classical variational inequality and inherits uniqueness from the standard verification theorem for the unregularized problem. revision: yes

  2. Referee: [Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.

    Authors: The C^{2,α}_loc regularity is obtained from the semi-linear HJB equation satisfied by the randomized value function and holds uniformly on compact sets for λ small enough. Combined with pointwise convergence of the value functions, this already allows passage to the limit inside the equation. Nevertheless, we acknowledge that explicit uniform control on the regularized nonlocal term strengthens the argument. In the revision we will insert a lemma providing such estimates, showing that the difference between the regularized nonlocal operator and the classical impulse operator vanishes uniformly on compact sets as λ → 0. This guarantees that the limit function satisfies the classical variational inequality in the viscosity (and, under the regularity, classical) sense, thereby confirming the robust approximation property. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper independently defines the randomized impulse control problem through regularization of the nonlocal and stopping operators, characterizes its solution as the fixed point of the resulting compound operator, establishes uniqueness via an equivalent Poisson compound measure scheme that yields a verification theorem, proves existence by iteration, and separately demonstrates convergence of the value function to the classical impulse control problem as the randomization parameter λ vanishes (combined with C^{2,α}_loc regularity). None of these steps reduces the central claims to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation whose content is unverified outside the present work. The framework is constructed to provide an approximation whose well-posedness and limiting behavior are established directly from the stated assumptions and iterative arguments.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard diffusion assumptions for the state process and the well-posedness of the regularized operators; the randomization parameter λ is introduced as a tunable regularizer rather than fitted to data.

free parameters (1)
  • randomization parameter λ
    Controls the strength of regularization in the nonlocal and stopping operators; vanishes to recover the classical problem.
axioms (1)
  • domain assumption The state process is a diffusion satisfying standard regularity conditions that allow the HJB derivation and C^{2,α}_loc regularity.
    Invoked to justify the semi-linear HJB equation and convergence result.

pith-pipeline@v0.9.0 · 5785 in / 1205 out tokens · 38053 ms · 2026-05-18T16:07:19.680707+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator... Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem

  • IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter λ vanishes. This convergence, combined with the value function’s C^{2,α}_loc regularity

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

  1. [1]

    A class of solvable impulse control problems.Applied Mathematics and Optimization, 49(3):265–295, 2004

    Luis HR Alvarez. A class of solvable impulse control problems.Applied Mathematics and Optimization, 49(3):265–295, 2004

  2. [2]

    Convergence of implicit schemes for hamilton–jacobi–bellman quasi-variational inequalities.SIAM Journal on Control and Optimization, 56(6):3994–4016, 2018

    Parsiad Azimzadeh, Erhan Bayraktar, and George Labahn. Convergence of implicit schemes for hamilton–jacobi–bellman quasi-variational inequalities.SIAM Journal on Control and Optimization, 56(6):3994–4016, 2018

  3. [3]

    Weakly chained matrices, policy iteration, and impulse control.SIAM Journal on Numerical Analysis, 54(3):1341–1364, 2016

    Parsiad Azimzadeh and Peter A Forsyth. Weakly chained matrices, policy iteration, and impulse control.SIAM Journal on Numerical Analysis, 54(3):1341–1364, 2016. 35

  4. [4]

    Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates.Mathematical Methods of Operations Research, 89(3):355–383, 2019

    Matteo Basei. Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates.Mathematical Methods of Operations Research, 89(3):355–383, 2019

  5. [5]

    On the impulse control of jump diffusions.SIAM Journal on Control and Optimization, 51(3):2612–2637, 2013

    Erhan Bayraktar, Thomas Emmerling, and Jos ´e-Luis Menaldi. On the impulse control of jump diffusions.SIAM Journal on Control and Optimization, 51(3):2612–2637, 2013

  6. [6]

    Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019

    Alain Bensoussan and Benoˆıt Chevalier-Roignant. Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019

  7. [7]

    Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities

    Alain. Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities. Gauthier- Villars, Paris, 1984

  8. [8]

    Elsevier, 2011

    Alain Bensoussan and J-L Lions.Applications of variational inequalities in stochastic control, volume 12. Elsevier, 2011

  9. [9]

    On classical and restricted impulse stochastic control for the exchange rate.Applied Mathematics & Optimization, 74(74), 2016

    Giulio Bertola, Wolfgang J Runggaldier, and Kazuhiro Yasuda. On classical and restricted impulse stochastic control for the exchange rate.Applied Mathematics & Optimization, 74(74), 2016

  10. [10]

    Risk sensitive asset management with transaction costs

    Tomasz R Bielecki and Stanley R Pliska. Risk sensitive asset management with transaction costs. Finance and Stochastics, 4(1):1–33, 2000

  11. [11]

    The noncoercive quasi-variational inequalities related to impulse control problems.Computers & Mathematics with Applications, 35(12):101–108, 1998

    Messaoud Boulbrachene. The noncoercive quasi-variational inequalities related to impulse control problems.Computers & Mathematics with Applications, 35(12):101–108, 1998

  12. [12]

    Pointwise error estimates for a class of elliptic quasi-variational in- equalities with nonlinear source terms.Applied mathematics and computation, 161(1):129–138, 2005

    Messaoud Boulbrachene. Pointwise error estimates for a class of elliptic quasi-variational in- equalities with nonlinear source terms.Applied mathematics and computation, 161(1):129–138, 2005

  13. [13]

    Classical and impulse stochastic control for the optimization of the dividend and risk policies of an insurance firm.Mathematical Finance, 16(1):181–202, 2006

    Abel Cadenillas, Tahir Choulli, Michael Taksar, and Lei Zhang. Classical and impulse stochastic control for the optimization of the dividend and risk policies of an insurance firm.Mathematical Finance, 16(1):181–202, 2006

  14. [14]

    Existence of optimal simple policies for discounted- cost inventory and cash management in continuous time.Operations research, 26(4):620–636, 1978

    George M Constantinides and Scott F Richard. Existence of optimal simple policies for discounted- cost inventory and cash management in continuous time.Operations research, 26(4):620–636, 1978

  15. [15]

    Impulse control of multidimensional jump diffusions

    Mark HA Davis, Xin Guo, and Guoliang Wu. Impulse control of multidimensional jump diffusions. SIAM Journal on Control and Optimization, 48(8):5276–5293, 2010

  16. [16]

    Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching.Applied Mathematics & Optimization, 91(1):9, 2025

    Robert Denkert, Huy ˆen Pham, and Xavier Warin. Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching.Applied Mathematics & Optimization, 91(1):9, 2025

  17. [17]

    Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

    Jodi Dianetti, Giorgio Ferrari, and Renyuan Xu. Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

  18. [18]

    Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

    Yuchao Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

  19. [19]

    Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988

    Jerome F Eastham and Kevin J Hastings. Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988

  20. [20]

    American Mathematical Society, 1998

    Lawrence C Evans.Partial Differential Equations, volume 19. American Mathematical Society, 1998

  21. [21]

    Grundlehren der mathematischen Wissenschaften

    David Gilbarg and Neil S Trudinger.Elliptic Partial Differential Equations of Second Order. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2013. 36

  22. [22]

    Smooth fit principle for impulse control of multidimensional diffusion processes.SIAM Journal on Control and Optimization, 48(2):594–617, 2009

    Xin Guo and Guoliang Wu. Smooth fit principle for impulse control of multidimensional diffusion processes.SIAM Journal on Control and Optimization, 48(2):594–617, 2009

  23. [23]

    Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983

    J M Harrison, T M Sellke, and A J Taylor. Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983

  24. [24]

    Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983

    J M Harrison and M I Taksar. Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983

  25. [25]

    Dynamic programming and markov processes

    Ronald A Howard. Dynamic programming and markov processes. 1960

  26. [26]

    An implicit method for the finite time horizon hamilton–jacobi–bellman quasi- variational inequalities.Applied Mathematics and Computation, 265:163–175, 2015

    Masashi Ieda. An implicit method for the finite time horizon hamilton–jacobi–bellman quasi- variational inequalities.Applied Mathematics and Computation, 265:163–175, 2015

  27. [27]

    Springer Science & Business Media, 2009

    Monique Jeanblanc, Marc Yor, and Marc Chesney.Mathematical methods for financial markets. Springer Science & Business Media, 2009

  28. [28]

    Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995

    M Jeanblanc-Picqu´e and A N Shiryaev. Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995

  29. [30]

    Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993

    Monique Jeanblanc-Picqu´e. Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993

  30. [31]

    Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

    Yanwei Jia and Xun Yu Zhou. Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

  31. [32]

    q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

    Yanwei Jia and Xun Yu Zhou. q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

  32. [33]

    Springer, New York, second edition

    Ioannis Karatzas and Steven E Shreve.Brownian motion and stochastic calculus / Ioannis Karatzas, Steven Shreve.Graduate Texts in Mathematics ; 113. Springer, New York, second edition. edition, 1998

  33. [34]

    Backward SDEs with constrained jumps and quasi-variational inequalities.The Annals of Probability, 38(2):794 – 840, 2010

    Idris Kharroubi, Jin Ma, Huy ˆen Pham, and Jianfeng Zhang. Backward SDEs with constrained jumps and quasi-variational inequalities.The Annals of Probability, 38(2):794 – 840, 2010

  34. [35]

    Portfolio optimisation with strictly positive transaction costs and impulse control

    Ralf Korn. Portfolio optimisation with strictly positive transaction costs and impulse control. Finance and Stochastics, 2:85–114, 1998

  35. [36]

    Some applications of impulse control in mathematical finance.Mathematical Methods of Operations Research, 50:493–518, 1999

    Ralf Korn. Some applications of impulse control in mathematical finance.Mathematical Methods of Operations Research, 50:493–518, 1999

  36. [37]

    A model of optimal portfolio selection under liquidity risk and price impact.Finance and Stochastics, 11:51–90, 2007

    Vathana Ly Vath, Mohamed Mnif, and Huyˆen Pham. A model of optimal portfolio selection under liquidity risk and price impact.Finance and Stochastics, 11:51–90, 2007

  37. [38]

    Interactions of corporate financing and investment decisions: a dynamic framework.The Journal of Finance, 49(4):1253–1277, 1994

    D C Mauer and A J Triantis. Interactions of corporate financing and investment decisions: a dynamic framework.The Journal of Finance, 49(4):1253–1277, 1994

  38. [39]

    Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995

    A J Morton and S R Pliska. Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995

  39. [40]

    Optimal stochastic intervention control with application to the exchange rate.Economics, 29:225–243, 1998

    Gabriela Mundaca and Bernt Oksendal. Optimal stochastic intervention control with application to the exchange rate.Economics, 29:225–243, 1998. 37

  40. [41]

    Optimal consumption and portfolio with both fixed and proportional transaction costs.SIAM Journal on Control and Optimizations, 40(6):1765–1790, 2002

    B Øksendal and A Sulem. Optimal consumption and portfolio with both fixed and proportional transaction costs.SIAM Journal on Control and Optimizations, 40(6):1765–1790, 2002

  41. [42]

    Springer, 2007

    Bernt Øksendal and Agnes Sulem.Applied stochastic control of jump diffusions, volume 3. Springer, 2007

  42. [43]

    Springer Science & Business Media, 2009

    Huyˆen Pham.Continuous-time stochastic control and optimization with financial applications, volume 61. Springer Science & Business Media, 2009

  43. [44]

    A penalty scheme for monotone systems with interconnected obstacles: convergence and error estimates.SIAM Journal on Numerical Analysis, 57(4):1625–1648, 2019

    Christoph Reisinger and Yufei Zhang. A penalty scheme for monotone systems with interconnected obstacles: convergence and error estimates.SIAM Journal on Numerical Analysis, 57(4):1625–1648, 2019

  44. [45]

    Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems.SIAM Journal on Control and Optimization, 58(1):243–276, 2020

    Christoph Reisinger and Yufei Zhang. Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems.SIAM Journal on Control and Optimization, 58(1):243–276, 2020

  45. [46]

    A solvable one-dimensional model of a diffusion inventory system.Mathematics of Operations Research, 11(1):125–133, 1986

    Agn`es Sulem. A solvable one-dimensional model of a diffusion inventory system.Mathematics of Operations Research, 11(1):125–133, 1986

  46. [47]

    Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

    Wenpin Tang, Yuming Paul Zhang, and Xun Yu Zhou. Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

  47. [48]

    Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990

    A J Triantis and J E Hodder. Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990

  48. [49]

    Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

    Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020. 38