A Two-fold Randomization Framework for Impulse Control Problems

Haoyang Cao; Yuchao Dong; Zhouhao Yang

arxiv: 2509.12018 · v6 · submitted 2025-09-15 · 🧮 math.OC

A Two-fold Randomization Framework for Impulse Control Problems

Haoyang Cao , Yuchao Dong , Zhouhao Yang This is my paper

Pith reviewed 2026-05-18 16:07 UTC · model grok-4.3

classification 🧮 math.OC

keywords impulse controlrandomizationHJB equationreinforcement learningverification theoremconvergencefixed point operatorPoisson measure

0 comments

The pith

Randomized impulse control problems converge to the classical problem as the randomization parameter vanishes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a randomization framework for impulse control problems in which the solution is the fixed point of a compound operator made from regularized nonlocal and stopping operators. This leads to a semi-linear HJB equation and allows a verification theorem using a Poisson compound measure. The authors prove existence through iteration and show that the randomized version approaches the classical impulse control problem as the parameter lambda goes to zero. This convergence, along with local regularity of the value function, supports using the framework to build reinforcement learning algorithms that can approximate the original solutions.

Core claim

By introducing a two-fold randomization scheme, the impulse control problem is reformulated as the fixed point of a compound operator consisting of a regularized nonlocal operator and a regularized stopping operator. This yields a semi-linear Hamilton-Jacobi-Bellman equation. An equivalent scheme using Poisson compound measure establishes a verification theorem for uniqueness, while an iterative approach proves existence. As the randomization parameter lambda tends to zero, the randomized problem converges to its classical counterpart, providing a robust approximation that enables offline reinforcement learning algorithms with geometric convergence.

What carries the argument

The compound operator formed by the regularized nonlocal operator and the regularized stopping operator, whose fixed point characterizes the solution to the randomized problem.

If this is right

The value function possesses local Holder continuity of order alpha in the second derivative.
The offline RL algorithm derived from the iterative proof converges geometrically to the randomized solution.
The learned randomized solution approximates the classical impulse control solution with high accuracy.
Sensitivity to volatility parameter reveals the exploration-exploitation balance in the algorithm.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach may extend to other types of stochastic control problems involving jumps or impulses.
Similar randomization could provide numerical methods for problems where direct classical solutions are intractable.
Combining with other RL techniques might improve scalability to high-dimensional state spaces.

Load-bearing premise

The compound operator admits a fixed point and the Poisson compound measure scheme correctly supports the verification theorem.

What would settle it

Numerical experiments in which the difference between the value function of the randomized problem and a known classical solution decreases to zero as lambda is reduced to small values.

Figures

Figures reproduced from arXiv: 2509.12018 by Haoyang Cao, Yuchao Dong, Zhouhao Yang.

**Figure 2.** Figure 2: Sensitivity analysis with respect to volatility [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

**Figure 3.** Figure 3: Sensitivity analysis with respect to volatility [PITH_FULL_IMAGE:figures/full_fig_p035_3.png] view at source ↗

read the original abstract

We propose and analyze a randomization scheme for a general class of impulse control problems. The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator. This approach allows us to derive a semi-linear Hamilton-Jacobi-Bellman (HJB) equation. Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem that implies the uniqueness of the solution. Via an iterative approach, we prove the existence of the solution. The existence-and-uniqueness result ensures the randomized problem is well-defined. We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter $\pmb \lambda$ vanishes. This convergence, combined with the value function's $C^{2,\alpha}_{loc}$ regularity, confirms our framework provides a robust approximation and a foundation for developing learning algorithms. Under this framework, we propose an offline reinforcement learning (RL) algorithm. Its policy improvement step is naturally derived from the iterative approach from the existence proof, which enjoys a geometric convergence rate. We implement a model-free version of the algorithm and numerically demonstrate its effectiveness using a widely-studied example. The results show that our RL algorithm can learn the randomized solution, which accurately approximates its classical counterpart. A sensitivity analysis with respect to the volatility parameter $\sigma$ in the state process effectively demonstrates the exploration-exploitation tradeoff.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a randomization framework for impulse control that directly yields an RL algorithm with geometric convergence, though the limit uniqueness step could use tighter estimates.

read the letter

The paper's core contribution is a two-fold randomization for impulse control problems that yields both a semi-linear HJB equation and a natural offline RL algorithm derived directly from the existence iteration. They set up the randomized problem via a compound operator mixing a regularized nonlocal operator and a regularized stopping operator. This lets them write down the semi-linear HJB. They then switch to an equivalent Poisson compound measure scheme to prove a verification theorem, which gives uniqueness. Existence follows from iterating the operator. They show the value converges to the classical impulse control value as the randomization parameter lambda goes to zero, and they use the local Holder regularity of the value function to justify the limit. From the same iteration they extract a policy improvement step that gives geometric convergence for the RL algorithm. The model-free version is tested on a standard example, with some sensitivity checks on volatility. The approach looks technically coherent. The convergence argument relies on the regularity result, which is standard in these problems, so that part seems solid enough. The RL part is a nice bonus because the improvement step comes for free from the proof. One place that could use more work is the justification for passing uniqueness through the limit. The Poisson measure equivalence is used for the verification at fixed lambda, and if the measures do not converge uniformly, there might be a gap in showing the limit satisfies the original variational inequality uniquely. The paper probably addresses this with the regularity, but a referee might ask for an explicit uniform estimate or a direct argument in the limit. This paper is aimed at researchers in stochastic optimal control who are interested in bringing reinforcement learning into impulse problems. Someone already working on HJB equations for control with jumps or impulses will see the most value. It has enough new technical content and a working algorithm to warrant sending it out for peer review. I would recommend sending it to a journal in applied probability or control theory.

Referee Report

2 major / 2 minor

Summary. The paper proposes a two-fold randomization framework for general impulse control problems. The randomized problem is characterized as the fixed point of a compound operator formed by a regularized nonlocal operator and a regularized stopping operator, yielding a semi-linear HJB equation. Existence is proved via iteration, while uniqueness follows from a verification theorem obtained through an equivalent Poisson compound measure randomization scheme. The value function of the randomized problem is shown to converge to that of the classical impulse control problem as the randomization parameter λ vanishes; combined with C^{2,α}_loc regularity this is used to justify the approximation. An offline RL algorithm is derived from the iterative existence proof (with geometric convergence) and demonstrated numerically on a standard example, including sensitivity analysis with respect to volatility.

Significance. If the convergence and well-posedness results hold, the framework supplies a theoretically grounded regularization that enables model-free RL for impulse control while recovering the classical solution in the limit. The explicit geometric rate for the policy-improvement iteration and the numerical illustration of the exploration-exploitation trade-off via σ-sensitivity are concrete strengths that could support further algorithmic development in stochastic control.

major comments (2)

[Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.
[Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.

minor comments (2)

Notation for the compound operator and the two regularization parameters should be introduced with explicit definitions before the fixed-point argument is stated.
The numerical section would benefit from a table comparing the learned value function against a classical benchmark (e.g., finite-difference solution) for several λ values, rather than qualitative plots alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading and constructive comments, which help clarify key aspects of the convergence and uniqueness arguments. We appreciate the positive assessment of the framework's potential for model-free RL in impulse control. Below we respond point by point to the major comments, indicating revisions where appropriate to strengthen the presentation.

read point-by-point responses

Referee: [Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.

Authors: For each fixed λ the two randomization schemes are equivalent, so the verification theorem yields uniqueness of the randomized value function directly. Convergence of value functions as λ → 0 is established separately by direct comparison and the C^{2,α}_loc regularity. To make the passage to the limit fully rigorous and address the transfer of uniqueness, we will add uniform-in-λ bounds on the Poisson intensity together with weak convergence of the compound jump measures to the admissible impulse measures (in the sense of the original control set). These estimates will be placed in Section 4 and an appendix; they confirm that any limit point satisfies the classical variational inequality and inherits uniqueness from the standard verification theorem for the unregularized problem. revision: yes
Referee: [Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.

Authors: The C^{2,α}_loc regularity is obtained from the semi-linear HJB equation satisfied by the randomized value function and holds uniformly on compact sets for λ small enough. Combined with pointwise convergence of the value functions, this already allows passage to the limit inside the equation. Nevertheless, we acknowledge that explicit uniform control on the regularized nonlocal term strengthens the argument. In the revision we will insert a lemma providing such estimates, showing that the difference between the regularized nonlocal operator and the classical impulse operator vanishes uniformly on compact sets as λ → 0. This guarantees that the limit function satisfies the classical variational inequality in the viscosity (and, under the regularity, classical) sense, thereby confirming the robust approximation property. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper independently defines the randomized impulse control problem through regularization of the nonlocal and stopping operators, characterizes its solution as the fixed point of the resulting compound operator, establishes uniqueness via an equivalent Poisson compound measure scheme that yields a verification theorem, proves existence by iteration, and separately demonstrates convergence of the value function to the classical impulse control problem as the randomization parameter λ vanishes (combined with C^{2,α}_loc regularity). None of these steps reduces the central claims to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation whose content is unverified outside the present work. The framework is constructed to provide an approximation whose well-posedness and limiting behavior are established directly from the stated assumptions and iterative arguments.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard diffusion assumptions for the state process and the well-posedness of the regularized operators; the randomization parameter λ is introduced as a tunable regularizer rather than fitted to data.

free parameters (1)

randomization parameter λ
Controls the strength of regularization in the nonlocal and stopping operators; vanishes to recover the classical problem.

axioms (1)

domain assumption The state process is a diffusion satisfying standard regularity conditions that allow the HJB derivation and C^{2,α}_loc regularity.
Invoked to justify the semi-linear HJB equation and convergence result.

pith-pipeline@v0.9.0 · 5785 in / 1205 out tokens · 38053 ms · 2026-05-18T16:07:19.680707+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator... Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem
IndisputableMonolith/Foundation/AlphaCoordinateFixation.lean alpha_pin_under_high_calibration unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter λ vanishes. This convergence, combined with the value function’s C^{2,α}_loc regularity

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages

[1]

A class of solvable impulse control problems.Applied Mathematics and Optimization, 49(3):265–295, 2004

Luis HR Alvarez. A class of solvable impulse control problems.Applied Mathematics and Optimization, 49(3):265–295, 2004

work page 2004
[2]

Convergence of implicit schemes for hamilton–jacobi–bellman quasi-variational inequalities.SIAM Journal on Control and Optimization, 56(6):3994–4016, 2018

Parsiad Azimzadeh, Erhan Bayraktar, and George Labahn. Convergence of implicit schemes for hamilton–jacobi–bellman quasi-variational inequalities.SIAM Journal on Control and Optimization, 56(6):3994–4016, 2018

work page 2018
[3]

Weakly chained matrices, policy iteration, and impulse control.SIAM Journal on Numerical Analysis, 54(3):1341–1364, 2016

Parsiad Azimzadeh and Peter A Forsyth. Weakly chained matrices, policy iteration, and impulse control.SIAM Journal on Numerical Analysis, 54(3):1341–1364, 2016. 35

work page 2016
[4]

Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates.Mathematical Methods of Operations Research, 89(3):355–383, 2019

Matteo Basei. Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates.Mathematical Methods of Operations Research, 89(3):355–383, 2019

work page 2019
[5]

On the impulse control of jump diffusions.SIAM Journal on Control and Optimization, 51(3):2612–2637, 2013

Erhan Bayraktar, Thomas Emmerling, and Jos ´e-Luis Menaldi. On the impulse control of jump diffusions.SIAM Journal on Control and Optimization, 51(3):2612–2637, 2013

work page 2013
[6]

Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019

Alain Bensoussan and Benoˆıt Chevalier-Roignant. Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019

work page 2019
[7]

Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities

Alain. Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities. Gauthier- Villars, Paris, 1984

work page 1984
[8]

Elsevier, 2011

Alain Bensoussan and J-L Lions.Applications of variational inequalities in stochastic control, volume 12. Elsevier, 2011

work page 2011
[9]

On classical and restricted impulse stochastic control for the exchange rate.Applied Mathematics & Optimization, 74(74), 2016

Giulio Bertola, Wolfgang J Runggaldier, and Kazuhiro Yasuda. On classical and restricted impulse stochastic control for the exchange rate.Applied Mathematics & Optimization, 74(74), 2016

work page 2016
[10]

Risk sensitive asset management with transaction costs

Tomasz R Bielecki and Stanley R Pliska. Risk sensitive asset management with transaction costs. Finance and Stochastics, 4(1):1–33, 2000

work page 2000
[11]

The noncoercive quasi-variational inequalities related to impulse control problems.Computers & Mathematics with Applications, 35(12):101–108, 1998

Messaoud Boulbrachene. The noncoercive quasi-variational inequalities related to impulse control problems.Computers & Mathematics with Applications, 35(12):101–108, 1998

work page 1998
[12]

Pointwise error estimates for a class of elliptic quasi-variational in- equalities with nonlinear source terms.Applied mathematics and computation, 161(1):129–138, 2005

Messaoud Boulbrachene. Pointwise error estimates for a class of elliptic quasi-variational in- equalities with nonlinear source terms.Applied mathematics and computation, 161(1):129–138, 2005

work page 2005
[13]

Classical and impulse stochastic control for the optimization of the dividend and risk policies of an insurance firm.Mathematical Finance, 16(1):181–202, 2006

Abel Cadenillas, Tahir Choulli, Michael Taksar, and Lei Zhang. Classical and impulse stochastic control for the optimization of the dividend and risk policies of an insurance firm.Mathematical Finance, 16(1):181–202, 2006

work page 2006
[14]

Existence of optimal simple policies for discounted- cost inventory and cash management in continuous time.Operations research, 26(4):620–636, 1978

George M Constantinides and Scott F Richard. Existence of optimal simple policies for discounted- cost inventory and cash management in continuous time.Operations research, 26(4):620–636, 1978

work page 1978
[15]

Impulse control of multidimensional jump diffusions

Mark HA Davis, Xin Guo, and Guoliang Wu. Impulse control of multidimensional jump diffusions. SIAM Journal on Control and Optimization, 48(8):5276–5293, 2010

work page 2010
[16]

Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching.Applied Mathematics & Optimization, 91(1):9, 2025

Robert Denkert, Huy ˆen Pham, and Xavier Warin. Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching.Applied Mathematics & Optimization, 91(1):9, 2025

work page 2025
[17]

Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

Jodi Dianetti, Giorgio Ferrari, and Renyuan Xu. Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

work page arXiv 2024
[18]

Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

Yuchao Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

work page 2024
[19]

Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988

Jerome F Eastham and Kevin J Hastings. Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988

work page 1988
[20]

American Mathematical Society, 1998

Lawrence C Evans.Partial Differential Equations, volume 19. American Mathematical Society, 1998

work page 1998
[21]

Grundlehren der mathematischen Wissenschaften

David Gilbarg and Neil S Trudinger.Elliptic Partial Differential Equations of Second Order. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2013. 36

work page 2013
[22]

Smooth fit principle for impulse control of multidimensional diffusion processes.SIAM Journal on Control and Optimization, 48(2):594–617, 2009

Xin Guo and Guoliang Wu. Smooth fit principle for impulse control of multidimensional diffusion processes.SIAM Journal on Control and Optimization, 48(2):594–617, 2009

work page 2009
[23]

Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983

J M Harrison, T M Sellke, and A J Taylor. Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983

work page 1983
[24]

Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983

J M Harrison and M I Taksar. Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983

work page 1983
[25]

Dynamic programming and markov processes

Ronald A Howard. Dynamic programming and markov processes. 1960

work page 1960
[26]

An implicit method for the finite time horizon hamilton–jacobi–bellman quasi- variational inequalities.Applied Mathematics and Computation, 265:163–175, 2015

Masashi Ieda. An implicit method for the finite time horizon hamilton–jacobi–bellman quasi- variational inequalities.Applied Mathematics and Computation, 265:163–175, 2015

work page 2015
[27]

Springer Science & Business Media, 2009

Monique Jeanblanc, Marc Yor, and Marc Chesney.Mathematical methods for financial markets. Springer Science & Business Media, 2009

work page 2009
[28]

Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995

M Jeanblanc-Picqu´e and A N Shiryaev. Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995

work page 1995
[30]

Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993

Monique Jeanblanc-Picqu´e. Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993

work page 1993
[31]

Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

Yanwei Jia and Xun Yu Zhou. Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

work page 2022
[32]

q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

Yanwei Jia and Xun Yu Zhou. q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

work page 2023
[33]

Springer, New York, second edition

Ioannis Karatzas and Steven E Shreve.Brownian motion and stochastic calculus / Ioannis Karatzas, Steven Shreve.Graduate Texts in Mathematics ; 113. Springer, New York, second edition. edition, 1998

work page 1998
[34]

Backward SDEs with constrained jumps and quasi-variational inequalities.The Annals of Probability, 38(2):794 – 840, 2010

Idris Kharroubi, Jin Ma, Huy ˆen Pham, and Jianfeng Zhang. Backward SDEs with constrained jumps and quasi-variational inequalities.The Annals of Probability, 38(2):794 – 840, 2010

work page 2010
[35]

Portfolio optimisation with strictly positive transaction costs and impulse control

Ralf Korn. Portfolio optimisation with strictly positive transaction costs and impulse control. Finance and Stochastics, 2:85–114, 1998

work page 1998
[36]

Some applications of impulse control in mathematical finance.Mathematical Methods of Operations Research, 50:493–518, 1999

Ralf Korn. Some applications of impulse control in mathematical finance.Mathematical Methods of Operations Research, 50:493–518, 1999

work page 1999
[37]

A model of optimal portfolio selection under liquidity risk and price impact.Finance and Stochastics, 11:51–90, 2007

Vathana Ly Vath, Mohamed Mnif, and Huyˆen Pham. A model of optimal portfolio selection under liquidity risk and price impact.Finance and Stochastics, 11:51–90, 2007

work page 2007
[38]

Interactions of corporate financing and investment decisions: a dynamic framework.The Journal of Finance, 49(4):1253–1277, 1994

D C Mauer and A J Triantis. Interactions of corporate financing and investment decisions: a dynamic framework.The Journal of Finance, 49(4):1253–1277, 1994

work page 1994
[39]

Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995

A J Morton and S R Pliska. Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995

work page 1995
[40]

Optimal stochastic intervention control with application to the exchange rate.Economics, 29:225–243, 1998

Gabriela Mundaca and Bernt Oksendal. Optimal stochastic intervention control with application to the exchange rate.Economics, 29:225–243, 1998. 37

work page 1998
[41]

Optimal consumption and portfolio with both fixed and proportional transaction costs.SIAM Journal on Control and Optimizations, 40(6):1765–1790, 2002

B Øksendal and A Sulem. Optimal consumption and portfolio with both fixed and proportional transaction costs.SIAM Journal on Control and Optimizations, 40(6):1765–1790, 2002

work page 2002
[42]

Springer, 2007

Bernt Øksendal and Agnes Sulem.Applied stochastic control of jump diffusions, volume 3. Springer, 2007

work page 2007
[43]

Springer Science & Business Media, 2009

Huyˆen Pham.Continuous-time stochastic control and optimization with financial applications, volume 61. Springer Science & Business Media, 2009

work page 2009
[44]

A penalty scheme for monotone systems with interconnected obstacles: convergence and error estimates.SIAM Journal on Numerical Analysis, 57(4):1625–1648, 2019

Christoph Reisinger and Yufei Zhang. A penalty scheme for monotone systems with interconnected obstacles: convergence and error estimates.SIAM Journal on Numerical Analysis, 57(4):1625–1648, 2019

work page 2019
[45]

Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems.SIAM Journal on Control and Optimization, 58(1):243–276, 2020

Christoph Reisinger and Yufei Zhang. Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems.SIAM Journal on Control and Optimization, 58(1):243–276, 2020

work page 2020
[46]

A solvable one-dimensional model of a diffusion inventory system.Mathematics of Operations Research, 11(1):125–133, 1986

Agn`es Sulem. A solvable one-dimensional model of a diffusion inventory system.Mathematics of Operations Research, 11(1):125–133, 1986

work page 1986
[47]

Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

Wenpin Tang, Yuming Paul Zhang, and Xun Yu Zhou. Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

work page 2022
[48]

Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990

A J Triantis and J E Hodder. Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990

work page 1990
[49]

Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020. 38

work page 2020

[1] [1]

A class of solvable impulse control problems.Applied Mathematics and Optimization, 49(3):265–295, 2004

Luis HR Alvarez. A class of solvable impulse control problems.Applied Mathematics and Optimization, 49(3):265–295, 2004

work page 2004

[2] [2]

Convergence of implicit schemes for hamilton–jacobi–bellman quasi-variational inequalities.SIAM Journal on Control and Optimization, 56(6):3994–4016, 2018

Parsiad Azimzadeh, Erhan Bayraktar, and George Labahn. Convergence of implicit schemes for hamilton–jacobi–bellman quasi-variational inequalities.SIAM Journal on Control and Optimization, 56(6):3994–4016, 2018

work page 2018

[3] [3]

Weakly chained matrices, policy iteration, and impulse control.SIAM Journal on Numerical Analysis, 54(3):1341–1364, 2016

Parsiad Azimzadeh and Peter A Forsyth. Weakly chained matrices, policy iteration, and impulse control.SIAM Journal on Numerical Analysis, 54(3):1341–1364, 2016. 35

work page 2016

[4] [4]

Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates.Mathematical Methods of Operations Research, 89(3):355–383, 2019

Matteo Basei. Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates.Mathematical Methods of Operations Research, 89(3):355–383, 2019

work page 2019

[5] [5]

On the impulse control of jump diffusions.SIAM Journal on Control and Optimization, 51(3):2612–2637, 2013

Erhan Bayraktar, Thomas Emmerling, and Jos ´e-Luis Menaldi. On the impulse control of jump diffusions.SIAM Journal on Control and Optimization, 51(3):2612–2637, 2013

work page 2013

[6] [6]

Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019

Alain Bensoussan and Benoˆıt Chevalier-Roignant. Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019

work page 2019

[7] [7]

Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities

Alain. Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities. Gauthier- Villars, Paris, 1984

work page 1984

[8] [8]

Elsevier, 2011

Alain Bensoussan and J-L Lions.Applications of variational inequalities in stochastic control, volume 12. Elsevier, 2011

work page 2011

[9] [9]

On classical and restricted impulse stochastic control for the exchange rate.Applied Mathematics & Optimization, 74(74), 2016

Giulio Bertola, Wolfgang J Runggaldier, and Kazuhiro Yasuda. On classical and restricted impulse stochastic control for the exchange rate.Applied Mathematics & Optimization, 74(74), 2016

work page 2016

[10] [10]

Risk sensitive asset management with transaction costs

Tomasz R Bielecki and Stanley R Pliska. Risk sensitive asset management with transaction costs. Finance and Stochastics, 4(1):1–33, 2000

work page 2000

[11] [11]

The noncoercive quasi-variational inequalities related to impulse control problems.Computers & Mathematics with Applications, 35(12):101–108, 1998

Messaoud Boulbrachene. The noncoercive quasi-variational inequalities related to impulse control problems.Computers & Mathematics with Applications, 35(12):101–108, 1998

work page 1998

[12] [12]

Pointwise error estimates for a class of elliptic quasi-variational in- equalities with nonlinear source terms.Applied mathematics and computation, 161(1):129–138, 2005

Messaoud Boulbrachene. Pointwise error estimates for a class of elliptic quasi-variational in- equalities with nonlinear source terms.Applied mathematics and computation, 161(1):129–138, 2005

work page 2005

[13] [13]

Classical and impulse stochastic control for the optimization of the dividend and risk policies of an insurance firm.Mathematical Finance, 16(1):181–202, 2006

Abel Cadenillas, Tahir Choulli, Michael Taksar, and Lei Zhang. Classical and impulse stochastic control for the optimization of the dividend and risk policies of an insurance firm.Mathematical Finance, 16(1):181–202, 2006

work page 2006

[14] [14]

Existence of optimal simple policies for discounted- cost inventory and cash management in continuous time.Operations research, 26(4):620–636, 1978

George M Constantinides and Scott F Richard. Existence of optimal simple policies for discounted- cost inventory and cash management in continuous time.Operations research, 26(4):620–636, 1978

work page 1978

[15] [15]

Impulse control of multidimensional jump diffusions

Mark HA Davis, Xin Guo, and Guoliang Wu. Impulse control of multidimensional jump diffusions. SIAM Journal on Control and Optimization, 48(8):5276–5293, 2010

work page 2010

[16] [16]

Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching.Applied Mathematics & Optimization, 91(1):9, 2025

Robert Denkert, Huy ˆen Pham, and Xavier Warin. Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching.Applied Mathematics & Optimization, 91(1):9, 2025

work page 2025

[17] [17]

Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

Jodi Dianetti, Giorgio Ferrari, and Renyuan Xu. Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024

work page arXiv 2024

[18] [18]

Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

Yuchao Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024

work page 2024

[19] [19]

Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988

Jerome F Eastham and Kevin J Hastings. Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988

work page 1988

[20] [20]

American Mathematical Society, 1998

Lawrence C Evans.Partial Differential Equations, volume 19. American Mathematical Society, 1998

work page 1998

[21] [21]

Grundlehren der mathematischen Wissenschaften

David Gilbarg and Neil S Trudinger.Elliptic Partial Differential Equations of Second Order. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2013. 36

work page 2013

[22] [22]

Smooth fit principle for impulse control of multidimensional diffusion processes.SIAM Journal on Control and Optimization, 48(2):594–617, 2009

Xin Guo and Guoliang Wu. Smooth fit principle for impulse control of multidimensional diffusion processes.SIAM Journal on Control and Optimization, 48(2):594–617, 2009

work page 2009

[23] [23]

Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983

J M Harrison, T M Sellke, and A J Taylor. Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983

work page 1983

[24] [24]

Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983

J M Harrison and M I Taksar. Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983

work page 1983

[25] [25]

Dynamic programming and markov processes

Ronald A Howard. Dynamic programming and markov processes. 1960

work page 1960

[26] [26]

An implicit method for the finite time horizon hamilton–jacobi–bellman quasi- variational inequalities.Applied Mathematics and Computation, 265:163–175, 2015

Masashi Ieda. An implicit method for the finite time horizon hamilton–jacobi–bellman quasi- variational inequalities.Applied Mathematics and Computation, 265:163–175, 2015

work page 2015

[27] [27]

Springer Science & Business Media, 2009

Monique Jeanblanc, Marc Yor, and Marc Chesney.Mathematical methods for financial markets. Springer Science & Business Media, 2009

work page 2009

[28] [28]

Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995

M Jeanblanc-Picqu´e and A N Shiryaev. Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995

work page 1995

[29] [30]

Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993

Monique Jeanblanc-Picqu´e. Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993

work page 1993

[30] [31]

Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

Yanwei Jia and Xun Yu Zhou. Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022

work page 2022

[31] [32]

q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

Yanwei Jia and Xun Yu Zhou. q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023

work page 2023

[32] [33]

Springer, New York, second edition

Ioannis Karatzas and Steven E Shreve.Brownian motion and stochastic calculus / Ioannis Karatzas, Steven Shreve.Graduate Texts in Mathematics ; 113. Springer, New York, second edition. edition, 1998

work page 1998

[33] [34]

Backward SDEs with constrained jumps and quasi-variational inequalities.The Annals of Probability, 38(2):794 – 840, 2010

Idris Kharroubi, Jin Ma, Huy ˆen Pham, and Jianfeng Zhang. Backward SDEs with constrained jumps and quasi-variational inequalities.The Annals of Probability, 38(2):794 – 840, 2010

work page 2010

[34] [35]

Portfolio optimisation with strictly positive transaction costs and impulse control

Ralf Korn. Portfolio optimisation with strictly positive transaction costs and impulse control. Finance and Stochastics, 2:85–114, 1998

work page 1998

[35] [36]

Some applications of impulse control in mathematical finance.Mathematical Methods of Operations Research, 50:493–518, 1999

Ralf Korn. Some applications of impulse control in mathematical finance.Mathematical Methods of Operations Research, 50:493–518, 1999

work page 1999

[36] [37]

A model of optimal portfolio selection under liquidity risk and price impact.Finance and Stochastics, 11:51–90, 2007

Vathana Ly Vath, Mohamed Mnif, and Huyˆen Pham. A model of optimal portfolio selection under liquidity risk and price impact.Finance and Stochastics, 11:51–90, 2007

work page 2007

[37] [38]

Interactions of corporate financing and investment decisions: a dynamic framework.The Journal of Finance, 49(4):1253–1277, 1994

D C Mauer and A J Triantis. Interactions of corporate financing and investment decisions: a dynamic framework.The Journal of Finance, 49(4):1253–1277, 1994

work page 1994

[38] [39]

Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995

A J Morton and S R Pliska. Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995

work page 1995

[39] [40]

Optimal stochastic intervention control with application to the exchange rate.Economics, 29:225–243, 1998

Gabriela Mundaca and Bernt Oksendal. Optimal stochastic intervention control with application to the exchange rate.Economics, 29:225–243, 1998. 37

work page 1998

[40] [41]

Optimal consumption and portfolio with both fixed and proportional transaction costs.SIAM Journal on Control and Optimizations, 40(6):1765–1790, 2002

B Øksendal and A Sulem. Optimal consumption and portfolio with both fixed and proportional transaction costs.SIAM Journal on Control and Optimizations, 40(6):1765–1790, 2002

work page 2002

[41] [42]

Springer, 2007

Bernt Øksendal and Agnes Sulem.Applied stochastic control of jump diffusions, volume 3. Springer, 2007

work page 2007

[42] [43]

Springer Science & Business Media, 2009

Huyˆen Pham.Continuous-time stochastic control and optimization with financial applications, volume 61. Springer Science & Business Media, 2009

work page 2009

[43] [44]

A penalty scheme for monotone systems with interconnected obstacles: convergence and error estimates.SIAM Journal on Numerical Analysis, 57(4):1625–1648, 2019

Christoph Reisinger and Yufei Zhang. A penalty scheme for monotone systems with interconnected obstacles: convergence and error estimates.SIAM Journal on Numerical Analysis, 57(4):1625–1648, 2019

work page 2019

[44] [45]

Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems.SIAM Journal on Control and Optimization, 58(1):243–276, 2020

Christoph Reisinger and Yufei Zhang. Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems.SIAM Journal on Control and Optimization, 58(1):243–276, 2020

work page 2020

[45] [46]

A solvable one-dimensional model of a diffusion inventory system.Mathematics of Operations Research, 11(1):125–133, 1986

Agn`es Sulem. A solvable one-dimensional model of a diffusion inventory system.Mathematics of Operations Research, 11(1):125–133, 1986

work page 1986

[46] [47]

Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

Wenpin Tang, Yuming Paul Zhang, and Xun Yu Zhou. Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022

work page 2022

[47] [48]

Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990

A J Triantis and J E Hodder. Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990

work page 1990

[48] [49]

Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020

Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020. 38

work page 2020