A Two-fold Randomization Framework for Impulse Control Problems
Pith reviewed 2026-05-18 16:07 UTC · model grok-4.3
The pith
Randomized impulse control problems converge to the classical problem as the randomization parameter vanishes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By introducing a two-fold randomization scheme, the impulse control problem is reformulated as the fixed point of a compound operator consisting of a regularized nonlocal operator and a regularized stopping operator. This yields a semi-linear Hamilton-Jacobi-Bellman equation. An equivalent scheme using Poisson compound measure establishes a verification theorem for uniqueness, while an iterative approach proves existence. As the randomization parameter lambda tends to zero, the randomized problem converges to its classical counterpart, providing a robust approximation that enables offline reinforcement learning algorithms with geometric convergence.
What carries the argument
The compound operator formed by the regularized nonlocal operator and the regularized stopping operator, whose fixed point characterizes the solution to the randomized problem.
If this is right
- The value function possesses local Holder continuity of order alpha in the second derivative.
- The offline RL algorithm derived from the iterative proof converges geometrically to the randomized solution.
- The learned randomized solution approximates the classical impulse control solution with high accuracy.
- Sensitivity to volatility parameter reveals the exploration-exploitation balance in the algorithm.
Where Pith is reading between the lines
- This approach may extend to other types of stochastic control problems involving jumps or impulses.
- Similar randomization could provide numerical methods for problems where direct classical solutions are intractable.
- Combining with other RL techniques might improve scalability to high-dimensional state spaces.
Load-bearing premise
The compound operator admits a fixed point and the Poisson compound measure scheme correctly supports the verification theorem.
What would settle it
Numerical experiments in which the difference between the value function of the randomized problem and a known classical solution decreases to zero as lambda is reduced to small values.
Figures
read the original abstract
We propose and analyze a randomization scheme for a general class of impulse control problems. The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator. This approach allows us to derive a semi-linear Hamilton-Jacobi-Bellman (HJB) equation. Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem that implies the uniqueness of the solution. Via an iterative approach, we prove the existence of the solution. The existence-and-uniqueness result ensures the randomized problem is well-defined. We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter $\pmb \lambda$ vanishes. This convergence, combined with the value function's $C^{2,\alpha}_{loc}$ regularity, confirms our framework provides a robust approximation and a foundation for developing learning algorithms. Under this framework, we propose an offline reinforcement learning (RL) algorithm. Its policy improvement step is naturally derived from the iterative approach from the existence proof, which enjoys a geometric convergence rate. We implement a model-free version of the algorithm and numerically demonstrate its effectiveness using a widely-studied example. The results show that our RL algorithm can learn the randomized solution, which accurately approximates its classical counterpart. A sensitivity analysis with respect to the volatility parameter $\sigma$ in the state process effectively demonstrates the exploration-exploitation tradeoff.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a two-fold randomization framework for general impulse control problems. The randomized problem is characterized as the fixed point of a compound operator formed by a regularized nonlocal operator and a regularized stopping operator, yielding a semi-linear HJB equation. Existence is proved via iteration, while uniqueness follows from a verification theorem obtained through an equivalent Poisson compound measure randomization scheme. The value function of the randomized problem is shown to converge to that of the classical impulse control problem as the randomization parameter λ vanishes; combined with C^{2,α}_loc regularity this is used to justify the approximation. An offline RL algorithm is derived from the iterative existence proof (with geometric convergence) and demonstrated numerically on a standard example, including sensitivity analysis with respect to volatility.
Significance. If the convergence and well-posedness results hold, the framework supplies a theoretically grounded regularization that enables model-free RL for impulse control while recovering the classical solution in the limit. The explicit geometric rate for the policy-improvement iteration and the numerical illustration of the exploration-exploitation trade-off via σ-sensitivity are concrete strengths that could support further algorithmic development in stochastic control.
major comments (2)
- [Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.
- [Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.
minor comments (2)
- Notation for the compound operator and the two regularization parameters should be introduced with explicit definitions before the fixed-point argument is stated.
- The numerical section would benefit from a table comparing the learned value function against a classical benchmark (e.g., finite-difference solution) for several λ values, rather than qualitative plots alone.
Simulated Author's Rebuttal
We thank the referee for the careful reading and constructive comments, which help clarify key aspects of the convergence and uniqueness arguments. We appreciate the positive assessment of the framework's potential for model-free RL in impulse control. Below we respond point by point to the major comments, indicating revisions where appropriate to strengthen the presentation.
read point-by-point responses
-
Referee: [Abstract] Abstract (paragraph on characterization and verification): the verification theorem is obtained by passing through an equivalent Poisson-compound-measure randomization. It is not shown that the Poisson intensity or jump measure converges in a manner compatible with the original impulse set, nor are uniform-in-λ estimates provided on the measure-theoretic remainder. Without such control, uniqueness for fixed λ does not automatically transfer to the λ→0 limit even if pointwise convergence of value functions holds.
Authors: For each fixed λ the two randomization schemes are equivalent, so the verification theorem yields uniqueness of the randomized value function directly. Convergence of value functions as λ → 0 is established separately by direct comparison and the C^{2,α}_loc regularity. To make the passage to the limit fully rigorous and address the transfer of uniqueness, we will add uniform-in-λ bounds on the Poisson intensity together with weak convergence of the compound jump measures to the admissible impulse measures (in the sense of the original control set). These estimates will be placed in Section 4 and an appendix; they confirm that any limit point satisfies the classical variational inequality and inherits uniqueness from the standard verification theorem for the unregularized problem. revision: yes
-
Referee: [Abstract] The convergence statement (final paragraph of abstract) invokes C^{2,α}_loc regularity to pass to the limit, but the argument appears to lack a uniform estimate on the regularized measures or on the nonlocal term that would guarantee the limit satisfies the classical variational inequality. This is load-bearing for the claim that the framework provides a robust approximation.
Authors: The C^{2,α}_loc regularity is obtained from the semi-linear HJB equation satisfied by the randomized value function and holds uniformly on compact sets for λ small enough. Combined with pointwise convergence of the value functions, this already allows passage to the limit inside the equation. Nevertheless, we acknowledge that explicit uniform control on the regularized nonlocal term strengthens the argument. In the revision we will insert a lemma providing such estimates, showing that the difference between the regularized nonlocal operator and the classical impulse operator vanishes uniformly on compact sets as λ → 0. This guarantees that the limit function satisfies the classical variational inequality in the viscosity (and, under the regularity, classical) sense, thereby confirming the robust approximation property. revision: yes
Circularity Check
No significant circularity; derivation is self-contained
full rationale
The paper independently defines the randomized impulse control problem through regularization of the nonlocal and stopping operators, characterizes its solution as the fixed point of the resulting compound operator, establishes uniqueness via an equivalent Poisson compound measure scheme that yields a verification theorem, proves existence by iteration, and separately demonstrates convergence of the value function to the classical impulse control problem as the randomization parameter λ vanishes (combined with C^{2,α}_loc regularity). None of these steps reduces the central claims to a fitted input renamed as prediction, a self-definitional loop, or a load-bearing self-citation whose content is unverified outside the present work. The framework is constructed to provide an approximation whose well-posedness and limiting behavior are established directly from the stated assumptions and iterative arguments.
Axiom & Free-Parameter Ledger
free parameters (1)
- randomization parameter λ
axioms (1)
- domain assumption The state process is a diffusion satisfying standard regularity conditions that allow the HJB derivation and C^{2,α}_loc regularity.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The solution to this randomized problem is characterized as the fixed point of a compound operator which consists of a regularized nonlocal operator and a regularized stopping operator... Through an equivalent randomization scheme with a Poisson compound measure, we establish a verification theorem
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We then demonstrate that our randomized impulse control problem converges to its classical counterpart as the randomization parameter λ vanishes. This convergence, combined with the value function’s C^{2,α}_loc regularity
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Luis HR Alvarez. A class of solvable impulse control problems.Applied Mathematics and Optimization, 49(3):265–295, 2004
work page 2004
-
[2]
Parsiad Azimzadeh, Erhan Bayraktar, and George Labahn. Convergence of implicit schemes for hamilton–jacobi–bellman quasi-variational inequalities.SIAM Journal on Control and Optimization, 56(6):3994–4016, 2018
work page 2018
-
[3]
Parsiad Azimzadeh and Peter A Forsyth. Weakly chained matrices, policy iteration, and impulse control.SIAM Journal on Numerical Analysis, 54(3):1341–1364, 2016. 35
work page 2016
-
[4]
Matteo Basei. Optimal price management in retail energy markets: an impulse control problem with asymptotic estimates.Mathematical Methods of Operations Research, 89(3):355–383, 2019
work page 2019
-
[5]
Erhan Bayraktar, Thomas Emmerling, and Jos ´e-Luis Menaldi. On the impulse control of jump diffusions.SIAM Journal on Control and Optimization, 51(3):2612–2637, 2013
work page 2013
-
[6]
Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019
Alain Bensoussan and Benoˆıt Chevalier-Roignant. Sequential capacity expansion options.Opera- tions Research, 67(1):33–57, 2019
work page 2019
-
[7]
Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities
Alain. Bensoussan and J-L Lions.Impulse control and quasi-variational inequalities. Gauthier- Villars, Paris, 1984
work page 1984
-
[8]
Alain Bensoussan and J-L Lions.Applications of variational inequalities in stochastic control, volume 12. Elsevier, 2011
work page 2011
-
[9]
Giulio Bertola, Wolfgang J Runggaldier, and Kazuhiro Yasuda. On classical and restricted impulse stochastic control for the exchange rate.Applied Mathematics & Optimization, 74(74), 2016
work page 2016
-
[10]
Risk sensitive asset management with transaction costs
Tomasz R Bielecki and Stanley R Pliska. Risk sensitive asset management with transaction costs. Finance and Stochastics, 4(1):1–33, 2000
work page 2000
-
[11]
Messaoud Boulbrachene. The noncoercive quasi-variational inequalities related to impulse control problems.Computers & Mathematics with Applications, 35(12):101–108, 1998
work page 1998
-
[12]
Messaoud Boulbrachene. Pointwise error estimates for a class of elliptic quasi-variational in- equalities with nonlinear source terms.Applied mathematics and computation, 161(1):129–138, 2005
work page 2005
-
[13]
Abel Cadenillas, Tahir Choulli, Michael Taksar, and Lei Zhang. Classical and impulse stochastic control for the optimization of the dividend and risk policies of an insurance firm.Mathematical Finance, 16(1):181–202, 2006
work page 2006
-
[14]
George M Constantinides and Scott F Richard. Existence of optimal simple policies for discounted- cost inventory and cash management in continuous time.Operations research, 26(4):620–636, 1978
work page 1978
-
[15]
Impulse control of multidimensional jump diffusions
Mark HA Davis, Xin Guo, and Guoliang Wu. Impulse control of multidimensional jump diffusions. SIAM Journal on Control and Optimization, 48(8):5276–5293, 2010
work page 2010
-
[16]
Robert Denkert, Huy ˆen Pham, and Xavier Warin. Control randomisation approach for policy gradient and application to reinforcement learning in optimal switching.Applied Mathematics & Optimization, 91(1):9, 2025
work page 2025
-
[17]
Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024
Jodi Dianetti, Giorgio Ferrari, and Renyuan Xu. Exploratory optimal stopping: A singular control formulation.arXiv preprint arXiv:2408.09335, 2024
-
[18]
Yuchao Dong. Randomized optimal stopping problem in continuous time and reinforcement learning algorithm.SIAM Journal on Control and Optimization, 62(3):1590–1614, 2024
work page 2024
-
[19]
Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988
Jerome F Eastham and Kevin J Hastings. Optimal impulse control of portfolios.Mathematics of Operations Research, 13(4):588–605, 1988
work page 1988
-
[20]
American Mathematical Society, 1998
Lawrence C Evans.Partial Differential Equations, volume 19. American Mathematical Society, 1998
work page 1998
-
[21]
Grundlehren der mathematischen Wissenschaften
David Gilbarg and Neil S Trudinger.Elliptic Partial Differential Equations of Second Order. Grundlehren der mathematischen Wissenschaften. Springer Berlin Heidelberg, 2013. 36
work page 2013
-
[22]
Xin Guo and Guoliang Wu. Smooth fit principle for impulse control of multidimensional diffusion processes.SIAM Journal on Control and Optimization, 48(2):594–617, 2009
work page 2009
-
[23]
Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983
J M Harrison, T M Sellke, and A J Taylor. Impulse control of Brownian motion.Mathematics of Operations Research, 8(3):454–466, 1983
work page 1983
-
[24]
Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983
J M Harrison and M I Taksar. Instantaneous control of Brownian motion.Mathematics of Operations Research, 1983
work page 1983
-
[25]
Dynamic programming and markov processes
Ronald A Howard. Dynamic programming and markov processes. 1960
work page 1960
-
[26]
Masashi Ieda. An implicit method for the finite time horizon hamilton–jacobi–bellman quasi- variational inequalities.Applied Mathematics and Computation, 265:163–175, 2015
work page 2015
-
[27]
Springer Science & Business Media, 2009
Monique Jeanblanc, Marc Yor, and Marc Chesney.Mathematical methods for financial markets. Springer Science & Business Media, 2009
work page 2009
-
[28]
Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995
M Jeanblanc-Picqu´e and A N Shiryaev. Optimization of the flow of dividends.Russian Mathematical Surveys, 50(2):257–277, 1995
work page 1995
-
[30]
Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993
Monique Jeanblanc-Picqu´e. Impulse control method and exchange rate.Mathematical Finance, 3(2):161–177, 1993
work page 1993
-
[31]
Yanwei Jia and Xun Yu Zhou. Policy evaluation and temporal-difference learning in continuous time and space: A martingale approach.Journal of Machine Learning Research, 23(154):1–55, 2022
work page 2022
-
[32]
q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023
Yanwei Jia and Xun Yu Zhou. q-learning in continuous time.Journal of Machine Learning Research, 24(161):1–61, 2023
work page 2023
-
[33]
Springer, New York, second edition
Ioannis Karatzas and Steven E Shreve.Brownian motion and stochastic calculus / Ioannis Karatzas, Steven Shreve.Graduate Texts in Mathematics ; 113. Springer, New York, second edition. edition, 1998
work page 1998
-
[34]
Idris Kharroubi, Jin Ma, Huy ˆen Pham, and Jianfeng Zhang. Backward SDEs with constrained jumps and quasi-variational inequalities.The Annals of Probability, 38(2):794 – 840, 2010
work page 2010
-
[35]
Portfolio optimisation with strictly positive transaction costs and impulse control
Ralf Korn. Portfolio optimisation with strictly positive transaction costs and impulse control. Finance and Stochastics, 2:85–114, 1998
work page 1998
-
[36]
Ralf Korn. Some applications of impulse control in mathematical finance.Mathematical Methods of Operations Research, 50:493–518, 1999
work page 1999
-
[37]
Vathana Ly Vath, Mohamed Mnif, and Huyˆen Pham. A model of optimal portfolio selection under liquidity risk and price impact.Finance and Stochastics, 11:51–90, 2007
work page 2007
-
[38]
D C Mauer and A J Triantis. Interactions of corporate financing and investment decisions: a dynamic framework.The Journal of Finance, 49(4):1253–1277, 1994
work page 1994
-
[39]
Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995
A J Morton and S R Pliska. Optimal portfolio management with fixed transaction costs.Mathemati- cal Finance, 5(4):337–356, 1995
work page 1995
-
[40]
Gabriela Mundaca and Bernt Oksendal. Optimal stochastic intervention control with application to the exchange rate.Economics, 29:225–243, 1998. 37
work page 1998
-
[41]
B Øksendal and A Sulem. Optimal consumption and portfolio with both fixed and proportional transaction costs.SIAM Journal on Control and Optimizations, 40(6):1765–1790, 2002
work page 2002
-
[42]
Bernt Øksendal and Agnes Sulem.Applied stochastic control of jump diffusions, volume 3. Springer, 2007
work page 2007
-
[43]
Springer Science & Business Media, 2009
Huyˆen Pham.Continuous-time stochastic control and optimization with financial applications, volume 61. Springer Science & Business Media, 2009
work page 2009
-
[44]
Christoph Reisinger and Yufei Zhang. A penalty scheme for monotone systems with interconnected obstacles: convergence and error estimates.SIAM Journal on Numerical Analysis, 57(4):1625–1648, 2019
work page 2019
-
[45]
Christoph Reisinger and Yufei Zhang. Error estimates of penalty schemes for quasi-variational inequalities arising from impulse control problems.SIAM Journal on Control and Optimization, 58(1):243–276, 2020
work page 2020
-
[46]
Agn`es Sulem. A solvable one-dimensional model of a diffusion inventory system.Mathematics of Operations Research, 11(1):125–133, 1986
work page 1986
-
[47]
Wenpin Tang, Yuming Paul Zhang, and Xun Yu Zhou. Exploratory hjb equations and their convergence.SIAM Journal on Control and Optimization, 60(6):3191–3216, 2022
work page 2022
-
[48]
Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990
A J Triantis and J E Hodder. Valuing flexibility as a complex option.The Journal of Finance, 45(2):549–565, 1990
work page 1990
-
[49]
Haoran Wang, Thaleia Zariphopoulou, and Xun Yu Zhou. Reinforcement learning in continuous time and space: A stochastic control approach.Journal of Machine Learning Research, 21(198):1–34, 2020. 38
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.