Derives an explicit Gâteaux policy-gradient formula for entropy-regularized continuous-time mean-field control using the value function and cylindrical representations, then builds a model-based actor-critic scheme with PDE well-posedness analysis.
Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning
4 Pith papers cite this work. Polarity classification is still indexing.
fields
math.OC 4representative citing papers
An actor-critic framework built on a time-inhomogeneous little q-function and conditional normalizing flows serves as a mesh-free solver for entropy-regularized jump-diffusion control problems and stochastic games.
Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.
Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.
citing papers explorer
-
Policy Gradient for Continuous-Time Mean-Field Control
Derives an explicit Gâteaux policy-gradient formula for entropy-regularized continuous-time mean-field control using the value function and cylindrical representations, then builds a model-based actor-critic scheme with PDE well-posedness analysis.
-
An Actor-Critic Framework for Continuous-Time Jump-Diffusion Controls with Normalizing Flows
An actor-critic framework built on a time-inhomogeneous little q-function and conditional normalizing flows serves as a mesh-free solver for entropy-regularized jump-diffusion control problems and stochastic games.
-
Extended mean-field control problems with Poissonian common noise: Stochastic maximum principle and Hamiltonian-Jacobi-Bellman equation
Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.
-
Discretization error from regularized Reinforcement Learning to continuous-time stochastic control
Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.