Derives an explicit Gâteaux policy-gradient formula for entropy-regularized continuous-time mean-field control using the value function and cylindrical representations, then builds a model-based actor-critic scheme with PDE well-posedness analysis.
PhiBE: A PDE-based Bellman equation for continuous time policy evaluation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
math.OC 2years
2026 2representative citing papers
Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.
citing papers explorer
-
Policy Gradient for Continuous-Time Mean-Field Control
Derives an explicit Gâteaux policy-gradient formula for entropy-regularized continuous-time mean-field control using the value function and cylindrical representations, then builds a model-based actor-critic scheme with PDE well-posedness analysis.
-
Discretization error from regularized Reinforcement Learning to continuous-time stochastic control
Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.