Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning

· 2025 · arXiv 2503.09981

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Policy Gradient for Continuous-Time Mean-Field Control

math.OC · 2026-05-20 · conditional · novelty 7.0

Derives an explicit Gâteaux policy-gradient formula for entropy-regularized continuous-time mean-field control using the value function and cylindrical representations, then builds a model-based actor-critic scheme with PDE well-posedness analysis.

An Actor-Critic Framework for Continuous-Time Jump-Diffusion Controls with Normalizing Flows

math.OC · 2026-04-07 · unverdicted · novelty 7.0

An actor-critic framework built on a time-inhomogeneous little q-function and conditional normalizing flows serves as a mesh-free solver for entropy-regularized jump-diffusion control problems and stochastic games.

Extended mean-field control problems with Poissonian common noise: Stochastic maximum principle and Hamiltonian-Jacobi-Bellman equation

math.OC · 2024-07-07 · unverdicted · novelty 7.0

Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.

Discretization error from regularized Reinforcement Learning to continuous-time stochastic control

math.OC · 2026-04-23 · unverdicted · novelty 5.0

Derives quantitative convergence rates for the gap between optimal policies from regularized discrete-time Bellman equations and true optimal controls in underlying continuous-time stochastic problems.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Extended mean-field control problems with Poissonian common noise: Stochastic maximum principle and Hamiltonian-Jacobi-Bellman equation math.OC · 2024-07-07 · unverdicted · none · ref 31
Develops SMP for non-convex mean-field control with joint law dependence and Poisson common noise via relaxed controls and extension transformation, then derives equivalent HJB on measure space.

Accuracy of discretely sampled stochastic policies in continuous-time reinforcement learning

fields

years

verdicts

representative citing papers

citing papers explorer