Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.
A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic.arXiv preprint arXiv:2007.05170, 2020
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
background 1representative citing papers
Introduces a modular unified Lyapunov template for continuous-time analysis of minimax, bilevel (via penalty), and min-min-max problems with explicit time-scale thresholds.
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
citing papers explorer
-
Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum
Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.