Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.
A two-timescale framework for bilevel optimization: Complexity analysis and application to actor-critic
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4verdicts
UNVERDICTED 4roles
background 1polarities
background 1representative citing papers
Develops optimistic and pessimistic calculus rules for set-valued bilevel constraints, derives nonsmooth adjoint inclusions, and proposes a convergent single-loop algorithm demonstrated on total variation inverse problems.
Introduces a modular unified Lyapunov template for continuous-time analysis of minimax, bilevel (via penalty), and min-min-max problems with explicit time-scale thresholds.
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.
citing papers explorer
-
Optimal Sample Complexity for Single Time-Scale Actor-Critic with Momentum
Single-timescale actor-critic with STORM momentum and a recent-sample buffer achieves optimal O(ε^{-2}) sample complexity for ε-optimal policies in finite discounted MDPs.
-
Single-loop approaches to nonsmooth bilevel optimisation
Develops optimistic and pessimistic calculus rules for set-valued bilevel constraints, derives nonsmooth adjoint inclusions, and proposes a convergent single-loop algorithm demonstrated on total variation inverse problems.
-
Continuous-Time Analysis for Minimax and Bilevel Problems
Introduces a modular unified Lyapunov template for continuous-time analysis of minimax, bilevel (via penalty), and min-min-max problems with explicit time-scale thresholds.
-
CHAL: Council of Hierarchical Agentic Language
CHAL is a multi-agent dialectic system that performs structured belief optimization over defeasible domains using Bayesian-inspired graph representations and configurable meta-cognitive value system hyperparameters.