Derived rates of order up to n^{-1/6} log^4(n S A) for the high-dimensional CLT of averaged asynchronous Q-learning iterates, plus a general martingale-difference CLT.
Sutton and Andrew G
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 7roles
background 1polarities
background 1representative citing papers
Agriculture emerges spontaneously in an RL agent society through planning for delayed rewards, social learning that counters cheaters, and an irreversible lock-in effect.
SRCT models streaming as concurrent reservoir filling with k standby streams, proving harmonic uptime bounds, 3-5x acquisition speedup, monotonic quality convergence, and a prospect-theoretic no-thrash switching rule.
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
Task information structure determines ML scaling success, with code's dense verifiable signals enabling predictable progress while sparse-feedback tasks like typical RL do not.
Establishes n^{-1/4} Gaussian approximation in convex distance for averaged entropy-regularized Q-learning with linear function approximation and polynomial stepsizes.
Hybrid-AIRL adds supervised expert guidance and stochastic regularization to AIRL, yielding higher sample efficiency and more stable learning on Gymnasium benchmarks and Heads-Up Limit Hold'em poker.
citing papers explorer
No citing papers match the current filters.