Derived rates of order up to n^{-1/6} log^4(n S A) for the high-dimensional CLT of averaged asynchronous Q-learning iterates, plus a general martingale-difference CLT.
Sutton and Andrew G
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
SRCT models streaming as concurrent reservoir filling with k standby streams, proving harmonic uptime bounds, 3-5x acquisition speedup, monotonic quality convergence, and a prospect-theoretic no-thrash switching rule.
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
Task information structure determines ML scaling success, with code's dense verifiable signals enabling predictable progress while sparse-feedback tasks like typical RL do not.
Establishes n^{-1/4} Gaussian approximation in convex distance for averaged entropy-regularized Q-learning with linear function approximation and polynomial stepsizes.
Hybrid-AIRL adds supervised expert guidance and stochastic regularization to AIRL, yielding higher sample efficiency and more stable learning on Gymnasium benchmarks and Heads-Up Limit Hold'em poker.
citing papers explorer
-
Gaussian Approximation for Asynchronous Q-learning
Derived rates of order up to n^{-1/6} log^4(n S A) for the high-dimensional CLT of averaged asynchronous Q-learning iterates, plus a general martingale-difference CLT.
-
The Streaming Reservoir Convergence Theorem: A Prospect-Theoretic Framework for Multi-Provider Adaptive Streaming
SRCT models streaming as concurrent reservoir filling with k standby streams, proving harmonic uptime bounds, 3-5x acquisition speedup, monotonic quality convergence, and a prospect-theoretic no-thrash switching rule.
-
Hyperfastrl: Hypernetwork-based reinforcement learning for unified control of parametric chaotic PDEs
Hypernetworks map a forcing parameter directly to policy weights in an RL framework, enabling unified stabilization of the Kuramoto-Sivashinsky equation across regimes with KAN architectures showing strongest extrapolation.
-
Why Code, Why Now: An Information-Theoretic Perspective on the Limits of Machine Learning
Task information structure determines ML scaling success, with code's dense verifiable signals enabling predictable progress while sparse-feedback tasks like typical RL do not.
-
On Gaussian approximation for entropy-regularized Q-learning with function approximation
Establishes n^{-1/4} Gaussian approximation in convex distance for averaged entropy-regularized Q-learning with linear function approximation and polynomial stepsizes.
-
Hybrid-AIRL: Enhancing Inverse Reinforcement Learning with Supervised Expert Guidance
Hybrid-AIRL adds supervised expert guidance and stochastic regularization to AIRL, yielding higher sample efficiency and more stable learning on Gymnasium benchmarks and Heads-Up Limit Hold'em poker.
- Emergence of agriculture in an artificial society of reinforcement learning agents