Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properties, plus a truncation argument for unbounded noise.
TorchBeast: A PyTo rch Platform for Distributed RL
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
More capable RL agents exploit reward misspecifications more often, with phase transitions in behavior, and anomaly detectors can identify misaligned policies.
citing papers explorer
-
Concentration of General Stochastic Approximation Under Heavy-Tailed Markovian Noise
Establishes maximal concentration bounds for stochastic approximation under heavy-tailed Markovian noise, with tails ranging from sub-Gaussian to heavier than Weibull depending on step sizes and contractivity properties, plus a truncation argument for unbounded noise.
-
Scalable Option Learning in High-Throughput Environments
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
-
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
More capable RL agents exploit reward misspecifications more often, with phase transitions in behavior, and anomaly detectors can identify misaligned policies.