SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
Learning and transfer of modulated locomotor controllers
4 Pith papers cite this work. Polarity classification is still indexing.
abstract
We study a novel architecture and training procedure for locomotion tasks. A high-frequency, low-level "spinal" network with access to proprioceptive sensors learns sensorimotor primitives by training on simple tasks. This pre-trained module is fixed and connected to a low-frequency, high-level "cortical" network, with access to all sensors, which drives behavior by modulating the inputs to the spinal network. Where a monolithic end-to-end architecture fails completely, learning with a pre-trained spinal module succeeds at multiple high-level tasks, and enables the effective exploration required to learn from sparse rewards. We test our proposed architecture on three simulated bodies: a 16-dimensional swimming snake, a 20-dimensional quadruped, and a 54-dimensional humanoid. Our results are illustrated in the accompanying video at https://youtu.be/sboPYvhpraQ
verdicts
UNVERDICTED 4representative citing papers
LatentMimic decouples stylistic fidelity from geometric terrain constraints in quadruped locomotion via marginal latent divergence to a mocap prior and a dynamic replay buffer, yielding higher traversal success than motion-tracking baselines while preserving gait style.
Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.
Generative model with normalized pairwise distance constraint discovers output space topologies from sparse data and outperforms GANs and VAEs by avoiding mode collapse.
citing papers explorer
-
Scalable Option Learning in High-Throughput Environments
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
-
LatentMimic: Terrain-Adaptive Locomotion via Latent Space Imitation
LatentMimic decouples stylistic fidelity from geometric terrain constraints in quadruped locomotion via marginal latent divergence to a mocap prior and a dynamic replay buffer, yielding higher traversal success than motion-tracking baselines while preserving gait style.
-
Learning to Solve a Rubik's Cube with a Dexterous Hand
Hierarchical RL combines a model-based cube solver with a model-free hand controller to solve Rubik's cubes in simulation, achieving 90.3% success on 1400 random scrambles.
-
Neural Embedding for Physical Manipulations
Generative model with normalized pairwise distance constraint discovers output space topologies from sparse data and outperforms GANs and VAEs by avoiding mode collapse.