pith. sign in

hub Canonical reference

Exploration by Random Network Distillation

Canonical reference. 75% of citing Pith papers cite this work as background.

28 Pith papers citing it
Background 75% of classified citations
abstract

We introduce an exploration bonus for deep reinforcement learning methods that is easy to implement and adds minimal overhead to the computation performed. The bonus is the error of a neural network predicting features of the observations given by a fixed randomly initialized neural network. We also introduce a method to flexibly combine intrinsic and extrinsic rewards. We find that the random network distillation (RND) bonus combined with this increased flexibility enables significant progress on several hard exploration Atari games. In particular we establish state of the art performance on Montezuma's Revenge, a game famously difficult for deep reinforcement learning methods. To the best of our knowledge, this is the first method that achieves better than average human performance on this game without using demonstrations or having access to the underlying state of the game, and occasionally completes the first level.

hub tools

citation-role summary

background 6 method 1 other 1

citation-polarity summary

clear filters

representative citing papers

An Information-Geometric Approach to Artificial Curiosity

cs.LG · 2025-04-08 · unverdicted · novelty 7.0

Information geometry constrains intrinsic rewards to strictly concave functions of reciprocal occupancy, with geodesic interpolation on the occupancy manifold yielding a scalar-parameter family that includes count-based and max-entropy exploration.

Solving Rubik's Cube with a Robot Hand

cs.LG · 2019-10-16 · accept · novelty 7.0

Reinforcement learning models trained only in simulation using automatic domain randomization solve Rubik's cube with a real robot hand.

Goal-Conditioned Agents that Learn Everything All at Once

cs.LG · 2026-05-22 · unverdicted · novelty 6.0

LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.

Beyond Mode Collapse: Distribution Matching for Diverse Reasoning

cs.AI · 2026-05-19 · unverdicted · novelty 6.0

DMPO approximates forward KL minimization in on-policy RL by aligning the policy to a group-level reward-proportional target distribution, yielding 9-12% relative gains over GRPO on NP-Bench and smaller gains on math reasoning.

RoboNet: Large-Scale Multi-Robot Learning

cs.RO · 2019-10-24 · conditional · novelty 6.0

RoboNet is a multi-robot video dataset that enables pre-training of vision-based manipulation models which, after fine-tuning on a new robot, outperform robot-specific training that uses 4-20 times more data.

Shaping Zero-Shot Coordination via State Blocking

cs.LG · 2026-05-12 · unverdicted · novelty 6.0

SBC generates virtual environments via state blocking to expose agents to diverse suboptimal partner policies, yielding superior zero-shot coordination performance including with humans.

citing papers explorer

Showing 13 of 13 citing papers after filters.