LPM uses a dual-network design to compute intrinsic rewards from the change in prediction error across iterations, providing a noise-robust signal that is theoretically linked to information gain.
Exploration and anti-exploration with distributional random network distillation.arXiv preprint arXiv:2401.09750,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
LPM uses a dual-network design to compute intrinsic rewards from the change in prediction error across iterations, providing a noise-robust signal that is theoretically linked to information gain.