Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.
Never give up: Learning directed exploration strategies.arXiv preprint arXiv:2002.06038, 2020
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
LPM uses a dual-network design to compute intrinsic rewards from the change in prediction error across iterations, providing a noise-robust signal that is theoretically linked to information gain.
citing papers explorer
-
Baba in Wonderland: Online Self-Supervised Dynamics Discovery for Executable World Models
Alice uses preservation conflicts from failed candidate updates to create class-stratified hypotheses and guide exploration, improving executable world-model learning under prior misalignment.
-
Beyond Noisy-TVs: Noise-Robust Exploration Via Learning Progress Monitoring
LPM uses a dual-network design to compute intrinsic rewards from the change in prediction error across iterations, providing a noise-robust signal that is theoretically linked to information gain.