Online intrinsic rewards for decision making agents from large language model feedback

Qinqing Zheng, Mikael Henaff, Amy Zhang, Aditya Grover, Brandon Amos · 2024 · arXiv 2410.23022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

cs.AI · 2026-04-27 · unverdicted · novelty 6.0

Hierarchical Behaviour Spaces uses linear combinations of reward functions to induce expressive behavior spaces in hierarchical RL, yielding strong performance on NetHack primarily through better exploration rather than long-term planning.

Scalable Option Learning in High-Throughput Environments

cs.LG · 2025-08-30 · unverdicted · novelty 6.0

SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Hierarchical Behaviour Spaces cs.AI · 2026-04-27 · unverdicted · none · ref 20
Hierarchical Behaviour Spaces uses linear combinations of reward functions to induce expressive behavior spaces in hierarchical RL, yielding strong performance on NetHack primarily through better exploration rather than long-term planning.
Scalable Option Learning in High-Throughput Environments cs.LG · 2025-08-30 · unverdicted · none · ref 76
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.

Online intrinsic rewards for decision making agents from large language model feedback

fields

years

verdicts

representative citing papers

citing papers explorer