LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
Proceedings of the AAAI conference on artificial intelligence , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
A hierarchical framework extracts implicit safety criteria from crowd preferences and composes them via high-level policy to reduce safety violations in downstream RL tasks without explicit safety rewards.
citing papers explorer
-
Goal-Conditioned Agents that Learn Everything All at Once
LEO enables efficient all-goals learning in goal-conditioned RL by jointly predicting for all goals in one network pass, yielding >250x speedup over relabelling and better performance on Craftax.
-
Implicit Safety Alignment from Crowd Preferences
A hierarchical framework extracts implicit safety criteria from crowd preferences and composes them via high-level policy to reduce safety violations in downstream RL tasks without explicit safety rewards.