DECHRL models causal structures and stochastic delay distributions within hierarchical RL and incorporates them into a delay-aware empowerment objective to improve performance under temporal uncertainty.
arXiv preprint arXiv:1712.00948 , year=
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 6verdicts
UNVERDICTED 6roles
background 1polarities
background 1representative citing papers
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
SSE improves long-horizon goal-conditioned RL by using failure and partial-success transitions to identify unreliable subgoals, streamline high-level planning, and outperform prior hierarchical methods on benchmarks.
citing papers explorer
-
Delay-Empowered Causal Hierarchical Reinforcement Learning
DECHRL models causal structures and stochastic delay distributions within hierarchical RL and incorporates them into a delay-aware empowerment objective to improve performance under temporal uncertainty.
-
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
-
Scalable Option Learning in High-Throughput Environments
SOL is a new hierarchical RL algorithm that reaches 35x higher throughput and outperforms flat agents when trained on 30 billion frames in NetHack while showing positive scaling.
-
Training Language Models to Self-Correct via Reinforcement Learning
SCoRe uses multi-turn online RL with regularization on self-generated traces to improve LLM self-correction, achieving 15.6% and 9.1% gains on MATH and HumanEval for Gemini models.
-
Abstraction for Offline Goal-Conditioned Reinforcement Learning
Introduces relativised options and hierarchical abstraction to reuse experience across similar contexts in offline GCRL, with two algorithms demonstrating performance gains.
-
Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
SSE improves long-horizon goal-conditioned RL by using failure and partial-success transitions to identify unreliable subgoals, streamline high-level planning, and outperform prior hierarchical methods on benchmarks.