arXiv preprint arXiv:2402.10820 , year=

Alfredo Reichlin, Miguel Vasco, Hang Yin, Danica Kragic · 2024 · arXiv 2402.10820

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning

cs.LG · 2026-05-13 · unverdicted · novelty 7.0

Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.

QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.

Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning

cs.LG · 2026-04-22 · conditional · novelty 6.0

Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.

citing papers explorer

Showing 3 of 3 citing papers.

Switching Successor Measures for Hierarchical Zero-shot Reinforcement Learning cs.LG · 2026-05-13 · unverdicted · none · ref 48
Switching successor measures extend classical successor measures to enable hierarchical zero-shot RL via the FB π-Switch algorithm that extracts subgoal-selection and control policies from forward-backward representations.
QHyer: Q-conditioned Hybrid Attention-mamba Transformer for Offline Goal-conditioned RL cs.LG · 2026-05-03 · unverdicted · none · ref 157
QHyer replaces return-to-go with a state-conditioned Q-estimator and adds a gated hybrid attention-mamba backbone to achieve state-of-the-art performance in offline goal-conditioned RL on both Markovian and non-Markovian datasets.
Occupancy Reward Shaping: Improving Credit Assignment for Offline Goal-Conditioned Reinforcement Learning cs.LG · 2026-04-22 · conditional · none · ref 18
Occupancy Reward Shaping extracts goal-reaching rewards from world-model occupancy measures using optimal transport, improving offline goal-conditioned RL performance 2.2x on 13 tasks without changing the optimal policy.

arXiv preprint arXiv:2402.10820 , year=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer