Efficientzero v2: Mastering discrete and continuous control with limited data

· 2024 · arXiv 2403.00564

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners

cs.AI · 2026-05-08 · unverdicted · novelty 6.0

Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.

Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction

cs.LG · 2026-03-07 · unverdicted · novelty 6.0

Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

cs.AI · 2025-09-30 · unverdicted · novelty 6.0

TimeRewarder derives step-wise progress rewards from frame-wise temporal distances in passive videos and uses them to guide RL, achieving high success rates on Meta-World tasks with fewer interactions than prior methods or hand-designed rewards.

Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own

cs.RO · 2023-10-04 · unverdicted · novelty 5.0

RLFP and the FAC algorithm combine foundation-model priors for policy, value, and rewards to produce sample-efficient robotic RL that reaches 86% real-robot success after one hour and 100% success on 7/8 Meta-world tasks in under 100k frames.

citing papers explorer

Showing 4 of 4 citing papers.

Reason to Play: Behavioral and Brain Alignment Between Frontier LRMs and Human Game Learners cs.AI · 2026-05-08 · unverdicted · none · ref 8
Frontier LRMs match human game-learning behavior and predict fMRI signals an order of magnitude better than RL or Bayesian agents because of their in-context game-state representations.
Dreamer-CDP: Improving Reconstruction-free World Models Via Continuous Deterministic Representation Prediction cs.LG · 2026-03-07 · unverdicted · none · ref 18
Dreamer-CDP achieves reconstruction-free world modeling via a JEPA-style predictor on continuous deterministic representations and matches Dreamer's performance on Crafter.
TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance cs.AI · 2025-09-30 · unverdicted · none · ref 15
TimeRewarder derives step-wise progress rewards from frame-wise temporal distances in passive videos and uses them to guide RL, achieving high success rates on Meta-World tasks with fewer interactions than prior methods or hand-designed rewards.
Reinforcement Learning with Foundation Priors: Let the Embodied Agent Efficiently Learn on Its Own cs.RO · 2023-10-04 · unverdicted · none · ref 58
RLFP and the FAC algorithm combine foundation-model priors for policy, value, and rewards to produce sample-efficient robotic RL that reaches 86% real-robot success after one hour and 100% success on 7/8 Meta-world tasks in under 100k frames.

Efficientzero v2: Mastering discrete and continuous control with limited data

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer