pith. sign in

TimeRewarder: Learning Dense Reward from Passive Videos via Frame-wise Temporal Distance

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it
abstract

Designing dense rewards is crucial for reinforcement learning (RL), yet in robotics it often demands extensive manual effort and lacks scalability. One promising solution is to view task progress as a dense reward signal, as it quantifies the degree to which actions advance the system toward task completion over time. We present TimeRewarder, a simple yet effective reward learning method that derives progress estimation signals from passive videos, including robot demonstrations and human videos, by modeling temporal distances between frame pairs. We then demonstrate how TimeRewarder can supply step-wise proxy rewards to guide reinforcement learning. In our comprehensive experiments on ten challenging Meta-World tasks, we show that TimeRewarder dramatically improves RL for sparse-reward tasks, achieving nearly perfect success in 9/10 tasks with only 200,000 environment interactions per task. This approach outperformed previous methods and even the manually designed environment dense reward on both the final success rate and sample efficiency. Moreover, we show that TimeRewarder pretraining can exploit real-world human videos, highlighting its potential as a scalable approach to rich reward signals from diverse video sources.

fields

cs.RO 2

years

2026 2

verdicts

UNVERDICTED 2

clear filters

representative citing papers

Robots Need More than VLA and World Models

cs.RO · 2026-06-04 · unverdicted · novelty 5.0

The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • STEAM: Self-Supervised Temporal Ensemble Advantage Modeling for Real-World Robot Learning cs.RO · 2026-06-29 · unverdicted · none · ref 8 · internal anchor

    STEAM learns advantages from expert trajectories via self-supervised temporal ensemble modeling to improve policy learning on real robot tasks like bimanual folding and pick-and-place.

  • Robots Need More than VLA and World Models cs.RO · 2026-06-04 · unverdicted · none · ref 71 · internal anchor

    The paper identifies four missing interfaces (data autolabelling, embodiment retargeting, physics-grounded world models, and video-based reward inference) as the central bottleneck beyond VLA scaling for robot intelligence.