ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.
Evo-rl: Towards iterative policy improvement in real-world offline rl
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
method 1
citation-polarity summary
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1roles
method 1polarities
use method 1representative citing papers
citing papers explorer
-
ProcVLM: Learning Procedure-Grounded Progress Rewards for Robotic Manipulation
ProcVLM learns procedure-grounded dense progress rewards for robotic manipulation via a reasoning-before-estimation VLM trained on a 60M-frame synthesized corpus from 30 embodied datasets.