RoboFlamingo adapts open-source vision-language models for robot manipulation tasks via single-step comprehension plus an explicit policy head, outperforming prior methods on benchmarks with only light fine-tuning.
We loaded the pre-train 14 Preprint Table 5: Comparison of co-trained models and fine-tuned models on the CALVIN benchmark
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2023 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
Vision-Language Foundation Models as Effective Robot Imitators
RoboFlamingo adapts open-source vision-language models for robot manipulation tasks via single-step comprehension plus an explicit policy head, outperforming prior methods on benchmarks with only light fine-tuning.