DynaFLIP pre-trains dynamics-aware image encoders by aligning image, language, and 3D flow modalities through simplex-volume minimization plus regularizers on video triplets, yielding reusable backbones that improve manipulation policies by up to 22.5% in out-of-distribution settings.
Language-grounded decoupled action representation for robotic manipulation.arXiv preprint arXiv:2603.12967, 2026
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DynaFLIP: Rethinking Robotics Perception via Tri-Modal-Dynamics Guided Representation
DynaFLIP pre-trains dynamics-aware image encoders by aligning image, language, and 3D flow modalities through simplex-volume minimization plus regularizers on video triplets, yielding reusable backbones that improve manipulation policies by up to 22.5% in out-of-distribution settings.