ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
Keypoint action tokens enable in-context imitation learning in robotics
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 8roles
background 2representative citing papers
StaKe adds lightweight auxiliary heads for manipulation stage identification and next-gripper-transition keyframe prediction to VLA fine-tuning, reporting relative success rate gains of 14% in bimanual simulation and 56% on single-arm real-robot tasks.
SynthICL trains flow-matching transformer policies for in-context imitation learning entirely from synthetic RGB data and reports 79% average success on 16 unseen real manipulation tasks with one test-time demonstration.
Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.
J-PARSE modifies the Jacobian via aspect-ratio thresholding and directional projection to enable stable first-order inverse kinematic velocity control through kinematic singularities in serial manipulators.
Proposes a Red Team-Blue Team adversarial gamification architecture to generate synthetic hazardous scenarios for learning robot safety policies.
citing papers explorer
-
ReKep: Spatio-Temporal Reasoning of Relational Keypoint Constraints for Robotic Manipulation
ReKep encodes robotic tasks as optimizable Python functions over 3D keypoints that are generated automatically from language and RGB-D input, enabling real-time hierarchical planning on single- and dual-arm platforms without task-specific data.
-
Decompose and Recompose: Reasoning New Skills from Existing Abilities for Cross-Task Robotic Manipulation
Decompose and Recompose decomposes seen robotic demonstrations into skill-action alignments and recomposes them via visual-semantic retrieval and planning to enable zero-shot cross-task generalization.