AffordanceVLA proposes a VLA model with affordance-aware modules (Which2Act, Where2Act, How2Act) in a Mixture-of-Transformer trained in three stages to improve robotic manipulation.
Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.RO 6roles
background 2polarities
background 2representative citing papers
A task-conditioned two-stage system decouples grasp localization from interaction trajectory planning using specialized foundation models to improve generalization across heterogeneous object types.
PALM improves long-horizon robotic manipulation success by distilling affordance representations for object interaction and predicting within-subtask progress in a VLA model.
ViTacFormer learns a cross-modal visuo-tactile latent space with autoregressive tactile prediction and an easy-to-hard curriculum, then uses the representation for imitation learning that yields ~50% higher success and the first reported 11-stage, 2.5-minute autonomous dexterous tasks.
ReGIL retrieves segments from a single demonstration to compute local temporal-alignment rewards and guide policy training, achieving >75% success on three real-robot tasks with <1 hour of online data.