GesVLA encodes gesture features directly into the latent space of VLA models using a dual-VLM architecture and a rendering-based data pipeline, yielding improved target grounding in real robotic tasks.
Gesture-informed robot assistance via foundation models,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A robot quadruped trainer achieved 60.6% better pace adherence and 45.9% higher speed consistency than a wearable device, with participants rating it substantially higher in ease, enjoyment, and helpfulness.
citing papers explorer
-
GesVLA: Gesture-Aware Vision-Language-Action Model Embedded Representations
GesVLA encodes gesture features directly into the latent space of VLA models using a dual-VLM architecture and a rendering-based data pipeline, yielding improved target grounding in real robotic tasks.
-
Will People Enjoy a Robot Trainer? A Case Study with Snoopie the Pacerbot
A robot quadruped trainer achieved 60.6% better pace adherence and 45.9% higher speed consistency than a wearable device, with participants rating it substantially higher in ease, enjoyment, and helpfulness.