pith. machine review for the scientific record. sign in

arxiv: 2601.16046 · v2 · submitted 2026-01-22 · 💻 cs.RO · cs.CV

Recognition: unknown

DextER: Language-driven Dexterous Grasp Generation with Embodied Reasoning

Authors on Pith no claims yet
classification 💻 cs.RO cs.CV
keywords graspcontactdexterembodiedgenerationreasoningdexteroushand
0
0 comments X
read the original abstract

Language-driven dexterous grasp generation requires the models to understand task semantics, 3D geometry, and complex hand-object interactions. While vision-language models have been applied to this problem, existing approaches directly map observations to grasp parameters without intermediate reasoning about physical interactions. We present DextER, Dexterous Grasp Generation with Embodied Reasoning, which introduces contact-based embodied reasoning for multi-finger manipulation. Our key insight is that predicting which hand links contact where on the object surface provides an embodiment-aware intermediate representation, bridging task semantics with physical constraints. DextER autoregressively generates embodied contact tokens specifying which finger links contact where on the object surface, followed by grasp tokens encoding the hand configuration. On DexGYS, DextER achieves 67.14% success rate, outperforming state-of-the-art by 3.83 p.p. with 96.4% improvement in intention alignment. We also demonstrate steerable generation through partial contact specification, providing fine-grained control over grasp synthesis.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SECOND-Grasp: Semantic Contact-guided Dexterous Grasping

    cs.RO 2026-05 conditional novelty 6.0

    SECOND-Grasp integrates semantic contact proposals from vision-language reasoning with geometric refinement to achieve 98%+ lifting success and improved intent-aware grasping on seen and unseen objects.