VoxAfford fuses multi-scale voxel features into MLLM output tokens using cross-attention with a learned compatibility gate to achieve SOTA open-vocabulary 3D affordance detection with ~8% mIoU gain and zero-shot robot transfer.
Affordancenet: An end-to-end deep learning approach for object affordance detection
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
CLASP achieves 87% success in open-vocabulary desktop grasping via dual-pathway perception, asynchronous closed-loop evaluation, and automated multimodal data synthesis.
citing papers explorer
-
VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection
VoxAfford fuses multi-scale voxel features into MLLM output tokens using cross-attention with a learned compatibility gate to achieve SOTA open-vocabulary 3D affordance detection with ~8% mIoU gain and zero-shot robot transfer.
-
CLASP: Closed-loop Asynchronous Spatial Perception for Open-vocabulary Desktop Object Grasping
CLASP achieves 87% success in open-vocabulary desktop grasping via dual-pathway perception, asynchronous closed-loop evaluation, and automated multimodal data synthesis.