VoxAfford fuses multi-scale voxel features into MLLM output tokens using cross-attention with a learned compatibility gate to achieve SOTA open-vocabulary 3D affordance detection with ~8% mIoU gain and zero-shot robot transfer.
Affordance-centric policy learning: Sample efficient and generalisable robot policy learning using affordance-centric task frames
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3verdicts
UNVERDICTED 3representative citing papers
PACTS jointly model action trajectories and predicate belief trajectories in a single generative policy, enabling zero-shot skill composition via symbolic planning without retraining.
AffordVLA improves VLA models for robotic manipulation by implicitly injecting affordance perception through feature alignment with a zero-shot teacher, claiming SOTA results in simulation and real-world tests.
citing papers explorer
-
VoxAfford: Multi-Scale Voxel-Token Fusion for Open-Vocabulary 3D Affordance Detection
VoxAfford fuses multi-scale voxel features into MLLM output tokens using cross-attention with a learned compatibility gate to achieve SOTA open-vocabulary 3D affordance detection with ~8% mIoU gain and zero-shot robot transfer.
-
Jointly Learning Predicates and Actions Enables Zero-Shot Skill Composition
PACTS jointly model action trajectories and predicate belief trajectories in a single generative policy, enabling zero-shot skill composition via symbolic planning without retraining.
-
AffordVLA: Injecting Affordance Representations into Vision-Language-Action Models via Implicit Feature Alignment
AffordVLA improves VLA models for robotic manipulation by implicitly injecting affordance perception through feature alignment with a zero-shot teacher, claiming SOTA results in simulation and real-world tests.