TAIHRI is the first task-aware VLM for close-range HRI that localizes metric-scale 3D coordinates of critical keypoints by quantizing space and performing 2D keypoint reasoning via next-token prediction.
arXiv preprint arXiv:2512.06373 (2025)
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3years
2026 3representative citing papers
SAMOSA adapts SAM 2 for complex visual object tracking by integrating explicit nonlinear motion prediction, semantic cues for failure recovery, and geometric constraints for stability, outperforming prior SAM 2-based and supervised methods on benchmarks including anti-UAV datasets.
citing papers explorer
-
TAIHRI: Task-Aware 3D Human Keypoints Localization for Close-Range Human-Robot Interaction
TAIHRI is the first task-aware VLM for close-range HRI that localizes metric-scale 3D coordinates of critical keypoints by quantizing space and performing 2D keypoint reasoning via next-token prediction.
-
Segment Anything with Motion, Geometry, and Semantic Adaptation for Complex Nonlinear Visual Object Tracking
SAMOSA adapts SAM 2 for complex visual object tracking by integrating explicit nonlinear motion prediction, semantic cues for failure recovery, and geometric constraints for stability, outperforming prior SAM 2-based and supervised methods on benchmarks including anti-UAV datasets.
- Vision-OPD: Learning to See Fine Details for Multimodal LLMs via On-Policy Self-Distillation