ClickAIXR combines controller-based object selection in XR with on-device VLM inference to enable private, precise multimodal queries about real objects.
GazePointAR: A context-aware multimodal voice assistant for pronoun disambiguation in wearable augmented reality
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ClickAIXR: On-Device Multimodal Vision-Language Interaction with Real-World Objects in Extended Reality
ClickAIXR combines controller-based object selection in XR with on-device VLM inference to enable private, precise multimodal queries about real objects.