GuideDog supplies 22K egocentric image-description pairs from 46 countries and an 818-sample QA benchmark showing that current multimodal models still struggle with depth perception and BLV-specific guidance rules.
Lora: Low-rank adaptation of large language models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2025 2verdicts
UNVERDICTED 2representative citing papers
A gaze-only student model distilled from a joint gaze-video teacher achieves high skill-assessment accuracy using 73x less power than prior methods.
citing papers explorer
-
GuideDog: A Real-World Egocentric Multimodal Dataset for Blind and Low-Vision Accessibility-Aware Guidance
GuideDog supplies 22K egocentric image-description pairs from 46 countries and an 818-sample QA benchmark showing that current multimodal models still struggle with depth perception and BLV-specific guidance rules.
-
SkillSight: Efficient First-Person Skill Assessment with Gaze
A gaze-only student model distilled from a joint gaze-video teacher achieves high skill-assessment accuracy using 73x less power than prior methods.