TEGU improves zero-shot temporal action localization by using rich textual information from LLMs and video captions to better distinguish fine-grained actions without any training on labeled data.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Multimodal deep learning for ambivalence/hesitancy recognition in videos yields limited results on the BAH dataset, highlighting the need for improved spatio-temporal and cross-modal fusion methods.
citing papers explorer
-
Zero-Shot Temporal Action Localization Through Textual Guidance
TEGU improves zero-shot temporal action localization by using rich textual information from LLMs and video captions to better distinguish fine-grained actions without any training on labeled data.
-
Multimodal Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions
Multimodal deep learning for ambivalence/hesitancy recognition in videos yields limited results on the BAH dataset, highlighting the need for improved spatio-temporal and cross-modal fusion methods.