FreeZAD applies vision-language models with LogOIC scoring and frequency-based actionness calibration for training-free zero-shot temporal action detection, outperforming unsupervised methods on THUMOS14 and ActivityNet-1.3 while using 1/13 the runtime.
Ofa: Unifying architectures, tasks, and modalities through a simple sequence-to-sequence learning framework
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Training-Free Zero-Shot Temporal Action Detection with Vision-Language Models
FreeZAD applies vision-language models with LogOIC scoring and frequency-based actionness calibration for training-free zero-shot temporal action detection, outperforming unsupervised methods on THUMOS14 and ActivityNet-1.3 while using 1/13 the runtime.