LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.
Advances in neural information processing systems , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2verdicts
UNVERDICTED 2representative citing papers
FLO-EMD integrates flow-guided attention and EMD on aggregated motion traces to classify light, medium, and heavy congestion at 97.5% accuracy on 1,050 surveillance clips.
citing papers explorer
-
LanguageBind: Extending Video-Language Pretraining to N-modality by Language-based Semantic Alignment
LanguageBind aligns video, infrared, depth, and audio to a frozen language encoder via contrastive learning on the new VIDAL-10M dataset, extending video-language pretraining to N modalities.
-
Hybrid Congestion Classification Framework Using Flow-Guided Attention and Empirical Mode Decomposition
FLO-EMD integrates flow-guided attention and EMD on aggregated motion traces to classify light, medium, and heavy congestion at 97.5% accuracy on 1,050 surveillance clips.