A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
An unsupervised sentence embedding method by mutual information maximization
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2024 2verdicts
UNVERDICTED 2representative citing papers
Semantic similarity between texts is measured by the Jeffreys divergence between the image distributions induced by conditioning a diffusion model on each text, computed via Monte-Carlo sampling of the reverse-time SDEs.
citing papers explorer
-
Multi-Scale Contrastive Learning for Video Temporal Grounding
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
-
Conjuring Semantic Similarity
Semantic similarity between texts is measured by the Jeffreys divergence between the image distributions induced by conditioning a diffusion model on each text, computed via Monte-Carlo sampling of the reverse-time SDEs.