OmniVTG creates a new large-scale open-world VTG dataset using iterative concept-gap filling and timestamped captioning, paired with a three-stage self-correction CoT paradigm that yields SOTA zero-shot results on four existing benchmarks.
Cone: An efficient coarse-to-fine alignment frame- work for long video temporal grounding
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3verdicts
UNVERDICTED 3representative citing papers
UniversalVTG is a lightweight foundation model for video temporal grounding that achieves state-of-the-art results across five benchmarks while being over 100 times smaller than recent MLLM-based methods.
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.
citing papers explorer
-
OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding
OmniVTG creates a new large-scale open-world VTG dataset using iterative concept-gap filling and timestamped captioning, paired with a three-stage self-correction CoT paradigm that yields SOTA zero-shot results on four existing benchmarks.
-
UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding
UniversalVTG is a lightweight foundation model for video temporal grounding that achieves state-of-the-art results across five benchmarks while being over 100 times smaller than recent MLLM-based methods.
-
Multi-Scale Contrastive Learning for Video Temporal Grounding
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.