Cone: An efficient coarse-to-fine alignment frame- work for long video temporal grounding

Hou, Z · 2022 · arXiv 2209.10918

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding

cs.CV · 2026-04-28 · unverdicted · novelty 7.0

OmniVTG creates a new large-scale open-world VTG dataset using iterative concept-gap filling and timestamped captioning, paired with a three-stage self-correction CoT paradigm that yields SOTA zero-shot results on four existing benchmarks.

UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

UniversalVTG is a lightweight foundation model for video temporal grounding that achieves state-of-the-art results across five benchmarks while being over 100 times smaller than recent MLLM-based methods.

Multi-Scale Contrastive Learning for Video Temporal Grounding

cs.CV · 2024-12-10 · unverdicted · novelty 6.0

A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.

citing papers explorer

Showing 3 of 3 citing papers.

OmniVTG: A Large-Scale Dataset and Training Paradigm for Open-World Video Temporal Grounding cs.CV · 2026-04-28 · unverdicted · none · ref 10
OmniVTG creates a new large-scale open-world VTG dataset using iterative concept-gap filling and timestamped captioning, paired with a three-stage self-correction CoT paradigm that yields SOTA zero-shot results on four existing benchmarks.
UniversalVTG: A Universal and Lightweight Foundation Model for Video Temporal Grounding cs.CV · 2026-04-09 · unverdicted · none · ref 19
UniversalVTG is a lightweight foundation model for video temporal grounding that achieves state-of-the-art results across five benchmarks while being over 100 times smaller than recent MLLM-based methods.
Multi-Scale Contrastive Learning for Video Temporal Grounding cs.CV · 2024-12-10 · unverdicted · none · ref 20
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.

Cone: An efficient coarse-to-fine alignment frame- work for long video temporal grounding

fields

years

verdicts

representative citing papers

citing papers explorer