arXiv preprint arXiv:2301.00514 , year=

Rethinking the video sampling, reasoning strategies for temporal sentence grounding , author= · 2023 · arXiv 2301.00514

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Temporal-Aware Reasoning Optimization for Video Temporal Grounding

cs.CV · 2026-06-08 · unverdicted · novelty 6.0

TaRO improves video temporal grounding in MLLMs via constructive reasoning exploration from dense captions and a temporal-sensitivity reward that uses logit drops on disrupted event boundaries, followed by curriculum learning to SOTA results.

Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

Models frames and words as cooperative game players to value uncertain vision-language correspondences for proposal-free moment localization, reporting superior results on Charades-STA and ActivityNet Caption.

Multi-Scale Contrastive Learning for Video Temporal Grounding

cs.CV · 2024-12-10 · unverdicted · novelty 6.0

A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Temporal-Aware Reasoning Optimization for Video Temporal Grounding cs.CV · 2026-06-08 · unverdicted · none · ref 87
TaRO improves video temporal grounding in MLLMs via constructive reasoning exploration from dense captions and a temporal-sensitivity reward that uses logit drops on disrupted event boundaries, followed by curriculum learning to SOTA results.
Rethinking Weakly-supervised Video Temporal Grounding From a Game Perspective cs.CV · 2026-05-26 · unverdicted · none · ref 165
Models frames and words as cooperative game players to value uncertain vision-language correspondences for proposal-free moment localization, reporting superior results on Charades-STA and ActivityNet Caption.
Multi-Scale Contrastive Learning for Video Temporal Grounding cs.CV · 2024-12-10 · unverdicted · none · ref 73
A multi-scale and cross-scale contrastive learning framework uses intra-encoder stage features and a new sampling process to link short-range and long-range video moments for temporal grounding.

arXiv preprint arXiv:2301.00514 , year=

fields

years

verdicts

representative citing papers

citing papers explorer