MarkIt converts videos into query-conditioned marked versions via a linguistic-parsing and open-vocabulary segmentation bridge that embeds instance masks, semantic markers, and frame indices to improve Vid-LLM temporal grounding.
Towards visual-prompt temporal answer grounding in instructional video
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.MM 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MarkIt: Training-Free Visual Markers for Precise Video Temporal Grounding
MarkIt converts videos into query-conditioned marked versions via a linguistic-parsing and open-vocabulary segmentation bridge that embeds instance masks, semantic markers, and frame indices to improve Vid-LLM temporal grounding.