Timechat: A time-sensitive multimodal large language model for long video understanding

Shuhuai Ren, Linli Yao, Shicheng Li, Xu Sun, Lu Hou · 2024

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding

cs.CV · 2025-12-07 · conditional · novelty 6.0

DEViL offloads spatial grounding to a detector via a distilled reference-semantic token and temporal consistency regularization, reaching 43.1% m_vIoU at 14.33 FPS on HC-STVG.

citing papers explorer

Showing 1 of 1 citing paper.

Detector-Empowered Video Large Language Model for Efficient Spatio-Temporal Grounding cs.CV · 2025-12-07 · conditional · none · ref 55
DEViL offloads spatial grounding to a detector via a distilled reference-semantic token and temporal consistency regularization, reaching 43.1% m_vIoU at 14.33 FPS on HC-STVG.

Timechat: A time-sensitive multimodal large language model for long video understanding

fields

years

verdicts

representative citing papers

citing papers explorer