LongVT adds native video-cropping tool calling to LMMs for interleaved multimodal chain-of-tool-thought reasoning on long videos and releases VideoSIAH data for training and evaluation.
Figure 8 shows the RL prompt template, while Figure 9 presents the evaluation prompts used in LLM-as-a-Judge [55] for measuring an- swer’s accuracy during RL
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
LongVT: Incentivizing "Thinking with Long Videos" via Native Tool Calling
LongVT adds native video-cropping tool calling to LMMs for interleaved multimodal chain-of-tool-thought reasoning on long videos and releases VideoSIAH data for training and evaluation.