Quality gains from extra thinking in Gemini models for video understanding plateau after the first few hundred tokens, Flash Lite balances quality and cost best, and tight reasoning budgets lead to compression-step hallucination where final outputs include un-reasoned content.
ActivityNet: A large-scale video benchmark for human activity understanding
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Do Thought Streams Matter? Evaluating Reasoning in Gemini Vision-Language Models for Video Scene Understanding
Quality gains from extra thinking in Gemini models for video understanding plateau after the first few hundred tokens, Flash Lite balances quality and cost best, and tight reasoning budgets lead to compression-step hallucination where final outputs include un-reasoned content.