pith. sign in

ActivityNet-QA: a dataset for understanding complex web videos via ques- tion answering

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

citation-role summary

baseline 1

citation-polarity summary

fields

cs.CV 3

years

2026 2 2024 1

verdicts

UNVERDICTED 3

roles

baseline 1

polarities

baseline 1

clear filters

representative citing papers

AdaCodec: A Predictive Visual Code for Video MLLMs

cs.CV · 2026-06-01 · unverdicted · novelty 6.0

AdaCodec introduces a predictive visual code that cuts visual token use in video MLLMs by sending full frames only on high predictive cost and otherwise encoding inter-frame changes as P-tokens, yielding better benchmark scores at lower budgets.

TempCompass: Do Video LLMs Really Understand Videos?

cs.CV · 2024-03-01 · unverdicted · novelty 6.0

TempCompass benchmark reveals that state-of-the-art Video LLMs have poor ability to perceive temporal aspects such as speed, direction, and ordering in videos.

citing papers explorer

Showing 1 of 1 citing paper after filters.