ST-SimDiff is a training-free method using a spatio-temporal graph and dual similarity-difference selection to compress video tokens for MLLMs while retaining static and dynamic content.
Fast unfolding of communities in large networks.Journal of statistical mechanics: theory and experiment, 2008 (10):P10008,
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ST-SimDiff: Balancing Spatiotemporal Similarity and Difference for Efficient Video Understanding with MLLMs
ST-SimDiff is a training-free method using a spatio-temporal graph and dual similarity-difference selection to compress video tokens for MLLMs while retaining static and dynamic content.