Vista: Enhancing long-duration and high-resolution video understanding by video spatiotemporal augmentation, 2024.https://arxiv.org/abs/2412.00927

Weiming Ren, Huan Yang, Jie Min, Cong Wei, Wenhu Chen · 2024 · arXiv 2412.00927

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

SmolVLM: Redefining small and efficient multimodal models

cs.AI · 2025-04-07 · unverdicted · novelty 6.0

SmolVLM-256M outperforms a 300-times larger model using under 1 GB GPU memory, while the 2.2B version matches state-of-the-art VLMs at half the memory cost.

citing papers explorer

Showing 1 of 1 citing paper.

SmolVLM: Redefining small and efficient multimodal models cs.AI · 2025-04-07 · unverdicted · none · ref 29
SmolVLM-256M outperforms a 300-times larger model using under 1 GB GPU memory, while the 2.2B version matches state-of-the-art VLMs at half the memory cost.

Vista: Enhancing long-duration and high-resolution video understanding by video spatiotemporal augmentation, 2024.https://arxiv.org/abs/2412.00927

fields

years

verdicts

representative citing papers

citing papers explorer