VMamba introduces a state-space vision backbone using 2D selective scanning across four routes to achieve linear complexity and strong performance on image tasks.
Flashattention: Fast and memory- efficient exact attention with io-awareness
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2representative citing papers
InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.
citing papers explorer
-
VMamba: Visual State Space Model
VMamba introduces a state-space vision backbone using 2D selective scanning across four routes to achieve linear complexity and strong performance on image tasks.
-
InternVid: A Large-scale Video-Text Dataset for Multimodal Understanding and Generation
InternVid supplies 7M videos and LLM captions to train ViCLIP, which reaches leading zero-shot action recognition and competitive retrieval performance.