Video swin transformer

· 2022

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning

cs.CV · 2026-05-11 · unverdicted · novelty 5.0

iPay fuses RGB and skeleton expert streams via dual-attention and a prior-driven Spatial Difference Discriminator to reach 83.45% accuracy on 500+ real-world payment clips from onboard transit cameras.

Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation

cs.LG · 2026-05-09 · unverdicted · novelty 5.0

CVA aggregates frozen VFM embeddings via latent reasoning to create compact video embeddings for efficient micro-video recommendation, delivering consistent performance gains and orders-of-magnitude efficiency improvements.

EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges

cs.CV · 2026-04-24 · unverdicted · novelty 4.0

EV-CLIP introduces mask and context visual prompts to adapt CLIP for improved few-shot video action recognition under visual challenges such as low light and egocentric views, outperforming other efficient methods with backbone-scale-independent efficiency.

citing papers explorer

Showing 3 of 3 citing papers.

iPay: Integrated Payment Action Recognition via Multimodal Networks and Adaptive Spatial Prior Learning cs.CV · 2026-05-11 · unverdicted · none · ref 7
iPay fuses RGB and skeleton expert streams via dual-attention and a prior-driven Spatial Difference Discriminator to reach 83.45% accuracy on 500+ real-world payment clips from onboard transit cameras.
Compressed Video Aggregator: Content-driven Module for Efficient Micro-Video Recommendation cs.LG · 2026-05-09 · unverdicted · none · ref 19
CVA aggregates frozen VFM embeddings via latent reasoning to create compact video embeddings for efficient micro-video recommendation, delivering consistent performance gains and orders-of-magnitude efficiency improvements.
EV-CLIP: Efficient Visual Prompt Adaptation for CLIP in Few-shot Action Recognition under Visual Challenges cs.CV · 2026-04-24 · unverdicted · none · ref 15
EV-CLIP introduces mask and context visual prompts to adapt CLIP for improved few-shot video action recognition under visual challenges such as low light and egocentric views, outperforming other efficient methods with backbone-scale-independent efficiency.

Video swin transformer

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer