STRNet improves goal-conditioned visual navigation by replacing simplistic encoders and pooling with a spatio-temporal fusion module that performs spatial graph reasoning and hybrid temporal modeling.
Learning spatiotemporal features with 3d convolutional networks
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
background 1polarities
background 1representative citing papers
B-MoE framework achieves state-of-the-art performance on micro-action recognition by using region-specific experts and cross-attention routing.
KeyTailor improves video virtual try-on realism by using instruction-guided keyframes to enhance garment details and background integrity in DiT models without major architectural changes.
citing papers explorer
-
STRNet: Visual Navigation with Spatio-Temporal Representation through Dynamic Graph Aggregation
STRNet improves goal-conditioned visual navigation by replacing simplistic encoders and pooling with a spatio-temporal fusion module that performs spatial graph reasoning and hybrid temporal modeling.
-
B-MoE: A Body-Part-Aware Mixture-of-Experts "All Parts Matter" Approach to Micro-Action Recognition
B-MoE framework achieves state-of-the-art performance on micro-action recognition by using region-specific experts and cross-attention routing.
-
The devil is in the details: Enhancing Video Virtual Try-On via Keyframe-Driven Details Injection
KeyTailor improves video virtual try-on realism by using instruction-guided keyframes to enhance garment details and background integrity in DiT models without major architectural changes.