Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 3representative citing papers
CrossWeaver introduces MIB and SAF modules to enable flexible, reliability-aware cross-modal interaction and fusion, achieving SOTA multimodal semantic segmentation with minimal parameters and generalization to unseen modality sets.
Proposes ACE module with three deformable convolution blocks that outperforms PPM and ASPP on Pascal-Context and ADE20K datasets for semantic segmentation.
citing papers explorer
-
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
-
CrossWeaver: Cross-modal Weaving for Arbitrary-Modality Semantic Segmentation
CrossWeaver introduces MIB and SAF modules to enable flexible, reliability-aware cross-modal interaction and fusion, achieving SOTA multimodal semantic segmentation with minimal parameters and generalization to unseen modality sets.
-
Adaptive Context Encoding Module for Semantic Segmentation
Proposes ACE module with three deformable convolution blocks that outperforms PPM and ASPP on Pascal-Context and ADE20K datasets for semantic segmentation.