Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

Wang, W · 2021 · arXiv 2102.12122

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

cs.CV · 2021-03-25 · accept · novelty 8.0

Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.

Modality-Agnostic Prompt Learning for Multi-Modal Camouflaged Object Detection

cs.CV · 2026-04-14 · unverdicted · novelty 7.0

A framework uses modality-agnostic prompts to adapt SAM for multi-modal camouflaged object detection, with a mask refine module for better boundaries.

Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

DAR replaces GAP with an attention-based aggregation module retrained jointly with the classifier head to disentangle core from spurious features and outperforms DFR on multiple datasets.

Accuracy Improvement of Cell Image Segmentation Using Feedback Former

cs.CV · 2024-08-23 · unverdicted · novelty 5.0

Feedback Former improves cell image segmentation accuracy by feeding detailed feature maps back from near the output to lower transformer layers, outperforming non-feedback baselines with lower computational cost on three datasets.

citing papers explorer

Showing 4 of 4 citing papers.

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows cs.CV · 2021-03-25 · accept · none · ref 66
Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
Modality-Agnostic Prompt Learning for Multi-Modal Camouflaged Object Detection cs.CV · 2026-04-14 · unverdicted · none · ref 23
A framework uses modality-agnostic prompts to adapt SAM for multi-modal camouflaged object detection, with a mask refine module for better boundaries.
Deep Attention Reweighting: Post-Hoc Attention-Based Feature Aggregation in CNNs for Disentangling Core and Spurious Features under Spurious Correlations cs.CV · 2026-05-20 · unverdicted · none · ref 69
DAR replaces GAP with an attention-based aggregation module retrained jointly with the classifier head to disentangle core from spurious features and outperforms DFR on multiple datasets.
Accuracy Improvement of Cell Image Segmentation Using Feedback Former cs.CV · 2024-08-23 · unverdicted · none · ref 29
Feedback Former improves cell image segmentation accuracy by feeding detailed feature maps back from near the output to lower transformer layers, outperforming non-feedback baselines with lower computational cost on three datasets.

Pyramid vision transformer: A versatile backbone for dense prediction without convolutions

fields

years

verdicts

representative citing papers

citing papers explorer