pith:VXBTKPXP
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer uses shifted windows in a hierarchical structure to make vision Transformers efficient backbones with linear complexity.
arxiv:2103.14030 v2 · 2021-03-25 · cs.CV · cs.LG
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{VXBTKPXPWYONVRBHNNR45ZQN47}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones.
The assumption that the fixed window size and shift pattern chosen for ImageNet will transfer without major retuning to detection and segmentation heads on COCO and ADE20K; the paper reports strong numbers but does not isolate how much of the gain comes from the backbone versus from the detection/segmentation heads.
Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:50.426932Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
adc3353eefb61cdac4276b63cee60de7fd510f54088b109eb528218d0dc5d1e2
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/VXBTKPXPWYONVRBHNNR45ZQN47 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: adc3353eefb61cdac4276b63cee60de7fd510f54088b109eb528218d0dc5d1e2
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "3863d01106f9d601de72bd6cc5b11369b05e213b307e2cfd244a112f89d7f2ee",
"cross_cats_sorted": [
"cs.LG"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2021-03-25T17:59:31Z",
"title_canon_sha256": "429e2f6262f5bcd328a152a58f6dfa33854bfdd711788f8099511a4dd20e776f"
},
"schema_version": "1.0",
"source": {
"id": "2103.14030",
"kind": "arxiv",
"version": 2
}
}