Pith Number

pith:VXBTKPXP

pith:2021:VXBTKPXPWYONVRBHNNR45ZQN47

not attested not anchored not stored refs resolved

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

Baining Guo, Han Hu, Stephen Lin, Yixuan Wei, Yue Cao, Yutong Lin, Ze Liu, Zheng Zhang

Swin Transformer uses shifted windows in a hierarchical structure to make vision Transformers efficient backbones with linear complexity.

arxiv:2103.14030 v2 · 2021-03-25 · cs.CV · cs.LG

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{VXBTKPXPWYONVRBHNNR45ZQN47}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Its performance surpasses the previous state-of-the-art by a large margin of +2.7 box AP and +2.6 mask AP on COCO, and +3.2 mIoU on ADE20K, demonstrating the potential of Transformer-based models as vision backbones.

C2weakest assumption

The assumption that the fixed window size and shift pattern chosen for ImageNet will transfer without major retuning to detection and segmentation heads on COCO and ADE20K; the paper reports strong numbers but does not isolate how much of the gain comes from the backbone versus from the detection/segmentation heads.

C3one line summary

Swin Transformer reaches 87.3% ImageNet accuracy and sets new records on COCO detection and ADE20K segmentation by replacing global self-attention with shifted-window local attention inside a hierarchical pyramid.

References

85 extracted · 85 resolved · 4 Pith anchors

[1] Unilmv2: Pseudo-masked language models for uniﬁed language model pre-training 2020

[2] Toward transformer-based object detection 2012

[3] Irwan Bello, Barret Zoph, Ashish Vaswani, Jonathon Shlens, and Quoc V . Le. Attention augmented convolutional net- works, 2020. 3 2020

[4] YOLOv4: Optimal Speed and Accuracy of Object Detection 2004 · arXiv:2004.10934

[5] Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. Soft-nms – improving object detection with one line of code. In Proceedings of the IEEE International Conference on Computer Vision ( 2017

Formal links

1 machine-checked theorem link

Cited by

27 papers in Pith

XAMI -- A Benchmark Dataset for Artefact Detection in XMM-Newton Optical Images

Architecture-Aware Explanation Auditing for Industrial Visual Inspection

MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer

BEVCALIB: LiDAR-Camera Calibration via Geometry-Guided Bird's-Eye View Representations

Digitizing Nepal's Written Heritage: A Comprehensive HTR Pipeline for Old Nepali Manuscripts

Receipt and verification

First computed	2026-05-17T23:38:50.426932Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

adc3353eefb61cdac4276b63cee60de7fd510f54088b109eb528218d0dc5d1e2

Aliases

arxiv: 2103.14030 · arxiv_version: 2103.14030v2 · doi: 10.48550/arxiv.2103.14030 · pith_short_12: VXBTKPXPWYON · pith_short_16: VXBTKPXPWYONVRBH · pith_short_8: VXBTKPXP

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/VXBTKPXPWYONVRBHNNR45ZQN47 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: adc3353eefb61cdac4276b63cee60de7fd510f54088b109eb528218d0dc5d1e2

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "3863d01106f9d601de72bd6cc5b11369b05e213b307e2cfd244a112f89d7f2ee",
    "cross_cats_sorted": [
      "cs.LG"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2021-03-25T17:59:31Z",
    "title_canon_sha256": "429e2f6262f5bcd328a152a58f6dfa33854bfdd711788f8099511a4dd20e776f"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2103.14030",
    "kind": "arxiv",
    "version": 2
  }
}