Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, Baining Guo · 2021

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second

cs.CV · 2024-10-02 · unverdicted · novelty 6.0

Depth Pro is a fast foundation model for zero-shot metric monocular depth estimation that produces sharp high-resolution depth maps with absolute scale using a multi-scale vision transformer.

PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis

cs.CV · 2023-09-30 · accept · novelty 6.0

PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.

citing papers explorer

Showing 2 of 2 citing papers.

Depth Pro: Sharp Monocular Metric Depth in Less Than a Second cs.CV · 2024-10-02 · unverdicted · none · ref 235
Depth Pro is a fast foundation model for zero-shot metric monocular depth estimation that produces sharp high-resolution depth maps with absolute scale using a multi-scale vision transformer.
PixArt-$\alpha$: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis cs.CV · 2023-09-30 · accept · none · ref 31
PixArt-α matches commercial text-to-image quality with a diffusion transformer trained in 675 A100 GPU days through decomposed training stages, cross-attention text injection, and vision-language model dense captions.

Swin transformer: Hierarchical vision transformer using shifted windows

fields

years

verdicts

representative citing papers

citing papers explorer