Rotary position embedding for vision transformer

Byeongho Heo, Song Park, Dongyoon Han, Sangdoo Yun · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

HyperDiT: Hyper-Connected Transformers for High-Fidelity Pixel-Space Diffusion

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

HyperDiT achieves FID 1.56 on ImageNet 256x256 in pixel space via hyper-connected cross-scale interactions, cross-attention, SA-RoPE, and VFM registers.

Elastic Attention Cores for Scalable Vision Transformers

cs.CV · 2026-05-12 · unverdicted · novelty 6.0

VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

citing papers explorer

Showing 2 of 2 citing papers.

HyperDiT: Hyper-Connected Transformers for High-Fidelity Pixel-Space Diffusion cs.CV · 2026-05-15 · unverdicted · none · ref 25
HyperDiT achieves FID 1.56 on ImageNet 256x256 in pixel space via hyper-connected cross-scale interactions, cross-attention, SA-RoPE, and VFM registers.
Elastic Attention Cores for Scalable Vision Transformers cs.CV · 2026-05-12 · unverdicted · none · ref 155
VECA learns effective visual representations using core-periphery attention where patches interact exclusively via a resolution-invariant set of learned core embeddings, achieving linear O(N) complexity while maintaining competitive performance.

Rotary position embedding for vision transformer

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer