Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
arXiv preprint arXiv:2107.00782 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2023 2verdicts
UNVERDICTED 2representative citing papers
Pith review generated a malformed one-line summary.
citing papers explorer
-
Vision Transformers Need Registers
Adding register tokens to Vision Transformers eliminates high-norm background artifacts and raises state-of-the-art performance on dense visual prediction tasks.
-
DINOv2: Learning Robust Visual Features without Supervision
Pith review generated a malformed one-line summary.