TokenMask computes mask logits directly from query-token affinities in token space and interpolates in logit space rather than feature space, yielding efficiency gains for ViT segmentation.
End-to- end object detection with transformers
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Token-Space Mask Prediction for Efficient Vision Transformer Segmentation
TokenMask computes mask logits directly from query-token affinities in token space and interpolates in logit space rather than feature space, yielding efficiency gains for ViT segmentation.