LEASE achieves state-of-the-art unified performance on ImageNet-1K by combining masked token reconstruction and codebook contrast losses in a one-time precomputed discrete token space.
TULIP: Token-length upgraded CLIP.arXiv preprint arXiv:2410.10034, 2024
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
TuringViT claims a new ViT design with linear attention and curated data that matches SOTA performance using 10% of typical pretraining data while supporting dynamic resolutions and improving VLM integration.
citing papers explorer
-
Learning from Semantic Dictionaries: Discriminative Codebook Contrastive Learning for Unified Visual Representation and Generation
LEASE achieves state-of-the-art unified performance on ImageNet-1K by combining masked token reconstruction and codebook contrast losses in a one-time precomputed discrete token space.
-
TuringViT: Making SOTA Vision Transformers Accessible to All
TuringViT claims a new ViT design with linear attention and curated data that matches SOTA performance using 10% of typical pretraining data while supporting dynamic resolutions and improving VLM integration.