An image is worth 16x16 words: Transformers for image recognition at scale

· 2021

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

browse 3 citing papers

representative citing papers

netFound: Principled Design for Network Foundation Models

cs.NI · 2023-10-25 · unverdicted · novelty 6.0

netFound is a pretrained network foundation model using protocol-aware tokenization, context embedding, hierarchical attention, and privacy design that reaches F1 0.95 on exogenous context discrimination versus under 0.62 for prior models.

Caption-Matching: A Multimodal Approach for Cross-Domain Image Retrieval

cs.CV · 2024-03-22 · unverdicted · novelty 5.0

Caption-Matching generates image captions via pre-trained VLMs and matches them across domains to achieve SOTA CDIR performance on Office-Home and DomainNet without labeled data or fine-tuning.

Fast and Efficient Transformer-based Method for Bird's Eye View Instance Prediction

cs.CV · 2024-11-11 · unverdicted · novelty 4.0

An efficient transformer architecture for BEV instance prediction reduces parameter counts and inference times versus SOTA by relying on a simplified paradigm of only instance segmentation and flow prediction.

citing papers explorer

Showing 3 of 3 citing papers.

netFound: Principled Design for Network Foundation Models cs.NI · 2023-10-25 · unverdicted · none · ref 47
netFound is a pretrained network foundation model using protocol-aware tokenization, context embedding, hierarchical attention, and privacy design that reaches F1 0.95 on exogenous context discrimination versus under 0.62 for prior models.
Caption-Matching: A Multimodal Approach for Cross-Domain Image Retrieval cs.CV · 2024-03-22 · unverdicted · none · ref 33
Caption-Matching generates image captions via pre-trained VLMs and matches them across domains to achieve SOTA CDIR performance on Office-Home and DomainNet without labeled data or fine-tuning.
Fast and Efficient Transformer-based Method for Bird's Eye View Instance Prediction cs.CV · 2024-11-11 · unverdicted · none · ref 21
An efficient transformer architecture for BEV instance prediction reduces parameter counts and inference times versus SOTA by relying on a simplified paradigm of only instance segmentation and flow prediction.

An image is worth 16x16 words: Transformers for image recognition at scale

fields

years

verdicts

representative citing papers

citing papers explorer