B A NALYSIS OF CLIP ON CROPPED REGIONS In this section, we analyze some common failure cases of CLIP on cropped regions and discuss possible ways to mitigate these problems

It seems that the mask prediction is sometimes based on low-level appearance rather than semantics · 2022

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation

cs.CV · 2021-04-28 · conditional · novelty 7.0

ViLD distills region and text embeddings from a teacher vision-language model into a student detector, enabling open-vocabulary detection that outperforms supervised baselines on held-out rare classes in LVIS and transfers to COCO, VOC, and Objects365.

citing papers explorer

Showing 1 of 1 citing paper.

Open-vocabulary Object Detection via Vision and Language Knowledge Distillation cs.CV · 2021-04-28 · conditional · none · ref 9
ViLD distills region and text embeddings from a teacher vision-language model into a student detector, enabling open-vocabulary detection that outperforms supervised baselines on held-out rare classes in LVIS and transfers to COCO, VOC, and Objects365.

B A NALYSIS OF CLIP ON CROPPED REGIONS In this section, we analyze some common failure cases of CLIP on cropped regions and discuss possible ways to mitigate these problems

fields

years

verdicts

representative citing papers

citing papers explorer