Task Bias in Vision-Language Models

Carl Vondrick; Ishaan Preetam Chandratreya; Sachit Menon

arxiv: 2212.04412 · v1 · pith:SYOL5GFZnew · submitted 2022-12-08 · 💻 cs.CV · cs.LG

Task Bias in Vision-Language Models

Sachit Menon , Ishaan Preetam Chandratreya , Carl Vondrick This is my paper

classification 💻 cs.CV cs.LG

keywords taskvisualtowardsrepresentationbiasbiasedrepresentationstasks

0 comments

read the original abstract

Incidental supervision from language has become a popular approach for learning generic visual representations that can be prompted to perform many recognition tasks in computer vision. We conduct an in-depth exploration of the CLIP model and show that its visual representation is often strongly biased towards solving some tasks more than others. Moreover, which task the representation will be biased towards is unpredictable, with little consistency across images. To resolve this task bias, we show how to learn a visual prompt that guides the representation towards features relevant to their task of interest. Our results show that these visual prompts can be independent of the input image and still effectively provide a conditioning mechanism to steer visual representations towards the desired task.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Language-Instructed Vision Embeddings for Controllable and Generalizable Perception
cs.CV 2026-06 unverdicted novelty 6.0

LIVE uses language to generate task-centric vision embeddings at inference, reducing hallucinations by 34 points on MMVP, outperforming larger VLMs on VQA, and generalizing to unseen tasks.