Conditional prompt learning for vision-language mod- els

Kaiyang Zhou, Jingkang Yang, Chen Change Loy, Ziwei Liu

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.

Gate-and-Merge: Zero-shot Compositional Personalization of Vision Language Models

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

Gate-and-Merge enables zero-shot compositional personalization of VLMs by independently learning concept-specific LoRA adapters and merging them in weight space with cue-based gating to suppress interference.

Dual-Modality Anchor-Guided Filtering for Test-time Prompt Tuning

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

Dual-modality anchors from text descriptions and test-time image statistics filter views and ensemble predictions to improve test-time prompt tuning, achieving SOTA on 15 datasets.

TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery

cs.CV · 2026-02-22 · unverdicted · novelty 6.0

TokenTrace watermarks diffusion generations by jointly perturbing prompt embeddings and latent noise, enabling query-driven recovery of multiple independent concepts from one image.

Adapting Foundation Vision-Language Models to Medical Diagnosis via Query-Driven Expert Bridging

cs.CV · 2025-05-27 · unverdicted · novelty 6.0

MedBridge adapts pretrained VLMs to multi-label medical diagnosis via query tokens for non-destructive alignment and expert routing, reporting 6-15% AUC gains on chest radiograph benchmarks across eight models.

A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models

cs.CV · 2026-05-19 · unverdicted · novelty 5.0

A target-driven active learning approach for building efficient prompt sets in microscopy VLMs reaches 100% test accuracy with an average of 20 expert-verified images, outperforming random selection.

citing papers explorer

Showing 6 of 6 citing papers.

Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection cs.CV · 2026-05-11 · unverdicted · none · ref 46
Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
Gate-and-Merge: Zero-shot Compositional Personalization of Vision Language Models cs.CV · 2026-05-09 · unverdicted · none · ref 24
Gate-and-Merge enables zero-shot compositional personalization of VLMs by independently learning concept-specific LoRA adapters and merging them in weight space with cue-based gating to suppress interference.
Dual-Modality Anchor-Guided Filtering for Test-time Prompt Tuning cs.CV · 2026-04-14 · unverdicted · none · ref 41
Dual-modality anchors from text descriptions and test-time image statistics filter views and ensemble predictions to improve test-time prompt tuning, achieving SOTA on 15 datasets.
TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery cs.CV · 2026-02-22 · unverdicted · none · ref 51
TokenTrace watermarks diffusion generations by jointly perturbing prompt embeddings and latent noise, enabling query-driven recovery of multiple independent concepts from one image.
Adapting Foundation Vision-Language Models to Medical Diagnosis via Query-Driven Expert Bridging cs.CV · 2025-05-27 · unverdicted · none · ref 52
MedBridge adapts pretrained VLMs to multi-label medical diagnosis via query tokens for non-destructive alignment and expert routing, reporting 6-15% AUC gains on chest radiograph benchmarks across eight models.
A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models cs.CV · 2026-05-19 · unverdicted · none · ref 33
A target-driven active learning approach for building efficient prompt sets in microscopy VLMs reaches 100% test accuracy with an average of 20 expert-verified images, outperforming random selection.

Conditional prompt learning for vision-language mod- els

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer