Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
Conditional prompt learning for vision-language mod- els
6 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 6verdicts
UNVERDICTED 6roles
background 2polarities
background 2representative citing papers
Gate-and-Merge enables zero-shot compositional personalization of VLMs by independently learning concept-specific LoRA adapters and merging them in weight space with cue-based gating to suppress interference.
Dual-modality anchors from text descriptions and test-time image statistics filter views and ensemble predictions to improve test-time prompt tuning, achieving SOTA on 15 datasets.
TokenTrace watermarks diffusion generations by jointly perturbing prompt embeddings and latent noise, enabling query-driven recovery of multiple independent concepts from one image.
MedBridge adapts pretrained VLMs to multi-label medical diagnosis via query tokens for non-destructive alignment and expert routing, reporting 6-15% AUC gains on chest radiograph benchmarks across eight models.
A target-driven active learning approach for building efficient prompt sets in microscopy VLMs reaches 100% test accuracy with an average of 20 expert-verified images, outperforming random selection.
citing papers explorer
-
Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection
Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
-
Gate-and-Merge: Zero-shot Compositional Personalization of Vision Language Models
Gate-and-Merge enables zero-shot compositional personalization of VLMs by independently learning concept-specific LoRA adapters and merging them in weight space with cue-based gating to suppress interference.
-
Dual-Modality Anchor-Guided Filtering for Test-time Prompt Tuning
Dual-modality anchors from text descriptions and test-time image statistics filter views and ensemble predictions to improve test-time prompt tuning, achieving SOTA on 15 datasets.
-
TokenTrace: Multi-Concept Attribution through Watermarked Token Recovery
TokenTrace watermarks diffusion generations by jointly perturbing prompt embeddings and latent noise, enabling query-driven recovery of multiple independent concepts from one image.
-
Adapting Foundation Vision-Language Models to Medical Diagnosis via Query-Driven Expert Bridging
MedBridge adapts pretrained VLMs to multi-label medical diagnosis via query tokens for non-destructive alignment and expert routing, reporting 6-15% AUC gains on chest radiograph benchmarks across eight models.
-
A Human-in-the-Loop Framework for Efficient Prompt Selection in Microscopy Vision-Language Models
A target-driven active learning approach for building efficient prompt sets in microscopy VLMs reaches 100% test accuracy with an average of 20 expert-verified images, outperforming random selection.