Introduces MultiDepth-3k benchmark revealing diverse layer preferences across depth models on ambiguous scenes, with Laplacian Visual Prompting altering outputs for some frozen models and best pair reaching 75.5% ML-SRA.
Visual prompting: Modifying pixel space to adapt pre-trained models
12 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 12representative citing papers
BadBone backdoors backbone models with bi-level optimization to make prompt learning on downstream tasks vulnerable while preserving model utility.
Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.
TAME uses a Mixture-of-Experts prompt bank with input-dependent routing and three unsupervised objectives to adaptively defend CLIP against adversarial attacks at inference time, achieving at least 49.1% robustness gain on 11 datasets.
CAKI generates class-specific prompts from few-shot samples of the same class, stores them in a knowledge bank, and uses query-key matching to inject relevant class knowledge into test instance predictions for improved VLM performance.
Three frameworks adapt foundation models for generalized category discovery under domain shifts via disentanglement and prompt tuning, showing gains on synthetic and real multi-domain data.
Activation prompts on intermediate layers outperform input-level visual prompting and parameter-efficient fine-tuning in accuracy and efficiency across 29 datasets.
BlackVIP adapts foundation models via a Coordinator for input-dependent visual prompts and SPSA-GC for gradient estimation, enabling robust transfer on 19 datasets with low memory use and a link to randomized smoothing robustness.
SUPT assigns prompt features at the subgraph level to enable universal prompt tuning for any GNN pre-training strategy and outperforms fine-tuning in 42 of 45 full-shot and 41 of 45 few-shot graph experiments with average gains of 2.5% and 6.6%.
SimpleST is a model-agnostic prompt tuning framework that lets pre-trained spatio-temporal GNNs adapt to distribution shifts in traffic data while keeping all original model weights fixed.
A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.
citing papers explorer
-
One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models
Introduces MultiDepth-3k benchmark revealing diverse layer preferences across depth models on ambiguous scenes, with Laplacian Visual Prompting altering outputs for some frozen models and best pair reaching 75.5% ML-SRA.
-
BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning
BadBone backdoors backbone models with bi-level optimization to make prompt learning on downstream tasks vulnerable while preserving model utility.
-
Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection
Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
-
Latent Diffusion Pretraining for Crystal Property Prediction
CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.
-
TAME: Test-Time Adversarial Prompt Tuning via Mixture-of-Experts for Vision-Language Models
TAME uses a Mixture-of-Experts prompt bank with input-dependent routing and three unsupervised objectives to adaptively defend CLIP against adversarial attacks at inference time, achieving at least 49.1% robustness gain on 11 datasets.
-
Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model
CAKI generates class-specific prompts from few-shot samples of the same class, stores them in a knowledge bank, and uses query-key matching to inject relevant class knowledge into test instance predictions for improved VLM performance.
-
Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models
Three frameworks adapt foundation models for generalized category discovery under domain shifts via disentanglement and prompt tuning, showing gains on synthetic and real multi-domain data.
-
Visual prompting reimagined: The power of the Activation Prompts
Activation prompts on intermediate layers outperform input-level visual prompting and parameter-efficient fine-tuning in accuracy and efficiency across 29 datasets.
-
Efficient Prompt Learning for Traffic Forecasting
SimpleST is a model-agnostic prompt tuning framework that lets pre-trained spatio-temporal GNNs adapt to distribution shifts in traffic data while keeping all original model weights fixed.