Visual prompting: Modifying pixel space to adapt pre-trained models

Hyojin Bahng, Ali Jahanian, Swami Sankaranarayanan, Phillip Isola · 2022 · arXiv 2203.17274

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 3 baseline 1

representative citing papers

One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models

cs.CV · 2026-06-28 · unverdicted · novelty 7.0

Introduces MultiDepth-3k benchmark revealing diverse layer preferences across depth models on ambiguous scenes, with Laplacian Visual Prompting altering outputs for some frozen models and best pair reaching 75.5% ML-SRA.

BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning

cs.CR · 2026-05-29 · unverdicted · novelty 7.0

BadBone backdoors backbone models with bi-level optimization to make prompt learning on downstream tasks vulnerable while preserving model utility.

Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection

cs.CV · 2026-05-11 · unverdicted · novelty 7.0

Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.

Latent Diffusion Pretraining for Crystal Property Prediction

cs.LG · 2026-05-30 · unverdicted · novelty 6.0

CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.

TAME: Test-Time Adversarial Prompt Tuning via Mixture-of-Experts for Vision-Language Models

cs.CV · 2026-05-17 · unverdicted · novelty 6.0

TAME uses a Mixture-of-Experts prompt bank with input-dependent routing and three unsupervised objectives to adaptively defend CLIP against adversarial attacks at inference time, achieving at least 49.1% robustness gain on 11 datasets.

Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model

cs.CV · 2026-05-07 · unverdicted · novelty 6.0

CAKI generates class-specific prompts from few-shot samples of the same class, stores them in a knowledge bank, and uses query-key matching to inject relevant class knowledge into test instance predictions for improved VLM performance.

Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models

cs.CV · 2026-04-29 · unverdicted · novelty 6.0

Three frameworks adapt foundation models for generalized category discovery under domain shifts via disentanglement and prompt tuning, showing gains on synthetic and real multi-domain data.

Visual prompting reimagined: The power of the Activation Prompts

cs.CV · 2026-04-07 · unverdicted · novelty 6.0

Activation prompts on intermediate layers outperform input-level visual prompting and parameter-efficient fine-tuning in accuracy and efficiency across 29 datasets.

Robust Adaptation of Foundation Models with Black-Box Visual Prompting

cs.CV · 2024-07-04 · unverdicted · novelty 6.0

BlackVIP adapts foundation models via a Coordinator for input-dependent visual prompts and SPSA-GC for gradient estimation, enabling robust transfer on 19 datasets with low memory use and a link to randomized smoothing robustness.

Subgraph-level Universal Prompt Tuning

cs.LG · 2024-02-16 · unverdicted · novelty 6.0

SUPT assigns prompt features at the subgraph level to enable universal prompt tuning for any GNN pre-training strategy and outperforms fine-tuning in 42 of 45 full-shot and 41 of 45 few-shot graph experiments with average gains of 2.5% and 6.6%.

Efficient Prompt Learning for Traffic Forecasting

cs.LG · 2026-05-08 · unverdicted · novelty 5.0

SimpleST is a model-agnostic prompt tuning framework that lets pre-trained spatio-temporal GNNs adapt to distribution shifts in traffic data while keeping all original model weights fixed.

A Systematic Survey of Prompt Engineering in Large Language Models: Techniques and Applications

cs.AI · 2024-02-05 · unverdicted · novelty 3.0

A systematic survey categorizes prompt engineering methods for LLMs and VLMs by application area, summarizing methodologies, applications, models, datasets, strengths, and limitations for each technique along with a taxonomy and summary table.

citing papers explorer

Showing 9 of 9 citing papers after filters.

One Scene, Two Depths: Probing Geometric Ambiguity in Monocular Foundation Models cs.CV · 2026-06-28 · unverdicted · none · ref 1
Introduces MultiDepth-3k benchmark revealing diverse layer preferences across depth models on ambiguous scenes, with Laplacian Visual Prompting altering outputs for some frozen models and best pair reaching 75.5% ML-SRA.
BadBone: Backdoor Attacks Against Backbone Models in Visual Prompt Learning cs.CR · 2026-05-29 · unverdicted · none · ref 2
BadBone backdoors backbone models with bi-level optimization to make prompt learning on downstream tasks vulnerable while preserving model utility.
Thermal-Det: Language-Guided Cross-Modal Distillation for Open-Vocabulary Thermal Object Detection cs.CV · 2026-05-11 · unverdicted · none · ref 1
Thermal-Det is the first LLM-supervised open-vocabulary thermal object detector, created via synthetic data conversion from GroundingCap-1M and RGB-to-thermal distillation, yielding 2-4% AP gains on benchmarks.
Latent Diffusion Pretraining for Crystal Property Prediction cs.LG · 2026-05-30 · unverdicted · none · ref 44
CrysLDNet combines VAE and latent diffusion pretraining on unlabeled crystals to improve graph encoder performance on property prediction by about 4-5% on JARVIS and MP datasets.
TAME: Test-Time Adversarial Prompt Tuning via Mixture-of-Experts for Vision-Language Models cs.CV · 2026-05-17 · unverdicted · none · ref 66
TAME uses a Mixture-of-Experts prompt bank with input-dependent routing and three unsupervised objectives to adaptively defend CLIP against adversarial attacks at inference time, achieving at least 49.1% robustness gain on 11 datasets.
Plug-and-play Class-aware Knowledge Injection for Prompt Learning with Visual-Language Model cs.CV · 2026-05-07 · unverdicted · none · ref 11
CAKI generates class-specific prompts from few-shot samples of the same class, stores them in a knowledge bank, and uses query-key matching to inject relevant class knowledge into test instance predictions for improved VLM performance.
Generalized Category Discovery under Domain Shifts: From Vision to Vision-Language Models cs.CV · 2026-04-29 · unverdicted · none · ref 69
Three frameworks adapt foundation models for generalized category discovery under domain shifts via disentanglement and prompt tuning, showing gains on synthetic and real multi-domain data.
Visual prompting reimagined: The power of the Activation Prompts cs.CV · 2026-04-07 · unverdicted · none · ref 50
Activation prompts on intermediate layers outperform input-level visual prompting and parameter-efficient fine-tuning in accuracy and efficiency across 29 datasets.
Efficient Prompt Learning for Traffic Forecasting cs.LG · 2026-05-08 · unverdicted · none · ref 2
SimpleST is a model-agnostic prompt tuning framework that lets pre-trained spatio-temporal GNNs adapt to distribution shifts in traffic data while keeping all original model weights fixed.

Visual prompting: Modifying pixel space to adapt pre-trained models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer