Neutral-Reference Prompting for Vision-Language Models
Pith reviewed 2026-05-20 19:10 UTC · model grok-4.3
The pith
Neutral-Reference Prompting corrects pretraining biases in vision-language models by flipping predictions between confusable classes using neutral prompts and reference images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
NeRP is a plug-and-play prompting correction strategy that measures class-wise prior preferences from neutral text prompts and reference images to capture pretraining-induced bias geometry, combines them with sample likelihood to produce surrogate scores, and applies targeted local flips between easily confusable class pairs when the prior strongly favors the current prediction but observed evidence is insufficient, thereby correcting mispredictions on unseen classes without degrading known-class accuracy.
What carries the argument
Neutral-Reference Prompting (NeRP), which measures class-wise prior preferences along the pre-trained inter-class geometry using neutral text prompts and reference images, then combines them with sample likelihood for surrogate scoring and performs local flips on confusable pairs to override prior-dominated errors.
If this is right
- Accuracy on unseen classes rises substantially across 15 few-shot and cross-domain benchmarks.
- Known-class prediction performance remains unchanged after the correction.
- The method requires no parameter updates and works as a plug-and-play addition to existing VLMs.
- The same correction applies across multiple different model backbones without retraining.
Where Pith is reading between the lines
- The structured nature of the bias suggests that similar neutral-reference probing could diagnose and correct asymmetries in other multimodal models beyond vision-language tasks.
- Extending the approach to automatically discover confusable pairs from data rather than predefined classes would allow application in open-vocabulary settings without manual class lists.
- The reliance on geometry along inter-class distances points to a broader opportunity for analyzing pretraining biases as measurable geometric properties rather than isolated errors.
Load-bearing premise
The method assumes that prior preferences measured from neutral prompts and reference images accurately reflect the pretraining bias geometry and that flipping between confusable pairs will fix errors without creating new mistakes on either known or unseen classes.
What would settle it
Running NeRP on a new set of few-shot or cross-domain benchmarks and finding either no gain on unseen classes or a drop in known-class accuracy would falsify the claim that the correction reliably improves discrimination without introducing trade-offs.
Figures
read the original abstract
Efficient transfer learning of vision-language models (VLMs) commonly suffers from a Base-New Trade-off (BNT): improving performance on unseen (new) classes often degrades accuracy on known (base) classes. Addressing how to boost recognition of unseen classes without sacrificing known-class performance remains a central challenge. Existing work often simplistically attributes the BNT to overfitting on known classes. We observe an interesting phenomenon: VLMs frequently exhibit asymmetric confusion on certain downstream data, i.e., samples of class A are systematically mispredicted as class B, while the reverse confusion (B to A) rarely occurs. For known classes, this kind of bias can be mitigated by tuning using a cross-entropy loss, but for unseen classes, such pretraining-induced bias persists and harms generalization. Motivated by this, we propose NeRP, a plug-and-play prompting correction strategy that improves discrimination on unseen classes without modifying model parameters. NeRP leverages neutral text prompts and reference images to measure class-wise prior preferences along the pre-trained inter-class geometry, and combines them with the sample likelihood to obtain the model's surrogate score. If, for a given sample, the prior strongly favors the current prediction while the observed evidence is clearly insufficient, we perform a local flip between easily confusable class pairs, thereby correcting prior-dominated mispredictions. Extensive experiments across multiple backbones and 15 few-shot and cross-domain benchmarks show that NeRP substantially improves accuracy on unseen classes while preserving known-class prediction performance.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes NeRP, a plug-and-play prompting correction for vision-language models addressing the Base-New Trade-off. It identifies asymmetric confusion patterns in VLMs, measures class-wise prior preferences using neutral text prompts and reference images, combines these with sample likelihood into a surrogate score, and applies local flips between easily confusable pairs when the prior dominates but evidence is insufficient. This is claimed to boost unseen-class accuracy while preserving known-class performance, with results reported across multiple backbones and 15 few-shot and cross-domain benchmarks.
Significance. If the mechanism holds, NeRP offers a practical, tuning-free correction for pretraining biases in VLM transfer, which is a persistent issue. The extensive benchmark coverage and focus on asymmetric confusion as a distinct phenomenon from simple overfitting are strengths. The approach could influence prompting strategies if the flip rule proves robust without introducing new errors.
major comments (3)
- [§3 (Method)] §3 (Method): The quantitative criterion for 'clearly insufficient' evidence and the exact formula for the surrogate score (prior combined with likelihood) are not provided with explicit thresholds or equations; this is load-bearing because the decision to flip depends on it, and without it the claim that flips correct mispredictions without new base-class errors cannot be verified or reproduced.
- [§4 (Experiments, cross-domain results)] §4 (Experiments, cross-domain results): No ablation or analysis shows that confusable pairs identified from neutral prompts remain valid when domain shift alters inter-class geometry; this directly risks violating the preservation of known-class performance, as a flip tuned on source-like priors could increase base-class error rates.
- [Table of results (likely Table 2 or 3)] Table of results (likely Table 2 or 3): The manuscript reports broad gains but provides no error bars, standard deviations, or statistical tests across the 15 benchmarks; without these, the 'substantially improves' claim on unseen classes while 'preserving' base performance lacks the rigor needed to support the central no-trade-off assertion.
minor comments (2)
- [Abstract] Abstract: The phrase '15 few-shot and cross-domain benchmarks' is used without naming the specific datasets or providing a high-level breakdown; a short table or list in the abstract or introduction would improve readability.
- [Notation] Notation: 'Surrogate score' is referenced without a numbered equation defining its computation from prior and likelihood; adding Eq. (X) would clarify the combination step.
Simulated Author's Rebuttal
We thank the referee for the constructive review and positive assessment of NeRP's potential impact. We address each major comment point by point below, providing clarifications and committing to targeted revisions that strengthen reproducibility and rigor without altering the core claims or results.
read point-by-point responses
-
Referee: [§3 (Method)] §3 (Method): The quantitative criterion for 'clearly insufficient' evidence and the exact formula for the surrogate score (prior combined with likelihood) are not provided with explicit thresholds or equations; this is load-bearing because the decision to flip depends on it, and without it the claim that flips correct mispredictions without new base-class errors cannot be verified or reproduced.
Authors: We agree that explicit equations and thresholds are necessary for full reproducibility and verification of the no-new-errors claim. While §3 describes the surrogate score as the combination of neutral-prompt priors with sample likelihood and the flip condition as occurring when the prior strongly dominates but evidence is insufficient, the precise formulation was presented in prose. In the revision we will add the explicit equations for the surrogate score and the quantitative thresholds used for the 'clearly insufficient' criterion, along with pseudocode for the local flip decision. This will enable direct verification that the mechanism corrects mispredictions on confusable pairs without introducing base-class errors. revision: yes
-
Referee: [§4 (Experiments, cross-domain results)] §4 (Experiments, cross-domain results): No ablation or analysis shows that confusable pairs identified from neutral prompts remain valid when domain shift alters inter-class geometry; this directly risks violating the preservation of known-class performance, as a flip tuned on source-like priors could increase base-class error rates.
Authors: We acknowledge the value of an explicit ablation on pair stability under domain shift. Our cross-domain benchmark results already demonstrate that base-class accuracy is preserved after applying NeRP, which provides indirect evidence that the neutral-prompt-derived confusable pairs remain effective. To directly address the concern, we will add a targeted ablation in the revised §4 that examines how the identified pairs and flip decisions behave across domain shifts, confirming that no increase in base-class error occurs. revision: yes
-
Referee: [Table of results (likely Table 2 or 3)] Table of results (likely Table 2 or 3): The manuscript reports broad gains but provides no error bars, standard deviations, or statistical tests across the 15 benchmarks; without these, the 'substantially improves' claim on unseen classes while 'preserving' base performance lacks the rigor needed to support the central no-trade-off assertion.
Authors: We agree that variability measures and statistical tests would increase the rigor of the central no-trade-off claim. In the revised manuscript we will augment the result tables with standard deviations computed over multiple runs and include statistical significance tests (e.g., paired t-tests) for the reported gains on unseen classes and the preservation of base-class performance across the 15 benchmarks. revision: yes
Circularity Check
No significant circularity: NeRP is a heuristic correction derived from external measurements
full rationale
The paper's derivation begins from an empirical observation of asymmetric confusion on downstream data and defines NeRP as a plug-and-play rule that measures class-wise prior preferences from neutral text prompts plus reference images (independent of target labels), forms a surrogate score by combining those priors with sample likelihood, and applies a local flip only when the prior dominates while evidence is insufficient. No equation reduces the claimed accuracy gain on new classes to a fitted parameter on the evaluation data itself, nor does any load-bearing step collapse to a self-citation or ansatz imported from the authors' prior work. The method remains a post-hoc correction whose correctness is tested on held-out benchmarks rather than being true by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Asymmetric confusion observed on downstream data reflects a persistent pretraining-induced bias that can be measured via neutral prompts.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
NeRP leverages neutral text prompts and reference images to measure class-wise prior preferences along the pre-trained inter-class geometry, and combines them with the sample likelihood to obtain the model's surrogate score. If, for a given sample, the prior strongly favors the current prediction while the observed evidence is clearly insufficient, we perform a local flip between easily confusable class pairs
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We observe an interesting phenomenon: VLMs frequently exhibit asymmetric confusion on certain downstream data
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Clip the bias: How useful is balancing data in multimodal learning?arXiv preprint arXiv:2403.04547,
Alabdulmohsin, I., Wang, X., Steiner, A., Goyal, P., D’Amour, A., and Zhai, X. Clip the bias: How useful is balancing data in multimodal learning?arXiv preprint arXiv:2403.04547,
-
[2]
Food-101– mining discriminative components with random forests
Bossard, L., Guillaumin, M., and Van Gool, L. Food-101– mining discriminative components with random forests. InComputer vision–ECCV 2014: 13th European con- ference, zurich, Switzerland, September 6-12, 2014, pro- ceedings, part VI 13, pp. 446–461. Springer,
work page 2014
-
[3]
Gated Relational Alignment via Confidence-based Distillation for Efficient VLMs
Chen, Y ., Habibian, A., Benini, L., and Li, Y . Gated re- lational alignment via confidence-based distillation for efficient vlms.arXiv preprint arXiv:2601.22709,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Fei-Fei, L., Fergus, R., and Perona, P. Learning generative visual models from few training examples: An incremen- tal bayesian approach tested on 101 object categories. In 2004 conference on computer vision and pattern recogni- tion workshop, pp. 178–178. IEEE,
work page 2004
-
[5]
Ghate, K., Slaughter, I., Wilson, K., Diab, M., and Caliskan, A. Intrinsic bias is predicted by pretraining data and cor- relates with downstream performance in vision-language encoders.arXiv preprint arXiv:2502.07957,
-
[6]
Towards a unified view of parameter-efficient transfer learning.arXiv preprint arXiv:2110.04366,
He, J., Zhou, C., Ma, X., Berg-Kirkpatrick, T., and Neubig, G. Towards a unified view of parameter-efficient transfer learning.arXiv preprint arXiv:2110.04366,
-
[7]
Li, Y ., Liang, F., Zhao, L., Cui, Y ., Ouyang, W., Shao, J., Yu, F., and Yan, J. Supervision exists everywhere: A data efficient contrastive language-image pre-training paradigm.arXiv preprint arXiv:2110.05208,
-
[8]
Chordedit: One-step low-energy transport for image edit- ing.arXiv preprint arXiv:2602.19083,
Lu, L., Chen, X., Guo, M., Li, S., Wang, J., and Shi, Y . Chordedit: One-step low-energy transport for image edit- ing.arXiv preprint arXiv:2602.19083,
-
[9]
Fine-Grained Visual Classification of Aircraft
Maji, S., Rahtu, E., Kannala, J., Blaschko, M., and Vedaldi, A. Fine-grained visual classification of aircraft.arXiv preprint arXiv:1306.5151,
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
10 Neutral-Reference Prompting for Vision–Language Models Sahili, Z. A., Patras, I., and Purver, M. A comprehensive social bias audit of contrastive vision language models. arXiv preprint arXiv:2501.13223,
-
[11]
MMErroR: A Benchmark for Erroneous Reasoning in Vision-Language Models
Shi, Y ., Xie, Y ., Guo, M., Lu, L., Huang, M., Wang, J., Zhu, Z., Xu, B., and Huang, Z. Mmerror: A benchmark for erroneous reasoning in vision-language models.arXiv preprint arXiv:2601.03331,
work page internal anchor Pith review Pith/arXiv arXiv
-
[12]
UCF101: A Dataset of 101 Human Actions Classes From Videos in The Wild
Soomro, K., Zamir, A., and Shah, M. Ucf101: A dataset of 101 human actions classes from videos in the wild.ArXiv, abs/1212.0402,
work page internal anchor Pith review Pith/arXiv arXiv
-
[13]
Large Vision-Language Models Get Lost in Attention
Xi, G., Tian, Y ., Yang, M., Yi, H., Lin, L., Hao, X., Wang, K., and Wang, W. Large vision-language models get lost in attention.arXiv preprint arXiv:2605.05668,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Yang, A., Yu, B., Li, C., Liu, D., Huang, F., Huang, H., Jiang, J., Tu, J., Zhang, J., Zhou, J., et al. Qwen2.5-1m technical report.arXiv preprint arXiv:2501.15383,
work page internal anchor Pith review Pith/arXiv arXiv
-
[15]
Filip: Fine-grained interactive language-image pre-training.arXiv preprint arXiv:2111.07783,
Yao, L., Huang, R., Hou, L., Lu, G., Niu, M., Xu, H., Liang, X., Li, Z., Jiang, X., and Xu, C. Filip: Fine-grained interactive language-image pre-training.arXiv preprint arXiv:2111.07783,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.