Near OOD Detection for Vision-Language Prompt Learning with Contrastive Logit Score
Pith reviewed 2026-05-24 01:15 UTC · model grok-4.3
The pith
A contrastive logit score added post-hoc improves near-OOD detection for vision-language prompt models without retraining or architecture changes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the Contrastive Logit Score serves as an effective post-hoc scoring function for near out-of-distribution detection in vision-language prompt learning models, achieving substantial AUROC improvements without requiring model modifications or additional training.
What carries the argument
The Contrastive Logit Score (CLS), a scoring function that compares logits in a contrastive manner to produce a distribution-shift signal for near-OOD samples.
If this is right
- Prompt learning methods can gain better near-OOD detection simply by switching to the CLS scoring function at inference time.
- No retraining or extra data is needed to obtain the reported AUROC gains.
- The approach applies with minimal added computation across multiple prompt techniques and near-OOD benchmarks.
- CLS can be combined with existing prompt-learning pipelines without altering their training procedures.
Where Pith is reading between the lines
- The same logit-contrast idea could be tested on other vision-language tasks such as retrieval or captioning to check whether the near-OOD signal transfers.
- CLS might serve as a lightweight baseline when developing hybrid OOD detectors that combine prompt-based and non-prompt models.
- Examining performance on datasets with controlled degrees of shift could clarify the boundary between near and far OOD for these scoring functions.
Load-bearing premise
That comparing logits contrastively produces a reliable near-distribution-shift signal that works across prompt-learning methods and near-OOD datasets without method-specific tuning.
What would settle it
Applying CLS to a new prompt learning method on a held-out near-OOD dataset and finding no AUROC improvement or a drop would falsify the generalizability claim.
read the original abstract
Prompt learning has emerged as an efficient and effective method for fine-tuning vision-language models such as CLIP. While many studies have explored generalisation abilities of these models in few-shot classification tasks and a few studies have addressed far out-of-distribution (OOD) of the models, their potential for addressing near OOD detection remains underexplored. Existing methods either require training from scratch, need fine-tuning, or are not designed for vision-language prompt learning. To address this, we introduce the Contrastive Logit Score (CLS), a novel post-hoc, plug-and-play scoring function. CLS significantly improves near OOD detection of pre-trained vision-language prompt learning methods without modifying their model architectures or requiring retraining. Our method achieves up to an 11.67% improvement in AUROC for near OOD detection with minimal computational overhead. Extensive evaluations validate the effectiveness, efficiency, and generalisability of our approach. Our code is available at https://github.com/davidmcjung/near-OOD-prompt-learning.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Contrastive Logit Score (CLS), a post-hoc plug-and-play scoring function for near out-of-distribution (OOD) detection in pre-trained vision-language prompt learning methods (e.g., based on CLIP). It claims that CLS improves AUROC by up to 11.67% over existing approaches without architecture changes or retraining, supported by extensive evaluations on effectiveness, efficiency, and generalizability, with code released.
Significance. If the central claim holds, CLS offers a lightweight, training-free enhancement to near-OOD detection for widely used prompt-tuned VLMs, addressing an underexplored area. The public code release supports reproducibility and allows direct verification of the reported gains.
major comments (3)
- [§3] §3 (Method, CLS definition): The contrastive formulation must explicitly state whether any scalar (temperature, margin, or normalization constant) is fixed globally or selected per prompt-learning method; any per-method choice would undermine the 'zero hyperparameter search' and 'plug-and-play' premise for arbitrary pre-trained prompt learners.
- [§4] §4 (Experiments): The reported AUROC gains (up to 11.67%) are load-bearing for the generalization claim; the tables must include an explicit statement or ablation confirming that CLS hyperparameters were held fixed across all tested prompt-learning methods (CoOp, CoCoOp, etc.) and near-OOD pairs with no post-hoc selection, otherwise the tuning-free advantage is not demonstrated.
- [§4.3] §4.3 (Baselines and near-OOD pairs): The comparison set must be expanded or justified to include strong post-hoc OOD baselines designed for VLMs; if only ID-tuned or far-OOD methods are used, the relative improvement cannot be taken as evidence that CLS is superior for the near-OOD regime.
minor comments (2)
- [Abstract] Abstract: The sentence 'achieves up to an 11.67% improvement in AUROC' should name the exact baseline method and dataset pair for immediate clarity.
- [§3] Notation: Define the image and text logit vectors (I and T) and the contrastive operation in a single equation early in §3 to avoid ambiguity when readers compare CLS to standard softmax or energy scores.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will make revisions to improve clarity and strengthen the claims where appropriate.
read point-by-point responses
-
Referee: [§3] §3 (Method, CLS definition): The contrastive formulation must explicitly state whether any scalar (temperature, margin, or normalization constant) is fixed globally or selected per prompt-learning method; any per-method choice would undermine the 'zero hyperparameter search' and 'plug-and-play' premise for arbitrary pre-trained prompt learners.
Authors: In the CLS definition, all scalars including the temperature parameter are fixed globally to the default values inherited from the pre-trained CLIP model (temperature set to 1, with no margin or additional normalization constants introduced). These values are identical for every prompt-learning method and are never selected or tuned per method. We will add an explicit statement in the revised §3 confirming this global fixation to reinforce the zero-hyperparameter and plug-and-play claims. revision: yes
-
Referee: [§4] §4 (Experiments): The reported AUROC gains (up to 11.67%) are load-bearing for the generalization claim; the tables must include an explicit statement or ablation confirming that CLS hyperparameters were held fixed across all tested prompt-learning methods (CoOp, CoCoOp, etc.) and near-OOD pairs with no post-hoc selection, otherwise the tuning-free advantage is not demonstrated.
Authors: CLS hyperparameters were held strictly fixed across all prompt-learning methods (CoOp, CoCoOp, etc.) and all near-OOD pairs, with no post-hoc selection or per-experiment tuning performed. We will insert an explicit statement in the revised §4 and add a short confirmation note (or small ablation if space allows) documenting that the same fixed settings were used throughout. revision: yes
-
Referee: [§4.3] §4.3 (Baselines and near-OOD pairs): The comparison set must be expanded or justified to include strong post-hoc OOD baselines designed for VLMs; if only ID-tuned or far-OOD methods are used, the relative improvement cannot be taken as evidence that CLS is superior for the near-OOD regime.
Authors: The baselines were chosen as the standard post-hoc scoring functions that can be applied directly to VLMs without retraining. We will expand §4.3 with a justification paragraph explaining their relevance to the near-OOD prompt-learning setting and, where feasible within page limits, include one or two additional VLM-oriented post-hoc baselines to strengthen the comparison. revision: partial
Circularity Check
No circularity: CLS introduced as independent post-hoc score with empirical validation
full rationale
The paper defines CLS explicitly as a novel post-hoc, plug-and-play scoring function applied to pre-trained vision-language prompt learning models without retraining or architecture modification. No equations, parameters, or predictions in the provided text reduce by construction to fitted inputs, self-citations, or ansatzes from the same work. The AUROC gains are reported as results from extensive evaluations across methods and datasets, not as quantities forced by the method's own definition. The derivation chain is self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Logit-based scores can separate near-OOD from in-distribution samples
Reference graph
Works this paper leans on
-
[1]
Springer International Publishing, Cham, pp 446–461 Buse A (1982) The likelihood ratio, wald, and lagrange multiplier tests: An expository note. The American Statistician 36(3a):153–157 Charoenphakdee N, Cui Z, Zhang Y, et al (2021) Classification with rejection based on cost- sensitive classification. In: International Confer- ence on Machine Learning, P...
-
[2]
PMLR, pp 8759–8773 Huang R, Geng A, Li Y (2021) On the importance of gradients for detecting distributional shifts in the wild. In: Beygelzimer A, Dauphin Y, Liang P, et al (eds) Advances in Neural Information Processing Systems Jia C, Yang Y, Xia Y, et al (2021) Scaling up visual and vision-language representation learn- ing with noisy text supervision. ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.