CPT creates cluster-invariant spaces from pre-trained VLM semantics and applies neural collapse losses to boost long-tail performance and unseen-class generalization in prompt tuning.
Imagenet: A large-scale hierarchical image database
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
dataset 1polarities
use dataset 1representative citing papers
Vision foundation models from OpenAI and Meta are non-robust to nine categories of common perturbations, with new metrics linking robustness scores to downstream performance drops and a fine-tuning method proposed to improve stability without losing utility.
The model uses dense visuo-tactile feature interactions and material-diversity pairing on expanded datasets to generate tactile saliency maps for material segmentation, outperforming prior global-alignment methods.
citing papers explorer
-
Cluster-Aware Neural Collapse Prompt Tuning for Long-Tailed Generalization of Vision-Language Models
CPT creates cluster-invariant spaces from pre-trained VLM semantics and applies neural collapse losses to boost long-tail performance and unseen-class generalization in prompt tuning.
-
Robustness of Vision Foundation Models to Common Perturbations
Vision foundation models from OpenAI and Meta are non-robust to nine categories of common perturbations, with new metrics linking robustness scores to downstream performance drops and a fine-tuning method proposed to improve stability without losing utility.
-
Seeing Through Touch: Tactile-Driven Visual Localization of Material Regions
The model uses dense visuo-tactile feature interactions and material-diversity pairing on expanded datasets to generate tactile saliency maps for material segmentation, outperforming prior global-alignment methods.