PercepT discovers perceptual topic clusters from vision-language data via unsupervised training and maps images to them with attention pooling, reporting silhouette 0.97 and AUC 0.94 on ArtELingo.
arXiv preprint arXiv:2211.10780 , year =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Beyond Semantics: Modeling Factual and Affective Perceptual Experiences from Vision-Language Data
PercepT discovers perceptual topic clusters from vision-language data via unsupervised training and maps images to them with attention pooling, reporting silhouette 0.97 and AUC 0.94 on ArtELingo.