FoCo learns composition for zero-shot CIR via text-anchored visual aggregation and context-conditioned semantic completion trained jointly with cross-instance contrastive loss, reporting SOTA on four benchmarks.
arXiv preprint arXiv:2505.19406 (2025)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Learning to Compose: Revisiting Proxy Task Design for Zero-Shot Composed Image Retrieval
FoCo learns composition for zero-shot CIR via text-anchored visual aggregation and context-conditioned semantic completion trained jointly with cross-instance contrastive loss, reporting SOTA on four benchmarks.