FoCo learns composition for zero-shot CIR via text-anchored visual aggregation and context-conditioned semantic completion trained jointly with cross-instance contrastive loss, reporting SOTA on four benchmarks.
Title resolution pending
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 4years
2026 4verdicts
UNVERDICTED 4representative citing papers
ZeroSight supplies a video-derived dataset and evaluation protocol for genuine zero-shot composed image retrieval plus the SC4CIR consistency method, demonstrating that prior benchmarks inflate reported performance across 27 tested approaches.
Composed image retrieval is reframed as calibrated intent resolution under uncertainty via conformal prediction sets and expected-information-gain clarification, with new AmbiCIR benchmark showing matched single-turn SOTA and faster multi-turn resolution with valid coverage.
CLARA achieves turn-valid conformal coverage in ambiguous composed image retrieval by replacing text clarification with user selection among snapped real-image prototypes and reweighting calibration accordingly.
citing papers explorer
-
Learning to Compose: Revisiting Proxy Task Design for Zero-Shot Composed Image Retrieval
FoCo learns composition for zero-shot CIR via text-anchored visual aggregation and context-conditioned semantic completion trained jointly with cross-instance contrastive loss, reporting SOTA on four benchmarks.
-
Never Seen Before: Benchmarking Genuine Zero-Shot Composed Image Retrieval with Consistent Video-Sourced Datasets
ZeroSight supplies a video-derived dataset and evaluation protocol for genuine zero-shot composed image retrieval plus the SC4CIR consistency method, demonstrating that prior benchmarks inflate reported performance across 27 tested approaches.
-
Resolving Ambiguity in Composed Image Retrieval via Calibrated Interaction
Composed image retrieval is reframed as calibrated intent resolution under uncertainty via conformal prediction sets and expected-information-gain clarification, with new AmbiCIR benchmark showing matched single-turn SOTA and faster multi-turn resolution with valid coverage.
-
Show, Don't Ask: Generative Visual Disambiguation for Composed Image Retrieval with Turn-Valid Coverage
CLARA achieves turn-valid conformal coverage in ambiguous composed image retrieval by replacing text clarification with user selection among snapped real-image prototypes and reweighting calibration accordingly.