Learning Text-image Joint Embedding for Effi- cient Cross-modal Retrieval with Deep Feature Engineer- ing.ACM Transactions on Information Systems, 40(4):74:1– 74:27

Zhongwei Xie, Ling Liu, Yanzhao Wu, Luo Zhong, Lin Li · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

SIMMER: Cross-Modal Food Image--Recipe Retrieval via MLLM-Based Embedding

cs.CV · 2026-04-17 · unverdicted · novelty 6.0

SIMMER uses a single multimodal LLM (VLM2Vec) with custom prompts and partial-recipe augmentation to embed food images and recipes, achieving new state-of-the-art retrieval accuracy on Recipe1M.

citing papers explorer

Showing 1 of 1 citing paper.

SIMMER: Cross-Modal Food Image--Recipe Retrieval via MLLM-Based Embedding cs.CV · 2026-04-17 · unverdicted · none · ref 55
SIMMER uses a single multimodal LLM (VLM2Vec) with custom prompts and partial-recipe augmentation to embed food images and recipes, achieving new state-of-the-art retrieval accuracy on Recipe1M.

Learning Text-image Joint Embedding for Effi- cient Cross-modal Retrieval with Deep Feature Engineer- ing.ACM Transactions on Information Systems, 40(4):74:1– 74:27

fields

years

verdicts

representative citing papers

citing papers explorer