Finecaption: Compositional image captioning focusing on wherever you want at any granularity

Hang Hua, Qing Liu, Lingzhi Zhang, Jing Shi, Soo Ye Kim, Zhifei Zhang, Yilin Wang, Jianming Zhang, Zhe Lin, Jiebo Luo · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation

cs.CV · 2026-04-13 · unverdicted · novelty 3.0

This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challenges like instance permanence and consistent interaction.

citing papers explorer

Showing 1 of 1 citing paper.

LMMs Meet Object-Centric Vision: Understanding, Segmentation, Editing and Generation cs.CV · 2026-04-13 · unverdicted · none · ref 63
This review organizes literature on large multimodal models and object-centric vision into four themes—understanding, referring segmentation, editing, and generation—while summarizing paradigms, strategies, and challenges like instance permanence and consistent interaction.

Finecaption: Compositional image captioning focusing on wherever you want at any granularity

fields

years

verdicts

representative citing papers

citing papers explorer