InterCoG performs fine-grained image editing by first reasoning about object positions in text with spatial details, then grounding targets visually via bounding boxes and masks, supported by new training modules and a 45K-sample dataset.
Make the smiling person look older
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
InterCoG: Towards Spatially Precise Image Editing with Interleaved Chain-of-Grounding Reasoning
InterCoG performs fine-grained image editing by first reasoning about object positions in text with spatial details, then grounding targets visually via bounding boxes and masks, supported by new training modules and a 45K-sample dataset.