CineMEC performs multimodal entity coreference by clustering visual entities and aligning them with text role mentions to boost captioning and grounding performance on an extended VidSitu dataset.
Video Action Transformer Network
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition
CineMEC performs multimodal entity coreference by clustering visual entities and aligning them with text role mentions to boost captioning and grounding performance on an extended VidSitu dataset.