Video Action Transformer Network

Rohit Girdhar, Joao Carreira, Carl Doersch, Andrew Zisserman · 2019

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition

cs.CV · 2026-04-25 · unverdicted · novelty 6.0

CineMEC performs multimodal entity coreference by clustering visual entities and aligning them with text role mentions to boost captioning and grounding performance on an extended VidSitu dataset.

citing papers explorer

Showing 1 of 1 citing paper.

One Identity, Many Roles: Multimodal Entity Coreference for Enhanced Video Situation Recognition cs.CV · 2026-04-25 · unverdicted · none · ref 17
CineMEC performs multimodal entity coreference by clustering visual entities and aligning them with text role mentions to boost captioning and grounding performance on an extended VidSitu dataset.

Video Action Transformer Network

fields

years

verdicts

representative citing papers

citing papers explorer