A systematic analysis of evaluation practices in multimedia event extraction reveals that minor methodological choices cause large performance swings and overestimation of cross-modal grounding ability.
Proceedings of the 25th ACM International Conference on Multimedia , pages =
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
years
2026 4verdicts
UNVERDICTED 4representative citing papers
NEST is a new benchmark dataset for narrative event structures in long videos, with baselines reporting ETD below 8%, EL under 6%, EAE below 11%, and ERE at 35-44% F1.
EVENT5Ws is a new large-scale, manually verified open-domain event extraction dataset that benchmarks LLMs and demonstrates cross-context generalization.
MODEE is a multimodal system that integrates graphs with LLM embeddings to outperform prior open-domain event extraction methods on large datasets.
citing papers explorer
-
NEST: Narrative Event Structures in Time for Long Video Understanding
NEST is a new benchmark dataset for narrative event structures in long videos, with baselines reporting ETD below 8%, EL under 6%, EAE below 11%, and ERE at 35-44% F1.