Moviellm: Enhancing long video understanding with ai-generated movies

MovieLLM: Enhancing Long Video Understanding with AI-Generated Movies , author= · 2024 · arXiv 2403.01422

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 3 baseline 1

citation-polarity summary

background 2 baseline 1 unclear 1

representative citing papers

MLVU: Benchmarking Multi-task Long Video Understanding

cs.CV · 2024-06-06 · conditional · novelty 7.0

MLVU is a new benchmark for long video understanding that uses extended videos across diverse genres and multi-task evaluations, revealing that current MLLMs struggle significantly and degrade sharply with longer durations.

HPP: Hierarchical Programmatic Probing for Long Video Understanding by Decoupling Perception and Reasoning

cs.CV · 2026-06-19 · unverdicted · novelty 6.0

HPP decouples perception from reasoning in long-video VLMs by having an LLM run iterative programmatic probes on hierarchically segmented video, reporting gains on LongVideoBench, EgoSchema, VideoMME, and MLVU.

Towards Effective Long-Video Event Prediction via Multi-Level Event Semantics Mining

cs.CV · 2026-05-29 · unverdicted · novelty 5.0

VISTA mines multi-level event semantics via visual prompts, knowledge-enhanced retrieval, and proposal integration to improve long-video event prediction over existing LVLMs.

VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding

cs.CV · 2025-01-22 · unverdicted · novelty 4.0

VideoLLaMA3 uses a vision-centric training paradigm and token-reduction design to reach competitive results on image and video benchmarks.

Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey

cs.CV · 2026-04-13

citing papers explorer

Showing 5 of 5 citing papers.

MLVU: Benchmarking Multi-task Long Video Understanding cs.CV · 2024-06-06 · conditional · none · ref 42
MLVU is a new benchmark for long video understanding that uses extended videos across diverse genres and multi-task evaluations, revealing that current MLLMs struggle significantly and degrade sharply with longer durations.
HPP: Hierarchical Programmatic Probing for Long Video Understanding by Decoupling Perception and Reasoning cs.CV · 2026-06-19 · unverdicted · none · ref 152
HPP decouples perception from reasoning in long-video VLMs by having an LLM run iterative programmatic probes on hierarchically segmented video, reporting gains on LongVideoBench, EgoSchema, VideoMME, and MLVU.
Towards Effective Long-Video Event Prediction via Multi-Level Event Semantics Mining cs.CV · 2026-05-29 · unverdicted · none · ref 29
VISTA mines multi-level event semantics via visual prompts, knowledge-enhanced retrieval, and proposal integration to improve long-video event prediction over existing LVLMs.
VideoLLaMA 3: Frontier Multimodal Foundation Models for Image and Video Understanding cs.CV · 2025-01-22 · unverdicted · none · ref 149
VideoLLaMA3 uses a vision-centric training paradigm and token-reduction design to reach competitive results on image and video benchmarks.
Multimodal Large Language Model-Enabled Video Translation: A Role-Oriented Survey cs.CV · 2026-04-13 · unreviewed · ref 29

Moviellm: Enhancing long video understanding with ai-generated movies

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer