BRITE benchmark reveals that leading T2V models handle static object composition well but degrade sharply on object-action binding and audio-visual synchronization for implausible prompts.
Etva: Evaluation of text-to-video alignment via fine-grained question generation and answering
3 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 3representative citing papers
HuM-Eval evaluates human motion videos with a coarse-to-fine approach using VLM global checks plus 2D pose and 3D motion analysis, reaching 58.2% average correlation with human judgments and introducing a 1000-prompt benchmark.
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.
citing papers explorer
-
BRITE: A Benchmark for Reliable and Interpretable T2V Evaluation on Implausible Scenarios
BRITE benchmark reveals that leading T2V models handle static object composition well but degrade sharply on object-action binding and audio-visual synchronization for implausible prompts.
-
HuM-Eval: A Coarse-to-Fine Framework for Human-Centric Video Evaluation
HuM-Eval evaluates human motion videos with a coarse-to-fine approach using VLM global checks plus 2D pose and 3D motion analysis, reaching 58.2% average correlation with human judgments and introducing a 1000-prompt benchmark.
-
Advances and Challenges in Foundation Agents: From Brain-Inspired Intelligence to Evolutionary, Collaborative, and Safe Systems
This survey frames foundation agents using brain-inspired modular architectures and reviews challenges in evolution, collaboration, and safety.