YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
Seeing the Ar- row of Time in Large Multimodal Models
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 4verdicts
UNVERDICTED 4roles
dataset 1polarities
use dataset 1representative citing papers
Text-only contrastive fine-tuning of an MLLM with hard negatives produces embeddings that handle temporal, negation, and multimodal nuances in video retrieval and achieves SOTA performance.
Self-supervised models learn to perceive and manipulate the flow of time in videos, supporting speed detection, large-scale slow-motion data curation, and temporally controllable video synthesis.
OmniJigsaw is a self-supervised proxy task that reconstructs shuffled audio-visual clips via joint integration, sample-level selection, and clip-level masking strategies, yielding gains on 15 video, audio, and reasoning benchmarks.
citing papers explorer
-
YoCausal: How Far is Video Generation from World Model? A Causality Perspective
YoCausal benchmark shows video diffusion models detect the arrow of time but lack genuine causal understanding relative to humans.
-
Seeing Fast and Slow: Learning the Flow of Time in Videos
Self-supervised models learn to perceive and manipulate the flow of time in videos, supporting speed detection, large-scale slow-motion data curation, and temporally controllable video synthesis.
-
OmniJigsaw: Enhancing Omni-Modal Reasoning via Modality-Orchestrated Reordering
OmniJigsaw is a self-supervised proxy task that reconstructs shuffled audio-visual clips via joint integration, sample-level selection, and clip-level masking strategies, yielding gains on 15 video, audio, and reasoning benchmarks.