SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.
Koala-36m: A large-scale video dataset improving consistency between fine-grained conditions and video content
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 3roles
dataset 1polarities
use dataset 1representative citing papers
Streamo is a streaming video LLM trained end-to-end on the new Streamo-Instruct-465K dataset that unifies multiple real-time video tasks with claimed strong temporal reasoning and generalization.
VDCook is an automated, self-evolving platform for generating in-domain video datasets for MLLMs via natural language queries, retrieval-synthesis, and multi-dimensional metadata.
citing papers explorer
-
SyncDPO: Enhancing Temporal Synchronization in Video-Audio Joint Generation via Preference Learning
SyncDPO improves temporal synchronization in video-audio joint generation using DPO with efficient on-the-fly negative sample construction and curriculum learning.
-
Streaming Video Instruction Tuning
Streamo is a streaming video LLM trained end-to-end on the new Streamo-Instruct-465K dataset that unifies multiple real-time video tasks with claimed strong temporal reasoning and generalization.
-
VDCook:DIY video data cook your MLLMs
VDCook is an automated, self-evolving platform for generating in-domain video datasets for MLLMs via natural language queries, retrieval-synthesis, and multi-dimensional metadata.