AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional animators on prompt understanding and artistic motion.
Transnet v2: An effective deep network architecture for fast shot transition detection
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
LPM 1.0 generates infinite-length, identity-stable, real-time audio-visual conversational performances for single characters using a distilled causal diffusion transformer and a new benchmark.
Unlabeled web videos processed by designed data engines generate effective training data that yields strong zero-shot and finetuned performance on 3D detection, segmentation, VQA, and navigation.
SkyReels-V2 produces infinite-length film videos via MLLM-based captioning, progressive pretraining, motion RL, and diffusion forcing with non-decreasing noise schedules.
HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.
U-CESE integrates three CESE modules into a unified clip-based pipeline with DAKE keyframe extraction and ReCap captioning to support consistent multimodal event retrieval across video sources.
MERVIN is a multimodal retrieval system for Vietnamese news videos that integrates visual and textual features with LLM-enhanced transcripts and reports strong results on a 2025 AI challenge.
citing papers explorer
-
AniMatrix: An Anime Video Generation Model that Thinks in Art, Not Physics
AniMatrix generates anime videos by structuring artistic production rules into a controllable taxonomy and training the model to prioritize those rules over physical realism, achieving top scores from professional animators on prompt understanding and artistic motion.
-
LPM 1.0: Video-based Character Performance Model
LPM 1.0 generates infinite-length, identity-stable, real-time audio-visual conversational performances for single characters using a distilled causal diffusion transformer and a new benchmark.
-
Lifting Unlabeled Internet-level Data for 3D Scene Understanding
Unlabeled web videos processed by designed data engines generate effective training data that yields strong zero-shot and finetuned performance on 3D detection, segmentation, VQA, and navigation.
-
SkyReels-V2: Infinite-length Film Generative Model
SkyReels-V2 produces infinite-length film videos via MLLM-based captioning, progressive pretraining, motion RL, and diffusion forcing with non-decreasing noise schedules.
-
HunyuanVideo: A Systematic Framework For Large Video Generative Models
HunyuanVideo presents a 13B-parameter open-source video generative model with integrated data, architecture, training, and inference systems whose professional evaluations show it outperforming prior SOTA models including Runway Gen-3 and Luma 1.6.
-
U-CESE: Unified Clip-based Event Search Engine for AI Challenge HCMC 2025
U-CESE integrates three CESE modules into a unified clip-based pipeline with DAKE keyframe extraction and ReCap captioning to support consistent multimodal event retrieval across video sources.
-
MERVIN: A Unified Framework for Multimodal Event Retrieval in Vietnamese News Videos
MERVIN is a multimodal retrieval system for Vietnamese news videos that integrates visual and textual features with LLM-enhanced transcripts and reports strong results on a 2025 AI challenge.
- MSAVBench: Towards Comprehensive and Reliable Evaluation of Multi-Shot Audio-Video Generation