Concept-master: Multi-concept video customiza- tion on diffusion transformer models without test-time tun- ing

Yuzhou Huang, Ziyang Yuan, Quande Liu, Qiulin Wang, Xintao Wang, Ruimao Zhang, Pengfei Wan, Di Zhang, Kun Gai · 2025 · arXiv 2501.04698

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Generate Your Talking Avatar from Video Reference

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

TAVR generates high-fidelity talking avatars from cross-scene video references via token selection and three-stage training (same-scene pretraining, cross-scene fine-tuning, identity RL), outperforming baselines on a new 158-pair benchmark.

MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

MMControl adds multi-modal controls for identity, timbre, pose, and layout to unified audio-video diffusion models via dual-stream injection and adjustable guidance scaling.

Learning Zero-Shot Subject-Driven Video Generation Using 1% Compute

cs.CV · 2025-04-23 · unverdicted · novelty 6.0

A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.

Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation

cs.CV · 2026-05-17 · unverdicted · novelty 5.0

Omni-Customizer proposes an end-to-end framework using Omni-Context Fusion, Masked TTS Cross-Attention, Semantic-Anchored Multimodal RoPE, and specialized training curricula to achieve precise multimodal identity binding in joint audio-video generation.

Evolution of Video Generative Foundations

cs.CV · 2026-04-07 · unverdicted · novelty 2.0

This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

citing papers explorer

Showing 5 of 5 citing papers.

Generate Your Talking Avatar from Video Reference cs.CV · 2026-04-30 · unverdicted · none · ref 22
TAVR generates high-fidelity talking avatars from cross-scene video references via token selection and three-stage training (same-scene pretraining, cross-scene fine-tuning, identity RL), outperforming baselines on a new 158-pair benchmark.
MMControl: Unified Multi-Modal Control for Joint Audio-Video Generation cs.CV · 2026-04-21 · unverdicted · none · ref 9
MMControl adds multi-modal controls for identity, timbre, pose, and layout to unified audio-video diffusion models via dual-stream injection and adjustable guidance scaling.
Learning Zero-Shot Subject-Driven Video Generation Using 1% Compute cs.CV · 2025-04-23 · unverdicted · none · ref 24
A zero-shot subject-driven video generation framework that decomposes the task into identity injection from 200K subject-image pairs and motion preservation from 4K arbitrary videos, trained in 288 A100 GPU hours on CogVideoX-5B to match prior performance at 1% compute.
Omni-Customizer: End-to-End MultiModal Customization for Joint Audio-Video Generation cs.CV · 2026-05-17 · unverdicted · none · ref 28
Omni-Customizer proposes an end-to-end framework using Omni-Context Fusion, Masked TTS Cross-Attention, Semantic-Anchored Multimodal RoPE, and specialized training curricula to achieve precise multimodal identity binding in joint audio-video generation.
Evolution of Video Generative Foundations cs.CV · 2026-04-07 · unverdicted · none · ref 200
This survey traces video generation technology from GANs to diffusion models and then to autoregressive and multimodal approaches while analyzing principles, strengths, and future trends.

Concept-master: Multi-concept video customiza- tion on diffusion transformer models without test-time tun- ing

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer