Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
Realgeneral: Unifying visual generation via temporal in-context learning with video models
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 3
citation-polarity summary
fields
cs.CV 2years
2026 2roles
background 2polarities
background 2representative citing papers
citing papers explorer
-
Lance: Unified Multimodal Modeling by Multi-Task Synergy
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
- Beyond Text Prompts: Visual-to-Visual Generation as A Unified Paradigm