UniAR uses a shared context-visual tokenizer with bitwise quantization and parallel prediction in an autoregressive framework to unify visual understanding and generation, claiming SOTA on generation and editing tasks.
Unigen-1.5: Enhancing image generation and editing through reward unification in reinforcement learning
2 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 2years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
citing papers explorer
-
Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification
UniAR uses a shared context-visual tokenizer with bitwise quantization and parallel prediction in an autoregressive framework to unify visual understanding and generation, claiming SOTA on generation and editing tasks.
-
Lance: Unified Multimodal Modeling by Multi-Task Synergy
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.