Uni-Edit introduces a data synthesis pipeline turning VQA data into reasoning-intensive editing instructions, enabling single-task tuning that boosts all three capabilities in models like BAGEL and Janus-Pro.
Onecat: Decoder-only auto-regressive model for unified understanding and generation
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 9representative citing papers
Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.
AIA loss teaches unified multimodal models task-specific cross-modal attention patterns to reduce conflicts between image understanding and generation without architecture decoupling.
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
STARFlow2 presents an autoregressive flow-based architecture for unified multimodal text-image generation by interleaving a VLM stream with a TarFlow stream via residual skips and a unified latent space.
PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.
citing papers explorer
-
Uni-Edit: Intelligent Editing Is A General Task For Unified Model Tuning
Uni-Edit introduces a data synthesis pipeline turning VQA data into reasoning-intensive editing instructions, enabling single-task tuning that boosts all three capabilities in models like BAGEL and Janus-Pro.
-
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
-
PlanViz: Evaluating Planning-Oriented Image Generation and Editing for Computer-Use Tasks
PlanViz is a new benchmark with three sub-tasks and PlanScore metric to evaluate planning-oriented image generation and editing by unified multimodal models for computer-use tasks.
-
AIA: Rethinking Architecture Decoupling Strategy In Unified Multimodal Model
AIA loss teaches unified multimodal models task-specific cross-modal attention patterns to reduce conflicts between image understanding and generation without architecture decoupling.
-
Lance: Unified Multimodal Modeling by Multi-Task Synergy
Lance presents a dual-stream mixture-of-experts model with modality-aware positional encoding and staged multi-task training that outperforms prior open-source unified models on image and video generation while keeping strong understanding performance.
-
STARFlow2: Bridging Language Models and Normalizing Flows for Unified Multimodal Generation
STARFlow2 presents an autoregressive flow-based architecture for unified multimodal text-image generation by interleaving a VLM stream with a TarFlow stream via residual skips and a unified latent space.
-
Protein Autoregressive Modeling via Multiscale Structure Generation
PAR is a multi-scale autoregressive transformer framework for protein backbone generation that uses coarse-to-fine prediction, noisy context learning, and flow-based decoding to achieve high-quality unconditional and zero-shot conditional outputs.
-
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
-
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
Using understanding tasks as direct supervision during post-training improves image generation and editing in unified multimodal models.