StableMTL repurposes latent diffusion models for multi-task learning from partially annotated synthetic data via unified latent loss, task encoding, and a multi-stream task-attention architecture, reporting outperformance on 7 tasks across 8 benchmarks.
In: CVPR (2022)
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 5verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
Refinement via Regeneration (RvR) reformulates image refinement in unified multimodal models as conditional regeneration using prompt and semantic tokens from the initial image, yielding higher alignment scores than editing-based methods.
HO-Flow synthesizes realistic hand-object motions from text and canonical 3D objects via an interaction-aware VAE and masked flow matching, reporting SOTA physical plausibility and diversity on GRAB, OakInk, and DexYCB.
MIRAGE introduces a benchmark for multi-instance image editing and a training-free framework that uses vision-language parsing and parallel regional denoising to achieve precise edits without altering backgrounds.
SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.
citing papers explorer
-
StableMTL: Repurposing Latent Diffusion Models for Multi-Task Learning from Partially Annotated Synthetic Datasets
StableMTL repurposes latent diffusion models for multi-task learning from partially annotated synthetic data via unified latent loss, task encoding, and a multi-stream task-attention architecture, reporting outperformance on 7 tasks across 8 benchmarks.
-
Refinement via Regeneration: Enlarging Modification Space Boosts Image Refinement in Unified Multimodal Models
Refinement via Regeneration (RvR) reformulates image refinement in unified multimodal models as conditional regeneration using prompt and semantic tokens from the initial image, yielding higher alignment scores than editing-based methods.
-
HO-Flow: Generalizable Hand-Object Interaction Generation with Latent Flow Matching
HO-Flow synthesizes realistic hand-object motions from text and canonical 3D objects via an interaction-aware VAE and masked flow matching, reporting SOTA physical plausibility and diversity on GRAB, OakInk, and DexYCB.
-
MIRAGE: Benchmarking and Aligning Multi-Instance Image Editing
MIRAGE introduces a benchmark for multi-instance image editing and a training-free framework that uses vision-language parsing and parallel regional denoising to achieve precise edits without altering backgrounds.
-
SpatialEdit: Benchmarking Fine-Grained Image Spatial Editing
SpatialEdit provides a benchmark, large synthetic dataset, and baseline model for precise object and camera spatial manipulations in images, with the model beating priors on spatial editing.