Blended diffusion for text-driven editing of natural images

Avrahami, Omri, Lischinski, Dani, Fried, Ohad , year = · 2022 · arXiv 2111.14818

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1 baseline 1

citation-polarity summary

background 1 baseline 1

representative citing papers

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

cs.CV · 2021-12-20 · accept · novelty 7.0

A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.

MuPPet: Multi-person 2D-to-3D Pose Lifting

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving occlusion robustness.

DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory

cs.CV · 2023-08-16 · unverdicted · novelty 6.0

DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.

citing papers explorer

Showing 4 of 4 citing papers.

GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models cs.CV · 2021-12-20 · accept · none · ref 1
A 3.5-billion-parameter diffusion model with classifier-free guidance generates images preferred over DALL-E by human raters and can be fine-tuned for text-guided inpainting.
PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning cs.CV · 2026-05-01 · unverdicted · none · ref 1
PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.
MuPPet: Multi-person 2D-to-3D Pose Lifting cs.CV · 2026-04-08 · unverdicted · none · ref 3
MuPPet introduces person encoding, permutation augmentation, and dynamic multi-person attention to outperform prior single- and multi-person 2D-to-3D pose lifting methods on group interaction datasets while improving occlusion robustness.
DragNUWA: Fine-grained Control in Video Generation by Integrating Text, Image, and Trajectory cs.CV · 2023-08-16 · unverdicted · none · ref 69
DragNUWA integrates text, image, and trajectory controls into a diffusion video model using a Trajectory Sampler, Multiscale Fusion, and Adaptive Training to enable fine-grained open-domain video generation.

Blended diffusion for text-driven editing of natural images

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer