hub Canonical reference

Diffedit: Diffusion-based semantic image editing with mask guidance

Guillaume Couairon, Jakob Verbeek, Holger Schwenk, Matthieu Cord · 2022 · arXiv 2210.11427

Canonical reference. 100% of citing Pith papers cite this work as background.

28 Pith papers citing it

Background 100% of classified citations

read on arXiv browse 28 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 6

citation-polarity summary

background 6

representative citing papers

neuralCAD-Edit: An Expert Benchmark for Multimodal-Instructed 3D CAD Model Editing

cs.CV · 2026-04-17 · unverdicted · novelty 8.0

neuralCAD-Edit benchmark shows even the best foundation model (GPT 5.2) scores 53% lower than human CAD experts in acceptance trials for multimodal-instructed 3D model edits.

Where to Refine, When to Stop: Rethinking Redundancy via Latent Discrepancy for Efficient Visual Autoregressive Generation

cs.CV · 2026-05-29 · unverdicted · novelty 7.0

LD-Pruning applies latent discrepancy to prune tokens and adaptively skip unconditional branches in VAR models for up to 2.35x faster inference with preserved quality.

Functionalization via Structure Completion and Motion Rectification

cs.CV · 2026-05-18 · unverdicted · novelty 7.0

Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.

GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models

cs.LG · 2026-04-27 · unverdicted · novelty 7.0

GeoEdit constructs local tangent frames from small perturbations to initial noise, enabling Jacobian-free on-manifold edits in diffusion models via alternating tangent steps and diffusion projections.

UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

cs.CV · 2026-02-27 · unverdicted · novelty 7.0

DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.

RDSplat: Robust Watermarking for 3D Gaussian Splatting Against 2D and 3D Diffusion Editing

cs.CV · 2025-12-07 · conditional · novelty 7.0

RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.

Factored Classifier-Free Guidance

cs.CV · 2025-06-17 · unverdicted · novelty 7.0

Factored Classifier-Free Guidance enables per-attribute control in classifier-free guidance for diffusion models to produce more sound counterfactuals.

UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models

cs.CV · 2025-04-17 · unverdicted · novelty 7.0

UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.

GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.

MirrorPPR: Exemplar-Based Portrait Photo Retouching

cs.CV · 2026-06-28 · unverdicted · novelty 6.0

MirrorPPR extracts retouching operations from exemplar pairs via a dedicated extractor and transfers them to query images through a LoRA-adapted Diffusion Transformer, enabled by a new 47-million-pair dataset and self-augmentation for alignment.

SpatialFlow-GRPO: Where Spatial Credit Drives Image Editing

cs.CV · 2026-06-25 · unverdicted · novelty 6.0

SpatialFlow-GRPO adds region-level reward feedback and spatial alignment to Flow-GRPO-style RL for image editing, reporting gains on GEdit-Bench, ImgEdit-Bench, and a new MultiEditBench.

A Systematic Study of Behavioral Cloning for Scientific Data Annotation

cs.HC · 2026-05-26 · unverdicted · novelty 6.0

Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.

Rethinking Scribble-Guided Image Editing: Generalization, Instruction Adherence, and Multi-Tasking

cs.CV · 2026-05-25 · unverdicted · novelty 6.0

Empirical studies reveal instruction-level generalization as the main bottleneck in scribble-guided editing; three strategies (curriculum, multi-task mosaicking, edit-focused loss) achieve SOTA on VIBE benchmark.

ACE-LoRA: Adaptive Orthogonal Decoupling for Continual Image Editing

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

ACE-LoRA introduces adaptive orthogonal decoupling and rank-invariant compression for continual image editing in diffusion models, plus the CIE-Bench benchmark.

PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning

cs.CV · 2026-05-01 · unverdicted · novelty 6.0

PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.

Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation

cs.CV · 2025-11-21 · unverdicted · novelty 6.0

Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

cs.CV · 2025-09-26 · conditional · novelty 6.0

FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.

Semantics Disentanglement and Composition for Universal Image Coding with Efficiently LLM Reasoning and Generative Diffusion

cs.CV · 2024-12-24 · unverdicted · novelty 6.0

UniCodec uses LLM-driven semantic disentanglement at the encoder and diffusion-based compositional generation at the decoder to enable one codec for both human perception and machine vision tasks without task-specific retraining.

eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers

cs.CV · 2022-11-02 · unverdicted · novelty 6.0

An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.

IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations

cs.CV · 2026-05-01 · unverdicted · novelty 5.0

IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.

DiffMagicFace: Identity Consistent Facial Editing of Real Videos

cs.CV · 2026-04-15 · unverdicted · novelty 5.0

DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.

Landscape-Awareness for Geometric View Diffusion Model

cs.CV · 2026-05-19 · unverdicted · novelty 4.0

A score-based method is introduced to guide optimization in geometric view diffusion models toward correct viewpoints, improving convergence and sample efficiency over naive multistart strategies.

citing papers explorer

Showing 28 of 28 citing papers.

neuralCAD-Edit: An Expert Benchmark for Multimodal-Instructed 3D CAD Model Editing cs.CV · 2026-04-17 · unverdicted · none · ref 12
neuralCAD-Edit benchmark shows even the best foundation model (GPT 5.2) scores 53% lower than human CAD experts in acceptance trials for multimodal-instructed 3D model edits.
Where to Refine, When to Stop: Rethinking Redundancy via Latent Discrepancy for Efficient Visual Autoregressive Generation cs.CV · 2026-05-29 · unverdicted · none · ref 1
LD-Pruning applies latent discrepancy to prune tokens and adaptively skip unconditional branches in VAR models for up to 2.35x faster inference with preserved quality.
Functionalization via Structure Completion and Motion Rectification cs.CV · 2026-05-18 · unverdicted · none · ref 106
Object functionalization is cast as neural graph completion over a functional graph of parts, contacts, and motions, followed by geometry realization that also rectifies erroneous motions, demonstrated on furniture with a new paired dataset.
GeoEdit: Local Frames for Fast, Training-Free On-Manifold Editing in Diffusion Models cs.LG · 2026-04-27 · unverdicted · none · ref 5
GeoEdit constructs local tangent frames from small perturbations to initial noise, enabling Jacobian-free on-manifold edits in diffusion models via alternating tangent steps and diffusion projections.
UniEditBench: A Unified and Cost-Effective Benchmark for Image and Video Editing via Distilled MLLMs cs.CV · 2026-04-17 · unverdicted · none · ref 10
UniEditBench unifies image and video editing evaluation with a nine-plus-eight operation taxonomy and cost-effective 4B/8B distilled MLLM evaluators that align with human judgments.
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model cs.CV · 2026-02-27 · unverdicted · none · ref 2
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
RDSplat: Robust Watermarking for 3D Gaussian Splatting Against 2D and 3D Diffusion Editing cs.CV · 2025-12-07 · conditional · none · ref 10
RDSplat is the first 3D Gaussian Splatting watermarking method that maintains 0.701 bit accuracy against both 2D and 3D diffusion editing by embedding only in low-frequency primitives selected via FAPS.
Factored Classifier-Free Guidance cs.CV · 2025-06-17 · unverdicted · none · ref 7
Factored Classifier-Free Guidance enables per-attribute control in classifier-free guidance for diffusion models to produce more sound counterfactuals.
UniEdit-Flow: Unleashing Inversion and Editing in the Era of Flow Models cs.CV · 2025-04-17 · unverdicted · none · ref 14
UniEdit-Flow presents tuning-free Uni-Inv and Uni-Edit methods for inversion and editing in flow models that achieve accurate reconstruction and robust region-preserving edits across generative models.
GeoEdit: Geometry-Aware Object Editing via Dual-Branch Denoising cs.CV · 2026-06-29 · unverdicted · none · ref 10
GeoEdit introduces a Lift-Manipulate-Render-Denoise pipeline with dual-branch denoising and variance-homogeneous injection for 3D-consistent object editing in single photos.
MirrorPPR: Exemplar-Based Portrait Photo Retouching cs.CV · 2026-06-28 · unverdicted · none · ref 13
MirrorPPR extracts retouching operations from exemplar pairs via a dedicated extractor and transfers them to query images through a LoRA-adapted Diffusion Transformer, enabled by a new 47-million-pair dataset and self-augmentation for alignment.
SpatialFlow-GRPO: Where Spatial Credit Drives Image Editing cs.CV · 2026-06-25 · unverdicted · none · ref 5
SpatialFlow-GRPO adds region-level reward feedback and spatial alignment to Flow-GRPO-style RL for image editing, reporting gains on GEdit-Bench, ImgEdit-Bench, and a new MultiEditBench.
A Systematic Study of Behavioral Cloning for Scientific Data Annotation cs.HC · 2026-05-26 · unverdicted · none · ref 253
Introduces 9 synthetic annotation tasks and benchmarks for behavioral cloning, finding hierarchical skill learning, scaling benefits, effective multi-task pretraining, and shared internal representations of task phases and mistakes.
Rethinking Scribble-Guided Image Editing: Generalization, Instruction Adherence, and Multi-Tasking cs.CV · 2026-05-25 · unverdicted · none · ref 10
Empirical studies reveal instruction-level generalization as the main bottleneck in scribble-guided editing; three strategies (curriculum, multi-task mosaicking, edit-focused loss) achieve SOTA on VIBE benchmark.
ACE-LoRA: Adaptive Orthogonal Decoupling for Continual Image Editing cs.CV · 2026-05-14 · unverdicted · none · ref 6
ACE-LoRA introduces adaptive orthogonal decoupling and rank-invariant compression for continual image editing in diffusion models, plus the CIE-Bench benchmark.
PhysEdit: Physically-Consistent Region-Aware Image Editing via Adaptive Spatio-Temporal Reasoning cs.CV · 2026-05-01 · unverdicted · none · ref 6
PhysEdit introduces adaptive reasoning depth and spatial masking to make image editing faster and more instruction-aligned without retraining the base model.
Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing cs.CV · 2026-04-22 · unverdicted · none · ref 8
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
Less is More: Data-Efficient Adaptation for Controllable Text-to-Video Generation cs.CV · 2025-11-21 · unverdicted · none · ref 4
Fine-tuning text-to-video models on sparse low-quality synthetic data for physical camera controls outperforms fine-tuning on photorealistic data.
FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing cs.CV · 2025-09-26 · conditional · none · ref 4
FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.
Semantics Disentanglement and Composition for Universal Image Coding with Efficiently LLM Reasoning and Generative Diffusion cs.CV · 2024-12-24 · unverdicted · none · ref 56
UniCodec uses LLM-driven semantic disentanglement at the encoder and diffusion-based compositional generation at the decoder to enable one codec for both human perception and machine vision tasks without task-specific retraining.
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers cs.CV · 2022-11-02 · unverdicted · none · ref 12
An ensemble of stage-specialized text-to-image diffusion models improves prompt alignment over single shared-parameter models while preserving visual quality and inference speed.
IdentiFace: Multi-Modal Iterative Diffusion Framework for Identifiable Suspect Face Generation in Crime Investigations cs.CV · 2026-05-01 · unverdicted · none · ref 8
IdentiFace is a multi-modal iterative diffusion framework that generates identifiable suspect faces with improved identity retrieval for law enforcement applications.
DiffMagicFace: Identity Consistent Facial Editing of Real Videos cs.CV · 2026-04-15 · unverdicted · none · ref 32
DiffMagicFace uses concurrent fine-tuned text and image diffusion models plus a rendered multi-view dataset to achieve identity-consistent text-conditioned editing of real facial videos.
Landscape-Awareness for Geometric View Diffusion Model cs.CV · 2026-05-19 · unverdicted · none · ref 7
A score-based method is introduced to guide optimization in geometric view diffusion models toward correct viewpoints, improving convergence and sample efficiency over naive multistart strategies.
Towards Robust Sequential Decomposition for Complex Image Editing cs.CV · 2026-05-10 · unverdicted · none · ref 9 · 2 links
Develops a synthetic data pipeline for training sequential decomposition in generative image editing, showing robust gains with complexity and sim-to-real transfer via co-training.
UniGeo: Unifying Geometric Guidance for Camera-Controllable Image Editing via Video Models cs.CV · 2026-04-19 · unreviewed · ref 20 · 2 links
Making Image Editing Easier via Adaptive Task Reformulation with Agentic Executions cs.CV · 2026-04-17 · unreviewed · ref 6
CAMEO: A Conditional and Quality-Aware Multi-Agent Image Editing Orchestrator cs.CV · 2026-04-03 · unreviewed · ref 7

Diffedit: Diffusion-based semantic image editing with mask guidance

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer