hub

Direct inversion: Boosting diffusion-based edit- ing with 3 lines of code.arXiv preprint arXiv:2310.01506

Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, Qiang Xu · 2023 · arXiv 2310.01506

21 Pith papers cite this work. Polarity classification is still indexing.

21 Pith papers citing it

read on arXiv browse 21 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 dataset 1 method 1

citation-polarity summary

background 1 use dataset 1 use method 1

representative citing papers

Imagine Before You Draw: Visual Prompt Engineering for Image Generation

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

VPE inserts an internal autoregressive visual semantic token generation step to guide image token production in unified models, reporting faster convergence, higher quality, and superior editing preservation (PSNR 26.76 vs 19.92) versus external alternatives.

ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent

cs.CV · 2026-04-28 · unverdicted · novelty 7.0

ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.

$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models

cs.CV · 2026-04-26 · unverdicted · novelty 7.0

Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.

Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

Masked Logit Nudging aligns visual autoregressive model logits with source token maps under target prompts inside cross-attention masks, delivering top image editing results on PIE benchmarks and strong reconstructions on COCO and OpenImages while running faster than diffusion approaches.

RewardFlow: Generate Images by Optimizing What You Reward

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

RewardFlow unifies differentiable rewards including a new VQA-based one and uses a prompt-aware adaptive policy with Langevin dynamics to achieve state-of-the-art image editing and compositional generation.

LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction

cs.CV · 2026-03-22 · conditional · novelty 7.0

LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.

Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality

cs.CV · 2025-12-08 · unverdicted · novelty 7.0

LivingSwap is the first video reference-guided face swapping model that uses keyframe conditioning and temporal stitching to preserve source video realism with high fidelity across long sequences.

Delta Rectified Flow Sampling for Text-to-Image Editing

cs.CV · 2025-09-01 · unverdicted · novelty 7.0

DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.

In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer

cs.CV · 2025-04-29 · unverdicted · novelty 7.0

ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.

Training-free image inversion for one-step diffusion models

cs.CV · 2026-05-31 · unverdicted · novelty 6.0

TFinv proposes iterative noise alignment and suffix learning to enable training-free inversion and editing for one-step diffusion models, achieving SOTA performance and higher efficiency than multistep methods.

VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation

cs.CV · 2026-05-15 · accept · novelty 6.0

VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.

LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency

cs.CV · 2026-05-11 · unverdicted · novelty 6.0

LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.

LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

LIVEditor-14B applies a new sparse attention method (ISA) that prunes context and uses query-sharpness routing to cut attention latency ~60% with no loss in editing quality on standard benchmarks.

Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing

cs.CV · 2026-04-22 · unverdicted · novelty 6.0

Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.

FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing

cs.CV · 2025-09-26 · conditional · novelty 6.0

FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.

EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning

cs.CV · 2025-09-24 · unverdicted · novelty 6.0

EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

cs.CV · 2024-01-25 · unverdicted · novelty 6.0

Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.

Stable and Near-Reversible Diffusion ODE Solvers for Image Editing

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

Near-reversible Runge-Kutta ODE solvers combined with vector-field smoothing deliver more stable and higher-fidelity text-guided edits in diffusion models than exactly reversible schemes.

DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

DirectEdit eliminates reconstruction error in flow-based image editing by aligning forward paths and applying attention feature injection with mask-guided noise blending.

Semantic Granularity Navigation in Image Editing

cs.CV · 2026-05-20 · unverdicted · novelty 4.0 · 2 refs

NaviEdit is a training-free inference-time controller that decouples edit progress from model scale traversal in diffusion-based image editing via self-consistency, reporting average gains across editors and backbones.

CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing

cs.CV · 2025-06-23

citing papers explorer

Showing 21 of 21 citing papers.

Imagine Before You Draw: Visual Prompt Engineering for Image Generation cs.CV · 2026-06-03 · unverdicted · none · ref 36
VPE inserts an internal autoregressive visual semantic token generation step to guide image token production in unified models, reporting faster convergence, higher quality, and superior editing preservation (PSNR 26.76 vs 19.92) versus external alternatives.
ResetEdit: Precise Text-guided Editing of Generated Image via Resettable Starting Latent cs.CV · 2026-04-28 · unverdicted · none · ref 10
ResetEdit embeds a recoverable discrepancy signal during image generation in diffusion models to reconstruct an approximate original latent for high-fidelity text-guided editing.
$Z^2$-Sampling: Zero-Cost Zigzag Trajectories for Semantic Alignment in Diffusion Models cs.CV · 2026-04-26 · unverdicted · none · ref 14
Z²-Sampling implicitly realizes zero-cost zigzag trajectories for curvature-aware semantic alignment in diffusion models by reducing multi-step paths via operator dualities and temporal caching while synthesizing a directional derivative penalty.
Prompt-Guided Image Editing with Masked Logit Nudging in Visual Autoregressive Models cs.CV · 2026-04-16 · unverdicted · none · ref 22
Masked Logit Nudging aligns visual autoregressive model logits with source token maps under target prompts inside cross-attention masks, delivering top image editing results on PIE benchmarks and strong reconstructions on COCO and OpenImages while running faster than diffusion approaches.
RewardFlow: Generate Images by Optimizing What You Reward cs.CV · 2026-04-09 · unverdicted · none · ref 19
RewardFlow unifies differentiable rewards including a new VQA-based one and uses a prompt-aware adaptive policy with Langevin dynamics to achieve state-of-the-art image editing and compositional generation.
LPNSR: Optimal Noise-Guided Diffusion Image Super-Resolution Via Learnable Noise Prediction cs.CV · 2026-03-22 · conditional · none · ref 40
LPNSR derives optimal intermediate noise for diffusion SR via MLE and implements it with an LR-guided noise predictor, reaching SOTA perceptual quality in 4 steps without text priors.
Preserving Source Video Realism: High-Fidelity Face Swapping for Cinematic Quality cs.CV · 2025-12-08 · unverdicted · none · ref 20
LivingSwap is the first video reference-guided face swapping model that uses keyframe conditioning and temporal stitching to preserve source video realism with high fidelity across long sequences.
Delta Rectified Flow Sampling for Text-to-Image Editing cs.CV · 2025-09-01 · unverdicted · none · ref 13
DRFS is a new inversion-free editing technique for rectified flow models that models source-target velocity discrepancies and applies a time-dependent shift to improve fidelity and unify prior methods like DDS and FlowEdit.
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer cs.CV · 2025-04-29 · unverdicted · none · ref 16
ICEdit achieves state-of-the-art instructional image editing in Diffusion Transformers via in-context generation, requiring only 0.1% of prior training data and 1% trainable parameters.
Training-free image inversion for one-step diffusion models cs.CV · 2026-05-31 · unverdicted · none · ref 42
TFinv proposes iterative noise alignment and suffix learning to enable training-free inversion and editing for one-step diffusion models, achieving SOTA performance and higher efficiency than multistep methods.
VAGS: Velocity Adaptive Guidance Scale for Image Editing and Generation cs.CV · 2026-05-15 · accept · none · ref 30
VAGS adapts the CFG scale at each ODE step using velocity alignment signals to raise structural fidelity in editing and sample quality in generation over fixed-scale baselines.
LimeCross: Context-Conditioned Layered Image Editing with Structural Consistency cs.CV · 2026-05-11 · unverdicted · none · ref 21
LimeCross enables text-guided editing of individual layers in composite images by conditioning on cross-layer context via bi-stream attention while preserving layer integrity and introducing the LayerEditBench benchmark.
LIVEditor-14B: Lightning Unified Video Editing via In-Context Sparse Attention cs.CV · 2026-05-06 · unverdicted · none · ref 14
LIVEditor-14B applies a new sparse attention method (ISA) that prunes context and uses query-sharpness routing to cut attention latency ~60% with no loss in editing quality on standard benchmarks.
Rethinking Where to Edit: Task-Aware Localization for Instruction-Based Image Editing cs.CV · 2026-04-22 · unverdicted · none · ref 17
Task-aware localization via attention cues and feature centroids from source/target streams in IIE models improves non-edit consistency while preserving instruction following.
FlashEdit: Decoupling Speed, Structure, and Semantics for Precise Image Editing cs.CV · 2025-09-26 · conditional · none · ref 13
FlashEdit delivers real-time localized text-guided image editing under 0.2 seconds via cycle-consistent one-step inversion, background shield, and sparsified spatial cross-attention, achieving over 150x speedup on PIE-Bench.
EditVerse: Unifying Image and Video Editing and Generation with In-Context Learning cs.CV · 2025-09-24 · unverdicted · none · ref 11
EditVerse unifies image and video editing and generation in one transformer model via unified token sequences and in-context learning, trained jointly on curated video editing data plus image/video corpora and evaluated on a new instruction-based benchmark.
Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks cs.CV · 2024-01-25 · unverdicted · none · ref 21
Grounded SAM integrates Grounding DINO and SAM to support text-prompted open-world detection and segmentation, achieving 48.7 mean AP on SegInW zero-shot with the base detector and huge segmenter.
Stable and Near-Reversible Diffusion ODE Solvers for Image Editing cs.CV · 2026-05-12 · unverdicted · none · ref 21
Near-reversible Runge-Kutta ODE solvers combined with vector-field smoothing deliver more stable and higher-fidelity text-guided edits in diffusion models than exactly reversible schemes.
DirectEdit: Step-Level Accurate Inversion for Flow-Based Image Editing cs.CV · 2026-05-04 · unverdicted · none · ref 25
DirectEdit eliminates reconstruction error in flow-based image editing by aligning forward paths and applying attention feature injection with mask-guided noise blending.
Semantic Granularity Navigation in Image Editing cs.CV · 2026-05-20 · unverdicted · none · ref 7 · 2 links
NaviEdit is a training-free inference-time controller that decouples edit progress from model scale traversal in diffusion-based image editing via self-consistency, reporting average gains across editors and backbones.
CPAM: Context-Preserving Adaptive Manipulation for Zero-Shot Real Image Editing cs.CV · 2025-06-23 · unreviewed · ref 21

Direct inversion: Boosting diffusion-based edit- ing with 3 lines of code.arXiv preprint arXiv:2310.01506

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer