Delta Rectified Flow Sampling for Text-to-Image Editing

Gaspard Beaudouin; Jaeyeon Kim; Mengyu Wang; Minghan Li; Sung-Hoon Yoon

arxiv: 2509.05342 · v3 · submitted 2025-09-01 · 💻 cs.CV · cs.LG

Delta Rectified Flow Sampling for Text-to-Image Editing

Gaspard Beaudouin , Minghan Li , Jaeyeon Kim , Sung-Hoon Yoon , Mengyu Wang This is my paper

Pith reviewed 2026-05-18 18:58 UTC · model grok-4.3

classification 💻 cs.CV cs.LG

keywords text-to-image editingrectified flowimage editingsampling methodsdistillationvelocity fieldsdiffusion models

0 comments

The pith

Delta Rectified Flow Sampling improves text-to-image edits by modeling velocity discrepancies and adding a time-dependent shift to reduce over-smoothing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes Delta Rectified Flow Sampling as a new framework for editing images in rectified flow models using text instructions. It works by calculating the difference between the velocity fields of the original and desired images, then applies a shifting adjustment that moves the noisy starting point closer to the target path. This targets the blurring and loss of detail that often occurs when distillation methods try to skip full inversion steps. If the approach holds, it would let users make precise changes to generated images while keeping the original structure intact, all without retraining the base model or adding inversion steps. The method also positions itself as a bridge that recovers earlier techniques like Delta Denoising Score and FlowEdit as special cases under certain choices of the shift.

Core claim

DRFS explicitly models the discrepancy between source and target velocity fields in rectified flow dynamics and introduces a time-dependent shift term to align noisy latents with the target trajectory, yielding higher editing quality, fidelity, and controllability on the PIE Benchmark with no architectural changes to the underlying model.

What carries the argument

Velocity discrepancy modeling paired with a time-dependent shift term that pushes latents toward the target distribution during sampling.

If this is right

Edits preserve source image details more accurately while following the target text prompt.
The same framework recovers DDS when the shift term is removed and recovers FlowEdit under a linear shift schedule.
No inversion step or model architecture changes are required for the editing process.
The shift schedule can be analyzed and tuned to balance edit strength against fidelity.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The velocity discrepancy approach could transfer to editing tasks in video or 3D generation models that rely on flow dynamics.
Further refinement of the shift term schedule might yield performance gains when applied to larger or differently trained rectified flow models.
Testing the method on out-of-distribution prompts or non-PIE datasets would help identify where the discrepancy modeling provides the largest benefit.

Load-bearing premise

That explicitly modeling the source-target velocity discrepancy together with a time-dependent shift will reliably reduce over-smoothing artifacts during practical editing.

What would settle it

Evaluating DRFS on the PIE Benchmark and observing that it produces equal or greater over-smoothing, lower fidelity scores, or reduced controllability compared with prior methods such as DDS or FlowEdit.

Figures

Figures reproduced from arXiv: 2509.05342 by Gaspard Beaudouin, Jaeyeon Kim, Mengyu Wang, Minghan Li, Sung-Hoon Yoon.

**Figure 1.** Figure 1: Comparison between RFDS and DVRF (ours). Source prompt: Brown horse walking in a grassy meadow with an autumn forest backdrop and target prompt: Zebra walking in a grassy meadow with an autumn forest backdrop. As shown in (b) and (c), RFDS results in over-smoothing and detail loss. In contrast, DVRF (d) preserves textures. 3 Delta Velocity Rectified Flow (DVRF) We introduce Delta Velocity Rectified Flow (D… view at source ↗

**Figure 2.** Figure 2: Visual comparison of the sampling strategies for editing [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Effect of the offset coefficient ct. Subfigure (a) shows that larger η yields straighter trajectories; (b) shows it also increases update magnitude via amplified ∥vθ(ˆx tgt t ) − vθ(x src t )∥ 2 . 7 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative comparisons on images from our additional dataset and PIE benchmark. [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative edits produced by our DVRF. Each pair indicates the source image (left) and edited result (right). induce larger gradient updates and produce straighter latent trajectories. Geometrically, as illustrated in [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative edits produced by our DVRF with different schedulers. For each triplet: left = source, center = random scheduler, right = descending scheduler. 6 Related Work Text-guided Inversion and Editing. Image editing methods can be broadly categorized into training-based and training-free approaches. Training-based methods fine-tune generative 10 [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

read the original abstract

We propose Delta Rectified Flow Sampling (DRFS), a novel inversion-free, path-aware editing framework within rectified flow models for text-to-image editing. DRFS is a distillation-based method that explicitly models the discrepancy between the source and target velocity fields in order to mitigate over-smoothing artifacts rampant in prior distillation sampling approaches. We further introduce a time-dependent shift term to push noisy latents closer to the target trajectory, enhancing the alignment with the target distribution. We theoretically demonstrate that disabling this shift recovers Delta Denoising Score (DDS), bridging score-based diffusion optimization and velocity-based rectified-flow optimization. Moreover, under rectified-flow dynamics, a linear shift schedule recovers the inversion-free method FlowEdit as a strict special case, yielding a unifying view of optimization and ODE editing. We conduct an analysis to guide the design of our shift term, and experimental results on the widely used PIE Benchmark indicate that DRFS achieves superior editing quality, fidelity, and controllability while requiring no architectural modifications. Code is available at https://github.com/Harvard-AI-and-Robotics-Lab/DeltaRectifiedFlowSampling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

DRFS adds a velocity discrepancy term and time-dependent shift to rectified flow editing, recovering DDS and FlowEdit as cases, with claimed gains on PIE but thin experimental detail so far.

read the letter

DRFS models the gap between source and target velocity fields in rectified flows and introduces a time-dependent shift to reduce over-smoothing during editing. It frames DDS as the zero-shift case and FlowEdit as the linear-shift case, giving a single view that ties score-based and velocity-based approaches together under the same dynamics. That unification is the clearest new piece here and sits on top of existing rectified flow work without forcing new architecture changes. The code release helps, and the motivation around distillation artifacts is straightforward to follow. Experiments on the PIE benchmark report better quality, fidelity, and controllability than the baselines they compare against. Those results are the practical hook for anyone doing text-to-image editing. The soft spots are mostly in the supporting evidence. The abstract states the theoretical recovery but does not walk through the derivations, so it is hard to judge how tight the special-case arguments actually are without the full sections. The benchmark claims lack visible error bars or detailed ablations on the shift schedule, which makes it difficult to tell how robust the gains are across prompts or models. The core assumption that explicit discrepancy modeling plus the shift will consistently cut artifacts looks plausible on paper, but it could still be sensitive to hyperparameter choices or the base model. This paper is aimed at people already working on flow-based or diffusion editing pipelines who want a sampling tweak rather than a new model. Readers who care about practical controllability improvements and code they can try will get the most out of it. It has enough grounding in prior methods and a clear experimental target to merit a serious referee, even if the experiments will probably need tightening. I would send it to review.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes Delta Rectified Flow Sampling (DRFS), an inversion-free, path-aware framework for text-to-image editing in rectified flow models. DRFS explicitly models the discrepancy between source and target velocity fields to reduce over-smoothing and introduces a time-dependent shift term to align noisy latents with the target trajectory. It theoretically shows that setting the shift to zero recovers Delta Denoising Score (DDS) and that a linear shift schedule recovers FlowEdit as a special case, providing a unifying view of optimization and ODE-based editing. Experiments on the PIE Benchmark report superior editing quality, fidelity, and controllability without architectural modifications to the underlying model.

Significance. If the central claims hold, this work supplies a theoretically grounded bridge between score-based diffusion optimization and velocity-based rectified-flow methods while delivering practical gains in editing controllability. The explicit recovery of DDS and FlowEdit as special cases, combined with the public release of code at https://github.com/Harvard-AI-and-Robotics-Lab/DeltaRectifiedFlowSampling, strengthens reproducibility and situates the contribution within existing literature.

major comments (2)

[§3] §3 (Theoretical Analysis): the claim that DRFS recovers DDS by disabling the shift term and recovers FlowEdit under a linear shift schedule is load-bearing for the unifying-view contribution, yet the manuscript provides only high-level statements without the intermediate algebraic steps that connect the velocity-discrepancy term to the prior methods; explicit expansion of the ODE under each choice of shift would confirm the reductions.
[§4.3] §4.3 (PIE Benchmark Experiments): the reported superiority in quality, fidelity, and controllability is asserted without error bars, statistical significance tests, or ablations isolating the contribution of the time-dependent shift term versus the velocity-discrepancy modeling alone; these controls are necessary to substantiate that the observed gains are not attributable to hyper-parameter tuning or benchmark-specific artifacts.

minor comments (2)

[§3] Notation for the shift schedule β(t) is introduced in the abstract and §3 but never given an explicit functional form or range; adding a short paragraph or equation defining the schedule used in all experiments would improve clarity.
[Figure 3] Figure 3 (qualitative results) would benefit from side-by-side comparison with the exact DDS and FlowEdit baselines under identical random seeds to make the visual improvement attributable to DRFS immediately verifiable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments and the recommendation for minor revision. We are pleased that the significance of the work is recognized. Below we provide point-by-point responses to the major comments.

read point-by-point responses

Referee: §3 (Theoretical Analysis): the claim that DRFS recovers DDS by disabling the shift term and recovers FlowEdit under a linear shift schedule is load-bearing for the unifying-view contribution, yet the manuscript provides only high-level statements without the intermediate algebraic steps that connect the velocity-discrepancy term to the prior methods; explicit expansion of the ODE under each choice of shift would confirm the reductions.

Authors: We agree that including the explicit algebraic steps would enhance the clarity of the theoretical analysis. In the revised manuscript, we will expand §3 to provide the intermediate derivations. Specifically, we will show the ODE expansion when the shift term is set to zero, demonstrating recovery of the DDS formulation, and detail the linear shift schedule that reduces to FlowEdit. This will include all connecting steps from the velocity-discrepancy term. revision: yes
Referee: §4.3 (PIE Benchmark Experiments): the reported superiority in quality, fidelity, and controllability is asserted without error bars, statistical significance tests, or ablations isolating the contribution of the time-dependent shift term versus the velocity-discrepancy modeling alone; these controls are necessary to substantiate that the observed gains are not attributable to hyper-parameter tuning or benchmark-specific artifacts.

Authors: We recognize the importance of these additional controls for robustness. We will revise §4.3 to include error bars from multiple independent runs, perform statistical significance tests on the key metrics, and add ablations that separately evaluate the velocity-discrepancy modeling and the time-dependent shift term. These additions will help confirm that the improvements are attributable to the proposed components rather than other factors. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in DRFS derivation

full rationale

The paper introduces DRFS by explicitly modeling source-target velocity discrepancy plus a time-dependent shift term within rectified-flow dynamics. It derives special-case recoveries (shift=0 yields DDS; linear shift yields FlowEdit) as theoretical connections rather than tautological reductions. The PIE Benchmark results are presented as independent empirical validation of quality/fidelity gains without architectural changes. No self-definitional loops, fitted inputs relabeled as predictions, load-bearing self-citations, or ansatz smuggling appear in the derivation chain; the construction remains externally grounded and non-circular.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the shift term is introduced conceptually but its functional form and any associated constants are not specified.

pith-pipeline@v0.9.0 · 5735 in / 1122 out tokens · 45085 ms · 2026-05-18T18:58:41.297074+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We introduce Delta Velocity Rectified Flow (DVRF) ... explicitly minimizes the discrepancy between source and target velocity ... time-dependent shift term ct(xtgt0 − xsrc0)
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We prove that EDVRF reduces to EDDS when ct=0 ... FlowEdit ... when ct=t under rectified-flow dynamics

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 3 internal anchors

[1]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1122–1131, July 2017. doi: 10.1109/CVPRW.2017.150

work page doi:10.1109/cvprw.2017.150 2017
[2]

Albergo, Nicholas M

Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions, 2023. URLhttps://arxiv.org/abs/2303. 08797

work page 2023
[3]

Ledits++: Limitless image editing using text-to- image models, 2024

Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolinário Passos. Ledits++: Limitless image editing using text-to- image models, 2024. URLhttps://arxiv.org/abs/2311.16711

work page arXiv 2024
[4]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. Instructpix2pix: Learning to follow image editing instructions, 2023. URLhttps://arxiv.org/abs/2211.09800

work page arXiv 2023
[5]

Fireflow: Fast inversionofrectifiedflowforimagesemanticediting

Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, and Fan Tang. Fireflow: Fast inversionofrectifiedflowforimagesemanticediting. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=JFafMSAjUm. 11

work page 2025
[6]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis, 2024. URLhttps://arxiv. org/abs/2403.03206

work page internal anchor Pith review Pith/arXiv arXiv 2024
[7]

Next-gpt: Any- to-any multimodal LLM.CoRR, abs/2309.05519, 2023

Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, and Baining Guo. Instructdiffusion: A generalist modeling interface for vision tasks.CoRR, abs/2309.03895, 2023. doi: 10.48550/arXiv.2309. 03895. URL https://doi.org/10.48550/arXiv.2309.03895

work page doi:10.48550/arxiv.2309 2023
[8]

Prompt-to-Prompt Image Editing with Cross Attention Control

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen- Or. Prompt-to-prompt image editing with cross attention control, 2022. URL https: //arxiv.org/abs/2208.01626

work page internal anchor Pith review Pith/arXiv arXiv 2022
[9]

Delta denoising score

Amir Hertz, Kfir Aberman, and Daniel Cohen-Or. Delta denoising score. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2328–2337, October 2023

work page 2023
[10]

Classifier-free diffusion guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https: //openreview.net/forum?id=qw8AKxfYbI

work page 2021
[11]

Dacapo: Score distillation as stacked bridge for fast and high-quality 3d editing

Yufei Huang, Bangyan Liao, Yuqi Hu, Haitao Lin, Lirong Wu, Siyuan Li, Cheng Tan, Zicheng Liu, Yunfan Liu, Zelin Zang, Chang Yu, and Zhen Lei. Dacapo: Score distillation as stacked bridge for fast and high-quality 3d editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16304–16313, June 2025

work page 2025
[12]

An edit friendly ddpm noise space: Inversion and manipulations, 2024

Inbar Huberman-Spiegelglas, Vladimir Kulikov, and Tomer Michaeli. An edit friendly ddpm noise space: Inversion and manipulations, 2024. URLhttps://arxiv.org/abs/2304.06140

work page arXiv 2024
[13]

arXiv preprint arXiv:2310.01506 , year=

Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion-based editing with 3 lines of code, 2023. URLhttps://arxiv.org/abs/ 2310.01506

work page arXiv 2023
[14]

Noise-free score distil- lation

Oren Katzir, Or Patashnik, Daniel Cohen-Or, and Dani Lischinski. Noise-free score distil- lation. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=dlIMcmlAdk

work page 2024
[15]

Imagic: Text-based real image editing with diffusion models,

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models,

work page
[16]

URL https://arxiv.org/abs/2210.09276

work page arXiv
[17]

Flexiedit: Frequency- aware latent refinement for enhanced non-rigid editing

Gwanhyeong Koo, Sunjae Yoon, {Ji Woo} Hong, and {Chang D.} Yoo. Flexiedit: Frequency- aware latent refinement for enhanced non-rigid editing. In Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol, editors,Computer Vi- sion – ECCV 2024 - 18th European Conference, Proceedings, Lecture Notes in Computer Science (includ...

work page 2024
[18]

doi: 10.1007/978-3-031-73036-8\_21

ISBN 9783031730351. doi: 10.1007/978-3-031-73036-8\_21. Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 18th European Conference on Computer Vision, ECCV 2024 ; Conference date: 29-09-2024 Through 04-10-2024. 12

work page doi:10.1007/978-3-031-73036-8 2025
[19]

Posterior distillation sampling

Juil Koo, Chanho Park, and Minhyuk Sung. Posterior distillation sampling. InCVPR, 2024

work page 2024
[20]

FlowEdit: Inversion-free text-based editing using pre-trained flow models.arXiv preprint arXiv:2412.08629, 2024

Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, and Tomer Michaeli. Flowedit: Inversion-free text-based editing using pre-trained flow models, 2024. URL https://arxiv.org/abs/2412.08629

work page arXiv 2024
[21]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024
[22]

Stylediffusion: Prompt-embedding inversion for text-based editing

Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, and Jian Yang. Stylediffusion: Prompt-embedding inversion for text-based editing. arXiv preprint arXiv:2303.15649, 2023

work page arXiv 2023
[23]

Schedule your edit: A simple yet effective diffusion noise schedule for image editing, 2024

Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong Liu, Feng Tian, Guang Dai, Jingdong Wang, and Qianying Wang. Schedule your edit: A simple yet effective diffusion noise schedule for image editing, 2024. URLhttps://arxiv.org/abs/2410.18756

work page arXiv 2024
[24]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=PqvMRDCJT9t

work page 2023
[25]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023. URL https://openreview.net/ forum?id=XVjTT1nw5z

work page 2023
[26]

Jacobs, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa

David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa. Rethinking score distillation as a bridge between image distributions. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=I8PkICj9kM

work page 2024
[27]

Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models, 2024

Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models, 2024. URLhttps: //arxiv.org/abs/2305.16807

work page arXiv 2024
[28]

arXiv preprint arXiv:2211.09794 , year=

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models, 2022. URLhttps://arxiv. org/abs/2211.09794

work page arXiv 2022
[29]

Contrastive denoising score for text-guided latent diffusion image editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park, and Jong Chul Ye. Contrastive denoising score for text-guided latent diffusion image editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9192–9201, June 2024

work page 2024
[30]

Metaxas, and Yezhou Yang

Maitreya Patel, Song Wen, Dimitris N. Metaxas, and Yezhou Yang. Steering rectified flow models in the vector field for controlled image generation, 2024. URLhttps://arxiv.org/ abs/2412.00100

work page arXiv 2024
[31]

Pixabay. Pixabay. https://pixabay.com/. License: CC0; accessed 15 May 2025

work page 2025
[32]

Barron, and Ben Mildenhall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representations,

work page
[33]

URL https://openreview.net/forum?id=FjNys5c7VyY

work page
[34]

Fatezero: Fusing attentions for zero-shot text-based video editing, 2023

Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, and Qifeng Chen. Fatezero: Fusing attentions for zero-shot text-based video editing, 2023. URL https://arxiv.org/abs/2303.09535. 13

work page arXiv 2023
[35]

High-resolution image synthesis with latent diffusion models, June 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, June 2022

work page 2022
[36]

Semantic image inversion and editing using rectified stochastic differential equations

Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. Semantic image inversion and editing using rectified stochastic differential equations. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=Hu0FSOSEyS

work page 2025
[37]

Fast high-resolution image synthesis with latent adversarial diffusion distillation

Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, and Robin Rombach. Fast high-resolution image synthesis with latent adversarial diffusion distillation. In SIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024

work page 2024
[38]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=St1giarCHLP

work page 2021
[39]

Dual diffusion implicit bridges for image-to-image translation

Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=5HLoTvVGDe

work page 2023
[40]

arXiv preprint arXiv:2211.12572 , year =

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-play diffusion features for text-driven image-to-image translation, 2022. URLhttps://arxiv.org/abs/ 2211.12572

work page arXiv 2022
[41]

Edict: Exact diffusion inversion via coupled transformations, 2022

Bram Wallace, Akash Gokul, and Nikhil Naik. Edict: Exact diffusion inversion via coupled transformations, 2022. URL https://arxiv.org/abs/2211.12446

work page arXiv 2022
[42]

Taming rectified flow for inversion and editing

Jiangshan Wang, Junfu Pu, Zhongang Qi, Jiayi Guo, Yue Ma, Nisha Huang, Yuxin Chen, Xiu Li, and Ying Shan. Taming rectified flow for inversion and editing. InForty-second International Conference on Machine Learning, 2025. URL https://openreview.net/ forum?id=uDreZphNky

work page 2025
[43]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024. URL https://arxiv.org/abs/2409.12191

work page internal anchor Pith review Pith/arXiv arXiv 2024
[44]

Wang, A.C

Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, page 600–612, Apr

work page
[45]

Wang, A.C

doi: 10.1109/tip.2003.819861. URL http://dx.doi.org/10.1109/tip.2003.819861

work page doi:10.1109/tip.2003.819861 2003
[46]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=ppJuFSOAnM

work page 2023
[47]

Godiva: Generating open-domain videos from natural descriptions

Chengdong Wu, Ling-Qiao Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, and Nan Duan. Godiva: Generating open-domain videos from natural descriptions. Apr 2021

work page 2021
[48]

Unveil inversion and invariance in flow transformer for versatile image editing.arXiv preprint arXiv:2411.15843, 2025

Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, and Boyu Wang. Unveil inversion and invariance in flow transformer for versatile image editing, 2025. URLhttps://arxiv.org/ abs/2411.15843. 14

work page arXiv 2025
[49]

Inversion-free image editing with natural language.arXiv preprint arXiv:2312.04965, 2023

Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma, and Joyce Chai. Inversion-free image editing with natural language, 2023. URLhttps://arxiv.org/abs/2312.04965

work page arXiv 2023
[50]

Text-to-image rectified flow as plug-and-play priors

Xiaofeng Yang, Chen Cheng, Xulei Yang, Fayao Liu, and Guosheng Lin. Text-to-image rectified flow as plug-and-play priors. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=SzPZK856iI

work page 2025
[51]

Magicbrush: A manually annotated dataset for instruction-guided image editing

Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A manually annotated dataset for instruction-guided image editing. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=ZsDB2GzsqG

work page 2023
[52]

Quantization and training of neural networks for efficient integer-arithmetic-only inference

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018. doi: 10.1109/cvpr.2018. 00068. URL http://dx.doi.org/10.1109/cvpr.2018.00068

work page doi:10.1109/cvpr.2018 2018
[53]

Zheng, Q., Le, M., Shaul, N., Lipman, Y ., Grover, A., and Chen, R

Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, and Ricky T. Q. Chen. Guided flows for generative modeling and decision making, 2023. URL https: //arxiv.org/abs/2311.13443. 15 A Diffusion Models A.1 Diffusion models background Diffusion models define a forward process that gradually adds Gaussian noise to a clean image (or its latent)x0 a...

work page arXiv 2023
[54]

road closed

uses the time stept to represent the intermediate state of the editing flow. In contrast, in our setup, t refers to the time-step of a given rectified flow model. Additionally, FlowEdit introduces an extra parameternavg, which corresponds to our batch sizeB. Both parameters are used to estimate the expectation in(8). The implicit learning rate used corres...

work page

[1] [1]

Ntire 2017 challenge on single image super-resolution: Dataset and study

Eirikur Agustsson and Radu Timofte. Ntire 2017 challenge on single image super-resolution: Dataset and study. In2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), pages 1122–1131, July 2017. doi: 10.1109/CVPRW.2017.150

work page doi:10.1109/cvprw.2017.150 2017

[2] [2]

Albergo, Nicholas M

Michael S. Albergo, Nicholas M. Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions, 2023. URLhttps://arxiv.org/abs/2303. 08797

work page 2023

[3] [3]

Ledits++: Limitless image editing using text-to- image models, 2024

Manuel Brack, Felix Friedrich, Katharina Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolinário Passos. Ledits++: Limitless image editing using text-to- image models, 2024. URLhttps://arxiv.org/abs/2311.16711

work page arXiv 2024

[4] [4]

Tim Brooks, Aleksander Holynski, and Alexei A. Efros. Instructpix2pix: Learning to follow image editing instructions, 2023. URLhttps://arxiv.org/abs/2211.09800

work page arXiv 2023

[5] [5]

Fireflow: Fast inversionofrectifiedflowforimagesemanticediting

Yingying Deng, Xiangyu He, Changwang Mei, Peisong Wang, and Fan Tang. Fireflow: Fast inversionofrectifiedflowforimagesemanticediting. In Forty-second International Conference on Machine Learning, 2025. URL https://openreview.net/forum?id=JFafMSAjUm. 11

work page 2025

[6] [6]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, Kyle Lacey, Alex Goodwin, Yannik Marek, and Robin Rombach. Scaling rectified flow transformers for high-resolution image synthesis, 2024. URLhttps://arxiv. org/abs/2403.03206

work page internal anchor Pith review Pith/arXiv arXiv 2024

[7] [7]

Next-gpt: Any- to-any multimodal LLM.CoRR, abs/2309.05519, 2023

Zigang Geng, Binxin Yang, Tiankai Hang, Chen Li, Shuyang Gu, Ting Zhang, Jianmin Bao, Zheng Zhang, Han Hu, Dong Chen, and Baining Guo. Instructdiffusion: A generalist modeling interface for vision tasks.CoRR, abs/2309.03895, 2023. doi: 10.48550/arXiv.2309. 03895. URL https://doi.org/10.48550/arXiv.2309.03895

work page doi:10.48550/arxiv.2309 2023

[8] [8]

Prompt-to-Prompt Image Editing with Cross Attention Control

Amir Hertz, Ron Mokady, Jay Tenenbaum, Kfir Aberman, Yael Pritch, and Daniel Cohen- Or. Prompt-to-prompt image editing with cross attention control, 2022. URL https: //arxiv.org/abs/2208.01626

work page internal anchor Pith review Pith/arXiv arXiv 2022

[9] [9]

Delta denoising score

Amir Hertz, Kfir Aberman, and Daniel Cohen-Or. Delta denoising score. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 2328–2337, October 2023

work page 2023

[10] [10]

Classifier-free diffusion guidance

Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. In NeurIPS 2021 Workshop on Deep Generative Models and Downstream Applications, 2021. URL https: //openreview.net/forum?id=qw8AKxfYbI

work page 2021

[11] [11]

Dacapo: Score distillation as stacked bridge for fast and high-quality 3d editing

Yufei Huang, Bangyan Liao, Yuqi Hu, Haitao Lin, Lirong Wu, Siyuan Li, Cheng Tan, Zicheng Liu, Yunfan Liu, Zelin Zang, Chang Yu, and Zhen Lei. Dacapo: Score distillation as stacked bridge for fast and high-quality 3d editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 16304–16313, June 2025

work page 2025

[12] [12]

An edit friendly ddpm noise space: Inversion and manipulations, 2024

Inbar Huberman-Spiegelglas, Vladimir Kulikov, and Tomer Michaeli. An edit friendly ddpm noise space: Inversion and manipulations, 2024. URLhttps://arxiv.org/abs/2304.06140

work page arXiv 2024

[13] [13]

arXiv preprint arXiv:2310.01506 , year=

Xuan Ju, Ailing Zeng, Yuxuan Bian, Shaoteng Liu, and Qiang Xu. Direct inversion: Boosting diffusion-based editing with 3 lines of code, 2023. URLhttps://arxiv.org/abs/ 2310.01506

work page arXiv 2023

[14] [14]

Noise-free score distil- lation

Oren Katzir, Or Patashnik, Daniel Cohen-Or, and Dani Lischinski. Noise-free score distil- lation. In The Twelfth International Conference on Learning Representations, 2024. URL https://openreview.net/forum?id=dlIMcmlAdk

work page 2024

[15] [15]

Imagic: Text-based real image editing with diffusion models,

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models,

work page

[16] [16]

URL https://arxiv.org/abs/2210.09276

work page arXiv

[17] [17]

Flexiedit: Frequency- aware latent refinement for enhanced non-rigid editing

Gwanhyeong Koo, Sunjae Yoon, {Ji Woo} Hong, and {Chang D.} Yoo. Flexiedit: Frequency- aware latent refinement for enhanced non-rigid editing. In Aleš Leonardis, Elisa Ricci, Stefan Roth, Olga Russakovsky, Torsten Sattler, and Gül Varol, editors,Computer Vi- sion – ECCV 2024 - 18th European Conference, Proceedings, Lecture Notes in Computer Science (includ...

work page 2024

[18] [18]

doi: 10.1007/978-3-031-73036-8\_21

ISBN 9783031730351. doi: 10.1007/978-3-031-73036-8\_21. Publisher Copyright: © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.; 18th European Conference on Computer Vision, ECCV 2024 ; Conference date: 29-09-2024 Through 04-10-2024. 12

work page doi:10.1007/978-3-031-73036-8 2025

[19] [19]

Posterior distillation sampling

Juil Koo, Chanho Park, and Minhyuk Sung. Posterior distillation sampling. InCVPR, 2024

work page 2024

[20] [20]

FlowEdit: Inversion-free text-based editing using pre-trained flow models.arXiv preprint arXiv:2412.08629, 2024

Vladimir Kulikov, Matan Kleiner, Inbar Huberman-Spiegelglas, and Tomer Michaeli. Flowedit: Inversion-free text-based editing using pre-trained flow models, 2024. URL https://arxiv.org/abs/2412.08629

work page arXiv 2024

[21] [21]

Flux.https://github.com/black-forest-labs/flux, 2024

Black Forest Labs. Flux.https://github.com/black-forest-labs/flux, 2024

work page 2024

[22] [22]

Stylediffusion: Prompt-embedding inversion for text-based editing

Senmao Li, Joost van de Weijer, Taihang Hu, Fahad Shahbaz Khan, Qibin Hou, Yaxing Wang, and Jian Yang. Stylediffusion: Prompt-embedding inversion for text-based editing. arXiv preprint arXiv:2303.15649, 2023

work page arXiv 2023

[23] [23]

Schedule your edit: A simple yet effective diffusion noise schedule for image editing, 2024

Haonan Lin, Mengmeng Wang, Jiahao Wang, Wenbin An, Yan Chen, Yong Liu, Feng Tian, Guang Dai, Jingdong Wang, and Qianying Wang. Schedule your edit: A simple yet effective diffusion noise schedule for image editing, 2024. URLhttps://arxiv.org/abs/2410.18756

work page arXiv 2024

[24] [24]

Yaron Lipman, Ricky T. Q. Chen, Heli Ben-Hamu, Maximilian Nickel, and Matthew Le. Flow matching for generative modeling. InThe Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=PqvMRDCJT9t

work page 2023

[25] [25]

Flow straight and fast: Learning to generate and transfer data with rectified flow

Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023. URL https://openreview.net/ forum?id=XVjTT1nw5z

work page 2023

[26] [26]

Jacobs, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa

David McAllister, Songwei Ge, Jia-Bin Huang, David W. Jacobs, Alexei A Efros, Aleksander Holynski, and Angjoo Kanazawa. Rethinking score distillation as a bridge between image distributions. In The Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024. URL https://openreview.net/forum?id=I8PkICj9kM

work page 2024

[27] [27]

Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models, 2024

Daiki Miyake, Akihiro Iohara, Yu Saito, and Toshiyuki Tanaka. Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models, 2024. URLhttps: //arxiv.org/abs/2305.16807

work page arXiv 2024

[28] [28]

arXiv preprint arXiv:2211.09794 , year=

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models, 2022. URLhttps://arxiv. org/abs/2211.09794

work page arXiv 2022

[29] [29]

Contrastive denoising score for text-guided latent diffusion image editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park, and Jong Chul Ye. Contrastive denoising score for text-guided latent diffusion image editing. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9192–9201, June 2024

work page 2024

[30] [30]

Metaxas, and Yezhou Yang

Maitreya Patel, Song Wen, Dimitris N. Metaxas, and Yezhou Yang. Steering rectified flow models in the vector field for controlled image generation, 2024. URLhttps://arxiv.org/ abs/2412.00100

work page arXiv 2024

[31] [31]

Pixabay. Pixabay. https://pixabay.com/. License: CC0; accessed 15 May 2025

work page 2025

[32] [32]

Barron, and Ben Mildenhall

Ben Poole, Ajay Jain, Jonathan T. Barron, and Ben Mildenhall. Dreamfusion: Text-to-3d using 2d diffusion. InThe Eleventh International Conference on Learning Representations,

work page

[33] [33]

URL https://openreview.net/forum?id=FjNys5c7VyY

work page

[34] [34]

Fatezero: Fusing attentions for zero-shot text-based video editing, 2023

Chenyang Qi, Xiaodong Cun, Yong Zhang, Chenyang Lei, Xintao Wang, Ying Shan, and Qifeng Chen. Fatezero: Fusing attentions for zero-shot text-based video editing, 2023. URL https://arxiv.org/abs/2303.09535. 13

work page arXiv 2023

[35] [35]

High-resolution image synthesis with latent diffusion models, June 2022

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models, June 2022

work page 2022

[36] [36]

Semantic image inversion and editing using rectified stochastic differential equations

Litu Rout, Yujia Chen, Nataniel Ruiz, Constantine Caramanis, Sanjay Shakkottai, and Wen-Sheng Chu. Semantic image inversion and editing using rectified stochastic differential equations. In The Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=Hu0FSOSEyS

work page 2025

[37] [37]

Fast high-resolution image synthesis with latent adversarial diffusion distillation

Axel Sauer, Frederic Boesel, Tim Dockhorn, Andreas Blattmann, Patrick Esser, and Robin Rombach. Fast high-resolution image synthesis with latent adversarial diffusion distillation. In SIGGRAPH Asia 2024 Conference Papers, pages 1–11, 2024

work page 2024

[38] [38]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. In International Conference on Learning Representations, 2021. URL https://openreview. net/forum?id=St1giarCHLP

work page 2021

[39] [39]

Dual diffusion implicit bridges for image-to-image translation

Xuan Su, Jiaming Song, Chenlin Meng, and Stefano Ermon. Dual diffusion implicit bridges for image-to-image translation. In The Eleventh International Conference on Learning Representations, 2023. URL https://openreview.net/forum?id=5HLoTvVGDe

work page 2023

[40] [40]

arXiv preprint arXiv:2211.12572 , year =

Narek Tumanyan, Michal Geyer, Shai Bagon, and Tali Dekel. Plug-and-play diffusion features for text-driven image-to-image translation, 2022. URLhttps://arxiv.org/abs/ 2211.12572

work page arXiv 2022

[41] [41]

Edict: Exact diffusion inversion via coupled transformations, 2022

Bram Wallace, Akash Gokul, and Nikhil Naik. Edict: Exact diffusion inversion via coupled transformations, 2022. URL https://arxiv.org/abs/2211.12446

work page arXiv 2022

[42] [42]

Taming rectified flow for inversion and editing

Jiangshan Wang, Junfu Pu, Zhongang Qi, Jiayi Guo, Yue Ma, Nisha Huang, Yuxin Chen, Xiu Li, and Ying Shan. Taming rectified flow for inversion and editing. InForty-second International Conference on Machine Learning, 2025. URL https://openreview.net/ forum?id=uDreZphNky

work page 2025

[43] [43]

Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution

Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing Liu, Jialin Wang, Wenbin Ge, Yang Fan, Kai Dang, Mengfei Du, Xuancheng Ren, Rui Men, Dayiheng Liu, Chang Zhou, Jingren Zhou, and Junyang Lin. Qwen2-vl: Enhancing vision-language model’s perception of the world at any resolution, 2024. URL https://arxiv.org/abs/2409.12191

work page internal anchor Pith review Pith/arXiv arXiv 2024

[44] [44]

Wang, A.C

Z. Wang, A.C. Bovik, H.R. Sheikh, and E.P. Simoncelli. Image quality assessment: From error visibility to structural similarity.IEEE Transactions on Image Processing, page 600–612, Apr

work page

[45] [45]

Wang, A.C

doi: 10.1109/tip.2003.819861. URL http://dx.doi.org/10.1109/tip.2003.819861

work page doi:10.1109/tip.2003.819861 2003

[46] [46]

Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation

Zhengyi Wang, Cheng Lu, Yikai Wang, Fan Bao, Chongxuan Li, Hang Su, and Jun Zhu. Prolificdreamer: High-fidelity and diverse text-to-3d generation with variational score distillation. In Thirty-seventh Conference on Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=ppJuFSOAnM

work page 2023

[47] [47]

Godiva: Generating open-domain videos from natural descriptions

Chengdong Wu, Ling-Qiao Huang, Qianxi Zhang, Binyang Li, Lei Ji, Fan Yang, Guillermo Sapiro, and Nan Duan. Godiva: Generating open-domain videos from natural descriptions. Apr 2021

work page 2021

[48] [48]

Unveil inversion and invariance in flow transformer for versatile image editing.arXiv preprint arXiv:2411.15843, 2025

Pengcheng Xu, Boyuan Jiang, Xiaobin Hu, Donghao Luo, Qingdong He, Jiangning Zhang, Chengjie Wang, Yunsheng Wu, Charles Ling, and Boyu Wang. Unveil inversion and invariance in flow transformer for versatile image editing, 2025. URLhttps://arxiv.org/ abs/2411.15843. 14

work page arXiv 2025

[49] [49]

Inversion-free image editing with natural language.arXiv preprint arXiv:2312.04965, 2023

Sihan Xu, Yidong Huang, Jiayi Pan, Ziqiao Ma, and Joyce Chai. Inversion-free image editing with natural language, 2023. URLhttps://arxiv.org/abs/2312.04965

work page arXiv 2023

[50] [50]

Text-to-image rectified flow as plug-and-play priors

Xiaofeng Yang, Chen Cheng, Xulei Yang, Fayao Liu, and Guosheng Lin. Text-to-image rectified flow as plug-and-play priors. InThe Thirteenth International Conference on Learning Representations, 2025. URL https://openreview.net/forum?id=SzPZK856iI

work page 2025

[51] [51]

Magicbrush: A manually annotated dataset for instruction-guided image editing

Kai Zhang, Lingbo Mo, Wenhu Chen, Huan Sun, and Yu Su. Magicbrush: A manually annotated dataset for instruction-guided image editing. In Thirty-seventh Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2023. URL https://openreview.net/forum?id=ZsDB2GzsqG

work page 2023

[52] [52]

Quantization and training of neural networks for efficient integer-arithmetic-only inference

Richard Zhang, Phillip Isola, Alexei A. Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. In2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Jun 2018. doi: 10.1109/cvpr.2018. 00068. URL http://dx.doi.org/10.1109/cvpr.2018.00068

work page doi:10.1109/cvpr.2018 2018

[53] [53]

Zheng, Q., Le, M., Shaul, N., Lipman, Y ., Grover, A., and Chen, R

Qinqing Zheng, Matt Le, Neta Shaul, Yaron Lipman, Aditya Grover, and Ricky T. Q. Chen. Guided flows for generative modeling and decision making, 2023. URL https: //arxiv.org/abs/2311.13443. 15 A Diffusion Models A.1 Diffusion models background Diffusion models define a forward process that gradually adds Gaussian noise to a clean image (or its latent)x0 a...

work page arXiv 2023

[54] [54]

road closed

uses the time stept to represent the intermediate state of the editing flow. In contrast, in our setup, t refers to the time-step of a given rectified flow model. Additionally, FlowEdit introduces an extra parameternavg, which corresponds to our batch sizeB. Both parameters are used to estimate the expectation in(8). The implicit learning rate used corres...

work page