Recognition: unknown
From Competition to Coopetition: Coopetitive Training-Free Image Editing Based on Text Guidance
Pith reviewed 2026-05-10 08:08 UTC · model grok-4.3
The pith
CoEdit replaces competitive attention control with coopetitive negotiation between editing and reconstruction branches to reduce semantic conflicts in text-guided image editing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By shifting from independent competitive optimization of editing and reconstruction objectives to a coopetitive framework that negotiates attention through measured entropic interactions, CoEdit produces more harmonious edits across both space and the denoising trajectory while preserving source structure.
What carries the argument
Dual-Entropy Attention Manipulation, which quantifies directional entropic interactions between the editing and reconstruction branches to recast attention control as a harmony-maximization problem.
If this is right
- Editable and preservable regions become more accurately localized because attention is negotiated rather than fought over.
- Semantic drift is reduced across the full denoising sequence because latent states are refined at every step using the same entropic harmony signal.
- A single composite metric now jointly scores how well the edit succeeds and how faithfully the background is retained.
- The method remains fully training-free and zero-shot, inheriting the practical advantages of prior diffusion-based editors while addressing their coordination failure.
Where Pith is reading between the lines
- The same negotiation logic could be tested on video or 3D diffusion models where temporal consistency is even harder to maintain under competing objectives.
- Other attention-heavy generative tasks, such as layout-conditioned synthesis or prompt interpolation, might benefit from recasting their internal objectives as explicit harmony problems.
- If the entropic quantification proves stable across different diffusion backbones, the approach offers a lightweight plug-in module rather than a full architectural overhaul.
Load-bearing premise
Directional entropic interactions between the two branches can be quantified in a way that reliably converts attention control into harmony maximization without creating fresh semantic conflicts or needing per-image tuning.
What would settle it
Applying CoEdit to the same editing benchmarks and finding that its output images score lower than strong competitive baselines on both semantic alignment with the target prompt and structural similarity to the source image.
Figures
read the original abstract
Text-guided image editing, a pivotal task in modern multimedia content creation, has seen remarkable progress with training-free methods that eliminate the need for additional optimization. Despite recent progress, existing methods are typically constrained by a competitive paradigm in which the editing and reconstruction branches are independently driven by their respective objectives to maximize alignment with target and source prompts. The adversarial strategy causes semantic conflicts and unpredictable outcomes due to the lack of coordination between branches. To overcome these issues, we propose Coopetitive Training-Free Image Editing (CoEdit), a novel zero-shot framework that transforms attention control from competition to coopetitive negotiation, achieving editing harmony across spatial and temporal dimensions. Spatially, CoEdit introduces Dual-Entropy Attention Manipulation, which quantifies directional entropic interactions between branches to reformulate attention control as a harmony-maximization problem, eventually improving the localization of editable and preservable regions. Temporally, we present Entropic Latent Refinement mechanism to dynamically adjust latent representations over time, minimizing accumulated editing errors and ensuring consistent semantic transitions throughout the denoising trajectory. Additionally, we propose the Fidelity-Constrained Editing Score, a composite metric that jointly evaluates semantic editing and background fidelity. Extensive experiments on standard benchmarks demonstrate that CoEdit achieves superior performance in both editing quality and structural preservation, enhancing multimedia information utilization by enabling more effective interaction between visual and textual modalities. The code will be available at https://github.com/JinhaoShen/CoEdit.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes CoEdit, a zero-shot training-free framework for text-guided image editing that reframes attention control as coopetitive negotiation rather than competition between editing and reconstruction branches. It introduces Dual-Entropy Attention Manipulation to quantify directional entropic interactions for spatial harmony maximization and improved localization, Entropic Latent Refinement to adjust latents temporally for consistent denoising trajectories, and the Fidelity-Constrained Editing Score as a composite metric for semantic editing and background fidelity. The authors claim superior editing quality and structural preservation on standard benchmarks.
Significance. If the entropy-based mechanisms are shown to deliver the claimed harmony without new conflicts or tuning, the work could meaningfully advance training-free diffusion editing by reducing adversarial branch interactions, with potential benefits for multimedia applications requiring precise yet faithful edits. The introduction of directional entropy quantification and a joint fidelity metric are distinctive if empirically grounded.
major comments (3)
- [§3.2] §3.2 (Dual-Entropy Attention Manipulation): the reformulation of attention control as a harmony-maximization problem via directional entropic interactions is presented without a derivation or analysis showing that the resulting weights avoid prompt-dependent scales or new semantic conflicts; this is load-bearing for the central coopetitive claim.
- [§4] §4 (Experiments): superiority in editing quality and structural preservation is asserted, yet no quantitative tables, ablation results on the entropy terms, or direct comparisons to prior attention-control baselines are referenced in sufficient detail to evaluate the benchmark gains.
- [§3.3] §3.3 (Entropic Latent Refinement): the mechanism for dynamically adjusting latents to minimize accumulated errors is described at a high level; it is unclear whether the entropy weighting is parameter-free or requires implicit per-prompt calibration, undermining the training-free guarantee.
minor comments (2)
- Notation for the entropy terms (e.g., directional interaction definitions) should be explicitly tied to the attention maps for reproducibility.
- The abstract states code will be released, but the manuscript should include a reproducibility statement or pseudocode for the two proposed mechanisms.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments. We address each major point below and outline revisions that will strengthen the manuscript's rigor and clarity.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Dual-Entropy Attention Manipulation): the reformulation of attention control as a harmony-maximization problem via directional entropic interactions is presented without a derivation or analysis showing that the resulting weights avoid prompt-dependent scales or new semantic conflicts; this is load-bearing for the central coopetitive claim.
Authors: We agree that an explicit derivation would better substantiate the coopetitive claim. The Dual-Entropy Attention Manipulation is motivated by quantifying directional entropic interactions to achieve spatial harmony, with empirical results across diverse prompts supporting stability. In the revised manuscript, we will add a dedicated analysis subsection deriving the weight normalization properties, proving scale-invariance, and demonstrating that the formulation avoids introducing new semantic conflicts through bounded entropy terms. revision: yes
-
Referee: [§4] §4 (Experiments): superiority in editing quality and structural preservation is asserted, yet no quantitative tables, ablation results on the entropy terms, or direct comparisons to prior attention-control baselines are referenced in sufficient detail to evaluate the benchmark gains.
Authors: We acknowledge that the experimental section would benefit from greater detail and explicit referencing. While the manuscript reports quantitative evaluations on standard benchmarks and includes initial ablations, we will expand §4 with full quantitative tables, comprehensive ablation studies isolating the entropy terms, and direct side-by-side comparisons to prior attention-control baselines (e.g., Prompt-to-Prompt, Attend-and-Excite) to clearly demonstrate the benchmark gains. revision: yes
-
Referee: [§3.3] §3.3 (Entropic Latent Refinement): the mechanism for dynamically adjusting latents to minimize accumulated errors is described at a high level; it is unclear whether the entropy weighting is parameter-free or requires implicit per-prompt calibration, undermining the training-free guarantee.
Authors: The Entropic Latent Refinement is designed to be fully parameter-free: entropy weights are computed dynamically from latent statistics and attention maps at each timestep with no per-prompt calibration or tunable hyperparameters. We will revise §3.3 to include explicit pseudocode and a step-by-step explanation confirming the absence of any calibration, thereby reinforcing the training-free guarantee. revision: yes
Circularity Check
No significant circularity; new entropy-based mechanisms are introduced independently of fitted inputs or self-referential definitions.
full rationale
The abstract and described framework propose Dual-Entropy Attention Manipulation and Entropic Latent Refinement as novel reformulations that quantify directional interactions to achieve harmony maximization. No load-bearing equations, self-citations, or reductions to prior fitted parameters are evident in the provided text. The central claims add independent controls for spatial-temporal coordination rather than deriving predictions from the same inputs by construction. This matches the reader's assessment and qualifies as a normal non-finding under the guidelines (score 0-2).
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Kt-gan: Knowledge- transfer generative adversarial network for text-to-image synthesis,
H. Tan, X. Liu, M. Liu, B. Yin, and X. Li, “Kt-gan: Knowledge- transfer generative adversarial network for text-to-image synthesis,” IEEE Transactions on Image Processing, vol. 30, pp. 1275–1290, 2021
2021
-
[2]
Compositional inversion for stable diffusion models,
X. Zhang, X.-Y . Wei, J. Wu, T. Zhang, Z. Zhang, Z. Lei, and Q. Li, “Compositional inversion for stable diffusion models,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 7, 2024, pp. 7350–7358
2024
-
[3]
Enhanced generative structure prior for chinese text image super-resolution,
X. Li, W. Zuo, and C. C. Loy, “Enhanced generative structure prior for chinese text image super-resolution,”IEEE Transactions on Pattern Analysis and Machine Intelligence, pp. 1–16, 2025
2025
-
[4]
Textir: A simple framework for text-based editable image restoration,
Y . Bai, C. Wang, S. Xie, C. Dong, C. Yuan, and Z. Wang, “Textir: A simple framework for text-based editable image restoration,”IEEE Transactions on Visualization and Computer Graphics, vol. 31, no. 10, pp. 7549–7564, 2025
2025
-
[5]
Coaching the exploration and exploitation in active learning for interactive video retrieval,
Z.-Q. Y . Xiao-Yong Wei, “Coaching the exploration and exploitation in active learning for interactive video retrieval,”IEEE Transactions on Image Processing, vol. 22, no. 3, pp. 955–968, 2013
2013
-
[6]
Prior knowledge integration via llm encoding and pseudo event regulation for video moment retrieval,
Y . Jiang, W. Zhang, X. Zhang, X.-Y . Wei, C. W. Chen, and Q. Li, “Prior knowledge integration via llm encoding and pseudo event regulation for video moment retrieval,” inProceedings of the 32nd ACM International Conference on Multimedia, 2024, pp. 7249–7258
2024
-
[7]
Mining in-class social networks for large- scale pedagogical analysis,
X.-Y . Wei and Z.-Q. Yang, “Mining in-class social networks for large- scale pedagogical analysis,” inProceedings of the 20th ACM interna- tional conference on Multimedia, 2012
2012
-
[8]
Lightweight text- driven image editing with disentangled content and attributes,
B. Li, X. Lin, B. Liu, Z.-F. He, and Y .-K. Lai, “Lightweight text- driven image editing with disentangled content and attributes,”IEEE Transactions on Multimedia, vol. 26, pp. 1829–1841, 2024
2024
-
[9]
Mmginpainting: Multi-modality guided image inpainting based on diffusion models,
C. Zhang, W. Yang, X. Li, and H. Han, “Mmginpainting: Multi-modality guided image inpainting based on diffusion models,”IEEE Transactions on Multimedia, vol. 26, pp. 8811–8823, 2024
2024
-
[10]
Trame: Trajectory-anchored multi-view editing for text-guided 3d gaussian manipulation,
C. Luo, D. Di, X. Yang, Y . Ma, Z. Xue, W. Chen, X. Gou, and Y . Liu, “Trame: Trajectory-anchored multi-view editing for text-guided 3d gaussian manipulation,”IEEE Transactions on Multimedia, vol. 27, pp. 2886–2898, 2025
2025
-
[11]
Box it to bind it: Unified layout control and attribute binding in text-to-image diffusion models,
A. Taghipour, M. Ghahremani, M. Bennamoun, A. M. Rekavandi, H. Laga, and F. Boussaid, “Box it to bind it: Unified layout control and attribute binding in text-to-image diffusion models,”IEEE Transactions on Multimedia, pp. 1–15, 2025
2025
-
[12]
Detailed object description with controllable dimensions,
X. Wang, H. Zhang, B. Li, K. Liang, H. Sun, Z. He, Z. Ma, and J. Guo, “Detailed object description with controllable dimensions,”IEEE Transactions on Multimedia, pp. 1–13, 2025
2025
-
[13]
Denoising Diffusion Implicit Models
J. Song, C. Meng, and S. Ermon, “Denoising diffusion implicit models,” arXiv preprint arXiv:2010.02502, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2010
-
[14]
Prompt-to-prompt image editing with cross-attention control,
A. Hertz, R. Mokady, J. Tenenbaum, K. Aberman, Y . Pritch, and D. Cohen-Or, “Prompt-to-prompt image editing with cross-attention control,” inInternational Conference on Learning Representations, 2023
2023
-
[15]
Plug-and-play diffusion features for text-driven image-to-image translation,
N. Tumanyan, M. Geyer, S. Bagon, and T. Dekel, “Plug-and-play diffusion features for text-driven image-to-image translation,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 1921–1930
2023
-
[16]
h-edit: Effective and flex- ible diffusion-based editing via doob’s h-transform,
T. Nguyen, K. Do, D. Kieu, and T. Nguyen, “h-edit: Effective and flex- ible diffusion-based editing via doob’s h-transform,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 28 490–28 501
2025
-
[17]
Don’t forget your inverse ddim for image editing,
G. Gomez-Trenado, P. Mesejo, O. Cord ´on, and S. Lathuili `ere, “Don’t forget your inverse ddim for image editing,”IEEE Computational Intelligence Magazine, vol. 20, no. 3, p. 10–18, 2025
2025
-
[18]
Text-to-image rectified flow as plug-and-play priors,
X. Yang, C. Cheng, X. Yang, F. Liu, and G. Lin, “Text-to-image rectified flow as plug-and-play priors,” inInternational Conference on Learning Representations, 2025
2025
-
[19]
Postedit: Posterior sampling for efficient zero-shot image editing,
F. Tian, Y . Li, Y . Yan, S. Guan, Y . Ge, and X. Yang, “Postedit: Posterior sampling for efficient zero-shot image editing,”International Conference on Learning Representations, 2025
2025
-
[20]
Inversion-free image editing with language-guided diffusion models,
S. Xu, Y . Huang, J. Pan, Z. Ma, and J. Chai, “Inversion-free image editing with language-guided diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 9452–9461
2024
-
[21]
Custom-edit: Text- guided image editing with customized diffusion models,
J. Choi, Y . Choi, Y . Kim, J. Kim, and S. Yoon, “Custom-edit: Text- guided image editing with customized diffusion models,”arXiv preprint arXiv:2305.15779, 2023
-
[22]
Focus on your instruction: Fine-grained and multi- instruction image editing by attention modulation,
Q. Guo and T. Lin, “Focus on your instruction: Fine-grained and multi- instruction image editing by attention modulation,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 6986–6996
2024
-
[23]
Revisiting efficient semantic segmentation: Learning offsets for better spatial and class feature alignment,
S.-C. Zhang, Y . Li, Y .-H. Wu, Q. Hou, and M.-M. Cheng, “Revisiting efficient semantic segmentation: Learning offsets for better spatial and class feature alignment,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2025, pp. 22 361–22 371
2025
-
[24]
Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing,
M. Cao, X. Wang, Z. Qi, Y . Shan, X. Qie, and Y . Zheng, “Masactrl: Tuning-free mutual self-attention control for consistent image synthesis and editing,” inProceedings of the IEEE/CVF International Conference on Computer Vision, October 2023, pp. 22 560–22 570
2023
-
[25]
Consistent video inpainting using axial attention-based style transformer,
M. S. Junayed and M. B. Islam, “Consistent video inpainting using axial attention-based style transformer,”IEEE Transactions on Multimedia, vol. 25, pp. 7494–7504, 2023
2023
-
[26]
Art image inpainting with style-guided dual-branch inpainting network,
Q. Wang, Z. Wang, X. Zhang, and G. Feng, “Art image inpainting with style-guided dual-branch inpainting network,”IEEE Transactions on Multimedia, vol. 26, pp. 8026–8037, 2024
2024
-
[27]
Multi-channel attention selection gans for guided image-to-image translation,
H. Tang, P. H. Torr, and N. Sebe, “Multi-channel attention selection gans for guided image-to-image translation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 45, no. 5, pp. 6055–6071, 2023
2023
-
[28]
Toward interactive image inpainting via robust sketch refinement,
C. Liu, S. Xu, J. Peng, K. Zhang, and D. Liu, “Toward interactive image inpainting via robust sketch refinement,”IEEE Transactions on Multimedia, vol. 26, pp. 9973–9987, 2024
2024
-
[29]
Weighted feature fusion of con- volutional neural network and graph attention network for hyperspectral image classification,
Y . Dong, Q. Liu, B. Du, and L. Zhang, “Weighted feature fusion of con- volutional neural network and graph attention network for hyperspectral image classification,”IEEE Transactions on Image Processing, vol. 31, pp. 1559–1572, 2022
2022
-
[30]
Tuning-free inversion-enhanced control for consistent image editing,
X. Duan, S. Cui, G. Kang, B. Zhang, Z. Fei, M. Fan, and J. Huang, “Tuning-free inversion-enhanced control for consistent image editing,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 2, 2024, pp. 1644–1652
2024
-
[31]
Effective real image editing with accelerated iterative diffusion inversion,
Z. Pan, R. Gherardi, X. Xie, and S. Huang, “Effective real image editing with accelerated iterative diffusion inversion,” inProceedings of the IEEE/CVF International Conference on Computer Vision, Oct 2023, pp. 15 912–15 921
2023
-
[32]
An edit-friendly ddpm noise space: Inversion and manipulations,
I. Huberman-Spiegelglas, V . Kulikov, and T. Michaeli, “An edit-friendly ddpm noise space: Inversion and manipulations,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2024, pp. 1921–1930
2024
-
[33]
Towards efficient diffusion-based image editing with instant attention masks,
S. Zou, J. Tang, Y . Zhou, J. He, C. Zhao, R. Zhang, Z. Hu, and X. Sun, “Towards efficient diffusion-based image editing with instant attention masks,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 7, 2024, pp. 7864–7872
2024
-
[34]
Enhanced multi-scale cross-attention for person image generation,
H. Tang, L. Shao, N. Sebe, and L. Van Gool, “Enhanced multi-scale cross-attention for person image generation,”IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 47, no. 5, pp. 3377– 3393, 2025
2025
-
[35]
Swiftedit: Lightning fast text-guided image editing via one-step diffusion,
T. Nguyen, Q. Nguyen, K. Nguyen, A. Tran, and C. Pham, “Swiftedit: Lightning fast text-guided image editing via one-step diffusion,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2025, pp. 21 492–21 501
2025
-
[36]
Baret: Balanced attention based real image editing driven by target- text inversion,
Y . Qiao, F. Wang, J. Su, Y . Zhang, Y . Yu, S. Wu, and G.-J. Qi, “Baret: Balanced attention based real image editing driven by target- text inversion,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 5, 2023, pp. 4560–4568
2023
-
[37]
Y . Song, P. Dhariwal, M. Chen, and I. Sutskever, “Consistency models,” arXiv preprint arXiv:2303.01469, 2023
work page internal anchor Pith review arXiv 2023
-
[38]
Improved techniques for training consistency models
Y . Song and P. Dhariwal, “Improved techniques for training consistency models,”arXiv preprint arXiv:2310.14189, 2023
-
[39]
F.-Y . Wang, Z. Huang, X. Shi, W. Bian, G. Song, Y . Liu, and H. Li, “Animatelcm: Accelerating the animation of personalized diffusion mod- els and adapters with decoupled consistency learning,”arXiv preprint arXiv:2402.00769, 2024
-
[40]
In- vertible consistency distillation for text-guided image editing in around 7 steps,
N. Starodubcev, M. Khoroshikh, A. Babenko, and D. Baranchuk, “In- vertible consistency distillation for text-guided image editing in around 7 steps,” inNeurIPS, vol. 37, 2024, pp. 12 496–12 527
2024
-
[41]
Scott: Accelerating diffusion models with stochastic consistency distillation,
H. Liu, Q. Xie, T. Ye, Z. Deng, C. Chen, S. Tang, X. Fu, H. Lu, and Z. Zha, “Scott: Accelerating diffusion models with stochastic consistency distillation,”arXiv preprint arXiv:2403.01505, 2024
-
[42]
Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference
S. Luo, Y . Tan, L. Huang, J. Li, and H. Zhao, “Latent consistency models: Synthesizing high-resolution images with few-step inference,” arXiv preprint arXiv:2310.04378, 2023
work page internal anchor Pith review arXiv 2023
-
[43]
Rethinking score distillation as a bridge between image distributions,
D. McAllister, S. Ge, J.-B. Huang, D. Jacobs, A. Efros, Alexei amao2025tuningfreend Holynski, and A. Kanazawa, “Rethinking score distillation as a bridge between image distributions,” inAdvances in Neural Information Processing Systems, vol. 37, 2024, pp. 33 779– 33 804
2024
-
[44]
Delta denoising score,
A. Hertz, K. Aberman, and D. Cohen-Or, “Delta denoising score,” in Proceedings of the IEEE/CVF International Conference on Computer Vision, October 2023, pp. 2328–2337. 11
2023
-
[45]
Adversarial diffusion distillation,
A. Sauer, D. Lorenz, A. Blattmann, and R. Rombach, “Adversarial diffusion distillation,” inEuropean Conference on Computer Vision, 2024, pp. 87–103
2024
-
[46]
Improved zero-shot image editing via null-toon and directed delta denoising score,
M. A. N. Islam Fahim and J. Boutellier, “Improved zero-shot image editing via null-toon and directed delta denoising score,” inInternational Conference on Pattern Recognition, ser. Lecture Notes in Computer Science, Dec 2024, vol. 15306, pp. 309–323
2024
-
[47]
Qsd: Query-selection denoising score for image editing in latent diffusion model,
J. Hwang, C. Lim, and W. Lee, “Qsd: Query-selection denoising score for image editing in latent diffusion model,” inEuropean Conference on Computer Vision, 2025, pp. 229–243
2025
-
[48]
Dreamsteerer: Enhancing source image conditioned editability using personalized diffusion models,
Z. Yu, Z. Yang, and J. Zhang, “Dreamsteerer: Enhancing source image conditioned editability using personalized diffusion models,” inAd- vances in Neural Information Processing Systems, vol. 37, 2024, pp. 120 699–120 734
2024
-
[49]
Pnp inversion: Boosting diffusion-based editing with 3 lines of code,
X. Ju, A. Zeng, Y . Bian, S. Liu, and Q. Xu, “Pnp inversion: Boosting diffusion-based editing with 3 lines of code,” inInternational Conference on Learning Representations, 2024
2024
-
[50]
https://huggingface.co/datasets/ub-cvml- group/pie bench pp,
PIEBench++, “https://huggingface.co/datasets/ub-cvml- group/pie bench pp,” 2024
2024
-
[51]
Zero- shot image-to-image translation,
G. Parmar, K. Kumar Singh, R. Zhang, Y . Li, J. Lu, and J.-Y . Zhu, “Zero- shot image-to-image translation,” inConf. ACM SIGGRAPH, 2023, pp. 1–11
2023
-
[52]
Null- text inversion for editing real images using guided diffusion models,
R. Mokady, A. Hertz, K. Aberman, Y . Pritch, and D. Cohen-Or, “Null- text inversion for editing real images using guided diffusion models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 6038–6047
2023
-
[53]
Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models,
D. Miyake, A. Iohara, Y . Saito, and T. Tanaka, “Negative-prompt inversion: Fast image inversion for editing with text-guided diffusion models,” inIEEE/CVF Winter Conference on Applications of Computer Vision, 2025, pp. 2063–2072
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.