arxiv: 2605.12939 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: unknown

DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport

Xianbing Sun , Jiahui Zhan , Liqing Zhang , Jianfu Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:56 UTC · model grok-4.3

classification 💻 cs.CV

keywords virtual try-onone-step samplingconditional transportdiffusion modelsimage generationgarment preservationefficient inference

0 comments

The pith

Virtual try-on can reach state-of-the-art quality in one sampling step by straightening the conditional transport path.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that virtual try-on generation differs from general image synthesis because the output is tightly constrained by the input person and garment images. This constraint allows the sampling trajectory to be made much straighter than usual. By introducing pure conditional transport, a garment preservation loss, and a self-consistency loss, followed by one-step distillation, the method trains a model that produces high-quality try-on results directly in a single step. This avoids the high cost of multi-step sampling in existing diffusion and flow-based approaches while matching or exceeding their performance.

Core claim

The central discovery is that the deviation from straight paths in try-on comes from the mismatch with pretrained models rather than the task itself, so targeted modifications—pure conditional transport, garment preservation loss, and self-consistency loss—combined with one-step distillation enable accurate one-step virtual try-on.

What carries the argument

Straightened conditional transport achieved through pure conditional transport, garment preservation loss, self-consistency loss, and one-step distillation.

If this is right

High-quality virtual try-on becomes feasible at real-time speeds.
Existing pretrained generative models can be adapted for efficient conditional tasks without full retraining.
Sampling efficiency improves without sacrificing output fidelity in constrained generation settings.
Virtual try-on systems can be deployed on devices with limited compute.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar trajectory straightening may apply to other image-to-image translation tasks with strong conditional constraints.
Future work could explore whether this approach reduces the need for large pretrained models in specific domains.
Testing on diverse body types and garment styles would reveal the limits of the straight-path assumption.

Load-bearing premise

The outputs in virtual try-on are sufficiently constrained by the input conditions that a straight sampling path suffices for high quality.

What would settle it

A direct comparison showing that the one-step outputs are visibly inferior to multi-step outputs from the same model in terms of garment alignment or realism would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.12939 by Jiahui Zhan, Jianfu Zhang, Liqing Zhang, Xianbing Sun.

**Figure 2.** Figure 2: Overview of our framework. Stage 1 trains a teacher model to straighten the conditional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FID curves on VITON-HD under the unpaired evaluation setting. All models are trained in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of garment reconstruction during 30-step inference. The leftmost image is the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Qualitative comparisons on the VITON-HD [Choi et al., 2021] dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative and quantitative ablation results. From left to right, each variant progressively [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Results of our distilled one-step student model using five different Gaussian noise initializa [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative comparison on the DressCode dataset. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗

read the original abstract

Recent diffusion- and flow-based VTON methods achieve strong results with pretrained generative models, but their reliance on multi-step sampling incurs high inference cost, while existing acceleration methods largely overlook the intrinsic structure of the try-on task. In this paper, we highlight a key observation: VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution. However, limited task-specific data makes training from scratch impractical, forcing existing methods to fine-tune pretrained models whose objectives do not encourage such straight conditional trajectories. Thus, the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself. Motivated by this insight, we encourage straighter VTON sampling trajectories through three targeted modifications: pure conditional transport, a garment preservation loss, and a self consistency loss. We further introduce a one-step distillation stage. Extensive experiments show that our method achieves state-of-the-art performance with one-step sampling, establishing a new standard for efficient and high-quality VTON.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows a workable path to one-step VTON by adding targeted losses and distillation to straighten conditional trajectories, but the claim that this straightness is mainly a pretrained-model mismatch lacks a clean control experiment.

read the letter

The main point is that this work gets solid one-step results on virtual try-on by treating the task as heavily conditioned generation and adding three changes: pure conditional transport, a garment preservation loss, and a self-consistency loss, plus a distillation stage. That combination looks new for this specific setting and directly targets inference cost, which matters for real e-commerce use. The abstract makes a clear case that VTON constraints should allow straighter paths than general image generation, and the experiments are presented as confirming SOTA one-step quality. That part is useful and worth checking against existing multi-step and accelerated baselines. The soft spot is the one the stress-test flags. Without a from-scratch run (which the authors say is impractical due to data limits), it is hard to separate whether the gains come from exploiting an inherently straighter conditional manifold or from the new losses simply regularizing fine-tuning better. The paper acknowledges the data constraint, so the gap is understandable, but it leaves the load-bearing premise a bit indirect. I would also like to see explicit trajectory metrics, such as average path curvature or step-wise deviation, rather than relying only on final image scores. This is for people working on efficient conditional generation and fashion applications who need faster sampling without big quality drops. The ideas are grounded enough in the diffusion and flow literature to merit referee time, even if the isolation of the central mechanism needs tightening in revision. I would send it to review rather than desk reject.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces DirectTryOn for one-step virtual try-on (VTON) by straightening conditional transport trajectories in pretrained diffusion/flow models. It claims that VTON's heavy conditioning makes trajectories inherently straighter than in unconditional generation, so the main obstacle is pretrained-model mismatch rather than the task; three modifications (pure conditional transport, garment preservation loss, self-consistency loss) plus one-step distillation are proposed to correct this and achieve SOTA one-step performance.

Significance. If the central claim holds, the work would be significant for efficient VTON by exploiting task-specific trajectory properties to reduce inference from multi-step to single-step sampling while preserving quality, with practical value for real-time applications such as e-commerce.

major comments (1)

[Abstract] Abstract: the load-bearing premise that 'the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself' is not isolated, because no from-scratch baseline (holding architecture and data fixed) is reported despite the acknowledgment that limited task-specific data makes such training impractical; without this control, observed gains cannot be attributed specifically to revealing an intrinsically straighter conditional manifold versus the regularizing effect of the auxiliary losses.

minor comments (1)

The abstract and introduction would benefit from explicit quantitative statements of the step reduction (e.g., from N to 1) and the exact metrics where SOTA is claimed.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the load-bearing premise that 'the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself' is not isolated, because no from-scratch baseline (holding architecture and data fixed) is reported despite the acknowledgment that limited task-specific data makes such training impractical; without this control, observed gains cannot be attributed specifically to revealing an intrinsically straighter conditional manifold versus the regularizing effect of the auxiliary losses.

Authors: We agree that a from-scratch baseline holding architecture and data fixed would provide the cleanest isolation of whether the conditional manifold is intrinsically straighter. As the manuscript already states, however, the scarcity of high-quality paired garment-person data renders training from scratch impractical both in data volume and compute. This constraint is why virtually all recent VTON methods, including strong baselines, start from the same class of pretrained models. Our ablations (Section 4.3) isolate the contribution of each component: ablating pure conditional transport, garment preservation loss, or self-consistency loss individually increases trajectory curvature and degrades one-step FID/LPIPS, while the full combination yields the reported gains. These components are not generic regularizers; they explicitly target the pretrained-conditional mismatch. We also outperform other methods that fine-tune the identical pretrained backbones without straightening. In revision we will expand the abstract and Section 3 to explicitly discuss this limitation and the supporting ablation evidence. revision: partial

standing simulated objections not resolved

A from-scratch baseline is not feasible due to limited task-specific paired data, as already noted in the manuscript.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents a key observation about VTON conditional constraints leading to straighter trajectories as empirical motivation, then introduces three modifications (pure conditional transport, garment preservation loss, self-consistency loss) plus distillation. These are evaluated via experiments on performance metrics without any quoted equations or steps that reduce by construction to inputs, self-citations, or fitted parameters renamed as predictions. No self-definitional loops, uniqueness theorems from authors, or ansatz smuggling appear in the abstract or described chain. The central premise remains an independent claim supported by external benchmarks rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on the assumption that pretrained generative models can be adapted via fine-tuning for straight conditional paths, with no new entities introduced and limited free parameters visible in the abstract.

axioms (1)

domain assumption Pretrained diffusion or flow models can be fine-tuned to produce straighter conditional trajectories for VTON despite their original training objectives.
The abstract states that limited task-specific data forces fine-tuning of pretrained models whose objectives do not encourage straight trajectories.

pith-pipeline@v0.9.0 · 5506 in / 1196 out tokens · 51034 ms · 2026-05-14T19:56:43.605891+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 6 internal anchors

[1]

Single stage virtual try-on via deformable attention flows

Shuai Bai and Huiling Zhou and Zhikang Li and Chang Zhou and Hongxia Yang. Single stage virtual try-on via deformable attention flows. European Conference on Computer Vision. 2022

work page 2022
[2]

Viton-hd: High-resolution virtual try-on via misalignment-aware normalization

Seunghwan Choi and Sunghyun Park and Minsoo Lee and Jaegul Choo. Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

work page 2021
[3]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Morelli, Davide and Fincato, Michele and Cornia, Marcella and Landi, Federico and Cesari, Federico and Cucchiara, Rita , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page
[4]

, title =

Han, Xintong and Wu, Zuxuan and Wu, Zhe and Yu, Ruichi and Davis, Larry S. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page
[5]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Zhou, Ziqian and Liu, Shichao and Han, Xiangyu and Liu, Hao and Ng, Kwan-Yee and Xie, Ting and He, Shimin , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[6]

Advances in Neural Information Processing Systems (NeurIPS) , volume =

Song, Yang and Ermon, Stefano , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

work page
[7]

Towards multi-pose guided virtual try-on network

Haoye Dong and Xiaodan Liang and Xiaohui Shen and Bochao Wang and Hanjiang Lai and Jia Zhu and Zhiting Hu and Jian Yin. Towards multi-pose guided virtual try-on network. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019

work page 2019
[8]

Parser-free virtual try-on via distilling appearance flows

Yuying Ge and Yibing Song and Ruimao Zhang and Chongjian Ge and Wei Liu and Ping Luo. Parser-free virtual try-on via distilling appearance flows. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

work page 2021
[9]

Taming the power of diffusion models for high-quality virtual try-on with appearance flow

Junhong Gou and Siyu Sun and Jianfu Zhang and Jianlou Si and Chen Qian and Liqing Zhang. Taming the power of diffusion models for high-quality virtual try-on with appearance flow. Proceedings of the 31st ACM International Conference on Multimedia. 2023

work page 2023
[10]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

Yang, Hongwen and Zhang, Rongyao and Guo, Xiaonan and Liu, Wei and Zuo, Wangmeng and Luo, Ping , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

work page
[11]

Constructive Theory of Functions of Several Variables: Proceedings of a Conference held at Oberwolfach April 25–May 1, 1976 , pages =

Duchon, Jean , title =. Constructive Theory of Functions of Several Variables: Proceedings of a Conference held at Oberwolfach April 25–May 1, 1976 , pages =. 1977 , publisher =

work page 1976
[12]

Style-based global appearance flow for virtual try-on

Sen He and Yi-Zhe Song and Tao Xiang. Style-based global appearance flow for virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

work page 2022
[13]

High-resolution virtual try-on with misalignment and occlusion-handled conditions

Sangyun Lee and Gyojung Gu and Sunghyun Park and Seunghwan Choi and Jaegul Choo. High-resolution virtual try-on with misalignment and occlusion-handled conditions. European Conference on Computer Vision. 2022

work page 2022
[14]

Dress code: High-resolution multi-category virtual try-on

Davide Morelli and Matteo Fincato and Marcella Cornia and Federico Landi and Fabio Cesari and Rita Cucchiara. Dress code: High-resolution multi-category virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

work page 2022
[15]

European Conference on Computer Vision (ECCV) , pages =

Choi, Youngjin and Kwak, Seunghyun and Lee, Kyungjune and Choi, Hyojin and Shin, Jinwoo , title =. European Conference on Computer Vision (ECCV) , pages =

work page
[16]

Ladi-vton: Latent diffusion textual-inversion enhanced virtual try-on

Davide Morelli and Alberto Baldrati and Giuseppe Cartella and Marcella Cornia and Marco Bertini and Rita Cucchiara. Ladi-vton: Latent diffusion textual-inversion enhanced virtual try-on. Proceedings of the 31st ACM International Conference on Multimedia. 2023

work page 2023
[17]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

Xu, Zhen and Zhang, Jing and Liew, Jun Hao and Yan, Hongdong and Liu, Jianwen and Zhang, Chunyan and Feng, Jiashi and Shou, Mike Zheng , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

work page
[18]

Proceedings of the AAAI Conference on Artificial Intelligence , volume =

Xu, Yuhao and Gu, Tao and Chen, Weifeng and Chen, Aoxue , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

work page
[19]

Generative adversarial networks

Ian Goodfellow and Jean Pouget-Abadie and Mehdi Mirza and Bing Xu and David Warde-Farley and Sherjil Ozair and Aaron Courville and Yoshua Bengio. Generative adversarial networks. Communications of the ACM. 2020

work page 2020
[20]

Denoising diffusion probabilistic models

Jonathan Ho and Ajay Jain and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems. 2020

work page 2020
[21]

High-resolution image synthesis with latent diffusion models

Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer. High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

work page 2022
[22]

Disentangled cycle consistency for highly-realistic virtual try-on

Chongjian Ge and Yibing Song and Yuying Ge and Han Yang and Wei Liu and Ping Luo. Disentangled cycle consistency for highly-realistic virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

work page 2021
[23]

Gp-vton: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning

Zhenyu Xie and Zaiyu Huang and Xin Dong and Fuwei Zhao and Haoye Dong and Xijin Zhang and Feida Zhu and Xiaodan Liang. Gp-vton: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023

work page 2023
[24]

Stableviton: Learning semantic correspondence with latent diffusion model for virtual try-on

Jeongho Kim and Guojung Gu and Minho Park and Sunghyun Park and Jaegul Choo. Stableviton: Learning semantic correspondence with latent diffusion model for virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

work page 2024
[25]

Tryondiffusion: A tale of two unets

Luyang Zhu and Dawei Yang and Tyler Zhu and Fitsum Reda and William Chan and Chitwan Saharia and Mohammad Norouzi and Ira Kemelmacher-Shlizerman. Tryondiffusion: A tale of two unets. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023

work page 2023
[26]

arXiv preprint arXiv:2411.10499 , year =

Jiang, Bin and Hu, Xiaoxiao and Luo, Dongdong and He, Qian and Xu, Chen and Peng, Jing and Fu, Yanwei , title =. arXiv preprint arXiv:2411.10499 , year =

work page arXiv
[27]

Proceedings of the International Conference on Learning Representations (ICLR) , year =

Chong, Zheng and Dong, Xiao and Li, Haoxiang and Zhang, Shiyue and Zhang, Wenqing and Zhang, Xujie and Zhao, Hanqing and Liang, Xiaodan , title =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

work page
[28]

and Cao, J

Sun, K. and Cao, J. and Wang, Q. and Tian, L. and Zhang, X. and Zhuo, L. and Gao, D. , title =. arXiv preprint arXiv:2407.16224 , year =

work page arXiv
[29]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

Peebles, William and Xie, Saining , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

work page
[30]

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , booktitle =

Esser, Patrick and Kulal, Shubham and Blattmann, Andreas and Entezari, Reza and M. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , booktitle =

work page
[31]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis , booktitle =

Podell, Daniel and English, Zana and Lacey, Kenneth and Blattmann, Andreas and Dockhorn, Tobias and M. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis , booktitle =

work page
[32]

Advances in Neural Information Processing Systems (NeurIPS) , year =

Li, Yexin and Zhou, Haoyu and Shang, Weichen and Lin, Runyu and Chen, Xinyu and Ni, Bingbing , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

work page
[33]

IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

Self-correction for human parsing , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2020 , publisher=

work page 2020
[34]

arXiv preprint arXiv:2411.18350 , year=

TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models , author=. arXiv preprint arXiv:2411.18350 , year=

work page arXiv
[35]

arXiv preprint arXiv:2412.08573 (2024)

TryOffAnyone: Tiled Cloth Generation from a Dressed Person , author=. arXiv preprint arXiv:2412.08573 , year=

work page arXiv
[36]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Deepfashion: Powering robust clothes recognition and retrieval with rich annotations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[37]

International conference on machine learning , pages=

Deep unsupervised learning using nonequilibrium thermodynamics , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015
[38]

Denoising Diffusion Implicit Models

Denoising diffusion implicit models , author=. arXiv preprint arXiv:2010.02502 , year=

work page internal anchor Pith review Pith/arXiv arXiv 2010
[39]

Auto-Encoding Variational Bayes

Auto-encoding variational bayes , author=. arXiv preprint arXiv:1312.6114 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[40]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Paint by example: Exemplar-based image editing with diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[41]

International conference on machine learning , pages=

Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

work page 2021
[42]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Ye, Hailin and Zhang, Jing and Liu, Shichao and Han, Xiangyu and Yang, Wei , title =. arXiv preprint arXiv:2308.06721 , year =

work page internal anchor Pith review Pith/arXiv arXiv
[43]

Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=

U-net: Convolutional networks for biomedical image segmentation , author=. Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=. 2015 , organization=

work page 2015
[44]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Towards accurate multi-person pose estimation in the wild , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[45]

Advances in neural information processing systems , volume=

Gans trained by a two time-scale update rule converge to a local nash equilibrium , author=. Advances in neural information processing systems , volume=

work page
[46]

Demystifying MMD GANs

Demystifying mmd gans , author=. arXiv preprint arXiv:1801.01401 , year=

work page internal anchor Pith review arXiv
[47]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

On aliased resizing and surprising subtleties in gan evaluation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[48]

IEEE transactions on image processing , volume=

Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

work page 2004
[49]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

The unreasonable effectiveness of deep features as a perceptual metric , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[50]

IEEE transactions on pattern analysis and machine intelligence , volume=

Image quality assessment: Unifying structure and texture similarity , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

work page 2020
[51]

Decoupled Weight Decay Regularization

Fixing weight decay regularization in adam , author=. arXiv preprint arXiv:1711.05101 , volume=

work page internal anchor Pith review Pith/arXiv arXiv
[52]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

Animate anyone: Consistent and controllable image-to-video synthesis for character animation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

work page
[53]

International Conference on Learning Representations (ICLR) , year=

Flow Matching for Generative Modeling , author=. International Conference on Learning Representations (ICLR) , year=

work page
[54]

2024 , howpublished =

FLUX.1 [dev] , author =. 2024 , howpublished =

work page 2024
[55]

arXiv preprint arXiv:2501.03630 , year=

Mc-vton: Minimal control virtual try-on diffusion transformer , author=. arXiv preprint arXiv:2501.03630 , year=

work page arXiv
[56]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Cat-dm: Controllable accelerated virtual try-on with diffusion model , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[57]

SIGGRAPH Asia 2024 Conference Papers , pages=

Fast high-resolution image synthesis with latent adversarial diffusion distillation , author=. SIGGRAPH Asia 2024 Conference Papers , pages=

work page 2024
[58]

International Conference on Learning Representations (ICLR) , year=

Learning to Generate and Transfer Data with Rectified Flow , author=. International Conference on Learning Representations (ICLR) , year=

work page
[59]

arXiv preprint arXiv:2407.02398 , year=

Consistency flow matching: Defining straight flows with velocity consistency , author=. arXiv preprint arXiv:2407.02398 , year=

work page arXiv
[60]

The Twelfth International Conference on Learning Representations , year=

Instaflow: One step is enough for high-quality diffusion-based text-to-image generation , author=. The Twelfth International Conference on Learning Representations , year=

work page
[61]

Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) , year=

Optimal Flow Matching: Learning Straight Trajectories in Just One Step , author=. Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) , year=

work page
[62]

Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS) , year=

Blockwise Flow Matching: Improving Flow Matching Models for Efficient High-Quality Generation , author=. Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS) , year=

work page
[63]

Advances in Neural Information Processing Systems , year=

Attention Is All You Need , author=. Advances in Neural Information Processing Systems , year=

work page
[64]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

Texture-preserving diffusion models for high-fidelity virtual try-on , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[65]

arXiv preprint arXiv:2508.17614 , year=

JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on , author=. arXiv preprint arXiv:2508.17614 , year=

work page arXiv
[66]

arXiv e-prints , pages=

Ds-vton: High-quality virtual try-on via disentangled dual-scale generation , author=. arXiv e-prints , pages=

work page
[67]

Advances in Neural Information Processing Systems , volume=

Perflow: Piecewise rectified flow as universal plug-and-play accelerator , author=. Advances in Neural Information Processing Systems , volume=

work page
[68]

Classifier-Free Diffusion Guidance

Classifier-free diffusion guidance , author=. arXiv preprint arXiv:2207.12598 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[69]

Consistency models , author=

work page
[70]

Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

Any2anytryon: Leveraging adaptive position embeddings for versatile virtual clothing tasks , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

work page
[71]

Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

Densepose: Dense human pose estimation in the wild , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

work page
[72]

Proceedings of the European conference on computer vision (ECCV) , pages=

Instance-level human parsing via part grouping network , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

work page
[73]

IEEE transactions on pattern analysis and machine intelligence , volume=

Openpose: Realtime multi-person 2d pose estimation using part affinity fields , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2019 , publisher=

work page 2019