pith. machine review for the scientific record. sign in

arxiv: 2605.12939 · v1 · submitted 2026-05-13 · 💻 cs.CV

Recognition: unknown

DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport

Authors on Pith no claims yet

Pith reviewed 2026-05-14 19:56 UTC · model grok-4.3

classification 💻 cs.CV
keywords virtual try-onone-step samplingconditional transportdiffusion modelsimage generationgarment preservationefficient inference
0
0 comments X

The pith

Virtual try-on can reach state-of-the-art quality in one sampling step by straightening the conditional transport path.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that virtual try-on generation differs from general image synthesis because the output is tightly constrained by the input person and garment images. This constraint allows the sampling trajectory to be made much straighter than usual. By introducing pure conditional transport, a garment preservation loss, and a self-consistency loss, followed by one-step distillation, the method trains a model that produces high-quality try-on results directly in a single step. This avoids the high cost of multi-step sampling in existing diffusion and flow-based approaches while matching or exceeding their performance.

Core claim

The central discovery is that the deviation from straight paths in try-on comes from the mismatch with pretrained models rather than the task itself, so targeted modifications—pure conditional transport, garment preservation loss, and self-consistency loss—combined with one-step distillation enable accurate one-step virtual try-on.

What carries the argument

Straightened conditional transport achieved through pure conditional transport, garment preservation loss, self-consistency loss, and one-step distillation.

If this is right

  • High-quality virtual try-on becomes feasible at real-time speeds.
  • Existing pretrained generative models can be adapted for efficient conditional tasks without full retraining.
  • Sampling efficiency improves without sacrificing output fidelity in constrained generation settings.
  • Virtual try-on systems can be deployed on devices with limited compute.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar trajectory straightening may apply to other image-to-image translation tasks with strong conditional constraints.
  • Future work could explore whether this approach reduces the need for large pretrained models in specific domains.
  • Testing on diverse body types and garment styles would reveal the limits of the straight-path assumption.

Load-bearing premise

The outputs in virtual try-on are sufficiently constrained by the input conditions that a straight sampling path suffices for high quality.

What would settle it

A direct comparison showing that the one-step outputs are visibly inferior to multi-step outputs from the same model in terms of garment alignment or realism would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.12939 by Jiahui Zhan, Jianfu Zhang, Liqing Zhang, Xianbing Sun.

Figure 1
Figure 1. Figure 1: Comparison between general image generation and virtual try-on from the perspective [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our framework. Stage 1 trains a teacher model to straighten the conditional [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FID curves on VITON-HD under the unpaired evaluation setting. All models are trained in [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of garment reconstruction during 30-step inference. The leftmost image is the [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Qualitative comparisons on the VITON-HD [Choi et al., 2021] dataset. [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Qualitative and quantitative ablation results. From left to right, each variant progressively [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Results of our distilled one-step student model using five different Gaussian noise initializa [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Qualitative comparison on the DressCode dataset. [PITH_FULL_IMAGE:figures/full_fig_p017_8.png] view at source ↗
read the original abstract

Recent diffusion- and flow-based VTON methods achieve strong results with pretrained generative models, but their reliance on multi-step sampling incurs high inference cost, while existing acceleration methods largely overlook the intrinsic structure of the try-on task. In this paper, we highlight a key observation: VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution. However, limited task-specific data makes training from scratch impractical, forcing existing methods to fine-tune pretrained models whose objectives do not encourage such straight conditional trajectories. Thus, the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself. Motivated by this insight, we encourage straighter VTON sampling trajectories through three targeted modifications: pure conditional transport, a garment preservation loss, and a self consistency loss. We further introduce a one-step distillation stage. Extensive experiments show that our method achieves state-of-the-art performance with one-step sampling, establishing a new standard for efficient and high-quality VTON.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript introduces DirectTryOn for one-step virtual try-on (VTON) by straightening conditional transport trajectories in pretrained diffusion/flow models. It claims that VTON's heavy conditioning makes trajectories inherently straighter than in unconditional generation, so the main obstacle is pretrained-model mismatch rather than the task; three modifications (pure conditional transport, garment preservation loss, self-consistency loss) plus one-step distillation are proposed to correct this and achieve SOTA one-step performance.

Significance. If the central claim holds, the work would be significant for efficient VTON by exploiting task-specific trajectory properties to reduce inference from multi-step to single-step sampling while preserving quality, with practical value for real-time applications such as e-commerce.

major comments (1)
  1. [Abstract] Abstract: the load-bearing premise that 'the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself' is not isolated, because no from-scratch baseline (holding architecture and data fixed) is reported despite the acknowledgment that limited task-specific data makes such training impractical; without this control, observed gains cannot be attributed specifically to revealing an intrinsically straighter conditional manifold versus the regularizing effect of the auxiliary losses.
minor comments (1)
  1. The abstract and introduction would benefit from explicit quantitative statements of the step reduction (e.g., from N to 1) and the exact metrics where SOTA is claimed.

Simulated Author's Rebuttal

1 responses · 1 unresolved

We thank the referee for the constructive feedback. We address the major comment below.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the load-bearing premise that 'the deviation from an ideal straight path mainly comes from the mismatch between pretrained base models and the conditional nature of try-on generation, rather than from the task itself' is not isolated, because no from-scratch baseline (holding architecture and data fixed) is reported despite the acknowledgment that limited task-specific data makes such training impractical; without this control, observed gains cannot be attributed specifically to revealing an intrinsically straighter conditional manifold versus the regularizing effect of the auxiliary losses.

    Authors: We agree that a from-scratch baseline holding architecture and data fixed would provide the cleanest isolation of whether the conditional manifold is intrinsically straighter. As the manuscript already states, however, the scarcity of high-quality paired garment-person data renders training from scratch impractical both in data volume and compute. This constraint is why virtually all recent VTON methods, including strong baselines, start from the same class of pretrained models. Our ablations (Section 4.3) isolate the contribution of each component: ablating pure conditional transport, garment preservation loss, or self-consistency loss individually increases trajectory curvature and degrades one-step FID/LPIPS, while the full combination yields the reported gains. These components are not generic regularizers; they explicitly target the pretrained-conditional mismatch. We also outperform other methods that fine-tune the identical pretrained backbones without straightening. In revision we will expand the abstract and Section 3 to explicitly discuss this limitation and the supporting ablation evidence. revision: partial

standing simulated objections not resolved
  • A from-scratch baseline is not feasible due to limited task-specific paired data, as already noted in the manuscript.

Circularity Check

0 steps flagged

No significant circularity detected in derivation chain

full rationale

The paper presents a key observation about VTON conditional constraints leading to straighter trajectories as empirical motivation, then introduces three modifications (pure conditional transport, garment preservation loss, self-consistency loss) plus distillation. These are evaluated via experiments on performance metrics without any quoted equations or steps that reduce by construction to inputs, self-citations, or fitted parameters renamed as predictions. No self-definitional loops, uniqueness theorems from authors, or ansatz smuggling appear in the abstract or described chain. The central premise remains an independent claim supported by external benchmarks rather than internal redefinition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on the assumption that pretrained generative models can be adapted via fine-tuning for straight conditional paths, with no new entities introduced and limited free parameters visible in the abstract.

axioms (1)
  • domain assumption Pretrained diffusion or flow models can be fine-tuned to produce straighter conditional trajectories for VTON despite their original training objectives.
    The abstract states that limited task-specific data forces fine-tuning of pretrained models whose objectives do not encourage straight trajectories.

pith-pipeline@v0.9.0 · 5506 in / 1196 out tokens · 51034 ms · 2026-05-14T19:56:43.605891+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 6 internal anchors

  1. [1]

    Single stage virtual try-on via deformable attention flows

    Shuai Bai and Huiling Zhou and Zhikang Li and Chang Zhou and Hongxia Yang. Single stage virtual try-on via deformable attention flows. European Conference on Computer Vision. 2022

  2. [2]

    Viton-hd: High-resolution virtual try-on via misalignment-aware normalization

    Seunghwan Choi and Sunghyun Park and Minsoo Lee and Jaegul Choo. Viton-hd: High-resolution virtual try-on via misalignment-aware normalization. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

  3. [3]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

    Morelli, Davide and Fincato, Michele and Cornia, Marcella and Landi, Federico and Cesari, Federico and Cucchiara, Rita , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

  4. [4]

    , title =

    Han, Xintong and Wu, Zuxuan and Wu, Zhe and Yu, Ruichi and Davis, Larry S. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

  5. [5]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Zhou, Ziqian and Liu, Shichao and Han, Xiangyu and Liu, Hao and Ng, Kwan-Yee and Xie, Ting and He, Shimin , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  6. [6]

    Advances in Neural Information Processing Systems (NeurIPS) , volume =

    Song, Yang and Ermon, Stefano , title =. Advances in Neural Information Processing Systems (NeurIPS) , volume =

  7. [7]

    Towards multi-pose guided virtual try-on network

    Haoye Dong and Xiaodan Liang and Xiaohui Shen and Bochao Wang and Hanjiang Lai and Jia Zhu and Zhiting Hu and Jian Yin. Towards multi-pose guided virtual try-on network. Proceedings of the IEEE/CVF International Conference on Computer Vision. 2019

  8. [8]

    Parser-free virtual try-on via distilling appearance flows

    Yuying Ge and Yibing Song and Ruimao Zhang and Chongjian Ge and Wei Liu and Ping Luo. Parser-free virtual try-on via distilling appearance flows. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

  9. [9]

    Taming the power of diffusion models for high-quality virtual try-on with appearance flow

    Junhong Gou and Siyu Sun and Jianfu Zhang and Jianlou Si and Chen Qian and Liqing Zhang. Taming the power of diffusion models for high-quality virtual try-on with appearance flow. Proceedings of the 31st ACM International Conference on Multimedia. 2023

  10. [10]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

    Yang, Hongwen and Zhang, Rongyao and Guo, Xiaonan and Liu, Wei and Zuo, Wangmeng and Luo, Ping , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =

  11. [11]

    Constructive Theory of Functions of Several Variables: Proceedings of a Conference held at Oberwolfach April 25–May 1, 1976 , pages =

    Duchon, Jean , title =. Constructive Theory of Functions of Several Variables: Proceedings of a Conference held at Oberwolfach April 25–May 1, 1976 , pages =. 1977 , publisher =

  12. [12]

    Style-based global appearance flow for virtual try-on

    Sen He and Yi-Zhe Song and Tao Xiang. Style-based global appearance flow for virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

  13. [13]

    High-resolution virtual try-on with misalignment and occlusion-handled conditions

    Sangyun Lee and Gyojung Gu and Sunghyun Park and Seunghwan Choi and Jaegul Choo. High-resolution virtual try-on with misalignment and occlusion-handled conditions. European Conference on Computer Vision. 2022

  14. [14]

    Dress code: High-resolution multi-category virtual try-on

    Davide Morelli and Matteo Fincato and Marcella Cornia and Federico Landi and Fabio Cesari and Rita Cucchiara. Dress code: High-resolution multi-category virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

  15. [15]

    European Conference on Computer Vision (ECCV) , pages =

    Choi, Youngjin and Kwak, Seunghyun and Lee, Kyungjune and Choi, Hyojin and Shin, Jinwoo , title =. European Conference on Computer Vision (ECCV) , pages =

  16. [16]

    Ladi-vton: Latent diffusion textual-inversion enhanced virtual try-on

    Davide Morelli and Alberto Baldrati and Giuseppe Cartella and Marcella Cornia and Marco Bertini and Rita Cucchiara. Ladi-vton: Latent diffusion textual-inversion enhanced virtual try-on. Proceedings of the 31st ACM International Conference on Multimedia. 2023

  17. [17]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

    Xu, Zhen and Zhang, Jing and Liew, Jun Hao and Yan, Hongdong and Liu, Jianwen and Zhang, Chunyan and Feng, Jiashi and Shou, Mike Zheng , title =. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =

  18. [18]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume =

    Xu, Yuhao and Gu, Tao and Chen, Weifeng and Chen, Aoxue , title =. Proceedings of the AAAI Conference on Artificial Intelligence , volume =

  19. [19]

    Generative adversarial networks

    Ian Goodfellow and Jean Pouget-Abadie and Mehdi Mirza and Bing Xu and David Warde-Farley and Sherjil Ozair and Aaron Courville and Yoshua Bengio. Generative adversarial networks. Communications of the ACM. 2020

  20. [20]

    Denoising diffusion probabilistic models

    Jonathan Ho and Ajay Jain and Pieter Abbeel. Denoising diffusion probabilistic models. Advances in Neural Information Processing Systems. 2020

  21. [21]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer. High-resolution image synthesis with latent diffusion models. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2022

  22. [22]

    Disentangled cycle consistency for highly-realistic virtual try-on

    Chongjian Ge and Yibing Song and Yuying Ge and Han Yang and Wei Liu and Ping Luo. Disentangled cycle consistency for highly-realistic virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021

  23. [23]

    Gp-vton: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning

    Zhenyu Xie and Zaiyu Huang and Xin Dong and Fuwei Zhao and Haoye Dong and Xijin Zhang and Feida Zhu and Xiaodan Liang. Gp-vton: Towards general purpose virtual try-on via collaborative local-flow global-parsing learning. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023

  24. [24]

    Stableviton: Learning semantic correspondence with latent diffusion model for virtual try-on

    Jeongho Kim and Guojung Gu and Minho Park and Sunghyun Park and Jaegul Choo. Stableviton: Learning semantic correspondence with latent diffusion model for virtual try-on. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024

  25. [25]

    Tryondiffusion: A tale of two unets

    Luyang Zhu and Dawei Yang and Tyler Zhu and Fitsum Reda and William Chan and Chitwan Saharia and Mohammad Norouzi and Ira Kemelmacher-Shlizerman. Tryondiffusion: A tale of two unets. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2023

  26. [26]

    arXiv preprint arXiv:2411.10499 , year =

    Jiang, Bin and Hu, Xiaoxiao and Luo, Dongdong and He, Qian and Xu, Chen and Peng, Jing and Fu, Yanwei , title =. arXiv preprint arXiv:2411.10499 , year =

  27. [27]

    Proceedings of the International Conference on Learning Representations (ICLR) , year =

    Chong, Zheng and Dong, Xiao and Li, Haoxiang and Zhang, Shiyue and Zhang, Wenqing and Zhang, Xujie and Zhao, Hanqing and Liang, Xiaodan , title =. Proceedings of the International Conference on Learning Representations (ICLR) , year =

  28. [28]

    and Cao, J

    Sun, K. and Cao, J. and Wang, Q. and Tian, L. and Zhang, X. and Zhuo, L. and Gao, D. , title =. arXiv preprint arXiv:2407.16224 , year =

  29. [29]

    Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

    Peebles, William and Xie, Saining , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , pages =

  30. [30]

    Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , booktitle =

    Esser, Patrick and Kulal, Shubham and Blattmann, Andreas and Entezari, Reza and M. Scaling Rectified Flow Transformers for High-Resolution Image Synthesis , booktitle =

  31. [31]

    SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis , booktitle =

    Podell, Daniel and English, Zana and Lacey, Kenneth and Blattmann, Andreas and Dockhorn, Tobias and M. SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis , booktitle =

  32. [32]

    Advances in Neural Information Processing Systems (NeurIPS) , year =

    Li, Yexin and Zhou, Haoyu and Shang, Weichen and Lin, Runyu and Chen, Xinyu and Ni, Bingbing , title =. Advances in Neural Information Processing Systems (NeurIPS) , year =

  33. [33]

    IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=

    Self-correction for human parsing , author=. IEEE Transactions on Pattern Analysis and Machine Intelligence , volume=. 2020 , publisher=

  34. [34]

    arXiv preprint arXiv:2411.18350 , year=

    TryOffDiff: Virtual-Try-Off via High-Fidelity Garment Reconstruction using Diffusion Models , author=. arXiv preprint arXiv:2411.18350 , year=

  35. [35]

    arXiv preprint arXiv:2412.08573 (2024)

    TryOffAnyone: Tiled Cloth Generation from a Dressed Person , author=. arXiv preprint arXiv:2412.08573 , year=

  36. [36]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Deepfashion: Powering robust clothes recognition and retrieval with rich annotations , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  37. [37]

    International conference on machine learning , pages=

    Deep unsupervised learning using nonequilibrium thermodynamics , author=. International conference on machine learning , pages=. 2015 , organization=

  38. [38]

    Denoising Diffusion Implicit Models

    Denoising diffusion implicit models , author=. arXiv preprint arXiv:2010.02502 , year=

  39. [39]

    Auto-Encoding Variational Bayes

    Auto-encoding variational bayes , author=. arXiv preprint arXiv:1312.6114 , year=

  40. [40]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Paint by example: Exemplar-based image editing with diffusion models , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  41. [41]

    International conference on machine learning , pages=

    Learning transferable visual models from natural language supervision , author=. International conference on machine learning , pages=. 2021 , organization=

  42. [42]

    IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

    Ye, Hailin and Zhang, Jing and Liu, Shichao and Han, Xiangyu and Yang, Wei , title =. arXiv preprint arXiv:2308.06721 , year =

  43. [43]

    Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=

    U-net: Convolutional networks for biomedical image segmentation , author=. Medical image computing and computer-assisted intervention--MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18 , pages=. 2015 , organization=

  44. [44]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Towards accurate multi-person pose estimation in the wild , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  45. [45]

    Advances in neural information processing systems , volume=

    Gans trained by a two time-scale update rule converge to a local nash equilibrium , author=. Advances in neural information processing systems , volume=

  46. [46]

    Demystifying MMD GANs

    Demystifying mmd gans , author=. arXiv preprint arXiv:1801.01401 , year=

  47. [47]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    On aliased resizing and surprising subtleties in gan evaluation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  48. [48]

    IEEE transactions on image processing , volume=

    Image quality assessment: from error visibility to structural similarity , author=. IEEE transactions on image processing , volume=. 2004 , publisher=

  49. [49]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    The unreasonable effectiveness of deep features as a perceptual metric , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  50. [50]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Image quality assessment: Unifying structure and texture similarity , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2020 , publisher=

  51. [51]

    Decoupled Weight Decay Regularization

    Fixing weight decay regularization in adam , author=. arXiv preprint arXiv:1711.05101 , volume=

  52. [52]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

    Animate anyone: Consistent and controllable image-to-video synthesis for character animation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , pages=

  53. [53]

    International Conference on Learning Representations (ICLR) , year=

    Flow Matching for Generative Modeling , author=. International Conference on Learning Representations (ICLR) , year=

  54. [54]

    2024 , howpublished =

    FLUX.1 [dev] , author =. 2024 , howpublished =

  55. [55]

    arXiv preprint arXiv:2501.03630 , year=

    Mc-vton: Minimal control virtual try-on diffusion transformer , author=. arXiv preprint arXiv:2501.03630 , year=

  56. [56]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Cat-dm: Controllable accelerated virtual try-on with diffusion model , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  57. [57]

    SIGGRAPH Asia 2024 Conference Papers , pages=

    Fast high-resolution image synthesis with latent adversarial diffusion distillation , author=. SIGGRAPH Asia 2024 Conference Papers , pages=

  58. [58]

    International Conference on Learning Representations (ICLR) , year=

    Learning to Generate and Transfer Data with Rectified Flow , author=. International Conference on Learning Representations (ICLR) , year=

  59. [59]

    arXiv preprint arXiv:2407.02398 , year=

    Consistency flow matching: Defining straight flows with velocity consistency , author=. arXiv preprint arXiv:2407.02398 , year=

  60. [60]

    The Twelfth International Conference on Learning Representations , year=

    Instaflow: One step is enough for high-quality diffusion-based text-to-image generation , author=. The Twelfth International Conference on Learning Representations , year=

  61. [61]

    Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) , year=

    Optimal Flow Matching: Learning Straight Trajectories in Just One Step , author=. Thirty-eighth Conference on Neural Information Processing Systems (NeurIPS) , year=

  62. [62]

    Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS) , year=

    Blockwise Flow Matching: Improving Flow Matching Models for Efficient High-Quality Generation , author=. Thirty-ninth Conference on Neural Information Processing Systems (NeurIPS) , year=

  63. [63]

    Advances in Neural Information Processing Systems , year=

    Attention Is All You Need , author=. Advances in Neural Information Processing Systems , year=

  64. [64]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    Texture-preserving diffusion models for high-fidelity virtual try-on , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  65. [65]

    arXiv preprint arXiv:2508.17614 , year=

    JCo-MVTON: Jointly Controllable Multi-Modal Diffusion Transformer for Mask-Free Virtual Try-on , author=. arXiv preprint arXiv:2508.17614 , year=

  66. [66]

    arXiv e-prints , pages=

    Ds-vton: High-quality virtual try-on via disentangled dual-scale generation , author=. arXiv e-prints , pages=

  67. [67]

    Advances in Neural Information Processing Systems , volume=

    Perflow: Piecewise rectified flow as universal plug-and-play accelerator , author=. Advances in Neural Information Processing Systems , volume=

  68. [68]

    Classifier-Free Diffusion Guidance

    Classifier-free diffusion guidance , author=. arXiv preprint arXiv:2207.12598 , year=

  69. [69]

    Consistency models , author=

  70. [70]

    Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

    Any2anytryon: Leveraging adaptive position embeddings for versatile virtual clothing tasks , author=. Proceedings of the IEEE/CVF International Conference on Computer Vision , pages=

  71. [71]

    Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

    Densepose: Dense human pose estimation in the wild , author=. Proceedings of the IEEE conference on computer vision and pattern recognition , pages=

  72. [72]

    Proceedings of the European conference on computer vision (ECCV) , pages=

    Instance-level human parsing via part grouping network , author=. Proceedings of the European conference on computer vision (ECCV) , pages=

  73. [73]

    IEEE transactions on pattern analysis and machine intelligence , volume=

    Openpose: Realtime multi-person 2d pose estimation using part affinity fields , author=. IEEE transactions on pattern analysis and machine intelligence , volume=. 2019 , publisher=