{"paper":{"title":"DirectTryOn: One-Step Virtual Try-On via Straightened Conditional Transport","license":"http://creativecommons.org/licenses/by-nc-sa/4.0/","headline":"Virtual try-on can reach state-of-the-art quality in one sampling step by straightening the conditional transport path.","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Jiahui Zhan, Jianfu Zhang, Liqing Zhang, Xianbing Sun","submitted_at":"2026-05-13T03:18:43Z","abstract_excerpt":"Recent diffusion- and flow-based VTON methods achieve strong results with pretrained generative models, but their reliance on multi-step sampling incurs high inference cost, while existing acceleration methods largely overlook the intrinsic structure of the try-on task. In this paper, we highlight a key observation: VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution. However, limited task-specific data makes training from sc"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"our method achieves state-of-the-art performance with one-step sampling, establishing a new standard for efficient and high-quality VTON.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generative models.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Virtual try-on can reach state-of-the-art quality in one sampling step by straightening the conditional transport path.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"8d6a9670973b82e170bb21fcc6ad5e0c3ad6062eb999659b69b463e157e9f17e"},"source":{"id":"2605.12939","kind":"arxiv","version":1},"verdict":{"id":"56e33b74-77b7-4f0b-887b-5fc23a7002a4","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-14T19:56:43.605891Z","strongest_claim":"our method achieves state-of-the-art performance with one-step sampling, establishing a new standard for efficient and high-quality VTON.","one_line_summary":"DirectTryOn achieves state-of-the-art one-step virtual try-on performance by applying pure conditional transport, garment preservation loss, and self-consistency loss to straighten trajectories in pretrained generative models.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"VTON outputs are highly constrained by the conditional inputs, suggesting that the conditional sampling trajectory can be much straighter than that in general image generation, making one-step generation a natural solution.","pith_extraction_headline":"Virtual try-on can reach state-of-the-art quality in one sampling step by straightening the conditional transport path."},"references":{"count":73,"sample":[{"doi":"","year":2022,"title":"Single stage virtual try-on via deformable attention flows","work_id":"8a80dbef-8be7-4310-94a9-a0c1e00d4e86","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2021,"title":"Viton-hd: High-resolution virtual try-on via misalignment-aware normalization","work_id":"4f6de502-a51a-4c3e-a9b1-7f24167de8af","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages =","work_id":"3476b6d8-3820-4629-871b-328501fc4d7d","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Han, Xintong and Wu, Zuxuan and Wu, Zhe and Yu, Ruichi and Davis, Larry S. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , pages =","work_id":"6b932331-1d2e-4391-a11e-e724b5ed46ba","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year =","work_id":"2b7467c1-2b38-48de-9fc5-0170cffdd4b9","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":73,"snapshot_sha256":"20446352adac33fd006b2591ee4218309e904aa28f4be94aef8ce261fa23266f","internal_anchors":6},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}