FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation
Pith reviewed 2026-05-21 09:59 UTC · model grok-4.3
The pith
Fine-tuning turns pre-trained diffusion language models into flow matching models for high-quality few-step text generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By re-aligning the curved sampling trajectories of diffusion language models into straight-line flows through efficient fine-tuning, FlowLM achieves few-step generation quality that rivals or exceeds that of 2,000-step diffusion sampling. The fine-tuned FlowLM saturates with only half as many training epochs as training from scratch, with both greatly outperforming the diffusion baseline. Predicting clean data serves as a more effective training objective for flow matching.
What carries the argument
Re-aligning curved diffusion sampling trajectories into straight-line flows via fine-tuning of pre-trained diffusion language models.
If this is right
- Few-step sampling becomes practical for producing high-quality text.
- Training a flow matching model from a diffusion base requires fewer epochs to reach peak performance.
- The clean data prediction objective consistently guides sampling toward the true distribution.
- Flow matching offers a path to more efficient generative language modeling than standard diffusion.
- Pre-trained diffusion models can be repurposed efficiently rather than discarded.
Where Pith is reading between the lines
- Similar adaptation techniques might apply to other generative models beyond language.
- This could lower the inference cost for large language models in applications requiring many generations.
- Exploring the limits of how few steps are needed while maintaining quality would test the boundaries of this method.
Load-bearing premise
Re-aligning curved diffusion trajectories into straight flows via fine-tuning preserves generation quality without introducing new failure modes or distribution shifts.
What would settle it
Running the same evaluation metrics on few-step FlowLM samples versus 2000-step diffusion samples and finding that FlowLM does not match or exceed quality, or that from-scratch training saturates no later than fine-tuned, would challenge the claim.
Figures
read the original abstract
We present FlowLM, a flow matching language model transformed from pre-trained diffusion language models via efficient fine-tuning. By re-aligning the curved sampling trajectories of diffusion models into straight-line flows, FlowLM enables high quality few-step generation that rivals or even outperforms the quality of 2,000-step diffusion sampling with very few training epochs. Remarkably, finetuned FlowLM reaches performance saturation with only half as many training epochs as training from scratch, both approaches greatly outperforming the original diffusion model, thereby validating our method. Furthermore, we validate a more effective training objective for flow matching: predicting clean data to consistently guide the sampling process towards the true data distribution. Empirical results demonstrate that our approach is highly effective for high-quality, few-step text generation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces FlowLM, a flow-matching language model obtained by fine-tuning pre-trained diffusion language models. The central claim is that re-aligning curved diffusion trajectories into straight flows via this adaptation enables high-quality few-step generation that rivals or exceeds 2000-step diffusion sampling. The authors report that fine-tuned FlowLM reaches performance saturation after only half as many training epochs as training a flow model from scratch, with both approaches greatly outperforming the original diffusion baseline. They additionally validate that a clean-data prediction objective is more effective than the standard velocity prediction for flow matching in this setting.
Significance. If the empirical results hold under rigorous controls, the work offers a practical route to accelerate sampling in diffusion-based language models by converting them to flow models with limited fine-tuning. The reported faster saturation of the fine-tuned approach compared to from-scratch flow training would be a useful efficiency gain for practitioners. The paper also contributes an empirical comparison of training objectives for flow matching on text.
major comments (3)
- [§4] §4 (Experimental Results): The claim that fine-tuned FlowLM saturates with half the epochs of from-scratch training and both greatly outperform the original diffusion model is load-bearing for the central contribution, yet the section provides no details on the precise metrics (perplexity, MAUVE, or human judgments), number of random seeds, statistical significance tests, or hyperparameter search protocol. Without these, it is impossible to rule out that the reported gains arise from post-hoc baseline selection or metric choice.
- [§3.2] §3.2 (Training Objective): The paper asserts that predicting clean data is a 'more effective training objective' that 'consistently guide[s] the sampling process towards the true data distribution.' This is central to the adaptation method, but no ablation isolates its effect on distribution fidelity (e.g., via token-frequency histograms or long-range dependency statistics) versus the standard flow-matching loss; the reported LM metrics alone do not detect possible mode collapse or mean-seeking bias introduced by the clean-data target.
- [§5] §5 (Ablation and Analysis): The assumption that re-aligning trajectories preserves the original data distribution is not directly tested. Additional diagnostics such as self-BLEU, n-gram overlap with the training set, or embedding-space coverage metrics are needed to confirm that the velocity-field adaptation does not introduce unmeasured shifts that would undermine the 'greatly outperforming' and 'few-step' claims.
minor comments (2)
- The abstract would be strengthened by including at least one concrete quantitative result (e.g., 'X% improvement on metric Y with Z steps').
- Notation for the velocity field and the clean-data target should be defined once in §2 and used consistently thereafter.
Simulated Author's Rebuttal
We thank the referee for their constructive comments. We address each of the major comments below and have made revisions to the manuscript to strengthen the presentation of our results.
read point-by-point responses
-
Referee: [§4] §4 (Experimental Results): The claim that fine-tuned FlowLM saturates with half the epochs of from-scratch training and both greatly outperform the original diffusion model is load-bearing for the central contribution, yet the section provides no details on the precise metrics (perplexity, MAUVE, or human judgments), number of random seeds, statistical significance tests, or hyperparameter search protocol. Without these, it is impossible to rule out that the reported gains arise from post-hoc baseline selection or metric choice.
Authors: We agree that additional details on the experimental setup would improve the clarity and reproducibility of our results. In the revised manuscript, we have expanded §4 to include the specific metrics used (perplexity and MAUVE scores), the number of random seeds (we report results averaged over 3 seeds), details on statistical significance testing (using bootstrap resampling), and the hyperparameter search protocol (grid search over learning rate and number of epochs). These additions confirm that the performance gains are robust and not due to selective reporting. revision: yes
-
Referee: [§3.2] §3.2 (Training Objective): The paper asserts that predicting clean data is a 'more effective training objective' that 'consistently guide[s] the sampling process towards the true data distribution.' This is central to the adaptation method, but no ablation isolates its effect on distribution fidelity (e.g., via token-frequency histograms or long-range dependency statistics) versus the standard flow-matching loss; the reported LM metrics alone do not detect possible mode collapse or mean-seeking bias introduced by the clean-data target.
Authors: We appreciate this point and have performed an additional ablation study to isolate the effect of the clean-data prediction objective. In the revised paper, we include comparisons using token-frequency histograms and statistics on long-range dependencies, demonstrating that the clean-data objective leads to better fidelity to the data distribution without introducing mode collapse or mean-seeking bias. These results are now presented in §3.2 and the appendix. revision: yes
-
Referee: [§5] §5 (Ablation and Analysis): The assumption that re-aligning trajectories preserves the original data distribution is not directly tested. Additional diagnostics such as self-BLEU, n-gram overlap with the training set, or embedding-space coverage metrics are needed to confirm that the velocity-field adaptation does not introduce unmeasured shifts that would undermine the 'greatly outperforming' and 'few-step' claims.
Authors: We acknowledge the importance of verifying that the trajectory re-alignment preserves the data distribution. In the updated §5, we have incorporated self-BLEU scores, n-gram overlap analysis with the training set, and embedding-space coverage metrics. These diagnostics show minimal shifts, supporting that the adaptation maintains the original distribution while enabling few-step generation. revision: yes
Circularity Check
No circularity: claims rest on empirical comparisons with no derivations or fitted predictions by construction
full rationale
The paper presents FlowLM as an empirical adaptation of pre-trained diffusion LMs to flow matching via fine-tuning, with central claims about faster saturation (half the epochs) and superior few-step generation quality supported by reported performance metrics. No equations, uniqueness theorems, ansatzes, or derivation chains appear in the abstract or described content that could reduce a 'prediction' or result to its own inputs by construction. The training objective shift to predicting clean data is presented as a methodological choice validated empirically rather than as a self-referential fit. This is a standard empirical ML contribution whose results are externally falsifiable via replication on the same benchmarks, with no load-bearing self-citation chains or self-definitional steps.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
By re-aligning the curved sampling trajectories of diffusion models into straight-line flows, FlowLM enables high quality few-step generation
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LFM = E ||v_theta(zt,t) - (z1 - z0)||^2 with linear trajectory zt = t z1 + (1-t) z0
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,
work page internal anchor Pith review Pith/arXiv arXiv
-
[2]
Dlm-one: Diffusion language models for one-step sequence generation.arXiv preprint arXiv:2506.00290,
Chen, T., Zhang, S., and Zhou, M. Dlm-one: Diffusion language models for one-step sequence generation.arXiv preprint arXiv:2506.00290,
-
[3]
Dhingra, B., Mazaitis, K., and Cohen, W. W. Quasar: Datasets for question answering by search and reading. arXiv preprint arXiv:1707.03904,
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Letman, A., Mathur, A., Schelten, A., Yang, A., Fan, A., et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Improved Mean Flows: On the Challenges of Fastforward Generative Models
Geng, Z., Lu, Y ., Wu, Z., Shechtman, E., Kolter, J. Z., and He, K. Improved mean flows: On the chal- lenges of fastforward generative models.arXiv preprint arXiv:2512.02012,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
Dif- fuseq: Sequence to sequence text generation with diffu- sion models
Gong, S., Li, M., Feng, J., Wu, Z., and Kong, L. Dif- fuseq: Sequence to sequence text generation with diffu- sion models. InInternational Conference on Learning Representations (ICLR 2023)(01/05/2023-05/05/2023, Ki- gali, Rwanda), 2023a. Gong, S., Li, M., Feng, J., Wu, Z., and Kong, L. Diffuseq-v2: Bridging discrete and continuous text spaces for acceler...
work page 2023
-
[7]
Statistical significance tests for machine transla- tion evaluation
Koehn, P. Statistical significance tests for machine transla- tion evaluation. InProceedings of the 2004 conference on empirical methods in natural language processing, pp. 388–395,
work page 2004
-
[8]
Back to Basics: Let Denoising Generative Models Denoise
Li, T. and He, K. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720,
work page internal anchor Pith review Pith/arXiv arXiv
-
[9]
T., Ben-Hamu, H., Nickel, M., and Le, M
Lipman, Y ., Chen, R. T., Ben-Hamu, H., Nickel, M., and Le, M. Flow matching for generative modeling. In11th International Conference on Learning Representations, ICLR 2023,
work page 2023
-
[10]
Enable fast sampling for seq2seq text diffusion
Liu, P., Tian, X., and Lin, Z. Enable fast sampling for seq2seq text diffusion. InFindings of the Association for Computational Linguistics: EMNLP 2024, pp. 8495– 8505,
work page 2024
-
[11]
Large language diffusion models
Nie, S., Zhu, F., You, Z., Zhang, X., Ou, J., Hu, J., ZHOU, J., Lin, Y ., Wen, J.-R., and Li, C. Large language diffusion models. InICLR 2025 Workshop on Deep Generative Model in Machine Learning: Theory, Principle and Effi- cacy,
work page 2025
-
[12]
Tess 2: A large-scale generalist diffusion language model.arXiv preprint arXiv:2502.13917, 2025
Tae, J., Ivison, H., Kumar, S., and Cohan, A. Tess 2: A large-scale generalist diffusion language model.arXiv preprint arXiv:2502.13917,
-
[13]
Fast-dLLM: Training-free Acceleration of Diffusion LLM by Enabling KV Cache and Parallel Decoding
Wu, C., Zhang, H., Xue, S., Liu, Z., Diao, S., Zhu, L., Luo, P., Han, S., and Xie, E. Fast-dllm: Training-free acceler- ation of diffusion llm by enabling kv cache and parallel decoding.arXiv preprint arXiv:2505.22618,
work page internal anchor Pith review Pith/arXiv arXiv
-
[14]
Dream-coder 7b: An open diffusion language model for code.arXiv preprint arXiv:2509.01142,
Xie, Z., Ye, J., Zheng, L., Gao, J., Dong, J., Wu, Z., Zhao, X., Gong, S., Jiang, X., Li, Z., et al. Dream-coder 7b: An open diffusion language model for code.arXiv preprint arXiv:2509.01142,
-
[15]
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Zhu, F., Wang, R., Nie, S., Zhang, X., Wu, C., Hu, J., Zhou, J., Chen, J., Lin, Y ., Wen, J.-R., et al. Llada 1.5: Variance- reduced preference optimization for large language diffu- sion models.arXiv preprint arXiv:2505.19223,
work page internal anchor Pith review Pith/arXiv arXiv
-
[16]
More about FlowLM Figure 4.Visualization of generation trajectories in 2D PCA space
10 FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation A. More about FlowLM Figure 4.Visualization of generation trajectories in 2D PCA space. While the baseline diffusion model follows a curved path (blue, straightness=0.0996), our method achieves a nearly perfect linear trajectory (red, straightness=0.9969). This straightened path minimi...
work page 2000
-
[17]
The best results of few-step generation model arebold
Comparison between x-pred x-loss and x-pred, v-loss in FlowLM(mbr=1)usinguniform time sampling. The best results of few-step generation model arebold. Tasks Type Methods BLEU↑R-L↑BERTScore↑ dist-1↑Training epoch Question Generation Few-step FlowLM(x-pred,v-loss, step=5) 0.1557 0.3468 0.5845 0.9168 6000 FlowLM(x-pred,v-loss, step=3) 0.1559 0.3480 0.5822 0....
-
[18]
The best results of few-step generation model arebold
Comparison between x-pred x-loss and x-pred, v-loss in FlowLM(mbr=1)usinglogit-normal time sampling. The best results of few-step generation model arebold. Tasks Type Methods BLEU↑R-L↑BERTScore↑ dist-1↑Training epoch Question Generation Few-step FlowLM(x-pred,v-loss, step=5) 0.1414 0.3326 0.5712 0.9159 6000 FlowLM(x-pred,v-loss, step=3) 0.1400 0.3326 0.56...
- [19]
- [20]
- [21]
-
[22]
0.1471 0.3554 0.5767 0.8610 FlowLM(step=5)0.1656 0.3578 0.5966 0.9186 FlowLM(step=3) 0.1639 0.3571 0.5930 0.9146 FlowLM(step=1) 0.1535 0.3566 0.5720 0.8405 5 Multi-step Diffuseq(2000) 0.1622 0.3621 0.5989 0.9116 Few-step Diffuseq(DPM,
work page 2000
-
[23]
0.1480 0.3571 0.5773 0.8619 FlowLM(step=5)0.1669 0.3592 0.5982 0.9173 FlowLM(step=3) 0.1649 0.3577 0.5943 0.9138 FlowLM(step=1) 0.1541 0.3575 0.5727 0.8405 6 Multi-step Diffuseq(2000) 0.1634 0.3628 0.6002 0.9102 Few-step Diffuseq(DPM,
work page 2000
-
[24]
0.1485 0.3576 0.5782 0.8604 FlowLM(step=5)0.1673 0.3602 0.5994 0.9171 FlowLM(step=3) 0.1654 0.3589 0.5952 0.9124 FlowLM(step=1) 0.1540 0.3567 0.5723 0.8392 7 Multi-step Diffuseq(2000) 0.1649 0.3638 0.6011 0.9085 Few-step Diffuseq(DPM,
work page 2000
-
[25]
0.1490 0.3582 0.5790 0.8608 FlowLM(step=5)0.1677 0.3608 0.6003 0.9159 FlowLM(step=3) 0.1662 0.3600 0.5964 0.9122 FlowLM(step=1) 0.1540 0.3575 0.5725 0.8384 8 Multi-step Diffuseq(2000) 0.1653 0.3644 0.6019 0.9071 Few-step Diffuseq(DPM,
work page 2000
-
[26]
0.1491 0.3584 0.5790 0.8609 FlowLM(step=5)0.1678 0.3612 0.6007 0.9150 FlowLM(step=3) 0.1664 0.3610 0.5966 0.9114 FlowLM(step=1) 0.1543 0.3577 0.5727 0.8385 9 Multi-step Diffuseq(2000) 0.1654 0.3648 0.6029 0.9068 Few-step Diffuseq(DPM,
work page 2000
-
[27]
0.1491 0.3588 0.5788 0.8605 FlowLM(step=5)0.1682 0.3623 0.6015 0.9149 FlowLM(step=3) 0.1670 0.3617 0.5971 0.9111 FlowLM(step=1) 0.1543 0.3578 0.5730 0.8385 10 Multi-step Diffuseq(2000) 0.1654 0.3659 0.6029 0.9063 Few-step Diffuseq(DPM,
work page 2000
-
[28]
0.1487 0.3586 0.5789 0.8602 FlowLM(step=5)0.1687 0.3629 0.6022 0.9147 FlowLM(step=3) 0.1671 0.3620 0.5981 0.9109 FlowLM(step=1) 0.1540 0.3575 0.5727 0.8387 15 FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation Table 12.Experimental hyperparameter settings for Paraphrase task. Parameter Value Parameter Value Architecture & Diffusion Datase...
work page 2048
-
[30]
0.20910.5632 0.79820.9615 FlowLM(step=5) 0.2114 0.5515 0.79720.9787 FlowLM(step=3)0.21140.5523 0.7909 0.9772 FlowLM(step=1) 0.1914 0.5407 0.7561 0.9452 4 Multi-step Diffuseq(2000) 0.2168 0.5661 0.8173 0.9783 Few-step Diffuseq(DPM,
work page internal anchor Pith review Pith/arXiv arXiv 1914
- [31]
- [32]
- [33]
- [34]
-
[35]
0.2186 0.5744 0.8085 0.9644 FlowLM(step=5)0.2306 0.5748 0.8170 0.9809 FlowLM(step=3) 0.2279 0.5707 0.8090 0.9783 FlowLM(step=1) 0.1906 0.5410 0.7585 0.9459 9 Multi-step Diffuseq(2000) 0.2348 0.5843 0.8321 0.9817 Few-step Diffuseq(DPM,
work page 1906
- [36]
-
[37]
0.2204 0.5761 0.8105 0.9661 FlowLM(step=5)0.2319 0.5784 0.8188 0.9805 FlowLM(step=3) 0.2278 0.5715 0.8103 0.9784 FlowLM(step=1) 0.1919 0.5432 0.7601 0.9463 17 FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation Table 14.Hyperparameter settings for DiffuSeq and FlowLM experiments on Text Simplification. Parameter Value Parameter Value Archi...
work page 1919
-
[38]
0.2318 0.4674 0.6896 0.8795 FlowLM(step=5)0.2527 0.4850 0.7293 0.9022 FlowLM(step=3) 0.2484 0.4798 0.7122 0.8766 FlowLM(step=1) 0.2274 0.4440 0.6332 0.7493 2 Multi-step Diffuseq(2000) 0.3078 0.5431 0.7833 0.9135 Few-step Diffuseq(DPM,
work page 2000
-
[39]
0.2287 0.4663 0.6882 0.8828 FlowLM(step=5)0.2548 0.4857 0.7280 0.8962 FlowLM(step=3) 0.2519 0.4819 0.7126 0.8700 FlowLM(step=1) 0.2270 0.4438 0.6329 0.7522 3 Multi-step Diffuseq(2000) 0.3327 0.5614 0.7951 0.9233 Few-step Diffuseq(DPM,
work page 2000
-
[40]
0.2316 0.4687 0.6904 0.8802 FlowLM(step=5)0.2932 0.5210 0.7545 0.9057 FlowLM(step=3) 0.2753 0.5042 0.7304 0.8813 FlowLM(step=1) 0.2279 0.4445 0.6347 0.7496 4 Multi-step Diffuseq(2000) 0.3455 0.5718 0.8022 0.9239 Few-step Diffuseq(DPM,
work page 2000
-
[41]
0.2329 0.4698 0.6920 0.8804 FlowLM(step=5)0.3100 0.5352 0.7654 0.9063 FlowLM(step=3) 0.2883 0.5158 0.7409 0.8855 FlowLM(step=1) 0.2286 0.4452 0.6352 0.7497 5 Multi-step Diffuseq(2000) 0.3504 0.5756 0.8057 0.9262 Few-step Diffuseq(DPM,
work page 2000
-
[42]
0.2338 0.4704 0.6923 0.8798 FlowLM(step=5)0.3204 0.5458 0.7729 0.9081 FlowLM(step=3) 0.2984 0.5242 0.7478 0.8869 FlowLM(step=1) 0.2289 0.4458 0.6360 0.7493 6 Multi-step Diffuseq(2000) 0.3536 0.5771 0.8070 0.9259 Few-step Diffuseq(DPM,
work page 2000
-
[43]
0.2339 0.4705 0.6919 0.8807 FlowLM(step=5)0.3278 0.5516 0.7780 0.9085 FlowLM(step=3) 0.3042 0.5295 0.7520 0.8886 FlowLM(step=1) 0.2304 0.4466 0.6364 0.7498 7 Multi-step Diffuseq(2000) 0.3572 0.5799 0.8090 0.9261 Few-step Diffuseq(DPM,
work page 2000
-
[44]
0.2346 0.4705 0.6925 0.8801 FlowLM(step=5)0.3335 0.5573 0.7822 0.9099 FlowLM(step=3) 0.3091 0.5344 0.7564 0.8908 FlowLM(step=1) 0.2292 0.4459 0.6357 0.7490 8 Multi-step Diffuseq(2000) 0.3583 0.5814 0.8103 0.9261 Few-step Diffuseq(DPM,
work page 2000
-
[45]
0.2340 0.4706 0.6923 0.8804 FlowLM(step=5)0.3371 0.5609 0.7855 0.9109 FlowLM(step=3) 0.3125 0.5374 0.7591 0.8918 FlowLM(step=1) 0.2298 0.4463 0.6360 0.7493 9 Multi-step Diffuseq(2000) 0.3631 0.5859 0.8125 0.9257 Few-step Diffuseq(DPM,
work page 2000
-
[46]
0.2348 0.4708 0.6926 0.8805 FlowLM(step=5)0.3404 0.5639 0.7877 0.9118 FlowLM(step=3) 0.3145 0.5396 0.7615 0.8929 FlowLM(step=1) 0.2297 0.4461 0.6361 0.7493 10 Multi-step Diffuseq(2000) 0.3644 0.5867 0.8136 0.9254 Few-step Diffuseq(DPM,
work page 2000
-
[47]
Results demonstrate consistent quality gains as the relative training budget increases
Ablation analysis on Training epochs (mapped to 1k–10k) for the Paraphrase task. Results demonstrate consistent quality gains as the relative training budget increases. 21 FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation C. More comparison on different training strategies In standard diffusion models, the training process typically invo...
work page 2000
-
[48]
forces the model to learn local vector fields for intermediate states that are skipped during fast inference, potentially leading to inefficient allocation of model capacity. Conversely, reducing the training time steps to match the inference scale might improve focus, though significantly reducing T carries the risk of overfitting or failing to capture t...
work page 2000
-
[49]
Paraphrase experimental results. Our optimized Flow Matching (Ours) compared with fm num steps=2000 version (fm2k) and DiffuSeq baselines. 22 FlowLM: Few-Step Language Modeling via Diffusion-to-Flow Adaptation Table 16.Comprehensive comparison of Paraphrase results for all versions across multiple MBR candidate sizes (n∈ {1,3,5,10}). MBR (n) Category Mode...
work page 2000
-
[50]
0.19520.5583 0.79320.9566 FlowLM (Ours, S5) 0.1916 0.5289 0.78270.9785 FlowLM (Ours, S3)0.19870.5357 0.7784 0.9757 FlowLM (Ours, S1) 0.1910 0.5394 0.7560 0.9446 FlowLM (fm2k, S5) 0.1826 0.5162 0.7744 0.9734 FlowLM (fm2k, S3) 0.1891 0.5227 0.7702 0.9720 FlowLM (fm2k, S1) 0.1909 0.5431 0.7612 0.9359 3 Multi-step Diffuseq(2000) 0.2087 0.55610.80650.9755 Few-...
-
[51]
0.20910.56320.7982 0.9615 FlowLM (Ours, S5)0.21140.5515 0.79720.9787 FlowLM (Ours, S3)0.21140.5523 0.7909 0.9772 FlowLM (Ours, S1) 0.1914 0.5407 0.7561 0.9452 FlowLM (fm2k, S5) 0.2033 0.5401 0.7899 0.9737 FlowLM (fm2k, S3) 0.2054 0.5418 0.7837 0.9719 FlowLM (fm2k, S1) 0.1941 0.5440 0.7632 0.9370 5 Multi-step Diffuseq(2000)0.2229 0.5721 0.82170.9787 Few-st...
-
[52]
0.2145 0.5713 0.8055 0.9635 FlowLM (Ours, S5) 0.2204 0.5633 0.80790.9795 FlowLM (Ours, S3) 0.2188 0.5612 0.7995 0.9779 FlowLM (Ours, S1) 0.1908 0.5420 0.7585 0.9462 FlowLM (fm2k, S5) 0.2127 0.5518 0.8004 0.9748 FlowLM (fm2k, S3) 0.2119 0.5518 0.7933 0.9733 FlowLM (fm2k, S1) 0.1934 0.5458 0.7659 0.9357 10 Multi-step Diffuseq(2000)0.2377 0.5870 0.8333 0.981...
-
[53]
0.2204 0.5761 0.8105 0.9661 FlowLM (Ours, S5) 0.2319 0.5784 0.8188 0.9805 FlowLM (Ours, S3) 0.2278 0.5715 0.8103 0.9784 FlowLM (Ours, S1) 0.1919 0.5432 0.7601 0.9463 FlowLM (fm2k, S5) 0.2255 0.5670 0.8120 0.9755 FlowLM (fm2k, S3) 0.2225 0.5649 0.8053 0.9732 FlowLM (fm2k, S1) 0.1950 0.5491 0.7682 0.9357 The quantitative results are presented in Table 16 an...
-
[54]
1000 is original diffusion rescale value
Comprehensive comparison of Question generation results for input time-step rescale(20,200,1000). 1000 is original diffusion rescale value. Tasks Type Methods BLEU↑R-L↑BERTScore↑ dist-1↑Training epoch Paraphrase Few-step FlowLM(Rescale to 1000, step=5) 0.15960.3484 0.5898 0.92066000 FlowLM(Rescale to 1000, step=3) 0.1595 0.3489 0.5878 0.9169 6000 FlowLM(R...
-
[55]
Comparison of generated samples between FlowLM (Ours) at few-step inference and DiffuSeq baselines
Case study on the Question Generation task. Comparison of generated samples between FlowLM (Ours) at few-step inference and DiffuSeq baselines. Semantic inconsistencies and lexical errors are highlighted in bold. Model (Steps) Generated Question Reference:Karl Landsteiner won the Nobel Prize for medicine in 1930 for his discovery of what? FlowLM (Ours,N=
work page 1930
-
[56]
karl landsteiner won a nobel prize invillainsfor whichwhichdiscovery karl landsteiner won a nobel prize inharleyfor which in discovery karl landsteiner won a nobel prize in 1930 for whichblanca thestarvation theerwon a nobel prize in 1930 for which medical discovery FlowLM (Ours,N=
work page 1930
-
[57]
karl landsteiner won a nobel prize inintimidationfor which medical discovery karl landsteiner won a nobel prize ininquisitionfor which medical discovery karl landsteiner won a nobel\u53e4in 1930 for which medical knees karl landsteiner won a nobel prize in 1930 for whichknow FlowLM (Ours,N=
work page 1930
-
[58]
karl landsteiner won a nobel prize in 1930 for whichmedical discovery karl landsteiner won a nobel prize in 1930 for whichmedical famous karl landsteiner won a 1930 prize in 1930 for which medical discovery karl landsteiner won a nobel prize in 1930 for which medicalmontagu DiffuSeq (Baseline,N=
work page 1930
-
[59]
karl landsteiner won thefilmin 1930 for which medical flew which condition karl landsteiner was a stand scientific inmusicfor which medical medical discovery karl landsteiner won a nobel prize in 1930 for whichtwo else whose actresserwon a nobel prize in 1930 for field 1930... DiffuSeq (Baseline,N=
work page 1930
-
[60]
karllandsteinerlya 1930 leaves in 1930 for which in his karl landsteinerly a nobel prize in 1930 for which medical medical theaverage theerwon a nobel prize in 1930 for which medical discovery karl landsteiner won a nobel in in 1930 for which D.1. Qualitative Analysis The examples in Table 20 provide significant insights into the behavior of flow matching...
work page 1930
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.