Recognition: 2 theorem links
· Lean TheoremELF: Embedded Language Flows
Pith reviewed 2026-05-12 03:22 UTC · model grok-4.3
The pith
Continuous embedding flows generate higher-quality language with fewer sampling steps than discrete diffusion models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ELF demonstrates that continuous-time Flow Matching in embedding space, with a final shared-weight mapping to discrete tokens, substantially outperforms existing discrete and continuous diffusion language models in generation quality while requiring fewer sampling steps.
What carries the argument
Embedded Language Flows (ELF): continuous diffusion models based on Flow Matching that remain in embedding space until the last time step, where a shared-weight network produces discrete tokens.
If this is right
- ELF achieves better generation quality than leading discrete and continuous DLMs.
- It requires fewer sampling steps to reach that quality.
- Techniques such as classifier-free guidance from continuous domains apply straightforwardly to language.
- This points to continuous DLMs as a viable direction for language generation.
Where Pith is reading between the lines
- Similar embedding-space strategies might improve diffusion models for other discrete sequences like code or music.
- The reduced step count could lower inference costs in practical language applications.
Load-bearing premise
The assumption that keeping the model mostly in continuous embedding space with only a final mapping to tokens is sufficient for effective discrete language modeling.
What would settle it
A benchmark experiment on language generation tasks where ELF fails to match or exceed the quality of top discrete DLMs or requires more sampling steps.
Figures
read the original abstract
Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying them to language modeling. Unlike their image-domain counterparts, today's leading diffusion language models (DLMs) primarily operate over discrete tokens. In this paper, we show that continuous DLMs can be made effective with minimal adaptation to the discrete domain. We propose Embedded Language Flows (ELF), a class of diffusion models in continuous embedding space based on continuous-time Flow Matching. Unlike existing DLMs, ELF predominantly stays within the continuous embedding space until the final time step, where it maps to discrete tokens using a shared-weight network. This formulation makes it straightforward to adapt established techniques from image-domain diffusion models, e.g., classifier-free guidance (CFG). Experiments show that ELF substantially outperforms leading discrete and continuous DLMs, achieving better generation quality with fewer sampling steps. These results suggest that ELF offers a promising path toward effective continuous DLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Embedded Language Flows (ELF), a class of continuous-time Flow Matching models for language that operate primarily in continuous embedding space and only discretize to tokens at the final timestep via a shared-weight network. It claims this formulation enables straightforward adaptation of image-domain techniques such as classifier-free guidance and that experiments demonstrate ELF substantially outperforms leading discrete and continuous diffusion language models in generation quality while requiring fewer sampling steps.
Significance. If the experimental claims are substantiated, the work could provide a meaningful path for transferring continuous diffusion and flow-matching advances from images to discrete language modeling, potentially improving sampling efficiency and generation quality without heavy domain-specific redesign.
major comments (1)
- Abstract: The central claim that 'ELF substantially outperforms leading discrete and continuous DLMs, achieving better generation quality with fewer sampling steps' is presented without any quantitative metrics, baselines, datasets, sampling procedures, or statistical controls. This absence is load-bearing because the abstract supplies no evidence against which the superiority assertion can be evaluated.
Simulated Author's Rebuttal
We thank the referee for their review and for highlighting an important point about the abstract. We address the major comment below.
read point-by-point responses
-
Referee: Abstract: The central claim that 'ELF substantially outperforms leading discrete and continuous DLMs, achieving better generation quality with fewer sampling steps' is presented without any quantitative metrics, baselines, datasets, sampling procedures, or statistical controls. This absence is load-bearing because the abstract supplies no evidence against which the superiority assertion can be evaluated.
Authors: We agree that the abstract, in its current form, presents the performance claim at a high level without supporting numbers. The full manuscript contains the requested details in the Experiments section, including quantitative comparisons against leading discrete and continuous DLMs on standard language modeling benchmarks, specific sampling step counts, and the evaluation protocol. To make the abstract more informative and self-contained, we will revise it to incorporate key quantitative highlights (e.g., relative improvements in generation quality metrics and the reduction in sampling steps) while retaining its concise nature. This change directly addresses the concern without misrepresenting the results. revision: yes
Circularity Check
No significant circularity; abstract contains no derivations
full rationale
The provided document is limited to the abstract, which introduces ELF as a continuous embedding-space model based on Flow Matching and reports empirical outperformance without any equations, parameter-fitting steps, self-citations, or derivation chains. No load-bearing claim reduces by construction to its inputs, and none of the enumerated circularity patterns (self-definitional, fitted-input prediction, self-citation load-bearing, etc.) can be instantiated because no technical derivations are present. The result is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Continuous-time flow matching can be applied effectively to language data represented in continuous embedding space.
invented entities (1)
-
ELF
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
ELF predominantly stays within the continuous embedding space until the final time step, where it maps to discrete tokens using a shared-weight network... based on continuous-time Flow Matching
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose Embedded Language Flows (ELF), a class of diffusion models in continuous embedding space
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Joint distillation for fast likelihood evaluation and sampling in flow-based models
Xinyue Ai, Yutong He, Albert Gu, Ruslan Salakhutdinov, J Zico Kolter, Nicholas Matthew Boffi, and Max Simchowitz. Joint distillation for fast likelihood evaluation and sampling in flow-based models. InICLR, 2026. 6
work page 2026
-
[2]
Stochastic interpolants: A unifying framework for flows and diffusions.JMLR, 2025
Michael Albergo, Nicholas M Boffi, and Eric Vanden-Eijnden. Stochastic interpolants: A unifying framework for flows and diffusions.JMLR, 2025. 2, 3, 15
work page 2025
-
[3]
Building normalizing flows with stochastic interpolants
Michael Samuel Albergo and Eric Vanden-Eijnden. Building normalizing flows with stochastic interpolants. InICLR, 2023. 1, 2, 4
work page 2023
-
[4]
Encoder- decoder diffusion language models for efficient training and inference
Marianne Arriola, Yair Schiff, Hao Phung, Aaron Gokaslan, and V olodymyr Kuleshov. Encoder- decoder diffusion language models for efficient training and inference. InNeurIPS, 2025. 3, 8, 9, 27
work page 2025
-
[5]
Structured denoising diffusion models in discrete state-spaces
Jacob Austin, Daniel D Johnson, Jonathan Ho, Daniel Tarlow, and Rianne Van Den Berg. Structured denoising diffusion models in discrete state-spaces. InNeurIPS, 2021. 1, 2, 3
work page 2021
-
[6]
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas Müller, Dustin Podell, Robin Rombach, Harry Saini, Axel Sauer, and Luke Smith. FLUX.1 Kontext: Flow matching for in-context image ...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[7]
Findings of the 2014 workshop on statistical machine translation
Ondrej Bojar, Christian Buck, Christian Federmann, Barry Haddow, Philipp Koehn, Johannes Leveling, Christof Monz, Pavel Pecina, Matt Post, Herve Saint-Amand, Radu Soricut, Lucia Specia, and Ales Tamchyna. Findings of the 2014 workshop on statistical machine translation. InACL Workshop on Statistical Machine Translation, 2014. 2, 6
work page 2014
-
[8]
Visual generation without guidance
Huayu Chen, Kai Jiang, Kaiwen Zheng, Jianfei Chen, Hang Su, and Jun Zhu. Visual generation without guidance. InICML, 2025. 6, 18
work page 2025
-
[9]
Analog bits: Generating discrete data using diffusion models with self-conditioning
Ting Chen, Ruixiang Zhang, and Geoffrey Hinton. Analog bits: Generating discrete data using diffusion models with self-conditioning. InICLR, 2023. 5, 18
work page 2023
-
[10]
LangFlow: Continuous Diffusion Rivals Discrete in Language Modeling
Yuxin Chen, Chumeng Liang, Hangke Sui, Ruihan Guo, Chaoran Cheng, Jiaxuan You, and Ge Liu. Langflow: Continuous diffusion rivals discrete in language modeling.arXiv preprint arXiv:2604.11748, 2026. 2, 3, 6, 8, 15, 25
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[11]
Beyond autoregression: Fast LLMs via self-distillation through time
Justin Deschenaux and Caglar Gulcehre. Beyond autoregression: Fast LLMs via self-distillation through time. InICLR, 2025. 8
work page 2025
-
[12]
The diffusion duality, chapter ii:ψ-samplers and efficient curriculum
Justin Deschenaux, Caglar Gulcehre, and Subham Sekhar Sahoo. The diffusion duality, chapter ii:ψ-samplers and efficient curriculum. InICLR, 2026. 3
work page 2026
-
[13]
H., Doucet, A., Strudel, R., Dyer, C., Durkan, C., et al
Sander Dieleman, Laurent Sartran, Arman Roshannai, Nikolay Savinov, Yaroslav Ganin, Pierre H Richemond, Arnaud Doucet, Robin Strudel, Chris Dyer, Conor Durkan, Curtis Hawthorne, Rémi Leblond, Will Grathwohl, and Jonas Adler. Continuous diffusion for categor- ical data.arXiv preprint arXiv:2211.15089, 2022. 1, 2, 5, 9, 15, 27
-
[14]
Scaling rectified flow Transformers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, Dustin Podell, Tim Dockhorn, Zion English, and Robin Rombach. Scaling rectified flow Transformers for high-resolution image synthesis. InICML, 2024. 2, 6
work page 2024
-
[15]
Empowering diffusion models on the embedding space for text generation
Zhujin Gao, Junliang Guo, Xu Tan, Yongxin Zhu, Fang Zhang, Jiang Bian, and Linli Xu. Empowering diffusion models on the embedding space for text generation. InNAACL, 2024. 2, 15
work page 2024
-
[16]
Mean flows for one-step generative modeling
Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling. InNeurIPS, 2025. 6, 18 10
work page 2025
-
[17]
Improved Mean Flows: On the Challenges of Fastforward Generative Models
Zhengyang Geng, Yiyang Lu, Zongze Wu, Eli Shechtman, J Zico Kolter, and Kaiming He. Improved mean flows: On the challenges of fastforward generative models.arXiv preprint arXiv:2512.02012, 2025. 6, 18
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[18]
Aaron Gokaslan and Vanya Cohen. Openwebtext corpus, 2019. 6, 25
work page 2019
-
[19]
Diffuseq: Sequence to sequence text generation with diffusion models
Shansan Gong, Mukai Li, Jiangtao Feng, Zhiyong Wu, and LingPeng Kong. Diffuseq: Sequence to sequence text generation with diffusion models. InICLR, 2023. 1, 2, 15
work page 2023
-
[20]
Diffucoder: Understanding and improving masked diffusion models for code generation
Shansan Gong, Ruixiang Zhang, Huangjie Zheng, Jiatao Gu, Navdeep Jaitly, Lingpeng Kong, and Yizhe Zhang. Diffucoder: Understanding and improving masked diffusion models for code generation. InICLR, 2026. 3
work page 2026
-
[21]
Likelihood-based diffusion language models
Ishaan Gulrajani and Tatsunori B Hashimoto. Likelihood-based diffusion language models. In NeurIPS, 2023. 2, 15
work page 2023
-
[22]
Xiaochuang Han, Sachin Kumar, and Yulia Tsvetkov. SSD-LM: Semi-autoregressive simplex- based diffusion language model for text generation and modular control. InACL, 2023. 2, 15
work page 2023
-
[23]
Diffusionbert: Improving generative masked language models with diffusion models
Zhengfu He, Tianxiang Sun, Qiong Tang, Kuanning Wang, Xuan-Jing Huang, and Xipeng Qiu. Diffusionbert: Improving generative masked language models with diffusion models. InACL,
-
[24]
Query-key normalization for Transformers
Alex Henry, Prudhvi Raj Dachapally, Shubham Shantaram Pawar, and Yuxuan Chen. Query-key normalization for Transformers. InFindings of EMNLP, 2020. 24
work page 2020
-
[25]
Classifier-free diffusion guidance
Jonathan Ho and Tim Salimans. Classifier-free diffusion guidance. InNeurIPS Workshops,
-
[26]
Denoising diffusion probabilistic models
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In NeurIPS, 2020. 1, 2, 3, 15, 16
work page 2020
-
[27]
Continuous diffusion model for language modeling
Jaehyeong Jo and Sung Ju Hwang. Continuous diffusion model for language modeling. In NeurIPS, 2025. 2, 15
work page 2025
-
[28]
Muon: An optimizer for hidden layers in neural networks
Keller Jordan, Yuchen Jin, Vlado Boza, You Jiacheng, Franz Cecista, Laker Newhouse, and Jeremy Bernstein. Muon: An optimizer for hidden layers in neural networks. Technical report, Keller Jordan blog, 2024. 6, 23
work page 2024
-
[29]
Elucidating the design space of diffusion-based generative models
Tero Karras, Miika Aittala, Timo Aila, and Samuli Laine. Elucidating the design space of diffusion-based generative models. InNeurIPS, 2022. 23
work page 2022
-
[30]
Flow Map Language Models: One-step Language Modeling via Continuous Denoising
Chanhyuk Lee, Jaehoon Yoo, Manan Agarwal, Sheel Shah, Jerry Huang, Aditi Raghunathan, Seunghoon Hong, Nicholas M Boffi, and Jinwoo Kim. Flow map language models: One-step language modeling via continuous denoising.arXiv preprint arXiv:2602.16813, 2026. 2, 3, 5, 6, 8, 15, 25
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[31]
Lijiang Li, Zuwei Long, Yunhang Shen, Heting Gao, Haoyu Cao, Xing Sun, Caifeng Shan, Ran He, and Chaoyou Fu. Omni-diffusion: Unified multimodal understanding and generation with masked discrete diffusion.arXiv preprint arXiv:2603.06577, 2026. 3
-
[32]
Back to Basics: Let Denoising Generative Models Denoise
Tianhong Li and Kaiming He. Back to basics: Let denoising generative models denoise.arXiv preprint arXiv:2511.13720, 2025. 2, 4, 6, 20, 21, 22
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[33]
A survey on diffusion language models,
Tianyi Li, Mingda Chen, Bowei Guo, and Zhiqiang Shen. A survey on diffusion language models.arXiv preprint arXiv:2508.10875, 2025. 1, 3
-
[34]
Diffusion-LM improves controllable text generation
Xiang Li, John Thickstun, Ishaan Gulrajani, Percy S Liang, and Tatsunori B Hashimoto. Diffusion-LM improves controllable text generation. InNeurIPS, 2022. 1, 2, 15
work page 2022
-
[35]
ROUGE: A package for automatic evaluation of summaries
Chin-Yew Lin. ROUGE: A package for automatic evaluation of summaries. InACL Workshop on Text Summarization Branches Out, 2004. 6 11
work page 2004
-
[36]
Zhenghao Lin, Yeyun Gong, Yelong Shen, Tong Wu, Zhihao Fan, Chen Lin, Nan Duan, and Weizhu Chen. Text generation with diffusion language models: A pre-training approach with continuous paragraph denoise. InICML, 2023. 2, 15
work page 2023
-
[37]
Flow matching for generative modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling. InICLR, 2023. 1, 2, 3, 4, 15
work page 2023
-
[38]
Flow straight and fast: Learning to generate and transfer data with rectified flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InICLR, 2023. 1, 2, 3, 4, 15
work page 2023
-
[39]
Decoupled weight decay regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. InICLR, 2019. 23
work page 2019
-
[40]
Discrete diffusion modeling by estimating the ratios of the data distribution
Aaron Lou, Chenlin Meng, and Stefano Ermon. Discrete diffusion modeling by estimating the ratios of the data distribution. InICML, 2024. 1
work page 2024
-
[41]
Latent diffusion for language generation
Justin Lovelace, Varsha Kishore, Chao Wan, Eliot Shekhtman, and Kilian Q Weinberger. Latent diffusion for language generation. InNeurIPS, 2023. 2, 3, 5, 15, 27
work page 2023
-
[42]
Diffusion guided language modeling
Justin Lovelace, Varsha Kishore, Yiwei Chen, and Kilian Q Weinberger. Diffusion guided language modeling. InFindings of ACL, 2024. 3, 15
work page 2024
-
[43]
SiT: Exploring flow and diffusion-based generative models with scalable interpolant Transformers
Nanye Ma, Mark Goldstein, Michael S Albergo, Nicholas M Boffi, Eric Vanden-Eijnden, and Saining Xie. SiT: Exploring flow and diffusion-based generative models with scalable interpolant Transformers. InECCV, 2024. 2, 5, 19
work page 2024
-
[44]
Tess: Text-to-text self-conditioned simplex diffusion
Rabeeh Karimi Mahabadi, Hamish Ivison, Jaesung Tae, James Henderson, Iz Beltagy, Matthew E Peters, and Arman Cohan. Tess: Text-to-text self-conditioned simplex diffusion. In EACL, 2024. 2, 5, 15
work page 2024
-
[45]
Cosmos: Compressed and smooth latent space for text diffusion modeling
Viacheslav Meshchaninov, Egor Chimbulatov, Alexander Shabalin, Aleksandr Abramov, and Dmitry Vetrov. Cosmos: Compressed and smooth latent space for text diffusion modeling. In NeurIPS, 2025. 2, 3, 15
work page 2025
-
[46]
Shashi Narayan, Shay B. Cohen, and Mirella Lapata. Don’t give me the details, just the summary! topic-aware convolutional neural networks for extreme summarization. InEMNLP,
-
[47]
Improved denoising diffusion probabilistic models
Alexander Quinn Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. InICML, 2021. 2, 3, 16
work page 2021
-
[48]
Large language diffusion models
Shen Nie, Fengqi Zhu, Zebin You, Xiaolu Zhang, Jingyang Ou, Jun Hu, Jun Zhou, Yankai Lin, Ji-Rong Wen, and Chongxuan Li. Large language diffusion models. InNeurIPS, 2025. 1, 3
work page 2025
-
[49]
BLEU: a method for automatic evaluation of machine translation
Kishore Papineni, Salim Roukos, Todd Ward, and Wei-Jing Zhu. BLEU: a method for automatic evaluation of machine translation. InACL, 2002. 6
work page 2002
-
[50]
Scalable diffusion models with Transformers
William Peebles and Saining Xie. Scalable diffusion models with Transformers. InICCV, 2023. 18, 24
work page 2023
-
[51]
Peter Potaptchik, Jason Yim, Adhi Saravanan, Peter Holderrieth, Eric Vanden-Eijnden, and Michael S Albergo. Discrete flow maps.arXiv preprint arXiv:2604.09784, 2026. 3, 5, 15
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[52]
Language models are unsupervised multitask learners.OpenAI blog, 2019
Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, and Ilya Sutskever. Language models are unsupervised multitask learners.OpenAI blog, 2019. 6
work page 2019
-
[53]
Exploring the limits of transfer learning with a unified text-to-text transformer.JMLR, 2020
Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, and Peter J Liu. Exploring the limits of transfer learning with a unified text-to-text transformer.JMLR, 2020. 4, 6, 7, 25
work page 2020
-
[54]
High-resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InCVPR, 2022. 2
work page 2022
-
[55]
Categorical flow maps.arXiv preprint arXiv:2602.12233, 2026
Daan Roos, Oscar Davis, Floor Eijkelboom, Michael Bronstein, Max Welling, ˙Ismail ˙Ilkan Ceylan, Luca Ambrogioni, and Jan-Willem van de Meent. Categorical flow maps.arXiv preprint arXiv:2602.12233, 2026. 3, 15 12
-
[56]
Simple and effective masked diffusion language models
Subham Sahoo, Marianne Arriola, Yair Schiff, Aaron Gokaslan, Edgar Marroquin, Justin Chiu, Alexander Rush, and V olodymyr Kuleshov. Simple and effective masked diffusion language models. InNeurIPS, 2024. 1, 2, 3, 6, 8, 9, 25
work page 2024
-
[57]
Subham Sekhar Sahoo, Justin Deschenaux, Aaron Gokaslan, Guanghan Wang, Justin Chiu, and V olodymyr Kuleshov. The diffusion duality. InICML, 2025. 1, 2, 3, 6, 8, 9, 25, 27
work page 2025
-
[58]
Scaling beyond masked diffusion language models.arXiv preprint arXiv:2602.15014, 2026
Subham Sekhar Sahoo, Jean-Marie Lemercier, Zhihan Yang, Justin Deschenaux, Jingyu Liu, John Thickstun, and Ante Jukic. Scaling beyond masked diffusion language models.arXiv preprint arXiv:2602.15014, 2026. 1, 3
-
[59]
TEncDM: Understanding the properties of the diffusion model in the space of language model encodings
Alexander Shabalin, Viacheslav Meshchaninov, Egor Chimbulatov, Vladislav Lapikov, Roman Kim, Grigory Bartosh, Dmitry Molchanov, Sergey Markov, and Dmitry Vetrov. TEncDM: Understanding the properties of the diffusion model in the space of language model encodings. InAAAI, 2025. 3, 5, 15
work page 2025
-
[60]
Why gaussian diffusion models fail on discrete data?arXiv preprint arXiv:2604.02028,
Alexander Shabalin, Simon Elistratov, Viacheslav Meshchaninov, Ildus Sadrtdinov, and Dmitry Vetrov. Why gaussian diffusion models fail on discrete data?arXiv preprint arXiv:2604.02028,
-
[61]
GLU Variants Improve Transformer
Noam Shazeer. GLU variants improve Transformer.arXiv preprint arXiv:2002.05202, 2020. 24
work page internal anchor Pith review Pith/arXiv arXiv 2002
-
[62]
Junzhe Shen, Jieru Zhao, Ziwei He, and Zhouhan Lin. Codar: Continuous diffusion language models are more powerful than you think.arXiv preprint arXiv:2603.02547, 2026. 2, 3, 15
-
[63]
Deep unsuper- vised learning using nonequilibrium thermodynamics
Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsuper- vised learning using nonequilibrium thermodynamics. InICML, 2015. 1, 2
work page 2015
-
[64]
Score-based generative modeling through stochastic differential equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. InICLR,
-
[65]
Yuxuan Song, Zheng Zhang, Cheng Luo, Pengyang Gao, Fan Xia, Hao Luo, Zheng Li, Yuehang Yang, Hongli Yu, Xingwei Qu, Yuwei Fu, Jing Su, Ge Zhang, Wenhao Huang, Mingxuan Wang, Lin Yan, Xiaoying Jia, Jingjing Liu, Wei-Ying Ma, Ya-Qin Zhang, Yonghui Wu, and Hao Zhou. Seed diffusion: A large-scale diffusion language model with high-speed inference.arXiv prepri...
-
[66]
Self-conditioned embedding diffusion for text generation
Robin Strudel, Corentin Tallec, Florent Altché, Yilun Du, Yaroslav Ganin, Arthur Mensch, Will Grathwohl, Nikolay Savinov, Sander Dieleman, Laurent Sifre, and Rémi Leblond. Self- conditioned embedding diffusion for text generation.arXiv preprint arXiv:2211.04236, 2022. 2, 5, 15
-
[67]
Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024
Jianlin Su, Murtadha Ahmed, Yu Lu, Shengfeng Pan, Wen Bo, and Yunfeng Liu. Roformer: Enhanced transformer with rotary position embedding.Neurocomputing, 568:127063, 2024. 24
work page 2024
-
[68]
Tess 2: A large-scale generalist diffusion language model
Jaesung Tae, Hamish Ivison, Sachin Kumar, and Arman Cohan. Tess 2: A large-scale generalist diffusion language model. InACL, 2025. 2, 15
work page 2025
-
[69]
Diffusion models without classifier- free guidance.arXiv preprint arXiv:2502.12154, 2025
Zhicong Tang, Jianmin Bao, Dong Chen, and Baining Guo. Diffusion models without classifier- free guidance.arXiv preprint arXiv:2502.12154, 2025. 6, 18
-
[70]
Wan: Open and Advanced Large-Scale Video Generative Models
Wan Team, Ang Wang, Baole Ai, Bin Wen, Chaojie Mao, Chen-Wei Xie, Di Chen, Feiwu Yu, Haiming Zhao, Jianxiao Yang, et al. Wan: Open and advanced large-scale video generative models.arXiv preprint arXiv:2503.20314, 2025. 2
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[71]
Remasking discrete diffusion models with inference-time scaling
Guanghan Wang, Yair Schiff, Subham Sekhar Sahoo, and V olodymyr Kuleshov. Remasking discrete diffusion models with inference-time scaling. InNeurIPS, 2025. 3
work page 2025
-
[72]
InfoDiffusion: Information entropy aware diffusion process for non-autoregressive text generation
Renzhi Wang, Jing Li, and Piji Li. InfoDiffusion: Information entropy aware diffusion process for non-autoregressive text generation. InFindings of EMNLP, 2023. 2, 15 13
work page 2023
-
[73]
Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding
Chengyue Wu, Hao Zhang, Shuchen Xue, Zhijian Liu, Shizhe Diao, Ligeng Zhu, Ping Luo, Song Han, and Enze Xie. Fast-dllm: Training-free acceleration of diffusion llm by enabling kv cache and parallel decoding. InICLR, 2026. 3
work page 2026
-
[74]
AR-Diffusion: Auto-regressive diffusion model for text generation
Tong Wu, Zhihao Fan, Xiao Liu, Hai-Tao Zheng, Yeyun Gong, Jian Jiao, Juntao Li, Jian Guo, Nan Duan, and Weizhu Chen. AR-Diffusion: Auto-regressive diffusion model for text generation. InNeurIPS, 2023. 2, 15
work page 2023
-
[75]
Mmada: Multimodal large diffusion language models
Ling Yang, Ye Tian, Bowen Li, Xinchen Zhang, Ke Shen, Yunhai Tong, and Mengdi Wang. Mmada: Multimodal large diffusion language models. InNeurIPS, 2025. 3
work page 2025
-
[76]
Dream 7B: Diffusion Large Language Models
Jiacheng Ye, Zhihui Xie, Lin Zheng, Jiahui Gao, Zirui Wu, Xin Jiang, Zhenguo Li, and Lingpeng Kong. Dream 7b: Diffusion large language models.arXiv preprint arXiv:2508.15487, 2025. 1, 3
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[77]
Jiasheng Ye, Zaixiang Zheng, Yu Bao, Lihua Qian, and Mingxuan Wang. DINOISER: Diffused conditional sequence learning by manipulating noises.Transactions of the Association for Computational Linguistics, 2024. 2, 15
work page 2024
-
[78]
Zebin You, Shen Nie, Xiaolu Zhang, Jun Hu, Jun Zhou, Zhiwu Lu, Ji-Rong Wen, and Chongxuan Li. Llada-v: Large language diffusion models with visual instruction tuning.arXiv preprint arXiv:2505.16933, 2025. 3
-
[79]
Seqdiffuseq: Text diffusion with encoder-decoder transformers
Hongyi Yuan, Zheng Yuan, Chuanqi Tan, Fei Huang, and Songfang Huang. Seqdiffuseq: Text diffusion with encoder-decoder transformers. InNAACL, 2024. 2, 5, 9, 15
work page 2024
-
[80]
Root mean square layer normalization
Biao Zhang and Rico Sennrich. Root mean square layer normalization. InNeurIPS, 2019. 24
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.