Self-conditioned flow language models solve fixed-point iterations, enabling fixed-point flow maps that distill into FMLM* which outperforms SOTA in few-step generation on OpenWebText.
ELF: Embedded Language Flows
9 Pith papers cite this work. Polarity classification is still indexing.
abstract
Diffusion and flow-based models have become the de facto approaches for generating continuous data, e.g., in domains such as images and videos. Their success has attracted growing interest in applying them to language modeling. Unlike their image-domain counterparts, today's leading diffusion language models (DLMs) primarily operate over discrete tokens. In this paper, we show that continuous DLMs can be made effective with minimal adaptation to the discrete domain. We propose Embedded Language Flows (ELF), a class of diffusion models in continuous embedding space based on continuous-time Flow Matching. Unlike existing DLMs, ELF predominantly stays within the continuous embedding space until the final time step, where it maps to discrete tokens using a shared-weight network. This formulation makes it straightforward to adapt established techniques from image-domain diffusion models, e.g., classifier-free guidance (CFG). Experiments show that ELF substantially outperforms leading discrete and continuous DLMs, achieving better generation quality with fewer sampling steps. These results suggest that ELF offers a promising path toward effective continuous DLMs.
years
2026 9representative citing papers
Low Gen-PPL in continuous diffusion LMs results from repetition caused by a 1D contractive attractor in self-conditioning feedback; ACE subtracts the direction to reduce repetition to human levels while preserving quality.
Flow models reach 99.2% Sudoku accuracy in 7 passes and 96.1% on out-of-distribution Sudoku-Extreme by selecting dynamically stable candidates and training with self-conditioning plus DPO to avoid failed outputs.
MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.
Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.
VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.
MoGeFlow learns text-conditioned flows over PartVQ group-specific code embeddings to generate motions, achieving SOTA R-Precision on HumanML3D and KIT-ML while preserving discrete token validity.
ELF-S2T applies audio-conditioned flow-matching on continuous text latents from pre-trained ELF to achieve competitive ASR and S2TT results, with analysis showing shared close-distance confusion in latent space.
Few-step deterministic maps on continuous text latents fail because they cannot resolve discrete branch choices before sharp categorical readouts, with failure governed by decoder sharpness rather than transport accuracy.
citing papers explorer
-
Self-conditioned Flow Map Language Models via Fixed-point Flows
Self-conditioned flow language models solve fixed-point iterations, enabling fixed-point flow maps that distill into FMLM* which outperforms SOTA in few-step generation on OpenWebText.
-
Low Perplexity is Repetition: A One-Dimensional Self-Conditioning Attractor in Continuous Diffusion LMs
Low Gen-PPL in continuous diffusion LMs results from repetition caused by a 1D contractive attractor in self-conditioning feedback; ACE subtracts the direction to reduce repetition to human levels while preserving quality.
-
Masked Language Flow Models
MLFMs combine masking with continuous flows to scale flow-based language models to reasoning and instruction-following tasks on GSM8K and MT-Bench.
-
Continuous Language Diffusion as a Decoder-Interface Problem
Continuous language diffusion works by entering high-margin decoder basins where frozen T5 embeddings recover 93-96% of native decisions and linear readouts reach 97.9% agreement, implying models should be evaluated as representation-decoder systems.
-
VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination
VoidPadding decouples padding from termination in MDLMs via a new [VOID] token, delivering +17.84 average benchmark points and 55.7% fewer decoding steps on Dream-7B-Instruct.