pith. machine review for the scientific record. sign in

arxiv: 2505.17384 · v2 · submitted 2025-05-23 · 💻 cs.LG · cs.CV· stat.ML

Recognition: unknown

Variational Autoencoding Discrete Diffusion with Enhanced Dimensional Correlations Modeling

Authors on Pith no claims yet
classification 💻 cs.LG cs.CVstat.ML
keywords diffusiondiscretemodelingdenoisinggenerationmdmsqualitysteps
0
0 comments X
read the original abstract

Discrete diffusion models have recently shown great promise for modeling complex discrete data, with masked diffusion models (MDMs) offering a compelling trade-off between quality and generation speed. MDMs denoise by progressively unmasking multiple dimensions from an all-masked input, but their performance can degrade when using few denoising steps due to limited modeling of inter-dimensional dependencies. In this paper, we propose Variational Autoencoding Discrete Diffusion (VADD), a novel framework that enhances discrete diffusion with latent variable modeling to implicitly capture correlations among dimensions. By introducing an auxiliary recognition model, VADD enables stable training via variational lower bounds maximization and amortized inference over the training set. Our approach retains the efficiency of traditional MDMs while significantly improving sample quality, especially when the number of denoising steps is small. Empirical results on 2D toy data, pixel-level image generation, and text generation demonstrate that VADD consistently outperforms MDM baselines in sample quality with few denoising steps.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Infinite Mask Diffusion for Few-Step Distillation

    cs.CL 2026-05 unverdicted novelty 7.0

    Infinite Mask Diffusion Models use stochastic infinite-state masks to overcome the factorization error lower bound in standard masked diffusion, achieving superior few-step performance on language tasks via distillation.