A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.
Diffuse and disperse: Image generation with representation regularization.arXiv preprint arXiv:2506.09027
5 Pith papers cite this work. Polarity classification is still indexing.
years
2026 5verdicts
UNVERDICTED 5representative citing papers
Continuous adversarial flow models replace MSE in flow matching with adversarial training via a discriminator, improving guidance-free FID on ImageNet from 8.26 to 3.63 for SiT and similar gains for JiT and text-to-image benchmarks.
MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
Med-DisSeg uses a dispersive loss on batch representations plus adaptive multi-scale decoding to achieve state-of-the-art fine-grained segmentation on five medical imaging datasets.
citing papers explorer
-
Stage-adaptive audio diffusion modeling
A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.
-
Continuous Adversarial Flow Models
Continuous adversarial flow models replace MSE in flow matching with adversarial training via a discriminator, improving guidance-free FID on ImageNet from 8.26 to 3.63 for SiT and similar gains for JiT and text-to-image benchmarks.
-
MPDiT: Multi-Patch Global-to-Local Transformer Architecture For Efficient Flow Matching and Diffusion Model
MPDiT uses a hierarchical multi-patch design in transformers to lower computation in diffusion models by handling coarse global features first then fine local details, plus faster-converging embeddings.
-
Premier: Personalized Preference Modulation with Learnable User Embedding in Text-to-Image Generation
Premier learns user-specific embeddings to modulate text-to-image generation, outperforming prior methods on preference alignment, text consistency, and expert ratings even with limited history.
-
Med-DisSeg: Dispersion-Driven Representation Learning for Fine-Grained Medical Image Segmentation
Med-DisSeg uses a dispersive loss on batch representations plus adaptive multi-scale decoding to achieve state-of-the-art fine-grained segmentation on five medical imaging datasets.