STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.
Mixed citations
No other representation component is needed: Diffusion transformers can provide representation guidance by themselves
Mixed citation behavior. Most common role is background (40%).
citation-role summary
citation-polarity summary
representative citing papers
Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.
RAE v2 reaches gFID 1.06 on ImageNet-256 in 80 epochs by combining multi-layer encoder sums, complementary REPA targets, and free guidance via output reparameterization.
A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.
Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.
Aligning the DDIM forward diffusion process with flow-matching manifold evolution enables high-quality generation without time conditioning, and class-conditional synthesis is possible with an unconditional denoiser by using separate time spaces per class.
FRAMER improves real-world super-resolution by decomposing features into low- and high-frequency bands via FFT, applying intra- and inter-contrastive losses with adaptive modulators, and using the final layer as teacher for intermediate layers during diffusion denoising.
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.
citing papers explorer
-
STRIDE: Training-Free Diversity Guidance via PCA-Directed Feature Perturbation in Single-Step Diffusion Models
STRIDE boosts diversity in one-step diffusion models by injecting PCA-aligned pink noise into transformer features while preserving text alignment and quality.
-
Don't Retrain, Align: Adapting Autoregressive LMs to Diffusion LMs via Representation Alignment
Layer-wise representation alignment lets diffusion language models reuse semantic structures from frozen autoregressive models, accelerating training by up to 4x without architectural changes beyond the attention mask.
-
Improved Baselines with Representation Autoencoders
RAE v2 reaches gFID 1.06 on ImageNet-256 in 80 epochs by combining multi-layer encoder sums, complementary REPA targets, and free guidance via output reparameterization.
-
Stage-adaptive audio diffusion modeling
A semantic progress signal from SSL discrepancy slope enables three stage-aware mechanisms that improve training efficiency and performance in audio diffusion models over static baselines.
-
Realiz3D: 3D Generation Made Photorealistic via Domain-Aware Learning
Realiz3D decouples visual domain from 3D controls in diffusion models via domain-aware residual adapters to enable photorealistic controllable generation.
-
Exploring Time Conditioning in Diffusion Generative Models from Disjoint Noisy Data Manifolds
Aligning the DDIM forward diffusion process with flow-matching manifold evolution enables high-quality generation without time conditioning, and class-conditional synthesis is possible with an unconditional denoiser by using separate time spaces per class.
-
FRAMER: Frequency-Aligned Self-Distillation with Adaptive Modulation Leveraging Diffusion Priors for Real-World Image Super-Resolution
FRAMER improves real-world super-resolution by decomposing features into low- and high-frequency bands via FFT, applying intra- and inter-contrastive losses with adaptive modulators, and using the final layer as teacher for intermediate layers during diffusion denoising.
-
Elucidating Representation Degradation Problem in Diffusion Model Training
Diffusion models suffer representation degradation at high noise due to recoverability mismatch; ERD mitigates this by dynamic optimization reallocation, accelerating convergence across backbones.
- D-OPSD: On-Policy Self-Distillation for Continuously Tuning Step-Distilled Diffusion Models