Semantic Generative Tuning uses image segmentation as a generative proxy to align misaligned representation spaces in unified multimodal models and improve both perception and generative layout fidelity.
arXiv preprint arXiv:2407.00783 , year =
4 Pith papers cite this work. Polarity classification is still indexing.
verdicts
UNVERDICTED 4representative citing papers
Sparse internal snapshots at canonical low-noise levels from frozen diffusion backbones suffice for competitive out-of-distribution detection without full trajectories or large heads.
Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.
citing papers explorer
-
Semantic Generative Tuning for Unified Multimodal Models
Semantic Generative Tuning uses image segmentation as a generative proxy to align misaligned representation spaces in unified multimodal models and improve both perception and generative layout fidelity.
-
Backbone-Equated Diffusion OOD via Sparse Internal Snapshots
Sparse internal snapshots at canonical low-noise levels from frozen diffusion backbones suffice for competitive out-of-distribution detection without full trajectories or large heads.
-
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Aligning noisy hidden states in diffusion transformers to clean features from pretrained visual encoders speeds up training over 17x and reaches FID 1.42.
-
Teacher-Feature Drifting: One-Step Diffusion Distillation with Pretrained Diffusion Representations
A simplified one-step diffusion distillation uses pretrained teacher features directly for drifting loss plus a mode coverage term, achieving FID 1.58 on ImageNet-64 and 18.4 on SDXL.