FeRA: Frequency-Energy Constrained Routing for Effective Diffusion Adaptation Fine-Tuning
read the original abstract
Diffusion models have achieved remarkable success in generative modeling, yet how to effectively adapt large pretrained models to new tasks remains challenging. We revisit the reconstruction behavior of diffusion models during denoising to unveil the underlying frequency energy mechanism governing this process. Building upon this observation, we propose FeRA, a frequency driven fine tuning framework that aligns parameter updates with the intrinsic frequency energy progression of diffusion. FeRA establishes a comprehensive frequency energy framework for effective diffusion adaptation fine tuning, comprising three synergistic components: (i) a compact frequency energy indicator that characterizes the latent bandwise energy distribution, (ii) a soft frequency router that adaptively fuses multiple frequency specific adapter experts, and (iii) a frequency energy consistency regularization that stabilizes diffusion optimization and ensures coherent adaptation across bands. Routing operates in both training and inference, with inference time routing dynamically determined by the latent frequency energy. It integrates seamlessly with adapter based tuning schemes and generalizes well across diffusion backbones and resolutions. By aligning adaptation with the frequency energy mechanism, FeRA provides a simple, stable, and compatible paradigm for effective and robust diffusion model adaptation.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
SPOT-E: Test-Time Entropy Shaping with Visual Spotlights for Frozen VLMs
SPOT-E uses entropy shaping on answer predictions with low-entropy anchors to optimize visual spotlights at test time via GRPO for better VLM performance on evidence-intensive tasks.
-
UniCSG: Unified High-Fidelity Content-Constrained Style-Driven Generation via Staged Semantic and Frequency Disentanglement
UniCSG adds staged semantic disentanglement and frequency-aware reconstruction to DiT diffusion models to improve content preservation and style fidelity in both text- and reference-guided generation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.