Gradient Preconditioning for Efficient and Reliable Reward-Guided Generation
read the original abstract
We propose a gradient preconditioning method that makes reward-guided generation with one-step generative models both efficient and reliable. Test-time noise optimization can unlock substantially better reward-guided generations from pretrained generative models, but it is prone to reward hacking that degrades quality and is often too slow for practical use. We precondition reward gradients by projecting them onto a carefully designed white Gaussian noise feasible set, a compact spectral set with blockwise norm constraints that tightly captures the statistics and spatial uncorrelatedness of white Gaussian noise. This preconditioning reshapes each gradient update into a noise-aligned direction, driving faster and more effective reward ascent while preventing reward hacking. The projection is closed-form and matches the $O(N \log N)$ complexity of FFT, adding negligible overhead in practice. In experiments on FLUX with four reward models, our approach reaches a comparable Aesthetic Score using only 30% of the wall-clock time required by the state-of-the-art regularization-based method.
This paper has not been read by Pith yet.
Forward citations
Cited by 2 Pith papers
-
NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment
NTRK uses a whitening operator to tilt the noise term in diffusion reverse kernels for reward guidance, outperforming baselines with 20x fewer steps on aesthetic tasks.
-
NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment
NTRK is a reward-guided diffusion sampler that uses a whitening operator to bias the noise term toward high-reward outcomes, outperforming baselines with up to 20x fewer sampling steps on aesthetic tasks.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.