pith. sign in

arxiv: 2602.08646 · v3 · pith:TQQNH5I5new · submitted 2026-02-09 · 💻 cs.LG

Gradient Preconditioning for Efficient and Reliable Reward-Guided Generation

classification 💻 cs.LG
keywords rewardgradientmodelsnoisepreconditioningreward-guidedefficientgaussian
0
0 comments X
read the original abstract

We propose a gradient preconditioning method that makes reward-guided generation with one-step generative models both efficient and reliable. Test-time noise optimization can unlock substantially better reward-guided generations from pretrained generative models, but it is prone to reward hacking that degrades quality and is often too slow for practical use. We precondition reward gradients by projecting them onto a carefully designed white Gaussian noise feasible set, a compact spectral set with blockwise norm constraints that tightly captures the statistics and spatial uncorrelatedness of white Gaussian noise. This preconditioning reshapes each gradient update into a noise-aligned direction, driving faster and more effective reward ascent while preventing reward hacking. The projection is closed-form and matches the $O(N \log N)$ complexity of FFT, adding negligible overhead in practice. In experiments on FLUX with four reward models, our approach reaches a comparable Aesthetic Score using only 30% of the wall-clock time required by the state-of-the-art regularization-based method.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

    cs.LG 2026-06 unverdicted novelty 6.0

    NTRK uses a whitening operator to tilt the noise term in diffusion reverse kernels for reward guidance, outperforming baselines with 20x fewer steps on aesthetic tasks.

  2. NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

    cs.LG 2026-06 unverdicted novelty 6.0

    NTRK is a reward-guided diffusion sampler that uses a whitening operator to bias the noise term toward high-reward outcomes, outperforming baselines with up to 20x fewer sampling steps on aesthetic tasks.