pith. sign in

Improved techniques for fine-tuning flow models via adjoint matching: a deterministic control pipeline

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it
abstract

We propose a deterministic adjoint matching framework that formulates human preference alignment for flow-based generative models as an optimal control problem over velocity fields. One can directly regress the control toward a value-gradient-induced target under the current policy, leading to a simple and stable training objective. Building on this perspective, we introduce a truncated adjoint scheme that focuses computation on the terminal portion of the trajectory, where reward-relevant signals concentrate, which yields substantial computational savings while preserving alignment quality. We further generalize the framework beyond standard KL-based regularization, allowing more flexible trade-offs between alignment strength and distributional preservation. Experiments on SiT-XL/2 and FLUX.2-Klein-4B demonstrate consistent gains across multiple alignment metrics, along with substantially improved diversity and mode preservation.

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • OPD+: Rethinking the Advantage Design for On-Policy Distillation cs.LG · 2026-05-31 · unverdicted · none · ref 4 · internal anchor

    OPD+ removes the bias from stop-gradient in on-policy distillation by deriving correct gradients for f-divergences, outperforming standard KL-based methods on math reasoning and tool-use tasks.