Follow the Mean: Reference-Guided Flow Matching

Floor Eijkelboom; Jan-Willem van de Meent; Maksim Zhdanov; Pedro M. P. Curvo

arxiv: 2605.10302 · v3 · pith:RYP5JXVMnew · submitted 2026-05-11 · 💻 cs.LG

Follow the Mean: Reference-Guided Flow Matching

Pedro M. P. Curvo , Maksim Zhdanov , Floor Eijkelboom , Jan-Willem van de Meent This is my paper

Pith reviewed 2026-05-13 06:10 UTC · model grok-4.3

classification 💻 cs.LG

keywords meanflowmatchingreferencecontrolcontrollablegenerationguidance

0 comments

The pith

Flow matching admits controllable generation by shifting the conditional endpoint mean computed from a reference set, enabling training-free guidance on frozen pretrained models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Generative models turn noise into images by following a smooth path. In flow matching, that path is defined by a velocity field. The paper observes that when the path is deterministic, the velocity at any point depends only on the average of the possible ending points. By swapping which set of reference images you average, you change where the flow heads without touching the model weights. They demonstrate this with a training-free correction applied to a 4-billion-parameter FLUX model to control color, identity, and structure while the prompt stays fixed. A second version adds a small learned refiner around an explicit mean anchor so the reference set can be swapped at test time while still matching the quality of standard unconditional models on animal-face datasets.

Core claim

For deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself.

Load-bearing premise

That the interpolants remain deterministic and that the endpoint mean fully determines the velocity field without additional dependencies on the reference distribution or noise schedule.

Figures

Figures reproduced from arXiv: 2605.10302 by Floor Eijkelboom, Jan-Willem van de Meent, Maksim Zhdanov, Pedro M. P. Curvo.

**Figure 2.** Figure 2: Reference-mean guidance on the two-moons distribution. The model and all other settings [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗

**Figure 3.** Figure 3: Reference-set swaps on frozen FLUX.2-klein. Prompt and noise seed are fixed within each column. The generated output shifts systematically in color, object identity, and style as the reference set changes. 6 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Qualitative evidence of structural control on frozen [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: SPG preserves unconditional generation quality while enabling inference-time control [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: SPG preserves generation quality, avoids memorization, and enables inference-time control [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Examples from the GenEval protocol. Each column shows a representative reference-bank [PITH_FULL_IMAGE:figures/full_fig_p027_7.png] view at source ↗

**Figure 8.** Figure 8: Two-moons control. Top: t changes with the reference set fixed. Bottom: the reference composition changes with the model fixed. MNIST. We repeat the analysis on MNIST digits (0 and 1), where the reference set now operates on image-space representations [PITH_FULL_IMAGE:figures/full_fig_p028_8.png] view at source ↗

**Figure 9.** Figure 9: Inference-time condition changes; dataset and model are fixed. [PITH_FULL_IMAGE:figures/full_fig_p029_9.png] view at source ↗

**Figure 10.** Figure 10: MNIST steering with soft-labeled references. The same model generates ones or zeros [PITH_FULL_IMAGE:figures/full_fig_p029_10.png] view at source ↗

**Figure 11.** Figure 11: The three βt schedules evaluated in this ablation, shown for β0 = 1 before the shared late-time cutoff at t = 0.85: constant (βt = β0), quadratic decay (βt = β0(1 − t) 2 ), and bell-shaped (βt = 4β0t(1 − t)). Constant applies uniform guidance until the cutoff; quadratic decay front-loads guidance and vanishes at t = 1, avoiding the (1 − t) −1 instability; bell-shaped guidance peaks at t = 0.5 and suppress… view at source ↗

**Figure 12.** Figure 12: Ablation of guidance strength β0 for the constant schedule. The constant schedule applies uniform guidance at every step. At moderate β0 the target attribute transfers cleanly, but above β0 ≈ 1 artifacts appear near t = 1, visible as oversaturated colors and structural distortion. E.4 Reference-Set Size We study the effect of the reference-set size on the diversity of generated outputs. We fix the prompt,… view at source ↗

**Figure 13.** Figure 13: Ablation of guidance strength β0 for the bell-shaped schedule. The bell-shaped schedule concentrates guidance around the midpoint of the trajectory and suppresses both early and late corrections. Relative to the constant schedule, it delays attribute transfer slightly but remains stable at larger β0 values. a single mode. This behavior contrasts with guidance methods that often reduce diversity as control… view at source ↗

**Figure 14.** Figure 14: Ablation of guidance strength β0 for the quadratic decay schedule. The quadratic schedule front-loads guidance and decays to zero at t = 1, cancelling the late-time divergence. Across the full β0 range it provides the cleanest attribute transfer with the fewest late-time artifacts, which is why this schedule is used in the main experiments. E.5 Number of Function Evaluations (NFE) We study the effect of t… view at source ↗

**Figure 15.** Figure 15: Reference-set size ablation. LPIPS diversity increases with the number of reference [PITH_FULL_IMAGE:figures/full_fig_p034_15.png] view at source ↗

**Figure 16.** Figure 16: Ring-leap control task across guidance strengths and solver budgets. Columns vary NFE, [PITH_FULL_IMAGE:figures/full_fig_p035_16.png] view at source ↗

**Figure 17.** Figure 17: Prompt–reference interaction. Rows change the prompt; columns change the reference [PITH_FULL_IMAGE:figures/full_fig_p036_17.png] view at source ↗

**Figure 18.** Figure 18: Quantitative controllability under reference composition for the prompt [PITH_FULL_IMAGE:figures/full_fig_p037_18.png] view at source ↗

**Figure 19.** Figure 19: Qualitative controllability for the prompt [PITH_FULL_IMAGE:figures/full_fig_p038_19.png] view at source ↗

**Figure 20.** Figure 20: Qualitative controllability for the prompt [PITH_FULL_IMAGE:figures/full_fig_p039_20.png] view at source ↗

**Figure 21.** Figure 21: SPG diversity as a function of reference-set size. Average pairwise LPIPS increases with [PITH_FULL_IMAGE:figures/full_fig_p039_21.png] view at source ↗

**Figure 22.** Figure 22: White-background reference-bank comparison. The reference bank consists of examples [PITH_FULL_IMAGE:figures/full_fig_p040_22.png] view at source ↗

**Figure 23.** Figure 23: Reference bank of 20 images of pink elephants. [PITH_FULL_IMAGE:figures/full_fig_p041_23.png] view at source ↗

**Figure 24.** Figure 24: Reference bank of 20 images of blue elephants. [PITH_FULL_IMAGE:figures/full_fig_p042_24.png] view at source ↗

**Figure 25.** Figure 25: Reference bank of 20 images of giraffes. [PITH_FULL_IMAGE:figures/full_fig_p043_25.png] view at source ↗

**Figure 26.** Figure 26: Reference bank of 20 images of zebras. 44 [PITH_FULL_IMAGE:figures/full_fig_p044_26.png] view at source ↗

**Figure 27.** Figure 27: Reference bank of 20 images of elephants. [PITH_FULL_IMAGE:figures/full_fig_p045_27.png] view at source ↗

**Figure 28.** Figure 28: Reference bank of 20 images of keyholes. [PITH_FULL_IMAGE:figures/full_fig_p046_28.png] view at source ↗

**Figure 29.** Figure 29: Reference bank of 20 images of Van Gogh style images. [PITH_FULL_IMAGE:figures/full_fig_p047_29.png] view at source ↗

**Figure 30.** Figure 30: Reference bank of 20 pencil-sketch house images. [PITH_FULL_IMAGE:figures/full_fig_p048_30.png] view at source ↗

**Figure 31.** Figure 31: Reference bank of 20 cinematic house images. [PITH_FULL_IMAGE:figures/full_fig_p049_31.png] view at source ↗

**Figure 32.** Figure 32: Reference bank of three hand-pose images used for the sign-of-the-horns experiment. [PITH_FULL_IMAGE:figures/full_fig_p049_32.png] view at source ↗

read the original abstract

Existing approaches to controllable generation typically rely on fine-tuning, auxiliary networks, or test-time search. We show that flow matching admits a different control interface: adaptation through examples. For deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself. This yields a simple principle for controllable generation: steer a pretrained model by changing the reference set it follows. We instantiate this idea in two forms. Reference-Mean Guidance is training-free: it computes a closed-form endpoint-mean correction from a reference bank and applies it to a frozen FLUX.2-klein (4B) model, enabling control of color, identity, style, and structure while keeping the prompt, seed, and weights fixed. Semi-Parametric Guidance amortizes the same idea through an explicit mean anchor and learned residual refiner, matching unconditional DiT-B/4 quality on AFHQv2 while allowing the reference set to be swapped at inference time. These results point to a broader direction: generative models that adapt through data, not parameter updates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reduces flow-matching control to shifting a reference endpoint mean and shows it works on a frozen FLUX model, but the derivation and quantitative checks are still thin.

read the letter

The core claim is that deterministic flow matching lets you steer generation by changing the conditional mean of the endpoint drawn from a reference set, and the authors give a closed-form way to do this on a pretrained model without touching its weights. They also sketch a semi-parametric version that keeps an explicit mean anchor plus a learned refiner. That reduction to endpoint-mean steering is the new piece, and it follows directly from the linear interpolant setup where velocity is set by the expected endpoint. Applying it to FLUX.2-klein for color, identity, and style control while keeping prompt and seed fixed is a practical demonstration that the idea can be used at scale with no fine-tuning. The semi-parametric route also shows you can swap reference sets at inference time and still match unconditional DiT quality on AFHQv2. Those are the parts that land cleanly. The soft spot is that the central step—claiming the velocity field is governed solely by the conditional endpoint mean with no leftover schedule or marginal dependence—needs the full derivation and error analysis to hold up. The current writeup gives qualitative results on one model but no ablations against standard guidance baselines or checks for schedule sensitivity, so the generality is not yet clear. The stress-test concern about implicit dependencies on the noise schedule or reference covariance is worth checking explicitly. This is for people working on controllable generation who want training-free options in flow or diffusion models. A reader who needs a simple steering knob for large frozen models will get something usable from the principle even before the details are tightened. It deserves a serious referee because the idea is straightforward to implement and the evidence, while preliminary, points to a direction worth verifying rather than dismissing.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that flow matching with deterministic interpolants allows the velocity field to be controlled solely through the conditional endpoint mean, enabling reference-guided generation by shifting this mean using example sets. This is instantiated as training-free Reference-Mean Guidance applied to a frozen FLUX.2-klein model for attribute control, and as Semi-Parametric Guidance that amortizes the approach while maintaining quality on AFHQv2.

Significance. If the central theoretical claim holds and is supported by rigorous derivation and experiments, this work could offer a significant advance in controllable generation for flow-based models by providing a simple, training-free adaptation mechanism based on reference data rather than parameter updates or auxiliary models. The application to a large-scale pretrained model like FLUX.2-klein highlights practical potential, though stronger quantitative evidence is needed to establish the method's reliability.

major comments (3)

The core assertion that 'for deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean' lacks a detailed derivation showing that the proposed closed-form correction implements exactly v_t(x_t) = (E[x_1 | x_t] - x_t)/(1-t) with no residual terms from p_t(x_t), reference marginals, or the noise schedule; this is load-bearing for the claim that shifting the mean shifts the flow itself.
In the Reference-Mean Guidance instantiation on the frozen FLUX.2-klein (4B) model, the manuscript does not verify that the endpoint-mean correction avoids injecting schedule-dependent scaling or reference-set covariance effects into the effective velocity, as required by the skeptic's concern on implicit dependencies.
The claim that Semi-Parametric Guidance matches unconditional DiT-B/4 quality on AFHQv2 while allowing reference-set swapping is stated without quantitative metrics, ablations, or error analysis, which is necessary to substantiate that the amortized mean anchor preserves fidelity without introducing new dependencies.

minor comments (2)

Clarify notation for 'FLUX.2-klein (4B)' and 'reference bank' for consistency across sections.
The abstract would benefit from a brief mention of any evaluation metrics used for the qualitative control results.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below with clarifications, additional derivations, and planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: The core assertion that 'for deterministic interpolants, the velocity field is solely governed by a conditional endpoint mean' lacks a detailed derivation showing that the proposed closed-form correction implements exactly v_t(x_t) = (E[x_1 | x_t] - x_t)/(1-t) with no residual terms from p_t(x_t), reference marginals, or the noise schedule; this is load-bearing for the claim that shifting the mean shifts the flow itself.

Authors: We agree that a self-contained derivation is necessary for rigor. In the revision we will add a full proof in the appendix establishing that, for deterministic linear interpolants, the flow-matching velocity reduces exactly to v_t(x_t) = (E[x_1 | x_t] - x_t)/(1-t) with no residual dependence on the marginal p_t(x_t), reference marginals, or noise schedule. The proof proceeds by substituting the deterministic interpolant into the conditional expectation and showing that all other terms cancel. revision: yes
Referee: In the Reference-Mean Guidance instantiation on the frozen FLUX.2-klein (4B) model, the manuscript does not verify that the endpoint-mean correction avoids injecting schedule-dependent scaling or reference-set covariance effects into the effective velocity, as required by the skeptic's concern on implicit dependencies.

Authors: We acknowledge the need for explicit verification. In the revised manuscript we will insert a dedicated analysis subsection that substitutes the closed-form mean correction into the velocity expression and algebraically confirms the absence of schedule-dependent scaling and reference-set covariance terms. We will also add targeted empirical diagnostics on the FLUX.2-klein outputs to corroborate that no unintended dependencies are introduced. revision: yes
Referee: The claim that Semi-Parametric Guidance matches unconditional DiT-B/4 quality on AFHQv2 while allowing reference-set swapping is stated without quantitative metrics, ablations, or error analysis, which is necessary to substantiate that the amortized mean anchor preserves fidelity without introducing new dependencies.

Authors: We agree that quantitative evidence is required. In the revision we will report FID scores comparing Semi-Parametric Guidance against the unconditional DiT-B/4 baseline on AFHQv2, include ablations isolating the mean-anchor and residual-refiner components, and provide error analysis demonstrating that reference-set swapping preserves fidelity without introducing new dependencies beyond those of the base model. revision: yes

Circularity Check

0 steps flagged

No significant circularity; core claim follows from standard deterministic flow-matching properties

full rationale

The paper derives the control principle directly from the mathematical property of deterministic linear interpolants in flow matching, where the velocity satisfies v_t(x_t) = (E[x_1|x_t] - x_t)/(1-t) by definition of the conditional expectation under the path x_t = (1-t)x_0 + t x_1. This is presented as an external fact of the interpolant construction rather than a fitted parameter or self-referential equation. No load-bearing step reduces to a self-citation, ansatz smuggled via prior work, or renaming of a known result; the reference-mean guidance is an application of this property to a frozen model. The derivation remains self-contained against external flow-matching theory and does not force the target result by construction from its own inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the deterministic interpolant property of flow matching and the assumption that the velocity field depends only on the conditional endpoint mean.

axioms (1)

domain assumption Deterministic interpolants govern the velocity field solely via conditional endpoint mean
Invoked to justify that shifting the reference mean directly steers the flow without additional terms.

pith-pipeline@v0.9.0 · 5500 in / 1126 out tokens · 41545 ms · 2026-05-13T06:10:13.891989+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

For the commonly used linear interpolant ... ut(x) = μt(x)−x / (1−t), μt(x) := E[x1 | xt = x]
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean embed_injective unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the velocity field is solely governed by a conditional endpoint mean; shifting this mean shifts the flow itself

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.