Branching Flows: Discrete, Continuous, and Manifold Flow Matching with Splits and Deletions
Pith reviewed 2026-05-17 22:09 UTC · model grok-4.3
The pith
Branching Flows extends flow matching so elements can branch and delete to control output size during generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Branching Flows is a generative modeling framework that transports a simple distribution to the data distribution where the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model, allowing control of the number of elements, and composes with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and multimodal product spaces that mix these components.
What carries the argument
A forest of binary trees whose nodes branch and delete at learned stochastic rates, which dynamically changes the number of elements while the base flow matching process evolves their individual features.
If this is right
- The model generates outputs whose size is drawn from the data distribution rather than fixed in advance.
- The same framework applies without change to discrete, continuous, manifold, and multimodal data by swapping the base flow matching process.
- Training stays stable using the standard flow matching objective while adding the branching component.
- New capabilities appear in domains that require variable-length outputs such as small-molecule and protein generation.
Where Pith is reading between the lines
- The branching process could be added as a modular layer on top of existing flow matching codebases to handle open-ended structures.
- Element count might become correlated with other properties through the learned rates, offering a built-in way to model joint distributions over size and features.
- The same tree structure might extend to other transport-based generative methods beyond flow matching by replacing the base dynamics.
Load-bearing premise
Stochastic branching and deletion rates can be learned from data such that the generative process matches the target distribution without instability or systematic bias when element counts vary.
What would settle it
After training on a dataset with varying element counts such as antibody sequences, generate samples and compare the empirical distribution of their lengths to the training distribution; a clear mismatch would show the rates did not learn the correct marginal over sizes.
Figures
read the original abstract
Diffusion and flow matching approaches to generative modeling have shown promise in domains where the state space is continuous, such as image generation or protein folding & design, and discrete, exemplified by diffusion large language models. They offer a natural fit when the number of elements in a state is fixed in advance (e.g. images), but require ad hoc solutions when, for example, the length of a response from a large language model, or the number of amino acids in a protein chain is not known a priori. Here we propose Branching Flows, a generative modeling framework that, like diffusion and flow matching approaches, transports a simple distribution to the data distribution. But in Branching Flows, the elements in the state evolve over a forest of binary trees, branching and dying stochastically with rates that are learned by the model. This allows the model to control, during generation, the number of elements in the sequence. We also show that Branching Flows can compose with any flow matching base process on discrete sets, continuous Euclidean spaces, smooth manifolds, and `multimodal' product spaces that mix these components. We demonstrate this in three domains: small molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), and show that Branching Flows is a capable distribution learner with a stable learning objective, and that it enables new capabilities.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes Branching Flows, a generative modeling framework extending flow matching to variable-length states. Elements evolve over a forest of binary trees and branch or delete stochastically according to learned rates, allowing the model to control cardinality during generation. The approach is claimed to compose with arbitrary base flow-matching dynamics on discrete sets, continuous Euclidean spaces, smooth manifolds, and multimodal product spaces. Empirical demonstrations are provided on small-molecule generation (multimodal), antibody sequence generation (discrete), and protein backbone generation (multimodal), with assertions of a stable objective and capable distribution learning.
Significance. If the central claims hold, the work would address a practical limitation of fixed-cardinality flow matching and diffusion models in domains such as molecular design and protein engineering. The compositionality across discrete, continuous, manifold, and product spaces is a potentially useful modular feature. Credit is due for attempting to handle variable length in a principled stochastic manner rather than through ad-hoc padding or truncation.
major comments (3)
- [§3.2] §3.2 (Branching Dynamics): No derivation is supplied showing that the learned branching and deletion rates, when combined with the base flow-matching vector field, yield a marginal process whose terminal distribution equals the data distribution. The abstract asserts stability and correctness, but the absence of an explicit combined loss or marginalization argument leaves the central claim unverified.
- [§4] §4 (Experiments): The reported results on molecules, antibodies, and protein backbones provide no quantitative metrics, ablation studies on the branching rates, or comparisons against standard flow matching with padding/truncation or other variable-length baselines. This makes it impossible to assess whether the branching mechanism improves distribution matching or merely adds degrees of freedom.
- [§3.3] §3.3 (Composition with Base Processes): The claim that the framework composes with any flow-matching base process on multimodal product spaces is stated without a proof that the branching dynamics commute with the product-space vector field or preserve the independent marginals.
minor comments (2)
- [Figure 1] Figure 1 (schematic of the forest): The visual depiction of simultaneous branching and deletion events is difficult to follow; adding time-step annotations or a small numerical example would improve clarity.
- Notation: The distinction between the tree-indexed state and the observed sequence is introduced without a clear mapping; a short table relating symbols to their meanings would help readers.
Simulated Author's Rebuttal
We thank the referee for their thoughtful and constructive review of our manuscript. We address each major comment in detail below and describe the revisions we plan to incorporate to strengthen the theoretical grounding and empirical evaluation.
read point-by-point responses
-
Referee: [§3.2] §3.2 (Branching Dynamics): No derivation is supplied showing that the learned branching and deletion rates, when combined with the base flow-matching vector field, yield a marginal process whose terminal distribution equals the data distribution. The abstract asserts stability and correctness, but the absence of an explicit combined loss or marginalization argument leaves the central claim unverified.
Authors: We thank the referee for this observation. The branching dynamics are defined in §3.2 as a stochastic process on the forest of binary trees whose rates are learned jointly with the base vector field; the construction ensures that the terminal marginal recovers the data distribution by design of the rate functions. However, we agree that an explicit marginalization argument and combined loss would improve clarity. In the revised manuscript we will add a detailed derivation showing that the joint process (branching rates composed with the base flow-matching field) has the correct terminal distribution, including the explicit form of the overall objective obtained by taking the expectation over tree structures. revision: yes
-
Referee: [§4] §4 (Experiments): The reported results on molecules, antibodies, and protein backbones provide no quantitative metrics, ablation studies on the branching rates, or comparisons against standard flow matching with padding/truncation or other variable-length baselines. This makes it impossible to assess whether the branching mechanism improves distribution matching or merely adds degrees of freedom.
Authors: We acknowledge that the current experimental section would benefit from more rigorous quantitative evaluation. In the revised version we will report standard quantitative metrics (e.g., validity/uniqueness/novelty for molecules, sequence recovery and perplexity for antibodies, and structural fidelity measures such as RMSD for protein backbones). We will also include ablation studies isolating the effect of the learned branching and deletion rates, together with direct comparisons against flow-matching baselines that use padding or truncation as well as other variable-length generative approaches. revision: yes
-
Referee: [§3.3] §3.3 (Composition with Base Processes): The claim that the framework composes with any flow-matching base process on multimodal product spaces is stated without a proof that the branching dynamics commute with the product-space vector field or preserve the independent marginals.
Authors: The branching process operates exclusively on cardinality and tree topology, while the base flow-matching dynamics act component-wise on the attributes of each element. Because the branching rates are defined independently of the base vector field, the two processes commute by construction and the product-space marginals remain independent. To make this rigorous, we will add a formal proposition together with its proof (placed in an appendix) demonstrating commutation with the product-space vector field and preservation of the independent marginals under the multimodal measure. revision: yes
Circularity Check
No significant circularity detected
full rationale
The paper presents Branching Flows as a generative framework that overlays learned stochastic branching and deletion rates on a forest of binary trees, allowing variable-length outputs while composing with any standard flow-matching base process across discrete, continuous, manifold, and product spaces. The abstract and strongest claims describe a stable objective and empirical results on molecules, antibodies, and protein backbones, but contain no equations that define a target quantity in terms of itself, no fitted parameters renamed as predictions, and no load-bearing self-citations or uniqueness theorems imported from prior author work. The derivation chain therefore remains independent of its own outputs and does not reduce by construction to its inputs.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Branching Flows augments a base Markov generator on an element space with a branching and deletion process. A forest of trees... elements evolve independently along each branch but duplicate and decouple at bifurcations.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The Conditional Branching Flows (CBF) loss is the sum of three Bregman divergences: one for the base process, one against the split intensity, and one against the deletion intensity.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Trans-dimensional generative modeling via jump diffusion models, 2023
Albano, G. and Giorno, V. (2020). Inference on the effect of non homogeneous inputs in ornstein-uhlenbeck neuronal modeling.Mathematical Biosciences and Engineering, 17(1):328–348. Besard, T., Foket, C., and De Sutter, B. (2018). Effective extensible programming: Unleashing Julia on GPUs.IEEE Transactions on Parallel and Distributed Systems. Bezanson, J.,...
-
[2]
The expression for the rates thus follows from ˙α1 −α 1 ˙α3 α3 = ( ˙κ1 t −κ 1 t0 ˙κ3 t κ3 t0 )−(κ 1 t −κ 1 t0 κ3 t κ3 t0 ) ˙κ3 t κ3 t = ˙κ1 t −κ 1 t ˙κ3 t κ3 t and analagously ˙α2 −α 2 ˙α3 α3 = ˙κ2 t −κ 2 t ˙κ3 t κ3 t . C The Ornstein-Uhlenbeck process with Time Dependent Diffusion Coefficient Consider the stochastic differential equation dXt =θ(µ−X t)dt+...
work page 2020
-
[3]
Let G be discrete and let 26 (Xt, Gt) be an S× G -valued process under the GM-regularity assumptions
but in the more general context of Generator Matching (GM) (Holderrieth et al., 2024), we extend the theory to allow for conditioning on discrete processes. Let G be discrete and let 26 (Xt, Gt) be an S× G -valued process under the GM-regularity assumptions. Here, we suppose that for each fixedg∈ Gthere is an operatorL g t :T →C(S;R) generatingp Xt,Gt(dx,...
work page 2024
-
[4]
Its total mass λt(x) = Z Qt(dy;x) gives the jump intensity or jump rate, i.e
and Flow Matching Guide (FMG) (Lipman et al., 2024), we model jumps by a time-dependent kernel Qt(dy; x) that, for each state x∈S , assigns a positive measure onS\ {x}. Its total mass λt(x) = Z Qt(dy;x) gives the jump intensity or jump rate, i.e. the instantaneous hazard of leaving x, and we require λt(x) <∞ . Whenλ t(x)>0, normalizingQ t yields the jump ...
work page 2024
-
[5]
we consider atomic time-dependent jump kernels: Qt(dy;x) = nX i=1 λi(t, x)δΓi(t,x)(dy), specifyingjump targetsΓ i(t, x) andjump ratesλ i(t, x). The total rate is λtotal(t, x) := Z Qt(dy;x) = nX i=1 λi(t, x) and the normalized jump distributionJ t(dy;x) is Jt(dy;x) = 1 λtotal(t, x) nX i=1 λi(t, x)δΓi(t,x)(dy), so that ifY∼J t(dy|x), thenP(Y= Γ i(t, x)) = λ...
work page 2023
-
[6]
E.2 Antibodies E.2.1 Antibody Sequence Generation: Metrics and Comparisons To evaluate Branching Flows, 10 000 sequences were generated with 1 000 uniformly-spaced steps from Branching Flows, Oracle Length, and Edit Flows models and then compared to natural ones. Sequence diversity was assessed by computing the minimum pairwise cosine distance (over vecto...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.