FlowInOne unifies text-to-image, layout editing, and visual instruction tasks into one image-to-image flow matching pipeline using converted visual prompts.
While smaller in scale compared to semantic edits (comprising specifically curated classes likeballs_pokeat ∼11K andwindat ∼9K), this subset is of exceptionally high fidelity
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
FlowInOne:Unifying Multimodal Generation as Image-in, Image-out Flow Matching
FlowInOne unifies text-to-image, layout editing, and visual instruction tasks into one image-to-image flow matching pipeline using converted visual prompts.