FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2

Black Forest Labs · 2025

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

browse 7 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

CLVR framework adds closed-loop visual verification, proxy prompt reinforcement learning, and delta-space weight merge to improve complex text-to-image generation over single-step or unverified multi-step baselines.

Early Semantic Grounding in Image Editing Models for Zero-Shot Referring Image Segmentation

cs.CV · 2026-05-13 · unverdicted · novelty 6.0

Pretrained instruction-based image editing models exhibit early foreground-background separability that enables a training-free framework for zero-shot referring image segmentation using a single denoising step.

Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.

Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

Lens is a 3.8B-parameter text-to-image model that reaches competitive or superior performance to >6B-parameter systems using 19.3% of the training compute of Z-Image through a densely captioned 800M dataset, multi-resolution batching, semantic VAE, strong language encoder, RL fine-tuning, and 4-step

MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.

RLDX-1 Technical Report

cs.RO · 2026-05-05 · unverdicted · novelty 4.0 · 2 refs

RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.

Follow the Mean: Reference-Guided Flow Matching

cs.LG · 2026-05-11 · 2 refs

citing papers explorer

Showing 7 of 7 citing papers.

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning cs.CV · 2026-05-14 · unverdicted · none · ref 24
CLVR framework adds closed-loop visual verification, proxy prompt reinforcement learning, and delta-space weight merge to improve complex text-to-image generation over single-step or unverified multi-step baselines.
Early Semantic Grounding in Image Editing Models for Zero-Shot Referring Image Segmentation cs.CV · 2026-05-13 · unverdicted · none · ref 17
Pretrained instruction-based image editing models exhibit early foreground-background separability that enables a training-free framework for zero-shot referring image segmentation using a single denoising step.
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers cs.LG · 2026-05-07 · unverdicted · none · ref 27
Mean-Variance Split residuals separate centered variation from mean updates to prevent collapse and enable stable training of 1000-layer Diffusion Transformers.
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models cs.CV · 2026-05-20 · unverdicted · none · ref 3
Lens is a 3.8B-parameter text-to-image model that reaches competitive or superior performance to >6B-parameter systems using 19.3% of the training compute of Z-Image through a densely captioned 800M dataset, multi-resolution batching, semantic VAE, strong language encoder, RL fine-tuning, and 4-step
MONET: A Massive, Open, Non-redundant and Enriched Text-to-image dataset cs.CV · 2026-05-20 · unverdicted · none · ref 50
MONET is an open 104.9M image-text pair dataset created via safety filtering, deduplication, and multi-VLM recaptioning from 2.9B raw pairs, validated by training a competitive 4B-parameter latent diffusion model.
RLDX-1 Technical Report cs.RO · 2026-05-05 · unverdicted · none · ref 16 · 2 links
RLDX-1 outperforms frontier VLAs such as π0.5 and GR00T N1.6 on dexterous manipulation benchmarks, reaching 86.8% success on ALLEX humanoid tasks versus around 40% for the baselines.
Follow the Mean: Reference-Guided Flow Matching cs.LG · 2026-05-11 · unreviewed · ref 5 · 2 links

FLUX.2: Frontier Visual Intelligence.https://bfl.ai/blog/flux-2

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer