{"paper":{"title":"Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"CLVR couples verified logical planning with pixel diffusion, uses proxy reinforcement learning on distilled histories, and merges weights to cut inference to 4 NFEs while outperforming open-source T2I models on complex benchmarks.","cross_cats":["cs.AI"],"primary_cat":"cs.CV","authors_text":"Hanbo Cheng, Jun Du, Limin Lin, Ruo Zhang, Yicheng Pan","submitted_at":"2026-05-14T14:22:25Z","abstract_excerpt":"Despite rapid advancements, current text-to-image (T2I) models predominantly rely on a single-step generation paradigm, which struggles with complex semantics and faces diminishing returns from parameter scaling.\n  While recent multi-step reasoning approaches show promise, they are hindered by ungrounded planning hallucinations lacking verification, monolithic post-hoc reflection, long-context optimization instabilities, and prohibitive inference latency. To overcome these bottlenecks, we propose the Closed-Loop Visual Reasoning (CLVR) framework, a comprehensive system that deeply couples visu"},"claims":{"count":3,"items":[{"kind":"strongest_claim","text":"CLVR outperforms existing open-source baselines across multiple benchmarks and approaches the performance of proprietary commercial models, unlocking general test-time scaling capabilities for complex visual generation.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The automated data engine with step-level visual verification can reliably synthesize reasoning trajectories that are free of planning hallucinations and representative of real user prompts.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"CLVR couples verified logical planning with pixel diffusion, uses proxy reinforcement learning on distilled histories, and merges weights to cut inference to 4 NFEs while outperforming open-source T2I models on complex benchmarks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"}],"snapshot_sha256":"c470e8d3690210acc38f4f959d4f805c48cc86fc5f456d441c3cae7e3c870fba"},"source":{"id":"2605.14876","kind":"arxiv","version":1},"verdict":{"id":"5d7aa2f4-a04e-4fe4-9df5-b8dcc4a341ad","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T03:20:08.744051Z","strongest_claim":"CLVR outperforms existing open-source baselines across multiple benchmarks and approaches the performance of proprietary commercial models, unlocking general test-time scaling capabilities for complex visual generation.","one_line_summary":"CLVR couples verified logical planning with pixel diffusion, uses proxy reinforcement learning on distilled histories, and merges weights to cut inference to 4 NFEs while outperforming open-source T2I models on complex benchmarks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The automated data engine with step-level visual verification can reliably synthesize reasoning trajectories that are free of planning hallucinations and representative of real user prompts.","pith_extraction_headline":""},"references":{"count":0,"sample":[],"resolved_work":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57","internal_anchors":0},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}