pith. machine review for the scientific record. sign in

arxiv: 2508.16644 · v4 · submitted 2025-08-18 · 💻 cs.CV

Recognition: unknown

CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

Authors on Pith no claims yet
classification 💻 cs.CV
keywords countloopobjectachievesattentionbenchmarkscountsfeedbackinstance
0
0 comments X
read the original abstract

Diffusion models excel at photorealistic synthesis but struggle with precise object counts, especially in high-density settings. We introduce COUNTLOOP, a training-free framework that achieves precise instance control through iterative, structured feedback. Our method alternates between synthesis and evaluation: a VLM-based planner generates structured scene layouts, while a VLM-based critic provides explicit feedback on object counts, spatial arrangements, and visual quality to refine the layout iteratively. Instance-driven attention masking and cumulative attention composition further prevent semantic leakage, ensuring clear object separation even in densely occluded scenes. Evaluations on COCO-Count, T2I-CompBench, and two newly introduced high instance benchmarks show that COUNTLOOP reduces counting error by up to 57% and achieves the highest or comparable spatial quality scores across all benchmarks, while maintaining photorealism.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.