CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

Anindya Mondal , Ayan Banerjee , Sauradip Nag , Josep Llados , Xiatian Zhu , Anjan Dutta

Authors on Pith no claims yet

classification 💻 cs.CV

keywords countloopobjectachievesattentionbenchmarkscountsfeedbackinstance

read the original abstract

Diffusion models excel at photorealistic synthesis but struggle with precise object counts, especially in high-density settings. We introduce COUNTLOOP, a training-free framework that achieves precise instance control through iterative, structured feedback. Our method alternates between synthesis and evaluation: a VLM-based planner generates structured scene layouts, while a VLM-based critic provides explicit feedback on object counts, spatial arrangements, and visual quality to refine the layout iteratively. Instance-driven attention masking and cumulative attention composition further prevent semantic leakage, ensuring clear object separation even in densely occluded scenes. Evaluations on COCO-Count, T2I-CompBench, and two newly introduced high instance benchmarks show that COUNTLOOP reduces counting error by up to 57% and achieves the highest or comparable spatial quality scores across all benchmarks, while maintaining photorealism.

This paper has not been read by Pith yet.

CountLoop: Training-Free High-Instance Image Generation via Iterative Agent Guidance

discussion (0)