ShotCrop uses three-stage training (CoT SFT, pseudo-label semi-supervised, GRPO-S) to produce triple-shot compositions and reports 2.82x better shot localization than GPT-5 on a 1.2k expert benchmark.
Zoomer: Adaptive image focus optimization for black-box mllm
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ShotCrop$^3$: Cropping Human-Centric Images into Cinematic Triple-Shot Compositions
ShotCrop uses three-stage training (CoT SFT, pseudo-label semi-supervised, GRPO-S) to produce triple-shot compositions and reports 2.82x better shot localization than GPT-5 on a 1.2k expert benchmark.