Zoomer: Adaptive image focus optimization for black-box mllm

Jiaxu Qian, Chendong Wang, Yifan Yang, Chaoyun Zhang, Huiqiang Jiang, Xufang Luo, Yu Kang, Qingwei Lin, Anlan Zhang, Shiqi Jiang, et al · 2025 · arXiv 2505.00742

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

ShotCrop$^3$: Cropping Human-Centric Images into Cinematic Triple-Shot Compositions

cs.CV · 2026-06-04 · unverdicted · novelty 6.0

ShotCrop uses three-stage training (CoT SFT, pseudo-label semi-supervised, GRPO-S) to produce triple-shot compositions and reports 2.82x better shot localization than GPT-5 on a 1.2k expert benchmark.

citing papers explorer

Showing 1 of 1 citing paper.

ShotCrop$^3$: Cropping Human-Centric Images into Cinematic Triple-Shot Compositions cs.CV · 2026-06-04 · unverdicted · none · ref 14
ShotCrop uses three-stage training (CoT SFT, pseudo-label semi-supervised, GRPO-S) to produce triple-shot compositions and reports 2.82x better shot localization than GPT-5 on a 1.2k expert benchmark.

Zoomer: Adaptive image focus optimization for black-box mllm

fields

years

verdicts

representative citing papers

citing papers explorer