Zoom in, click out: Unlocking and evaluating the potential of zooming for gui grounding.arXiv preprint arXiv:2512.05941

Zhiyuan Jiang, Shenghao Xie, Wenyi Li, Wenqiang Zu, Peihang Li, Jiahao Qiu, Siqi Pei, Lei Ma, Tiejun Huang, Mengdi Wang, et al · 2025 · arXiv 2512.05941

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs

cs.CV · 2026-05-10 · conditional · novelty 7.0

GUI grounding in VLMs is bottlenecked by prefill-stage candidate selection that decoding cannot fix, so Re-Prefill uses attention to extract and re-inject target tokens for up to 4.3% gains on ScreenSpot-Pro.

AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding

cs.CV · 2026-05-04 · unverdicted · novelty 6.0

AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.

Zoom Consistency: A Free Confidence Signal in Multi-Step Visual Grounding Pipelines

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

Zoom consistency provides a geometric, cross-model confidence signal in zoom-in grounding pipelines that correlates with prediction correctness and enables modest gains in specialist-generalist routing.

citing papers explorer

Showing 3 of 3 citing papers.

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs cs.CV · 2026-05-10 · conditional · none · ref 13
GUI grounding in VLMs is bottlenecked by prefill-stage candidate selection that decoding cannot fix, so Re-Prefill uses attention to extract and re-inject target tokens for up to 4.3% gains on ScreenSpot-Pro.
AutoFocus: Uncertainty-Aware Active Visual Search for GUI Grounding cs.CV · 2026-05-04 · unverdicted · none · ref 14
AutoFocus converts token perplexity into an anisotropic Gaussian uncertainty field to drive region proposals and shape-aware zooming for improved GUI grounding in VLMs.
Zoom Consistency: A Free Confidence Signal in Multi-Step Visual Grounding Pipelines cs.CV · 2026-04-15 · unverdicted · none · ref 2
Zoom consistency provides a geometric, cross-model confidence signal in zoom-in grounding pipelines that correlates with prediction correctness and enables modest gains in specialist-generalist routing.

Zoom in, click out: Unlocking and evaluating the potential of zooming for gui grounding.arXiv preprint arXiv:2512.05941

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer