Mvp: Multiple view prediction improves gui grounding.arXiv preprint arXiv:2512.08529, 2025

Yunzhu Zhang, Zeyu Pan, Zhengwen Zeng, Shuheng Shen, Changhua Meng, Linchao Zhu · 2025 · arXiv 2512.08529

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs

cs.CV · 2026-05-10 · conditional · novelty 7.0

GUI grounding in VLMs is bottlenecked by prefill-stage candidate selection that decoding cannot fix, so Re-Prefill uses attention to extract and re-inject target tokens for up to 4.3% gains on ScreenSpot-Pro.

citing papers explorer

Showing 1 of 1 citing paper.

What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs cs.CV · 2026-05-10 · conditional · none · ref 16
GUI grounding in VLMs is bottlenecked by prefill-stage candidate selection that decoding cannot fix, so Re-Prefill uses attention to extract and re-inject target tokens for up to 4.3% gains on ScreenSpot-Pro.

Mvp: Multiple view prediction improves gui grounding.arXiv preprint arXiv:2512.08529, 2025

fields

years

verdicts

representative citing papers

citing papers explorer