pith:YLYZW2EP
What Happens Before Decoding? Prefill Determines GUI Grounding in VLMs
GUI grounding in VLMs follows a two-stage process where the prefill stage selects candidate UI elements that the decoding stage cannot correct.
arxiv:2605.12549 v1 · 2026-05-10 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{YLYZW2EP2NILYIOZY2B3LFFY3C}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
we show that grounding follows a two-stage paradigm: the prefill stage determines candidate UI elements, while the decoding stage subsequently refines the final coordinates. This asymmetry establishes prefill as the critical step, as errors in candidate selection cannot be effectively corrected during decoding.
That visual tokens receiving consistently high attention from the query (final) position across layers form a reliable preliminary target hypothesis, and that re-appending them with instruction hidden states enables effective re-thinking without adding noise or bias.
GUI grounding in VLMs is bottlenecked by prefill-stage candidate selection that decoding cannot fix, so Re-Prefill uses attention to extract and re-inject target tokens for up to 4.3% gains on ScreenSpot-Pro.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T03:10:02.164922Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
c2f19b688fd350bc21d9c683b594b8d89422b33c9315cffdad8c87e6c88f5d5b
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/YLYZW2EP2NILYIOZY2B3LFFY3C \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: c2f19b688fd350bc21d9c683b594b8d89422b33c9315cffdad8c87e6c88f5d5b
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "39b07e6b1acba186aa8e866003a84fa071b9bb523e7eec9096a8a203f7dc0a28",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-10T07:04:07Z",
"title_canon_sha256": "07560c2ec7a79bc099ccfc84abaad10b7e4cf1a639680238e83555b1289482bf"
},
"schema_version": "1.0",
"source": {
"id": "2605.12549",
"kind": "arxiv",
"version": 1
}
}