pith:WCZCW6CG
Learning to See What You Need: Gaze Attention for Multimodal Large Language Models
Multimodal LLMs can match or exceed full dense attention by dynamically restricting focus to a small number of task-relevant gaze regions and using up to 90 percent fewer visual key-value entries.
arxiv:2605.13080 v1 · 2026-05-13 · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{WCZCW6CGUOLLC5I6HYOF7TDI5X}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Gaze Attention matches or surpasses dense-attention baselines, while using up to 90% fewer visual KV entries in the attention computation.
That spatially grouping visual embeddings into compact gaze regions, dynamically selecting them via lightweight descriptors, and appending learnable context tokens is sufficient to preserve all task-critical information without performance loss.
Gaze Attention groups visual embeddings into selectable regions and dynamically restricts attention to task-relevant ones, matching dense baselines with up to 90% fewer visual KV entries via added context tokens.
References
Formal links
Receipt and verification
| First computed | 2026-05-18T03:08:58.705781Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
b0b22b7846a396b1751e3e1c5fcc68ede9e8841e21e0449badf7ef3201478b77
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/WCZCW6CGUOLLC5I6HYOF7TDI5X \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: b0b22b7846a396b1751e3e1c5fcc68ede9e8841e21e0449badf7ef3201478b77
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "d5061722eb07690452f1db69d5249046c32d2804a577bf13591dc33afc8bd85e",
"cross_cats_sorted": [],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-13T06:54:09Z",
"title_canon_sha256": "b87ba34be4779e971d0208fedc4745687a3fcfe54e350e1e099b5b6c8fe747f8"
},
"schema_version": "1.0",
"source": {
"id": "2605.13080",
"kind": "arxiv",
"version": 1
}
}