pith:D7L6RRL4
GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents
GUI-R1 applies reinforcement learning to vision-language models so they act as GUI agents after training on only 3,000 examples.
arxiv:2504.10458 v4 · 2025-04-14 · cs.CV · cs.CL · cs.HC
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{D7L6RRL4ZNI4SWIVTKUBGYSMR3}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
GUI-R1 achieves superior performance using only 0.02% of the data (3K vs. 13M) compared to previous state-of-the-art methods like OS-Atlas across eight benchmarks spanning three different platforms (mobile, desktop, and web).
That a small set of carefully curated high-quality data across platforms combined with unified action space rule modeling is sufficient for generalization to unseen interfaces without the need for extensive supervised fine-tuning.
GUI-R1 uses reinforcement fine-tuning with GRPO on a small curated dataset to create a generalist vision-language action model that outperforms prior GUI agent methods across mobile, desktop, and web benchmarks using only 0.02% of the data.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:53.851729Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
1fd7e8c57ccb51c959159aa813624c8ecbd2b1d5da4cfbd037db48d7752e4a17
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/D7L6RRL4ZNI4SWIVTKUBGYSMR3 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1fd7e8c57ccb51c959159aa813624c8ecbd2b1d5da4cfbd037db48d7752e4a17
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "766fa50426543522f6bbe166c68311d89e93e4581ff4e5aef84c10e3f04d4d16",
"cross_cats_sorted": [
"cs.CL",
"cs.HC"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2025-04-14T17:45:54Z",
"title_canon_sha256": "9f87fd9acac35043bf980f040ecdea7c52fff6a27a16621dbdde637356fb3443"
},
"schema_version": "1.0",
"source": {
"id": "2504.10458",
"kind": "arxiv",
"version": 4
}
}