pith. sign in
Pith Number

pith:D7L6RRL4

pith:2025:D7L6RRL4ZNI4SWIVTKUBGYSMR3
not attested not anchored not stored refs resolved

GUI-R1 : A Generalist R1-Style Vision-Language Action Model For GUI Agents

Jiaming Li, Longze Chen, Lu Wang, Run Luo, Wanwei He, Xiaobo Xia

GUI-R1 applies reinforcement learning to vision-language models so they act as GUI agents after training on only 3,000 examples.

arxiv:2504.10458 v4 · 2025-04-14 · cs.CV · cs.CL · cs.HC

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{D7L6RRL4ZNI4SWIVTKUBGYSMR3}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

GUI-R1 achieves superior performance using only 0.02% of the data (3K vs. 13M) compared to previous state-of-the-art methods like OS-Atlas across eight benchmarks spanning three different platforms (mobile, desktop, and web).

C2weakest assumption

That a small set of carefully curated high-quality data across platforms combined with unified action space rule modeling is sufficient for generalization to unseen interfaces without the need for extensive supervised fine-tuning.

C3one line summary

GUI-R1 uses reinforcement fine-tuning with GRPO on a small curated dataset to create a generalist vision-language action model that outperforms prior GUI agent methods across mobile, desktop, and web benchmarks using only 0.02% of the data.

References

30 extracted · 30 resolved · 12 Pith anchors

[1] OS-ATLAS: A Foundation Action Model for Generalist GUI Agents 2024 · arXiv:2410.23218
[2] UI-TARS: Pioneering Automated GUI Interaction with Native Agents 2025 · arXiv:2501.12326
[3] SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents 2024 · arXiv:2401.10935
[4] Qwen2.5-VL Technical Report 2025 · arXiv:2502.13923
[5] Visual-RFT: Visual Reinforcement Fine-Tuning 2025 · arXiv:2503.01785

Formal links

3 machine-checked theorem links

Cited by

42 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:53.851729Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

1fd7e8c57ccb51c959159aa813624c8ecbd2b1d5da4cfbd037db48d7752e4a17

Aliases

arxiv: 2504.10458 · arxiv_version: 2504.10458v4 · doi: 10.48550/arxiv.2504.10458 · pith_short_12: D7L6RRL4ZNI4 · pith_short_16: D7L6RRL4ZNI4SWIV · pith_short_8: D7L6RRL4
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/D7L6RRL4ZNI4SWIVTKUBGYSMR3 \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 1fd7e8c57ccb51c959159aa813624c8ecbd2b1d5da4cfbd037db48d7752e4a17
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "766fa50426543522f6bbe166c68311d89e93e4581ff4e5aef84c10e3f04d4d16",
    "cross_cats_sorted": [
      "cs.CL",
      "cs.HC"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-04-14T17:45:54Z",
    "title_canon_sha256": "9f87fd9acac35043bf980f040ecdea7c52fff6a27a16621dbdde637356fb3443"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2504.10458",
    "kind": "arxiv",
    "version": 4
  }
}