pith. sign in
Pith Number

pith:OZAGJFTR

pith:2025:OZAGJFTRZKRQMIICMHPS45JVFT
not attested not anchored not stored refs resolved

UI-R1: Enhancing Efficient Action Prediction of GUI Agents by Reinforcement Learning

Guanjing Xiong, Han Xiao, Hao Wang, Hongsheng Li, Liang Liu, Shuai Ren, Xi Yin, Yaxuan Guo, Yuxiang Chai, Zhengxi Lu

Rule-based RL on 136 GUI tasks lifts a 3B multimodal model to 22% higher action-prediction accuracy.

arxiv:2503.21620 v5 · 2025-03-27 · cs.AI

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{OZAGJFTRZKRQMIICMHPS45JVFT}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

UI-R1-3B achieves significant improvements over the base model (Qwen2.5-VL-3B) on both in-domain and out-of-domain tasks, with average accuracy gains of 22.1% on ScreenSpot, 6.0% on ScreenSpot-Pro, and 12.7% on ANDROIDCONTROL.

C2weakest assumption

The rule-based action reward provides sufficient and unbiased supervision for policy optimization across diverse GUI tasks without post-hoc adjustments or hidden data selection.

C3one line summary

UI-R1 shows rule-based RL with GRPO on 136 GUI tasks improves a 3B MLLM's action prediction accuracy by 6-22% over its base model and matches larger SFT-trained models.

References

18 extracted · 18 resolved · 9 Pith anchors

[1] L1: Controlling how long a reasoning model thinks with reinforcement learning
[2] arXiv preprint arXiv:2407.17490 , year=
[3] VisRL: Intention-driven visual perception via reinforced reasoning 2025
[4] Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents · arXiv:2410.05243
[5] DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning · arXiv:2501.12948

Formal links

2 machine-checked theorem links

Cited by

30 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:48.089144Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

7640649671caa306210261df2e75352cead9fd9a1f5cd95a5f808466d5097937

Aliases

arxiv: 2503.21620 · arxiv_version: 2503.21620v5 · doi: 10.48550/arxiv.2503.21620 · pith_short_12: OZAGJFTRZKRQ · pith_short_16: OZAGJFTRZKRQMIIC · pith_short_8: OZAGJFTR
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/OZAGJFTRZKRQMIICMHPS45JVFT \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 7640649671caa306210261df2e75352cead9fd9a1f5cd95a5f808466d5097937
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "996e1401c55cb49fa859097baae52738828675a6956b602244bf691972e2c774",
    "cross_cats_sorted": [],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.AI",
    "submitted_at": "2025-03-27T15:39:30Z",
    "title_canon_sha256": "abfd2d349b1b9ba15cf5fb12e0034980e0086d0d412dfc67f35fc9c4a60ab860"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2503.21620",
    "kind": "arxiv",
    "version": 5
  }
}