pith:QHECGYVL
MobiBench: Multi-Branch, Modular Benchmark for Mobile GUI Agents
MobiBench provides a modular offline benchmark for mobile GUI agents that matches human evaluators at 94.72 percent agreement.
arxiv:2512.12634 v3 · 2025-12-14 · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{QHECGYVLDT7OHOERAC35FBGPR7}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
MobiBench achieves 94.72 percent agreement with human evaluators, on par with carefully engineered online benchmarks, while preserving the scalability and reproducibility of static offline benchmarks.
That the multi-path annotations comprehensively capture all valid alternative actions that human evaluators would accept, without systematic omissions that could affect agreement rates.
MobiBench is the first modular multi-path offline benchmark for mobile GUI agents, achieving 94.72% agreement with human evaluators while allowing component-level analysis.
References
Cited by
Receipt and verification
| First computed | 2026-05-18T03:09:32.623517Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
81c82362ab1cfee3b89100b7d284cf8fdd2331bbd8ad4e3c50084163ad07cdbe
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QHECGYVLDT7OHOERAC35FBGPR7 \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 81c82362ab1cfee3b89100b7d284cf8fdd2331bbd8ad4e3c50084163ad07cdbe
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "1974e3286eef3c7f833714c06a065c85214ebf5b5ac70cc3196a453ce2f2dbe1",
"cross_cats_sorted": [],
"license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
"primary_cat": "cs.AI",
"submitted_at": "2025-12-14T10:41:39Z",
"title_canon_sha256": "446390125615900b34aef5e039d642b5de22165a8a8970ee94f2aacaa577efac"
},
"schema_version": "1.0",
"source": {
"id": "2512.12634",
"kind": "arxiv",
"version": 3
}
}