pith. sign in
Pith Number

pith:NLIXKHKP

pith:2023:NLIXKHKPME3NHIOJQ4YVW5FPMV
not attested not anchored not stored refs resolved

AppAgent: Multimodal Agents as Smartphone Users

Bin Fu, Chi Zhang, Gang Yu, Jiaxuan Liu, Xin Chen, Yucheng Han, Zebiao Huang, Zhao Yang

AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.

arxiv:2312.13771 v2 · 2023-12-21 · cs.CV

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{NLIXKHKPME3NHIOJQ4YVW5FPMV}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

Our framework enables the agent to operate smartphone applications through a simplified action space, mimicking human-like interactions such as tapping and swiping. This novel approach bypasses the need for system back-end access, thereby broadening its applicability across diverse apps.

C2weakest assumption

The agent can reliably learn to navigate and execute tasks in new apps through autonomous exploration or human demonstrations, producing a knowledge base that generalizes across applications.

C3one line summary

AppAgent lets large language models operate diverse smartphone apps via visual interactions and learns app usage from exploration or demonstrations.

References

286 extracted · 286 resolved · 19 Pith anchors

[5] Meta FAIR, Anton Bakhtin, Noam Brown, Emily Dinan, Gabriele Farina, Colin Flaherty, Daniel Fried, Andrew Goff, Jonathan Gray, Hengyuan Hu, et al. 2022. Human-level play in the game of diplomacy by com 2022
[6] Multimodal web navigation with instruction-finetuned foundation models 2023
[7] A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis 2023 · arXiv:2307.12856
[8] Chartllama: A multimodal llm for chart understanding and generation 2023
[9] MetaGPT: Meta Programming for A Multi-Agent Collaborative Framework 2023 · arXiv:2308.00352

Formal links

2 machine-checked theorem links

Cited by

27 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.405660Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

6ad1751d4f6136d3a1c987315b74af6554e2880aa8cf1bc0bf625382143509aa

Aliases

arxiv: 2312.13771 · arxiv_version: 2312.13771v2 · doi: 10.48550/arxiv.2312.13771 · pith_short_12: NLIXKHKPME3N · pith_short_16: NLIXKHKPME3NHIOJ · pith_short_8: NLIXKHKP
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/NLIXKHKPME3NHIOJQ4YVW5FPMV \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 6ad1751d4f6136d3a1c987315b74af6554e2880aa8cf1bc0bf625382143509aa
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "9a32a980081b8d77ebaf7885933ae6cda5952055956415891cc187a99082d678",
    "cross_cats_sorted": [],
    "license": "http://arxiv.org/licenses/nonexclusive-distrib/1.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2023-12-21T11:52:45Z",
    "title_canon_sha256": "4af3c8c63187b7042d09f7707a621ba4ff661c5067efaa5e0c6ce20838930e23"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2312.13771",
    "kind": "arxiv",
    "version": 2
  }
}