pith. sign in

arxiv: 2606.31612 · v1 · pith:TLI6UBLSnew · submitted 2026-06-30 · 💻 cs.CV

What Memory Do GUI Agents Really Need? From Passive Records to Active Task-Driving States

Pith reviewed 2026-07-01 05:24 UTC · model grok-4.3

classification 💻 cs.CV
keywords GUI agentsmemory managementactive task memoryreinforcement learninglong-horizon tasksmobile benchmarksworkflow state
0
0 comments X

The pith

GUI agents perform long-horizon tasks more reliably when memory actively maintains each value's role and status instead of passive records.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Current memory methods for GUI agents accumulate observations as passive storage, forcing the agent to reconstruct whether a value should be used now, has been used, or is for later. This reconstruction often fails in long trajectories with similar fields, repeated values, distractors, and outdated states, leading to repeated or missed operations. The paper introduces Active Task Driving Memory (ATMem) that maintains task-relevant information as a continually updated execution state linking each value to its role and current status. This state directly supports action selection based on the workflow. They also present STR-GRPO, an online RL method that learns selective memory use by contrasting memory-on and memory-off rollouts and applying cost-aware rewards.

Core claim

ATMem shifts GUI-agent memory from passive storage to an actively maintained execution state that links each value to its role and current status, enabling action selection based on the current workflow state rather than implicit reconstruction from accumulated records.

What carries the argument

Active Task Driving Memory (ATMem), which maintains an execution state linking values to roles and statuses for direct workflow-based decisions.

If this is right

  • Agents can select actions without inferring value relevance from raw records, reducing errors in complex tasks.
  • STR-GRPO enables learning when to use memory to improve task completion while minimizing unnecessary costs.
  • The new benchmark allows evaluation of complete in-scope work and avoidance of out-of-scope actions over long horizons.
  • Memory use becomes tied to actual contribution to execution success.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • ATMem could be adapted to non-mobile GUI environments or other agent domains with similar long-horizon challenges.
  • Explicit state maintenance might interact with LLM context limits in ways that reduce overall token usage.
  • Future work might test if the role-status linking reduces the frequency of hallucinated or outdated actions.
  • The contrastive RL approach in STR-GRPO may apply to other memory or tool-use decisions in agents.

Load-bearing premise

Explicitly linking each value to its role and current status will allow reliable action selection without introducing new inference errors or excessive overhead.

What would settle it

An experiment showing that agents with ATMem still repeat operations or miss required actions in trajectories containing similar fields, repeated values, and distractors would falsify the benefit of the active state.

Figures

Figures reproduced from arXiv: 2606.31612 by Chen Liu, Hanzhang Zhou, Ling Chen, Panrong Tong, Quyu Kong, Steven Hoi, Wenhao Wang, Xin Yu, Xu Zhang, Yue Wang.

Figure 1
Figure 1. Figure 1: Motivation. Passive records preserve past snippets but do not provide stable execution-state awareness, leading to missed, repeated, or over-scoped operations. (c) AndroidWorld statistics show that 83% of tasks involve data operations. (d) With the same GPT-5 planner and UIIns grounding framework, ATMem improves over flat-memory and full-history baselines, and failure analysis attributes flat-memory errors… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of our methodology. (a) ATMem organizes task-relevant data into a structured execution state with constraints, schema fields, item content, and item-level status. (b) Verified SFT data are synthesized through task-template instantiation, teacher-agent rollouts, and environment validation. (c) STR-GRPO uses balanced memory-ON/OFF interventions to estimate ATMem utility and learn selective memory in… view at source ↗
Figure 3
Figure 3. Figure 3: DataScope statistics and controlled difficulty scaling. (a) Bars show the average numbers of actionable target entries and same-schema distractors per task from DC-V1 to DC-V3, while the line shows the distractor-to-target ratio. The controlled increase in both quantities raises the ratio from 1.39× to 3.22×, testing whether agents can maintain target coverage while filtering increasingly confusable data w… view at source ↗
Figure 4
Figure 4. Figure 4: Execution case of ATMem-UI on AndroidWorld. The figure shows how our agent collects task￾relevant data, maintains their structured execution states, and uses these states to guide subsequent actions. By tracking which data items are pending or completed, the agent reliably progresses from data collection to task execution and completes the long-horizon workflow. help maintain item-level eligibility informa… view at source ↗
Figure 5
Figure 5. Figure 5: Execution trajectory comparison between recording-centric memory and ATMem. The flat￾memory agent records task information as unstructured notes, but fails to explicitly track which data items have been completed or still require action, leading to repeated search and stuck execution. In contrast, the ATMem-based agent maintains structured task data and item-level execution status, enabling more stable pro… view at source ↗
Figure 6
Figure 6. Figure 6: Representative failure case on our data-scope benchmark. On a data-scope workflow from our benchmark, MAI-UI-8B identifies the target contact information and begins adding the contact at step 25, but then repeats the same contact-creation operation until step 60. This stuck-loop behavior suggests that the agent can recover relevant values, but fails to track whether the data operation has already been comp… view at source ↗
Figure 7
Figure 7. Figure 7: Further analysis of STR-GRPO training dynamics and DataScope failure cases. [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
read the original abstract

Mobile GUI agents increasingly face long-horizon tasks that require reading, updating, and reusing task-relevant data across pages and applications. Existing memory methods treat memory largely as passive storage, where past observations are accumulated and retrieved when needed. Yet retrieving a value does not reveal its current role in the workflow. The agent must still infer from accumulated records whether the value should be used now, has already been used, or must wait for a later dependency. This implicit reconstruction becomes unreliable in long trajectories with similar fields, repeated values, distractors, and outdated states, causing repeated or missed operations. We propose Active Task Driving Memory (ATMem), which shifts GUI-agent memory from passive storage to an actively maintained execution state. ATMem maintains task-relevant information as a continually updated execution state that links each value to its role and current status, enabling action selection based on the current workflow state. We therefore introduce \textbf{STR-GRPO}, an online reinforcement learning method that learns to use ATMem selectively according to its contribution to task completion. STR-GRPO contrasts memory-on and memory-off rollouts to estimate when memory use improves execution, while memory-cost-aware reward discourages costly memory usage that does not improve execution. To evaluate whether agents can complete all in-scope work while avoiding out-of-scope actions over long-horizon execution, we build a challenging mobile benchmark. From a list of near identical entries, agents must act on every entry that satisfies the instruction and reject entries that violate its constraints.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that passive memory in GUI agents forces unreliable implicit inference of value roles and statuses in long-horizon tasks with distractors and repeated values, and proposes Active Task Driving Memory (ATMem) as an actively maintained execution state that explicitly links each value to its workflow role and current status. It introduces STR-GRPO, an online RL algorithm that contrasts memory-on and memory-off rollouts with a memory-cost-aware reward to learn selective use of ATMem, and presents a new mobile GUI benchmark requiring agents to act on every in-scope entry while rejecting out-of-scope ones.

Significance. If the empirical results hold, the work could usefully shift GUI-agent memory design toward explicit, actively updated state representations rather than passive retrieval. The benchmark's emphasis on complete coverage without extraneous actions addresses a relevant evaluation gap. STR-GRPO's on/off contrast provides a falsifiable way to measure memory utility, which is a methodological strength.

major comments (2)
  1. [§3] §3 (ATMem definition): the central claim that ATMem 'links each value to its role and current status' enabling reliable action selection is not supported by any description of the mechanism that populates or corrects those links. If role/status assignment is performed by the same LLM policy already shown to struggle with distractors, repeated values, and outdated states, the explicit representation relocates rather than removes the inference problem; this is load-bearing for the claim that ATMem improves execution over passive records.
  2. [§4] §4 (STR-GRPO): the memory-on vs memory-off contrast assumes that the difference isolates the benefit of explicit state maintenance, but without evidence that role/status links are assigned independently of the policy's inference errors, the contrast may simply compare two error-prone processes; this undermines the interpretation of the reward signal.
minor comments (2)
  1. [§3] The abstract and introduction repeatedly use 'execution state' without a formal definition or pseudocode; add a concise definition or diagram in §3.
  2. Table or figure captions for the benchmark should explicitly state the number of trajectories, average length, and distractor density to allow reproducibility assessment.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed review. The comments highlight important aspects of our claims regarding ATMem and STR-GRPO. We address each major comment below.

read point-by-point responses
  1. Referee: [§3] §3 (ATMem definition): the central claim that ATMem 'links each value to its role and current status' enabling reliable action selection is not supported by any description of the mechanism that populates or corrects those links. If role/status assignment is performed by the same LLM policy already shown to struggle with distractors, repeated values, and outdated states, the explicit representation relocates rather than removes the inference problem; this is load-bearing for the claim that ATMem improves execution over passive records.

    Authors: We agree that §3 currently presents ATMem at a conceptual level without sufficient detail on the population and correction mechanisms. The manuscript describes ATMem as a structured execution state that is actively updated during task execution, with the policy deciding updates based on new observations and workflow progress. The explicit linking is intended to reduce repeated implicit inference from raw records. However, we acknowledge the referee's point that without explicit mechanisms (e.g., update rules or examples), the claim risks relocating rather than resolving the inference burden. We will revise §3 to include a more precise description of the state update process, including how the policy interacts with the structured fields. revision: yes

  2. Referee: [§4] §4 (STR-GRPO): the memory-on vs memory-off contrast assumes that the difference isolates the benefit of explicit state maintenance, but without evidence that role/status links are assigned independently of the policy's inference errors, the contrast may simply compare two error-prone processes; this undermines the interpretation of the reward signal.

    Authors: The memory-on versus memory-off design in STR-GRPO is meant to isolate the value of explicit state access by giving the memory-off condition the same raw observations but without the structured ATMem representation. The reward signal derives from task completion metrics on the benchmark, which penalizes both missed in-scope actions and extraneous out-of-scope actions. We recognize that the links are not assigned by an independent oracle and that policy errors can affect both conditions; the contrast therefore measures net utility rather than pure isolation of maintenance quality. The empirical gains on long-horizon tasks with distractors support that the structured state provides a measurable advantage. We will add a limitations paragraph discussing this interpretive caveat while retaining the current experimental framing. revision: partial

Circularity Check

0 steps flagged

No circularity: proposal of new memory structure and RL method with no derivations or self-referential reductions.

full rationale

The paper proposes ATMem as an actively maintained execution state linking values to roles/status and STR-GRPO as an RL method contrasting memory-on/off rollouts with cost-aware rewards. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided text. The central claim is a design shift from passive to active memory, justified by described limitations of prior approaches rather than reducing to its own inputs by construction. The benchmark and evaluation are presented as external tests. This is a standard non-circular proposal of an architectural change.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 2 invented entities

Review based solely on abstract; no explicit free parameters, axioms, or invented entities beyond the named methods are described.

invented entities (2)
  • ATMem no independent evidence
    purpose: Active task-driving memory maintaining execution state with roles and status
    Introduced in abstract as the core proposed mechanism.
  • STR-GRPO no independent evidence
    purpose: Online RL method contrasting memory-on and memory-off rollouts with cost-aware reward
    Introduced in abstract as the training procedure for selective memory use.

pith-pipeline@v0.9.1-grok · 5828 in / 1233 out tokens · 30945 ms · 2026-07-01T05:24:43.836035+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

96 extracted references · 44 canonical work pages · 20 internal anchors

  1. [3]

    Spa-bench: A comprehensive benchmark for smartphone agent evaluation

    Jingxuan Chen, Derek Yuen, Bin Xie, Yuhao Yang, Gongwei Chen, Zhihao Wu, Li Yixing, Xurui Zhou, Weiwen Liu, Shuai Wang, et al. Spa-bench: A comprehensive benchmark for smartphone agent evaluation. In NeurIPS 2024 Workshop on Open-World Agents, 2024

  2. [7]

    Developing a computer use model

    DeepMind . Developing a computer use model . Google Blog, Oct 2025. URL https://blog.google/technology/google-deepmind/gemini-computer-use-model/. Accessed: October 22, 2025

  3. [8]

    Mobile-bench: An evaluation benchmark for llm-based mobile agents

    Shihan Deng, Weikai Xu, Hongda Sun, Wei Liu, Tao Tan, Liujianfeng Liujianfeng, Ang Li, Jian Luan, Bin Wang, Rui Yan, et al. Mobile-bench: An evaluation benchmark for llm-based mobile agents. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 8813--8831, 2024

  4. [11]

    Mobilegpt: Augmenting llm with human-like app memory for mobile task automation

    Sunjae Lee, Junyoung Choi, Jungjae Lee, Munim Hasan Wasi, Hojun Choi, Steve Ko, Sangeun Oh, and Insik Shin. Mobilegpt: Augmenting llm with human-like app memory for mobile task automation. In Proceedings of the 30th Annual International Conference on Mobile Computing and Networking, pages 1119--1133, 2024

  5. [15]

    Introducing openai o3 and o4-mini

    Team OpenAI. Introducing openai o3 and o4-mini. https://openai. com/index/introducing-o3-and-o4-mini/, 2025

  6. [16]

    Generative agents: Interactive simulacra of human behavior

    Joon Sung Park, Joseph O'Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S Bernstein. Generative agents: Interactive simulacra of human behavior. In Proceedings of the 36th annual acm symposium on user interface software and technology, pages 1--22, 2023

  7. [18]

    Androidinthewild: A large-scale dataset for android device control

    Christopher Rawles, Alice Li, Daniel Rodriguez, Oriana Riva, and Timothy Lillicrap. Androidinthewild: A large-scale dataset for android device control. Advances in Neural Information Processing Systems, 36: 0 59708--59728, 2023

  8. [20]

    Constructive memory: past and future

    Daniel L Schacter. Constructive memory: past and future. Dialogues in clinical neuroscience, 14 0 (1): 0 7--18, 2012

  9. [21]

    The cognitive neuroscience of constructive memory: Remembering the past and imagining the future

    Daniel L Schacter and Donna Rose Addis. The cognitive neuroscience of constructive memory: Remembering the past and imagining the future. Philosophical Transactions of the Royal Society B: Biological Sciences, 362 0 (1481): 0 773, 2007

  10. [22]

    Seed1.8 model card: Towards generalized real-world agency

    Bytedance Seed. Seed1.8 model card: Towards generalized real-world agency. arXiv preprint, December 2025 a . Technical Report

  11. [23]

    Ui-tars-1.5

    ByteDance Seed. Ui-tars-1.5. https://seed-tars.com/1.5, 2025 b

  12. [24]

    HybridFlow: A Flexible and Efficient RLHF Framework

    Guangming Sheng, Chi Zhang, Zilingfeng Ye, Xibin Wu, Wang Zhang, Ru Zhang, Yanghua Peng, Haibin Lin, and Chuan Wu. Hybridflow: A flexible and efficient rlhf framework. arXiv preprint arXiv: 2409.19256, 2024

  13. [27]

    Reflexion: Language agents with verbal reinforcement learning

    Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. Reflexion: Language agents with verbal reinforcement learning. Advances in neural information processing systems, 36: 0 8634--8652, 2023

  14. [28]

    Cognitive architectures for language agents

    Theodore Sumers, Shunyu Yao, Karthik R Narasimhan, and Thomas L Griffiths. Cognitive architectures for language agents. Transactions on Machine Learning Research, 2023

  15. [29]

    Fairy: Interactive mobile assistant to real-world tasks via lmm-based multi-agent

    Jiazheng Sun, Te Yang, Jiayang Niu, Mingxuan Li, Yongyong Lu, Ruimeng Yang, and Xin Peng. Fairy: Interactive mobile assistant to real-world tasks via lmm-based multi-agent. arXiv e-prints, pages arXiv--2509, 2025

  16. [31]

    Gelab-zero: An advanced mobile agent inference system, 2025

    GELab Team. Gelab-zero: An advanced mobile agent inference system, 2025. URL https://github.com/stepfun-ai/gelab-zero

  17. [35]

    Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration

    Junyang Wang, Haiyang Xu, Haitao Jia, Xi Zhang, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, and Jitao Sang. Mobile-agent-v2: Mobile device operation assistant with effective navigation via multi-agent collaboration. Advances in Neural Information Processing Systems, 37: 0 2686--2710, 2024 a

  18. [38]

    Autodroid: Llm-powered task automation in android

    Hao Wen, Yuanchun Li, Guohong Liu, Shanhui Zhao, Tao Yu, Toby Jia-Jun Li, Shiqi Jiang, Yunhao Liu, Yaqin Zhang, and Yunxin Liu. Autodroid: Llm-powered task automation in android. In Proceedings of the 30th annual international conference on Mobile computing and networking, pages 543--557, 2024

  19. [42]

    Androidlab: Training and systematic benchmarking of android autonomous agents

    Yifan Xu, Xiao Liu, Xueqiao Sun, Siyi Cheng, Hao Yu, Hanyu Lai, Shudan Zhang, Dan Zhang, Jie Tang, and Yuxiao Dong. Androidlab: Training and systematic benchmarking of android autonomous agents. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 2144--2166, 2025 b

  20. [43]

    Step-gui technical report, 2025

    Haolong Yan, Jia Wang, Xin Huang, Yeqing Shen, Ziyang Meng, Zhimin Fan, Kaijun Tan, Jin Gao, Lieyu Shi, Mi Yang, Shiliang Yang, Zhirui Wang, Brian Li, Kang An, Chenyang Li, Lei Lei, Mengmeng Duan, Danxun Liang, Guodong Liu, Hang Cheng, Hao Wu, Jie Dong, Junhao Huang, Mei Chen, Renjie Yu, Shunshan Li, Xu Zhou, Yiting Dai, Yineng Deng, Yingdan Liang, Zelin ...

  21. [47]

    Appagent: Multimodal agents as smartphone users

    Chi Zhang, Zhao Yang, Jiaxuan Liu, Yanda Li, Yucheng Han, Xin Chen, Zebiao Huang, Bin Fu, and Gang Yu. Appagent: Multimodal agents as smartphone users. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pages 1--20, 2025

  22. [50]

    Moba: multifaceted memory-enhanced adaptive planning for efficient mobile task automation

    Zichen Zhu, Hao Tang, Yansi Li, Dingye Liu, Hongshen Xu, Kunyao Lan, Danyang Zhang, Yixuan Jiang, Hao Zhou, Chenrun Wang, et al. Moba: multifaceted memory-enhanced adaptive planning for efficient mobile task automation. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Lang...

  23. [51]

    Transactions on Machine Learning Research , year=

    Cognitive architectures for language agents , author=. Transactions on Machine Learning Research , year=

  24. [52]

    Advances in neural information processing systems , volume=

    Reflexion: Language agents with verbal reinforcement learning , author=. Advances in neural information processing systems , volume=

  25. [53]

    Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

    Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

  26. [54]

    Proceedings of the AAAI Conference on Artificial Intelligence , volume=

    Expel: Llm agents are experiential learners , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

  27. [55]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  28. [56]

    , author=

    MemGPT: towards LLMs as operating systems. , author=. 2023 , publisher=

  29. [57]

    A-MEM: Agentic Memory for LLM Agents

    A-mem: Agentic memory for llm agents , author=. arXiv preprint arXiv:2502.12110 , year=

  30. [58]

    Mem0: Building Production-Ready AI Agents with Scalable Long-Term Memory

    Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

  31. [59]

    Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

    Memory os of ai agent , author=. Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing , pages=

  32. [60]

    arXiv preprint arXiv:2505.16067 , year=

    How memory management impacts llm agents: An empirical study of experience-following behavior , author=. arXiv preprint arXiv:2505.16067 , year=

  33. [61]

    arXiv preprint arXiv:2505.19549 , year=

    From Single to Multi-Granularity: Toward Long-Term Memory Association and Selection of Conversational Agents , author=. arXiv preprint arXiv:2505.19549 , year=

  34. [62]

    arXiv preprint arXiv:2507.22925 , year=

    Hierarchical memory for high-efficiency long-term reasoning in llm agents , author=. arXiv preprint arXiv:2507.22925 , year=

  35. [63]

    arXiv preprint arXiv:2511.18423 , year=

    General agentic memory via deep research , author=. arXiv preprint arXiv:2511.18423 , year=

  36. [64]

    Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory

    Evo-memory: Benchmarking llm agent test-time learning with self-evolving memory , author=. arXiv preprint arXiv:2511.20857 , year=

  37. [65]

    SimpleMem: Efficient Lifelong Memory for LLM Agents

    SimpleMem: Efficient Lifelong Memory for LLM Agents , author=. arXiv preprint arXiv:2601.02553 , year=

  38. [66]

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents

    MemSkill: Learning and Evolving Memory Skills for Self-Evolving Agents , author=. arXiv preprint arXiv:2602.02474 , year=

  39. [67]

    GAM: Hierarchical Graph-based Agentic Memory for LLM Agents

    GAM: Hierarchical Graph-based Agentic Memory for LLM Agents , author=. arXiv preprint arXiv:2604.12285 , year=

  40. [68]

    MemMachine: A Ground-Truth-Preserving Memory System for Personalized AI Agents

    MemMachine: A ground-truth-preserving memory system for personalized AI agents , author=. arXiv preprint arXiv:2604.04853 , year=

  41. [69]

    HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents

    HiGMem: A Hierarchical and LLM-Guided Memory System for Long-Term Conversational Agents , author=. arXiv preprint arXiv:2604.18349 , year=

  42. [70]

    arXiv preprint arXiv:2602.14038 , year=

    Choosing how to remember: Adaptive memory structures for llm agents , author=. arXiv preprint arXiv:2602.14038 , year=

  43. [71]

    Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=

    Appagent: Multimodal agents as smartphone users , author=. Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems , pages=

  44. [72]

    Proceedings of the 30th annual international conference on Mobile computing and networking , pages=

    Autodroid: Llm-powered task automation in android , author=. Proceedings of the 30th annual international conference on Mobile computing and networking , pages=

  45. [73]

    Proceedings of the 30th Annual International Conference on Mobile Computing and Networking , pages=

    Mobilegpt: Augmenting llm with human-like app memory for mobile task automation , author=. Proceedings of the 30th Annual International Conference on Mobile Computing and Networking , pages=

  46. [74]

    MobA: multifaceted memory-enhanced adaptive planning for efficient mobile task automation , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (System Demonstrations) , pages=

  47. [75]

    arXiv preprint arXiv:2501.11733 , year=

    Mobile-agent-e: Self-evolving mobile assistant for complex tasks , author=. arXiv preprint arXiv:2501.11733 , year=

  48. [76]

    arXiv e-prints , pages=

    Fairy: Interactive Mobile Assistant to Real-world Tasks via LMM-based Multi-agent , author=. arXiv e-prints , pages=

  49. [77]

    arXiv preprint arXiv:2601.19199 , year=

    MAGNET: Towards Adaptive GUI Agents with Memory-Driven Knowledge Evolution , author=. arXiv preprint arXiv:2601.19199 , year=

  50. [78]

    arXiv preprint arXiv:2602.05832 , year=

    UI-Mem: Self-Evolving Experience Memory for Online Reinforcement Learning in Mobile GUI Agents , author=. arXiv preprint arXiv:2602.05832 , year=

  51. [79]

    arXiv preprint arXiv:2602.06075 , year=

    MemGUI-Bench: Benchmarking Memory of Mobile GUI Agents in Dynamic Environments , author=. arXiv preprint arXiv:2602.06075 , year=

  52. [80]

    arXiv preprint arXiv:2603.10291 , year=

    Hybrid Self-evolving Structured Memory for GUI Agents , author=. arXiv preprint arXiv:2603.10291 , year=

  53. [81]

    arXiv preprint arXiv:2601.17418 , year=

    GraphPilot: GUI Task Automation with One-Step LLM Reasoning Powered by Knowledge Graph , author=. arXiv preprint arXiv:2601.17418 , year=

  54. [82]

    MGA: Memory-Driven GUI Agent for Observation-Centric Interaction

    Mga: Memory-driven gui agent for observation-centric interaction , author=. arXiv preprint arXiv:2510.24168 , year=

  55. [83]

    arXiv preprint arXiv:2603.18429 , year=

    AndroTMem: From Interaction Trajectories to Anchored Memory in Long-Horizon GUI Agents , author=. arXiv preprint arXiv:2603.18429 , year=

  56. [84]

    EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration

    EchoTrail-GUI: Building Actionable Memory for GUI Agents via Critic-Guided Self-Exploration , author=. arXiv preprint arXiv:2512.19396 , year=

  57. [85]

    SkillDroid: Compile Once, Reuse Forever

    SkillDroid: Compile Once, Reuse Forever , author=. arXiv preprint arXiv:2604.14872 , year=

  58. [86]

    Mobileworld: Benchmarking autonomous mobile agents in agent-user interactive and mcp-augmented environments.arXiv preprint arXiv:2512.19432, 2025

    MobileWorld: Benchmarking Autonomous Mobile Agents in Agent-User Interactive and MCP-Augmented Environments , author=. arXiv preprint arXiv:2512.19432 , year=

  59. [87]

    arXiv preprint arXiv:2512.22047 , year=

    MAI-UI Technical Report: Real-World Centric Foundation GUI Agents , author=. arXiv preprint arXiv:2512.22047 , year=

  60. [88]

    Mobile-agent-v3

    Mobile-agent-v3. 5: Multi-platform fundamental gui agents , author=. arXiv preprint arXiv:2602.16855 , year=

  61. [89]

    UI-TARS: Pioneering Automated GUI Interaction with Native Agents

    Ui-tars: Pioneering automated gui interaction with native agents , author=. arXiv preprint arXiv:2501.12326 , year=

  62. [90]

    UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

    Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning , author=. arXiv preprint arXiv:2509.02544 , year=

  63. [91]

    arXiv preprint arXiv:2508.10833 , year=

    Ui-venus technical report: Building high-performance ui agents with rft , author=. arXiv preprint arXiv:2508.10833 , year=

  64. [92]

    2025 , url=

    GELab-Zero: An Advanced Mobile Agent Inference System , author=. 2025 , url=

  65. [93]

    https://openai

    Introducing OpenAI o3 and o4-mini , author=. https://openai. com/index/introducing-o3-and-o4-mini/ , year=

  66. [94]

    arXiv preprint arXiv:2510.20286 , year=

    UI-Ins: Enhancing GUI Grounding with Multi-Perspective Instruction-as-Reasoning , author=. arXiv preprint arXiv:2510.20286 , year=

  67. [95]

    AndroidWorld: A Dynamic Benchmarking Environment for Autonomous Agents

    Androidworld: A dynamic benchmarking environment for autonomous agents , author=. arXiv preprint arXiv:2405.14573 , year=

  68. [96]

    Annual review of psychology , volume=

    The cognitive neuroscience of working memory , author=. Annual review of psychology , volume=. 2015 , publisher=

  69. [97]

    Annual review of neuroscience , volume=

    An integrative theory of prefrontal cortex function , author=. Annual review of neuroscience , volume=. 2001 , publisher=

  70. [98]

    Task set and prefrontal cortex , author=. Annu. Rev. Neurosci. , volume=. 2008 , publisher=

  71. [99]

    Trends in cognitive sciences , volume=

    Motivation of extended behaviors by anterior cingulate cortex , author=. Trends in cognitive sciences , volume=. 2012 , publisher=

  72. [100]

    Journal of Neuroscience , volume=

    Tracking progress toward a goal in corticostriatal ensembles , author=. Journal of Neuroscience , volume=. 2014 , publisher=

  73. [101]

    Nature , volume=

    Neural activity predicts individual differences in visual working memory capacity , author=. Nature , volume=. 2004 , publisher=

  74. [102]

    Trends in cognitive sciences , volume=

    The episodic buffer: a new component of working memory? , author=. Trends in cognitive sciences , volume=. 2000 , publisher=

  75. [103]

    2024 , journal =

    HybridFlow: A Flexible and Efficient RLHF Framework , author =. 2024 , journal =

  76. [104]

    Advances in Neural Information Processing Systems , volume=

    Androidinthewild: A large-scale dataset for android device control , author=. Advances in Neural Information Processing Systems , volume=

  77. [105]

    Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

    Mobile-bench: An evaluation benchmark for llm-based mobile agents , author=. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

  78. [106]

    arXiv preprint arXiv:2511.09157 , year=

    ProBench: Benchmarking GUI Agents with Accurate Process Information , author=. arXiv preprint arXiv:2511.09157 , year=

  79. [107]

    arXiv preprint arXiv:2501.01149 , year=

    A3: Android agent arena for mobile gui agents , author=. arXiv preprint arXiv:2501.01149 , year=

  80. [108]

    NeurIPS 2024 Workshop on Open-World Agents , year=

    Spa-bench: A comprehensive benchmark for smartphone agent evaluation , author=. NeurIPS 2024 Workshop on Open-World Agents , year=

Showing first 80 references.