arxiv: 2604.27955 · v1 · submitted 2026-04-30 · 💻 cs.AI · cs.CV

Recognition: unknown

GUI Agents with Reinforcement Learning: Toward Digital Inhabitants

Junan Hu , Jian Liu , Jingxiang Lai , Jiarui Hu , Yiwei Sheng , Shuang Chen , Jian Li , Dazhao Du

show 1 more author

Song Guo

Authors on Pith no claims yet

Pith reviewed 2026-05-07 05:43 UTC · model grok-4.3

classification 💻 cs.AI cs.CV

keywords GUI agentsreinforcement learningtaxonomyreward engineeringworld modelsdigital inhabitantsoffline RLhybrid strategies

0 comments

The pith

Reinforcement learning organizes GUI agent methods into offline, online, and hybrid categories on the path to digital inhabitants.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

GUI agents perceive and act on computer screens through visual interfaces, yet supervised training alone cannot manage extended sequences of actions or adapt safely to changing conditions. This paper reviews the role of reinforcement learning in overcoming those limits by surveying current methods and grouping them under a new taxonomy. The taxonomy places approaches into offline learning from stored data, online learning through live interaction, and hybrid combinations of the two. It further examines how rewards are crafted, how training data can be used more efficiently, and which technical choices are gaining traction. If the patterns hold, GUI agents could progress from narrow automation tools to more autonomous systems capable of sustained operation in digital environments.

Core claim

The authors establish that reinforcement learning is central to GUI agent development because supervised fine-tuning cannot handle long-horizon credit assignment, distribution shifts, and safe exploration in irreversible environments. They introduce a principled taxonomy that organizes existing methods into Offline RL, Online RL, and Hybrid Strategies, and they complement the taxonomy with examinations of reward engineering, data efficiency, and key technical innovations. Their analysis identifies emerging trends: composite multi-tier reward architectures that address the reliability-scalability tension, world-model-based training that overcomes GUI I/O latency, and the spontaneous emergence

What carries the argument

A three-category taxonomy of Offline RL, Online RL, and Hybrid Strategies that classifies reinforcement learning methods for GUI agents and supports analysis of reward architectures and world models.

If this is right

Composite multi-tier reward architectures can balance reliability and scalability in agent training.
World-model-based training can deliver substantial performance gains by reducing reliance on slow GUI input-output operations.
Rich reward signals can produce emergent System-2 deliberation without the need for explicit reasoning supervision.
Future progress depends on advancing process rewards, continual reinforcement learning, cognitive architectures, and safe deployment practices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Agents built under this taxonomy could eventually transfer skills across different software applications without full retraining.
The three-way split may highlight gaps where new hybrid designs are needed to combine data efficiency with real-time adaptability.
Emergent deliberation implies that complex internal reasoning could arise naturally once reward signals are sufficiently detailed.
Safe deployment methods will be required once agents begin handling real user data and irreversible actions on personal devices.

Load-bearing premise

The existing literature on reinforcement learning for GUI agents can be comprehensively and accurately captured by the proposed offline, online, and hybrid taxonomy without significant omissions or selection bias.

What would settle it

A search that identifies multiple prominent RL-based GUI agent papers that cannot be placed in any of the three taxonomy categories, or experiments showing that world-model training produces no performance gains over direct GUI interaction.

Figures

Figures reproduced from arXiv: 2604.27955 by Dazhao Du, Jian Li, Jian Liu, Jiarui Hu, Jingxiang Lai, Junan Hu, Shuang Chen, Song Guo, Yiwei Sheng.

**Figure 1.** Figure 1: Overview of the survey structure. Sections 1 and 2 introduce the background of GUI agent; Sec view at source ↗

**Figure 2.** Figure 2: Overview of the RL training pipeline for GUI agents. The agent perceives the GUI environment view at source ↗

**Figure 3.** Figure 3: A taxonomy of representative GUI agent papers organised along five dimensions: view at source ↗

**Figure 4.** Figure 4: Timeline of GUI Agent Development. Grounding-specialized models. Visual grounding—precise mapping from natural language to screen coordinates—has emerged as a specialized focus for RL optimization, building on foundational work in universal visual grounding (Gou et al., 2024), unified pure vision agents (Xu et al., 2024d; Chen et al., 2026c), Aria-UI (Yang et al., 2025c), and Phi-Ground (Zhang et al., 2025… view at source ↗

**Figure 5.** Figure 5: The Reward Engineering Pyramid balances accuracy and generality for GUI Agents: rule-based view at source ↗

**Figure 6.** Figure 6: This pyramid depicts a four-stage data-training pipeline for agent capability, progressing from view at source ↗

**Figure 7.** Figure 7: An asynchronous distributed architecture for GUI RL agent training, addressing slow environment view at source ↗

read the original abstract

Graphical User Interface (GUI) agents have emerged as a promising paradigm for intelligent systems that perceive and interact with graphical interfaces visually. Yet supervised fine-tuning alone cannot handle long-horizon credit assignment, distribution shifts, and safe exploration in irreversible environments, making Reinforcement Learning (RL) a central methodology for advancing automation. In this work, we present the first comprehensive overview of the intersection between RL and GUI agents, and examine how this research direction may evolve toward digital inhabitants. We propose a principled taxonomy that organizes existing methods into Offline RL, Online RL, and Hybrid Strategies, and complement it with analyses of reward engineering, data efficiency, and key technical innovations. Our analysis reveals several emerging trends: the tension between reliability and scalability is motivating the adoption of composite, multi-tier reward architectures; GUI I/O latency bottlenecks are accelerating the shift toward world-model-based training, which can yield substantial performance gains; and the spontaneous emergence of System-2-style deliberation suggests that explicit reasoning supervision may not be necessary when sufficiently rich reward signals are available. We distill these findings into a roadmap covering process rewards, continual RL, cognitive architectures, and safe deployment, aiming to guide the next generation of robust GUI automation and its agent-native infrastructure.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims to offer the first comprehensive overview of reinforcement learning applied to GUI agents. It introduces a principled taxonomy categorizing methods into Offline RL, Online RL, and Hybrid Strategies. Complementary analyses cover reward engineering, data efficiency, and technical innovations. Emerging trends identified include the use of composite multi-tier reward architectures to balance reliability and scalability, a shift to world-model-based training to mitigate GUI I/O latency issues, and the spontaneous emergence of System-2-style deliberation from rich reward signals. The work concludes with a roadmap for future directions such as process rewards, continual RL, cognitive architectures, and safe deployment toward digital inhabitants.

Significance. If the proposed taxonomy proves to be comprehensive and the trends are based on a representative sample of the literature, this survey would be highly significant. It would provide a structured framework for understanding the current state of RL in GUI agents, highlight key challenges and innovations, and offer a forward-looking roadmap that could guide research in developing more advanced, autonomous GUI agents. This could accelerate progress in the field by standardizing approaches and identifying promising research avenues. The focus on evolving toward 'digital inhabitants' adds a visionary aspect that may inspire new lines of inquiry.

major comments (2)

[Taxonomy (Section 3)] The central claim rests on a 'principled taxonomy' that partitions existing RL-for-GUI methods into Offline RL, Online RL, and Hybrid Strategies. However, the manuscript does not specify the categorization criteria, provide a mapping table of all cited methods to these categories, or discuss potential overlaps or methods that fall outside these bins (e.g., pure imitation learning approaches). This is load-bearing for the comprehensiveness claim and the validity of the derived trends.
[Literature Review Methodology (Introduction or Section 2)] No search protocol, databases, keywords, date range, or inclusion criteria are described. This omission is critical because the trends (composite rewards, world-model shift, emergent deliberation) depend on the selected literature being representative without selection bias. The paper should either detail the review process or acknowledge limitations in coverage.

minor comments (2)

[Abstract] The abstract mentions 'analyses of reward engineering, data efficiency, and key technical innovations' but the corresponding sections could benefit from more quantitative comparisons or summary tables to support the qualitative trends.
[Roadmap section] The roadmap is presented at a high level; including specific open problems or example research questions would enhance its utility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We sincerely thank the referee for their constructive and insightful review. We appreciate the positive assessment of the paper's potential significance in providing a structured framework for RL applied to GUI agents and the forward-looking roadmap toward digital inhabitants. We agree that the taxonomy requires explicit criteria and a mapping to support the comprehensiveness claim, and that the literature review process should be documented to address potential selection bias concerns. We will undertake major revisions to incorporate these elements, as detailed in our point-by-point responses below.

read point-by-point responses

Referee: [Taxonomy (Section 3)] The central claim rests on a 'principled taxonomy' that partitions existing RL-for-GUI methods into Offline RL, Online RL, and Hybrid Strategies. However, the manuscript does not specify the categorization criteria, provide a mapping table of all cited methods to these categories, or discuss potential overlaps or methods that fall outside these bins (e.g., pure imitation learning approaches). This is load-bearing for the comprehensiveness claim and the validity of the derived trends.

Authors: We agree that explicitly defining the categorization criteria is necessary to substantiate the taxonomy and derived trends. In the revised manuscript, we will add a dedicated subsection in Section 3 ('Taxonomy Criteria and Scope') that specifies the partitioning rules: Offline RL covers methods that train exclusively on pre-collected, static GUI interaction datasets using offline algorithms without further live interaction (to avoid unsafe exploration in irreversible environments); Online RL includes methods that perform iterative, direct interaction with GUI environments (real or simulated) during training; and Hybrid Strategies combine both phases, such as offline pre-training on large-scale data followed by online fine-tuning or adaptation. We will insert a new mapping table (Table 1) that enumerates all cited methods, assigns each to its primary category with a one-sentence justification, and flags any boundary cases. Regarding overlaps and out-of-scope methods, we will add explicit discussion noting that pure imitation learning approaches are classified under Offline RL when they rely on demonstration data for behavioral cloning without reward-driven optimization, but hybrid cases (e.g., imitation-initialized RL) are placed in Hybrid; methods falling outside (e.g., non-RL supervised fine-tuning or non-GUI agents) are excluded per the survey scope focused on RL for GUI agents. These additions will directly support the validity of trends such as composite rewards and world-model shifts. revision: yes
Referee: [Literature Review Methodology (Introduction or Section 2)] No search protocol, databases, keywords, date range, or inclusion criteria are described. This omission is critical because the trends (composite rewards, world-model shift, emergent deliberation) depend on the selected literature being representative without selection bias. The paper should either detail the review process or acknowledge limitations in coverage.

Authors: We acknowledge that documenting the review methodology is essential for a survey claiming comprehensiveness and to mitigate concerns about selection bias in the identified trends. In the revised manuscript, we will insert a new subsection titled 'Literature Review Methodology' (placed in Section 2 or immediately following the introduction) that details: databases searched (arXiv, Google Scholar, IEEE Xplore, ACL Anthology, and major conference proceedings from 2020 onward); search keywords and Boolean queries (e.g., ('GUI agent' OR 'graphical user interface agent') AND ('reinforcement learning' OR 'RL') AND ('visual' OR 'screenshot')); date range (primarily January 2018–December 2024, capturing the emergence of modern GUI agents while including foundational works); and inclusion criteria (empirical papers applying RL to GUI agents with visual perception-action loops, reporting metrics on task completion or efficiency; exclusion of pure prompting/LLM-only methods without learning, non-visual interfaces, or non-agent UI design papers). We will also add a limitations paragraph noting that while the sampled literature represents prominent and highly-cited works in the field, the rapidly evolving nature of the area may omit the most recent preprints, and trends are derived from this representative but not exhaustive set. This will strengthen confidence in the analyses of reward engineering, data efficiency, and emerging patterns. revision: yes

Circularity Check

0 steps flagged

No circularity: survey taxonomy organizes external literature without self-referential derivation

full rationale

This paper is a literature survey that proposes a taxonomy (Offline RL, Online RL, Hybrid Strategies) to organize existing published methods and identifies interpretive trends in reward engineering and world models. No new mathematical derivations, fitted parameters, or predictions are generated from the paper's own data or definitions. The central claims rest on citations to external prior work rather than reducing to self-citations, ansatzes, or fitted inputs presented as novel results. No equations or self-definitional loops appear in the provided abstract or framing. The absence of an explicit search protocol affects completeness but does not create circularity by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claims rest on the assumption that supervised fine-tuning is insufficient for GUI agent challenges and that a three-category taxonomy plus trend analysis can comprehensively capture the field; these are domain assumptions and author-proposed structures without independent verification from the abstract.

axioms (2)

domain assumption Supervised fine-tuning alone cannot handle long-horizon credit assignment, distribution shifts, and safe exploration in irreversible environments
Invoked in the abstract to position RL as central for GUI agents.
ad hoc to paper Existing methods can be organized into a principled taxonomy of Offline RL, Online RL, and Hybrid Strategies
Proposed by the authors as the organizing framework for the overview.

invented entities (1)

digital inhabitants no independent evidence
purpose: To frame the long-term goal of advanced, autonomous GUI agents
New conceptual term introduced to describe the evolutionary direction of the research.

pith-pipeline@v0.9.0 · 5533 in / 1496 out tokens · 38123 ms · 2026-05-07T05:43:46.207186+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

88 extracted references · 82 canonical work pages · 25 internal anchors

[1]

GPT-4 Technical Report

Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774,

work page internal anchor Pith review arXiv
[2]

Automated reinforcement learning: An overview.arXiv preprint arXiv:2201.05000,

Reza Refaei Afshar, Yingqian Zhang, Joaquin Vanschoren, and Uzay Kaymak. Automated reinforcement learning: An overview.arXiv preprint arXiv:2201.05000,

work page arXiv
[3]

arXiv preprint arXiv:2410.08164 , year =

Saaket Agashe, Jiuzhou Han, Shuyu Gan, Jiachen Yang, Ang Li, and Xin Eric Wang. Agent s: An open agentic framework that uses computers like a human.arXiv preprint arXiv:2410.08164,

work page arXiv
[4]

Agent S2: A compositional generalist-specialist framework for computer use agents.arXiv preprint arXiv:2504.00906, 2025

Saaket Agashe, Kyle Wong, Vincent Tu, Jiachen Yang, Ang Li, and Xin Eric Wang. Agent s2: A composi- tional generalist-specialist framework for computer use agents.arXiv preprint arXiv:2504.00906,

work page arXiv
[5]

[Accessed 09-02-2026]. 34 GUI Agents with Reinforcement Learning: Toward Digital Inhabitants Tajamul Ashraf, Amal Saqib, Hanan Ghani, Muhra AlMahri, Yuhao Li, Noor Ahsan, Umair Nawaz, Jean Lahoud, Hisham Cholakkal, Mubarak Shah, et al. Agent-x: Evaluating deep multimodal reasoning in vision-centric agentic tasks.arXiv preprint arXiv:2505.24876,

work page arXiv 2026
[6]

Digi-q: Learning q-value functions for training device-control agents.arXiv preprint arXiv:2502.15760, 2025a

Hao Bai, Yifei Zhou, Li Erran Li, Sergey Levine, and Aviral Kumar. Digi-q: Learning q-value functions for training device-control agents.arXiv preprint arXiv:2502.15760, 2025a. Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, Wenbin Ge, Zhifang Guo, Qidong Huang, Jie Huang, Fei H...

work page arXiv
[7]

Terminal Agents Suffice for Enterprise Automation

Patrice Bechard, Orlando Marquez Ayala, Emily Chen, Jordan Skelton, Sagar Davasam, Srinivas Sunkara, Vikas Yadav, and Sai Rajeswar. Terminal agents suffice for enterprise automation.arXiv preprint arXiv:2604.00073,

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Bonatti, D

Rogerio Bonatti, Dan Zhao, Francesco Bonacci, Dillon Dupont, Sara Abdali, Yinheng Li, Yadong Lu, Justin Wagle, Kazuhito Koishida, Arthur Bucker, et al. Windows agent arena: Evaluating multi-modal os agents at scale.arXiv preprint arXiv:2409.08264,

work page arXiv
[9]

[Accessed 09-02-2026]. S. Cai, Y. Qin, H. Lin, Z. Xu, G. Li, Y. Shi, Z. Li, Y. Mao, S. Cai, X. Tan, Y. Liang, K. Li, and X. Sun. Smartsnap: Proactive evidence seeking for self-verifying agents.arXiv preprint arXiv:2512.22322,

work page arXiv 2026
[10]

C. Chen, J. Shao, D. Lu, H. Hu, X. Liu, H. Yao, and W. Liu. Gui-eyes: Tool-augmented perception for visual grounding in gui agents.arXiv preprint arXiv:2601.09770, 2026a. Haotian Chen, Xin Cong, Shengda Fan, Yuyang Fu, Ziqin Gong, Yaxi Lu, Yishan Li, Boye Niu, Chengjun Pan, Zijun Song, et al. Agentcpm-explore: Realizing long-horizon deep exploration for e...

work page arXiv
[11]

Unify-agent: A unified multimodal agent for world-grounded image synthesis.arXiv preprint arXiv:2603.29620, 2026

35 GUI Agents with Reinforcement Learning: Toward Digital Inhabitants Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi-Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, et al. Unify-agent: A unified multimodal agent for world- grounded image synthesis.arXiv preprint arXiv:2603.29620, 2026c. Z. Chen, Z. Zhao, K. Zhang, ...

work page arXiv
[12]

Elmur: External layer memory with up- date/rewrite for long-horizon rl.arXiv preprint arXiv:2510.07151,

Egor Cherepanov, Alexey K Kovalev, and Aleksandr I Panov. Elmur: External layer memory with up- date/rewrite for long-horizon rl.arXiv preprint arXiv:2510.07151,

work page arXiv
[13]

The browsergym ecosystem for web agent research.arXiv preprint arXiv:2412.05467,

De Chezelles, Thibault Le Sellier, Sahar Omidi Shayegan, Lawrence Keunho Jang, Xing Han Lù, Ori Yoran, Dehan Kong, Frank F Xu, Siva Reddy, Quentin Cappart, et al. The browsergym ecosystem for web agent research.arXiv preprint arXiv:2412.05467,

work page arXiv
[14]

Gemini 2.5: Pushing the Frontier with Advanced Reasoning, Multimodality, Long Context, and Next Generation Agentic Capabilities

Gheorghe Comanici, Eric Bieber, Mike Schaekermann, Ice Pasupat, Noveen Sachdeva, Inderjit Dhillon, Marcel Blistein, Ori Ram, Dan Zhang, Evan Rosen, et al. Gemini 2.5: Pushing the frontier with ad- vanced reasoning, multimodality, long context, and next generation agentic capabilities.arXiv preprint arXiv:2507.06261,

work page internal anchor Pith review arXiv
[15]

Agentic reward modeling: Verifying gui agent via online proactive interaction.arXiv preprint arXiv:2602.00575,

Chaoqun Cui, Jing Huang, Shijing Wang, Liming Zheng, Qingchao Kong, and Zhixiong Zeng. Agentic reward modeling: Verifying gui agent via online proactive interaction.arXiv preprint arXiv:2602.00575,

work page arXiv
[16]

Process Reinforcement through Implicit Rewards

Ganqu Cui, Lifan Yuan, Zefan Wang, Hanbin Wang, Yuchen Zhang, Jiacheng Chen, Wendi Li, Bingxiang He, Yuchen Fan, Tianyu Yu, et al. Process reinforcement through implicit rewards.arXiv preprint arXiv:2502.01456,

work page internal anchor Pith review arXiv
[17]

G. Dai, S. Jiang, T. Cao, Y. Yang, Y. Li, R. Tan, M. Li, and L. Qiu. Prore: A proactive reward system for gui agents via reasoner–actor collaboration.arXiv preprint arXiv:2509.21823,

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar

[Accessed 09-02-2026]. Biplab Deka, Zifeng Huang, Chad Franzen, Joshua Hibschman, Daniel Afergan, Yang Li, Jeffrey Nichols, and Ranjitha Kumar. Rico: A mobile app dataset for building data-driven design applications. InProceedings of the 30th annual ACM symposium on user interface software and technology, pp. 845–854,

2026
[19]

Simura: A world-model-driven simulative reasoning architecture for general goal-oriented agents.arXiv preprint arXiv:2507.23773,

Mingkai Deng, Jinyu Hou, Zhiting Hu, and Eric Xing. Simura: A world-model-driven simulative reasoning architecture for general goal-oriented agents.arXiv preprint arXiv:2507.23773,

work page arXiv
[20]

DynaWeb: Model-Based Reinforcement Learning of Web Agents

Hang Ding, Peidong Liu, Junqiao Wang, Ziwei Ji, Meng Cao, Rongzhao Zhang, Lynn Ai, Eric Yang, Tianyu Shi, and Lei Yu. Dynaweb: Model-based reinforcement learning of web agents.arXiv preprint arXiv:2601.22149,

work page internal anchor Pith review Pith/arXiv arXiv
[21]

Agentic entropy-balanced policy optimization.arXiv preprint arXiv:2510.14545,

36 GUI Agents with Reinforcement Learning: Toward Digital Inhabitants Guanting Dong, Licheng Bao, Zhongyuan Wang, Kangzhi Zhao, Xiaoxi Li, Jiajie Jin, Jinghan Yang, Hangyu Mao, Fuzheng Zhang, Kun Gai, et al. Agentic entropy-balanced policy optimization.arXiv preprint arXiv:2510.14545,

work page arXiv
[22]

Memr3: Memory retrieval via reflective reasoning for llm agents.arXiv preprint arXiv:2512.20237,

Xingbo Du, Loka Li, Duzhen Zhang, and Le Song. Memr3: Memory retrieval via reflective reasoning for llm agents.arXiv preprint arXiv:2512.20237,

work page arXiv
[23]

arXiv preprint arXiv:2503.09572 (2025)

Lutfi Eren Erdogan, Nicholas Lee, Sehoon Kim, Suhong Moon, Hiroki Furuta, Gopala Anumanchipalli, Kurt Keutzer, and Amir Gholami. Plan-and-act: Improving planning of agents for long-horizon tasks.arXiv preprint arXiv:2503.09572,

work page arXiv
[24]

Gui-bee: Align gui action grounding to novel environments via autonomous exploration.arXiv preprint arXiv:2501.13896,

Yue Fan, Handong Zhao, Ruiyi Zhang, Yu Shen, Xin Eric Wang, and Gang Wu. Gui-bee: Align gui action grounding to novel environments via autonomous exploration.arXiv preprint arXiv:2501.13896,

work page arXiv
[25]

Group-in-Group Policy Optimization for LLM Agent Training

Lang Feng, Zhenghai Xue, Tingcong Liu, and Bo An. Group-in-group policy optimization for llm agent training.arXiv preprint arXiv:2505.10978,

work page internal anchor Pith review arXiv
[26]

Mano technical report

Tianyu Fu, Anyang Su, Chenxu Zhao, Hanning Wang, Minghui Wu, Zhe Yu, Fei Hu, Mingjia Shi, Wei Dong, Jiayao Wang, et al. Mano technical report.arXiv preprint arXiv:2509.17336, 2025a. Wei Fu, Jiaxuan Gao, Xujie Shen, Chen Zhu, Zhiyu Mei, Chuyi He, Shusheng Xu, Guo Wei, Jun Mei, Jiashu Wang, et al. Areal: A large-scale asynchronous reinforcement learning sys...

work page arXiv
[27]

Websynthesis: World-model-guided mcts for efficient webui-trajectory synthesis.arXiv preprint arXiv:2507.04370,

Yifei Gao, Junhong Ye, Jiaqi Wang, and Jitao Sang. Websynthesis: World-model-guided mcts for efficient webui-trajectory synthesis.arXiv preprint arXiv:2507.04370,

work page arXiv
[28]

End-to-end navigation with vision language models: Transforming spatial reasoning into question-answering,

Dylan Goetting, Himanshu Gaurav Singh, and Antonio Loquercio. End-to-end navigation with vision lan- guage models: Transforming spatial reasoning into question-answering.arXiv preprint arXiv:2411.05755,

work page arXiv
[29]

Hongliang He, Wenlin Yao, Kaixin Ma, Wenhao Yu, Yong Dai, Hongming Zhang, Zhenzhong Lan, and Dong Yu

Boyu Gou, Ruohan Wang, Boyuan Zheng, Yanan Xie, Cheng Chang, Yiheng Shu, Huan Sun, and Yu Su. Navigating the digital world as humans do: Universal visual grounding for gui agents.arXiv preprint arXiv:2410.05243,

work page arXiv
[30]

arXiv preprint arXiv:2411.06559 , year=

Yu Gu, Kai Zhang, Yuting Ning, Boyuan Zheng, Boyu Gou, Tianci Xue, Cheng Chang, Sanjari Srivastava, Yanan Xie, Peng Qi, et al. Is your llm secretly a world model of the internet? model-based planning for web agents.arXiv preprint arXiv:2411.06559,

work page arXiv
[31]

Seed1.5-VL Technical Report

Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Ruoyu Zhang, Runxin Xu, Qihao Zhu, Shirong Ma, Peiyi Wang, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.Nature, 2025a. Dong Guo, Faming Wu, Feida Zhu, Fuxing Leng, Guang Shi, Haobin Chen, Haoqi Fan, Jian Wang, Jianyu Jiang, Jiawei Wang, et al. Seed1. 5...

work page internal anchor Pith review arXiv 2023
[32]

Hierarchy-of-groups policy optimization for long-horizon agentic tasks.arXiv preprint arXiv:2602.22817,

Shuo He, Lang Feng, Qi Wei, Xin Cheng, Lei Feng, and Bo An. Hierarchy-of-groups policy optimization for long-horizon agentic tasks.arXiv preprint arXiv:2602.22817,

work page arXiv
[33]

A data-driven approach for learning to control computers.arXiv preprint arXiv:2202.08137,

Peter C Humphreys, David Raposo, Toby Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Alex Goldin, Adam Santoro, and Timothy Lillicrap. A data-driven approach for learning to control computers.arXiv preprint arXiv:2202.08137,

work page arXiv
[34]

OpenAI o1 System Card

Aaron Jaech, Adam Kalai, Adam Lerer, Adam Richardson, Ahmed El-Kishky, Aiden Low, Alec Helyar, Alek- sander Madry, Alex Beutel, Alex Carney, et al. Openai o1 system card.arXiv preprint arXiv:2412.16720,

work page internal anchor Pith review Pith/arXiv arXiv
[35]

Osworld-mcp: Benchmarking mcp tool invocation in computer-use agents.arXiv preprint arXiv:2510.24563, 2025

Hongrui Jia, Jitong Liao, Xi Zhang, Haiyang Xu, Tianbao Xie, Chaoya Jiang, Ming Yan, Si Liu, Wei Ye, and Fei Huang. Osworld-mcp: Benchmarking mcp tool invocation in computer-use agents.arXiv preprint arXiv:2510.24563,

work page arXiv
[36]

arXiv preprint arXiv:2505.05262 , year=

Andreas Kontogiannis, Konstantinos Papathanasiou, Yi Shen, Giorgos Stamou, Michael M Zavlanos, and George Vouros. Enhancing cooperative multi-agent reinforcement learning with state modelling and ad- versarial exploration.arXiv preprint arXiv:2505.05262,

work page arXiv
[37]

Os-harm: A benchmark for measuring safety of computer use agents

Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, Nicolas Flammarion, and Maksym Andriushchenko. Os-harm: A benchmark for measuring safety of computer use agents.arXiv preprint arXiv:2506.14866,

work page arXiv
[38]

Computerrl: Scaling end-to-end online reinforcement learning for computer use agents.arXiv preprint arXiv:2508.14040,

Hanyu Lai, Xiao Liu, Yanxiao Zhao, Han Xu, Hanchen Zhang, Bohao Jing, Yanyu Ren, Shuntian Yao, Yuxiao Dong, and Jie Tang. Computerrl: Scaling end-to-end online reinforcement learning for computer use agents.arXiv preprint arXiv:2508.14040,

work page arXiv
[39]

Nested browser-use learning for agentic information seeking

Baixuan Li, Jialong Wu, Wenbiao Yin, Kuan Li, Zhongwang Zhang, Huifeng Yin, Zhengwei Tao, Liwen Zhang, Pengjun Xie, Jingren Zhou, et al. Nested browser-use learning for agentic information seeking. arXiv preprint arXiv:2512.23647, 2025a. Kaixin Li, Ziyang Meng, Hongzhan Lin, Ziyang Luo, Yuchen Tian, Jing Ma, Zhiyong Huang, and Tat-Seng Chua. Screenspot-pr...

work page arXiv
[40]

UI-AGILE: Advancing GUI Agents with Effective Reinforcement Learning and Precise Inference-Time Grounding

Zhangheng Li, Keen You, Haotian Zhang, Di Feng, Harsh Agrawal, Xiujun Li, Mohana Prasad Sathya Moor- thy, Jeff Nichols, Yinfei Yang, and Zhe Gan. Ferret-ui 2: Mastering universal user interface understanding across platforms. InInternational Conference on Learning Representations (ICLR), 2024b. Shuquan Lian, Yuhang Wu, Jia Ma, Yifan Ding, Zihan Song, Bing...

work page internal anchor Pith review Pith/arXiv arXiv
[41]

Showui: One vision-language-action model for gui visual agent

39 GUI Agents with Reinforcement Learning: Toward Digital Inhabitants Kevin Qinghong Lin, Linjie Li, Difei Gao, Zhengyuan Yang, Shiwei Wu, Zechen Bai, Stan Weixian Lei, Lijuan Wang, and Mike Zheng Shou. Showui: One vision-language-action model for gui visual agent. In Proceedings of the Computer Vision and Pattern Recognition Conference, pp. 19498–19508, ...

work page arXiv
[42]

Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, and Fei Wu

arXiv preprint arXiv:2508.05731. Yuhang Liu, Pengxiang Li, Zishu Wei, Congkai Xie, Xueyu Hu, Xinchen Xu, Shengyu Zhang, Xiaotian Han, Hongxia Yang, and Fei Wu. Infiguiagent: A multimodal generalist gui agent with native reasoning and reflection.arXiv preprint arXiv:2501.04575, 2025b. Yuhang Liu, Zeyu Liu, Shuanghe Zhu, Pengxiang Li, Congkai Xie, Jiasheng ...

work page arXiv
[43]

Webchorearena: Evaluating web browsing agents on realistic tedious web tasks.arXiv preprint arXiv:2506.01952,

Atsuyuki Miyai, Zaiying Zhao, Kazuki Egashira, Atsuki Sato, Tatsumi Sunada, Shota Onohara, Hiromasa Yamanishi, Mashiro Toyooka, Kunato Nishina, Ryoma Maeda, et al. Webchorearena: Evaluating web browsing agents on realistic tedious web tasks.arXiv preprint arXiv:2506.01952,

work page arXiv
[44]

Playing Atari with Deep Reinforcement Learning

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Alex Graves, Ioannis Antonoglou, Daan Wierstra, and Martin Riedmiller. Playing atari with deep reinforcement learning.arXiv preprint arXiv:1312.5602,

work page internal anchor Pith review arXiv
[45]

WebGPT: Browser-assisted question-answering with human feedback

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, et al. Webgpt: Browser-assisted question-answering with human feedback.arXiv preprint arXiv:2112.09332,

work page internal anchor Pith review arXiv
[46]

Ui- vision: A desktop-centric gui benchmark for visual perception and interaction.arXiv preprint arXiv:2503.15661, 2025

Shravan Nayak, Xiangru Jian, Kevin Qinghong Lin, Juan A Rodriguez, Montek Kalsi, Rabiul Awal, Nicolas Chapados, M Tamer Özsu, Aishwarya Agrawal, David Vazquez, et al. Ui-vision: A desktop-centric gui benchmark for visual perception and interaction.arXiv preprint arXiv:2503.15661,

work page arXiv
[47]

Gui agents: A survey

Dang Nguyen, Jian Chen, Yu Wang, Gang Wu, Namyong Park, Zhengmian Hu, Hanjia Lyu, Junda Wu, Ryan Aponte, Yu Xia, et al. Gui agents: A survey. InFindings of the Association for Computational Linguistics: ACL 2025, pp. 22522–22538,

2025
[48]

[Accessed 09-02-2026]. OpenAI. Computer-using agent — openai.com.https://openai.com/index/computer-using-agent/, 2025a. [Accessed 09-02-2026]. OpenAI. Computer-using agent, 2025b. URLhttps://openai.com/index/computer-using-agent/. Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina ...

2026
[49]

Explorer: Scaling exploration-driven web trajectory synthesis for multimodal web agents

Vardaan Pahuja, Yadong Lu, Corby Rosset, Boyu Gou, Arindam Mitra, Spencer Whitehead, Yu Su, and Ahmed Hassan. Explorer: Scaling exploration-driven web trajectory synthesis for multimodal web agents. InFindings of the Association for Computational Linguistics: ACL 2025, pp. 6300–6323,

2025
[50]

Zehan Qi, Xiao Liu, Iat Long Iong, Hanyu Lai, Xueqiao Sun, Wenyi Zhao, Yu Yang, Xinyue Yang, Jiadai Sun, Shuntian Yao, et al

Yichen Pan, Dehan Kong, Sida Zhou, Cheng Cui, Yifei Leng, Bing Jiang, Hangyu Liu, Yanyi Shang, Shuyan Zhou, Tongshuang Wu, et al. Webcanvas: Benchmarking web agents in online environments.arXiv preprint arXiv:2406.12373,

work page arXiv
[51]

Ica: Information-aware credit assignment for visually grounded long-horizon information-seeking agents.arXiv preprint arXiv:2602.10863,

Cong Pang, Xuyu Feng, Yujie Yi, Zixuan Chen, Jiawei Hong, Tiankuo Yao, Nang Yuan, Jiapeng Luo, Lewei Lu, and Xin Lou. Ica: Information-aware credit assignment for visually grounded long-horizon information-seeking agents.arXiv preprint arXiv:2602.10863,

work page arXiv
[52]

Mapping natural language commandstowebelements

41 GUI Agents with Reinforcement Learning: Toward Digital Inhabitants Panupong Pasupat, Tian-Shun Jiang, Evan Liu, Kelvin Guu, and Percy Liang. Mapping natural language commandstowebelements. InProceedings of the 2018 conference on empirical methods in natural language processing, pp. 4970–4976,

2018
[53]

HiPER: Hierarchical Reinforcement Learning with Explicit Credit Assignment for Large Language Model Agents

Jiangweizhi Peng, Yuanxin Liu, Ruida Zhou, Charles Fleming, Zhaoran Wang, Alfredo Garcia, and Mingyi Hong. Hiper: Hierarchical reinforcement learning with explicit credit assignment for large language model agents.arXiv preprint arXiv:2602.16165,

work page internal anchor Pith review arXiv
[54]

Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning

Xue Bin Peng, Aviral Kumar, Grace Zhang, and Sergey Levine. Advantage-weighted regression: Simple and scalable off-policy reinforcement learning.arXiv preprint arXiv:1910.00177,

work page internal anchor Pith review arXiv 1910
[55]

Agent q: Advanced reasoning and learning for autonomous ai agents.arXiv preprint arXiv:2408.07199, 2024

Pranav Putta, Edmund Mills, Naman Garg, Sumeet Motwani, Chelsea Finn, Divyansh Garg, and Rafael Rafailov. Agent q: Advanced reasoning and learning for autonomous ai agents.arXiv preprint arXiv:2408.07199,

work page arXiv
[56]

UI-TARS: Pioneering Automated GUI Interaction with Native Agents

Yujia Qin, Yining Ye, Junjie Fang, Haoming Wang, Shihao Liang, Shizuo Tian, Junda Zhang, Jiahao Li, Yunxin Li, Shijue Huang, et al. Ui-tars: Pioneering automated gui interaction with native agents.arXiv preprint arXiv:2501.12326,

work page internal anchor Pith review arXiv
[57]

Scaling synthetic task generation for agents via exploration.arXiv preprint arXiv:2509.25047,

Ram Ramrakhya, Andrew Szot, Omar Attia, Yuhao Yang, Anh Nguyen, Bogdan Mazoure, Zhe Gan, Harsh Agrawal, and Alexander Toshev. Scaling synthetic task generation for agents via exploration.arXiv preprint arXiv:2509.25047,

work page arXiv
[58]

A Comprehensive Survey of Agents for Computer Use: Foundations, Challenges, and Future Directions

Pascal J Sager, Benjamin Meyer, Peng Yan, Rebekka von Wartburg-Kottler, Layan Etaiwi, Aref Enayati, Gabriel Nobel, Ahmed Abdulkadir, Benjamin F Grewe, and Thilo Stadelmann. A comprehensive survey of agents for computer use: Foundations, challenges, and future directions.arXiv preprint arXiv:2501.16150,

work page internal anchor Pith review Pith/arXiv arXiv
[59]

Proximal Policy Optimization Algorithms

John Schulman, Filip Wolski, Prafulla Dhariwal, Alec Radford, and Oleg Klimov. Proximal policy optimiza- tion algorithms.arXiv preprint arXiv:1707.06347,

work page internal anchor Pith review arXiv
[60]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

work page internal anchor Pith review arXiv
[61]

Falcon-ui: Understanding gui before following user instructions.arXiv preprint arXiv:2412.09362,

Huawen Shen, Chang Liu, Gengluo Li, Xinlong Wang, Yu Zhou, Can Ma, and Xiangyang Ji. Falcon-ui: Understanding gui before following user instructions.arXiv preprint arXiv:2412.09362,

work page arXiv
[62]

Experiential reinforcement learning.arXiv preprint arXiv:2602.13949, 2026

Taiwei Shi, Sihao Chen, Bowen Jiang, Linxin Song, Longqi Yang, and Jieyu Zhao. Experiential reinforcement learning.arXiv preprint arXiv:2602.13949,

work page arXiv
[63]

Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720, 2025

Yucheng Shi, Wenhao Yu, Zaitang Li, Yonglin Wang, Hongming Zhang, Ninghao Liu, Haitao Mi, and Dong Yu. Mobilegui-rl: Advancing mobile gui agent through reinforcement learning in online environment.arXiv preprint arXiv:2507.05720,

work page arXiv
[64]

Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism

Mohammad Shoeybi, Mostofa Patwary, Raul Puri, Patrick LeGresley, Jared Casper, and Bryan Catanzaro. Megatron-lm: Training multi-billion parameter language models using model parallelism.arXiv preprint arXiv:1909.08053,

work page internal anchor Pith review arXiv 1909
[65]

C. H. Song, Y. Song, P. Goyal, Y. Su, O. Riva, H. Palangi, and T. Pfister. Watch and learn: Learning to use computers from online videos.arXiv preprint arXiv:2510.04673, 2025a. Chan Hee Song, Yiwen Song, Palash Goyal, Yu Su, Oriana Riva, Hamid Palangi, and Tomas Pfister. Watch and learn: Learning to use computers from online videos.arXiv preprint arXiv:25...

work page arXiv 2025
[66]

Magnet: Towards adaptive gui agents with memory-driven knowledge evolution.arXiv preprint arXiv:2601.19199,

Libo Sun, Jiwen Zhang, Siyuan Wang, and Zhongyu Wei. Magnet: Towards adaptive gui agents with memory-driven knowledge evolution.arXiv preprint arXiv:2601.19199,

work page arXiv
[67]

F. Tang, Z. Gu, Z. Lu, X. Liu, S. Shen, C. Meng, W. Wang, W. Zhang, Y. Shen, W. Lu, J. Xiao, and Y. Zhuang. Gui-g 2: Gaussian reward modeling for gui grounding.arXiv preprint arXiv:2507.15846, 2025a. 43 GUI Agents with Reinforcement Learning: Toward Digital Inhabitants Fei Tang, Zhangxuan Gu, Zhengxi Lu, Xuyang Liu, Shuheng Shen, Changhua Meng, Wen Wang, ...

work page arXiv
[68]

LPO: Towards Accurate GUI Agent Interaction via Location Preference Optimization

Jiaqi Tang, Yu Xia, Yi-Feng Wu, Yuwei Hu, Yuhui Chen, Qing-Guo Chen, Xiaogang Xu, Xiangyu Wu, Hao Lu, Yanqing Ma, et al. Lpo: Towards accurate gui agent interaction via location preference optimization. arXiv preprint arXiv:2506.09373, 2025d. Liujian Tang, Shaokang Dong, Yijia Huang, Minqi Xiang, Hongtao Ruan, Bin Wang, Shuo Li, Zhiheng Xi, Zhihui Cao, Ha...

work page internal anchor Pith review Pith/arXiv arXiv
[69]

AgentProg: Empowering Long-Horizon GUI Agents with Program-Guided Context Management

Shizuo Tian, Hao Wen, Yuxuan Chen, Jiacheng Liu, Shanhui Zhao, Guohong Liu, Ju Ren, Yunxin Liu, and Yuanchun Li. Agentprog: Empowering long-horizon gui agents with program-guided context management. arXiv preprint arXiv:2512.10371,

work page internal anchor Pith review arXiv
[70]

Sagar Gubbi Venkatesh, Partha Talukdar, and Srini Narayanan

Daniel Toyama, Philippe Hamel, Anita Gergely, Gheorghe Comanici, Amelia Glaese, Zafarali Ahmed, Tyler Jackson, Shibl Mourad, and Doina Precup. Androidenv: A reinforcement learning platform for android. arXiv preprint arXiv:2105.13231,

work page arXiv
[71]

UI-TARS-2 Technical Report: Advancing GUI Agent with Multi-Turn Reinforcement Learning

Haoming Wang, Haoyang Zou, Huatong Song, Jiazhan Feng, Junjie Fang, Junting Lu, Longxiang Liu, Qinyu Luo, Shihao Liang, Shijue Huang, et al. Ui-tars-2 technical report: Advancing gui agent with multi-turn reinforcement learning.arXiv preprint arXiv:2509.02544, 2025a. Peng Wang, Shuai Bai, Sinan Tan, Shijie Wang, Zhihao Fan, Jinze Bai, Keqin Chen, Xuejing ...

work page internal anchor Pith review arXiv
[72]

Probabilistic subgoal representations for hierarchical reinforcement learning.arXiv preprint arXiv:2406.16707, 2024e

Vivienne Huiling Wang, Tinghuai Wang, Wenyan Yang, Joni-Kristian Kämäräinen, and Joni Pajari- nen. Probabilistic subgoal representations for hierarchical reinforcement learning.arXiv preprint arXiv:2406.16707, 2024e. Weiyun Wang, Zhangwei Gao, Lixin Gu, Hengjun Pu, Long Cui, Xingguang Wei, Zhaoyang Liu, Linglin Jing, Shenglong Ye, Jie Shao, et al. Internv...

work page arXiv 2025
[73]

Xinming Wei, Jiahao Zhang, Haoran Li, Jiayu Chen, Rui Qu, Maoliang Li, Xiang Chen, and Guojie Luo. Agent. xpu: Efficient scheduling of agentic llm workloads on heterogeneous soc.arXiv preprint arXiv:2506.24045, 2025a. Z. Wei, W. Yao, Y. Liu, W. Zhang, Q. Lu, L. Qiu, C. Yu, P. Xu, C. Zhang, B. Yin, H. Yun, and L. Li. Webagent-r1: Training web agents via en...

work page arXiv
[74]

Uisim: An interactive image-based ui simulator for dynamic mobile environments

Jiannan Xiang, Yun Zhu, Lei Shu, Maria Wang, Lijun Yu, Gabriel Barcik, James Lyon, Srinivas Sunkara, and Jindong Chen. Uisim: An interactive image-based ui simulator for dynamic mobile environments. arXiv preprint arXiv:2509.21733,

work page arXiv
[75]

Webworld: A large-scale world model for web agent training.arXiv preprint arXiv:2602.14721,

Zikai Xiao, Jianhong Tu, Chuhang Zou, Yuxin Zuo, Zhi Li, Peng Wang, Bowen Yu, Fei Huang, Junyang Lin, and Zuozhu Liu. Webworld: A large-scale world model for web agent training.arXiv preprint arXiv:2602.14721,

work page arXiv
[76]

TheAgentCompany: Benchmarking LLM Agents on Consequential Real World Tasks

Frank F Xu, Yufan Song, Boxuan Li, Yuxuan Tang, Kritanjali Jain, Mengxue Bao, Zora Z Wang, Xuhui Zhou, Zhitong Guo, Murong Cao, et al. Theagentcompany: benchmarking llm agents on consequential real world tasks.arXiv preprint arXiv:2412.14161, 2024a. Si Xu, Zixiao Huang, Yan Zeng, Shengen Yan, Xuefei Ning, Quanlu Zhang, Haolin Ye, Sipei Gu, Chunsheng Shui,...

work page internal anchor Pith review arXiv
[77]

arXiv preprint arXiv:2507.05791 (2025)

Yan Yang, Dongxu Li, Yutong Dai, Yuhao Yang, Ziyang Luo, Zirui Zhao, Zhiyuan Hu, Junzhe Huang, Amrita Saha, Zeyuan Chen, et al. Gta1: Gui test-time scaling agent.arXiv preprint arXiv:2507.05791, 2025b. Yuhao Yang, Yue Wang, Dongxu Li, Ziyang Luo, Bei Chen, Chao Huang, and Junnan Li. Aria-ui: Visual grounding for gui instructions. InFindings of the Associa...

work page arXiv 2025
[78]

Mobile-agent-v3: Fundamental agents for gui automation.arXiv preprint arXiv:2508.15144, 2025

Jiabo Ye, Xi Zhang, Haiyang Xu, Haowei Liu, Junyang Wang, Zhaoqing Zhu, Ziwei Zheng, Feiyu Gao, Junjie Cao, Zhengxi Lu, et al. Mobile-agent-v3: Fundamental agents for gui automation.arXiv preprint arXiv:2508.15144,

work page arXiv
[79]

Ovm, outcome-supervised value models for planning in math- ematical reasoning

Fei Yu, Anningzhe Gao, and Benyou Wang. Ovm, outcome-supervised value models for planning in math- ematical reasoning. InFindings of the Association for Computational Linguistics: NAACL 2024, pp. 858–875,

2024
[80]

MemSearcher: Training LLMs to Reason, Search and Manage Memory via End-to-End Reinforcement Learning

Qianhao Yuan, Jie Lou, Zichao Li, Jiawei Chen, Yaojie Lu, Hongyu Lin, Le Sun, Debing Zhang, and Xianpei Han. Memsearcher: Training llms to reason, search and manage memory via end-to-end reinforcement learning.arXiv preprint arXiv:2511.02805,

work page internal anchor Pith review arXiv

Showing first 80 references.