arxiv: 2401.05459 · v2 · submitted 2024-01-10 · 💻 cs.HC · cs.AI· cs.SE

Recognition: 2 theorem links

Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security

Yuanchun Li , Hao Wen , Weijun Wang , Xiangyu Li , Yizhen Yuan , Guohong Liu , Jiacheng Liu , Wenxing Xu

show 17 more authors

Xiang Wang Yi Sun Rui Kong Yile Wang Hanfei Geng Jian Luan Xuefeng Jin Zilong Ye Guanjing Xiong Fan Zhang Xiang Li Mengwei Xu Zhijun Li Peng Li Yang Liu Ya-Qin Zhang Yunxin Liu

Authors on Pith no claims yet

Pith reviewed 2026-05-17 00:51 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.SE

keywords Personal LLM AgentsIntelligent Personal AssistantsLarge Language ModelsAgent ArchitectureCapabilityEfficiencySecurityHuman-Computer Interaction

0 comments

The pith

Personal LLM Agents will become the dominant software paradigm for end-users through deep integration with personal data and devices.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that LLM-based agents tied to a user's own data and devices can overcome the limits of today's personal assistants by handling intent, planning, tool use, and data management autonomously. It maps out the main architecture choices, gathers expert views on what works, and reviews concrete obstacles plus proposed fixes in capability, efficiency, and security. A reader would care because this setup promises assistants that actually finish complex personal jobs instead of just answering simple queries. The authors treat the shift to this paradigm as the logical next stage after smartphones and IoT devices spread everywhere.

Core claim

Personal LLM Agents are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. The authors argue that these agents will become a major software paradigm for end-users. To support the vision they summarize key architecture components and design choices, present expert opinions on the topic, identify main challenges for making the agents intelligent, efficient and secure, and survey representative solutions that address those challenges.

What carries the argument

Personal LLM Agents, defined as LLM-based agents deeply integrated with personal data and personal devices for assistance.

If this is right

Agents gain the ability to understand user intent, plan tasks, use tools, and manage personal data at a level beyond current IPAs.
Computing and sensing devices become part of a unified personal assistance system rather than isolated tools.
Solutions for efficiency and security must be solved before the paradigm can scale to everyday use.
Expert consensus on architecture choices will guide how future agents combine LLMs with personal context.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Adoption at this scale would move most user software interaction from tapping apps to conversational, context-aware agents.
Privacy and data-control mechanisms would need to evolve beyond today's app permission models.
Developers might start designing new services specifically as tool interfaces for these agents rather than standalone apps.

Load-bearing premise

Large language models supply semantic understanding and reasoning strong enough for agents to solve complex personal problems without constant human direction.

What would settle it

Real-world tests showing that integrated LLM agents still cannot autonomously complete multi-step personal tasks at scale, even after addressing the surveyed challenges.

read the original abstract

Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing devices have become ubiquitous, greatly expanding the boundaries of IPAs. However, due to the lack of capabilities such as user intent understanding, task planning, tool using, and personal data management etc., existing IPAs still have limited practicality and scalability. Recently, the emergence of foundation models, represented by large language models (LLMs), brings new opportunities for the development of IPAs. With the powerful semantic understanding and reasoning capabilities, LLM can enable intelligent agents to solve complex problems autonomously. In this paper, we focus on Personal LLM Agents, which are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. We envision that Personal LLM Agents will become a major software paradigm for end-users in the upcoming era. To realize this vision, we take the first step to discuss several important questions about Personal LLM Agents, including their architecture, capability, efficiency and security. We start by summarizing the key components and design choices in the architecture of Personal LLM Agents, followed by an in-depth analysis of the opinions collected from domain experts. Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This is a useful survey that organizes architecture, expert views, and challenges for personal LLM agents, though it stays at compilation level without new results.

read the letter

This survey on Personal LLM Agents organizes the topic around architecture components, expert opinions, and challenges in capability, efficiency, and security. It pulls together existing literature to sketch how these agents might integrate with personal data and devices. The structure gives a clear starting point for thinking about the practical barriers to making them work well in real settings. The inclusion of domain expert views adds some grounded perspective that goes beyond just listing papers. The survey of representative solutions for the main challenges is the part that could save readers time when hunting for relevant prior work. The focus on personal integration sets it apart from more general agent surveys and highlights issues like data management that matter for end-user deployment. The soft spots are typical for this format. The abstract gives no details on paper selection criteria or how conflicting findings were reconciled, so the balance and depth of coverage remain unclear without the full text. The central vision that these agents will become a major paradigm rests on assumptions about LLM reasoning scaling reliably, which is still unproven. No new derivations or experiments appear here, so the contribution is organizational rather than technical. This paper is for researchers and developers in HCI or applied AI who want an overview of the personal agent space and pointers to existing solutions. A reader building or studying these systems would get the most value from the challenge breakdowns. It deserves a serious referee because the topic has deployment relevance and the structure could help shape an emerging area, even if revisions are needed to add more on methodology. I would send it to peer review.

Referee Report

2 major / 2 minor

Summary. The paper surveys Personal LLM Agents—LLM-based systems deeply integrated with personal data and devices for user assistance. It summarizes architecture components and design choices, analyzes opinions from domain experts, discusses key challenges in capability, efficiency, and security, and reviews representative solutions, while envisioning these agents as a future major software paradigm for end-users.

Significance. If the survey is comprehensive and balanced, the work could meaningfully advance HCI research on intelligent personal assistants by mapping LLM opportunities against practical constraints in efficiency and security. The inclusion of expert opinions provides a useful bridge between technical capabilities and user-centered design, though the forward-looking vision would gain strength from clearer grounding in adoption barriers.

major comments (2)

[Abstract and survey sections] Abstract and survey of representative solutions: the claim of a 'comprehensive survey' is not supported by any description of literature search strategy, inclusion/exclusion criteria, or total works reviewed; without these, it is impossible to assess whether the coverage of capability, efficiency, and security solutions is representative or omits important counter-examples.
[Expert opinions analysis] Analysis of opinions collected from domain experts: no details are given on expert selection process, number of participants, question protocol, or handling of dissenting views; this weakens the reliability of the insights used to support the central vision of Personal LLM Agents as a major paradigm.

minor comments (2)

[Abstract] The abstract uses 'etc.' when listing IPA limitations; replace with an explicit enumeration to improve precision.
[Introduction] Terminology for 'foundation models' and 'Personal LLM Agents' should be defined on first use for readers outside core LLM research.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback and positive overall assessment of our survey on Personal LLM Agents. We appreciate the opportunity to improve the transparency of our methodology and will revise the manuscript accordingly to strengthen these aspects.

read point-by-point responses

Referee: [Abstract and survey sections] Abstract and survey of representative solutions: the claim of a 'comprehensive survey' is not supported by any description of literature search strategy, inclusion/exclusion criteria, or total works reviewed; without these, it is impossible to assess whether the coverage of capability, efficiency, and security solutions is representative or omits important counter-examples.

Authors: We agree that explicitly documenting the literature search process would improve the rigor and reproducibility of the survey. The current manuscript presents a curated selection of representative solutions based on relevance to the core themes of capability, efficiency, and security, drawn from recent publications in top venues and repositories. In the revised version, we will add a dedicated 'Survey Methodology' subsection that describes the search strategy (including keywords such as 'personal LLM agents', 'LLM-based personal assistants', 'on-device LLMs', and 'LLM privacy/security'), databases queried (Google Scholar, arXiv, ACM Digital Library, IEEE Xplore), time period (primarily post-2022), inclusion criteria (focus on systems integrating personal data/devices with LLMs), and approximate scale of review (over 100 works screened, with ~40 representative solutions analyzed in depth). This addition will help readers evaluate coverage and identify any potential gaps. revision: yes
Referee: [Expert opinions analysis] Analysis of opinions collected from domain experts: no details are given on expert selection process, number of participants, question protocol, or handling of dissenting views; this weakens the reliability of the insights used to support the central vision of Personal LLM Agents as a major paradigm.

Authors: We acknowledge that additional details on the expert consultation process are needed to support the reliability of this analysis. The opinions were synthesized from discussions with domain experts in AI, HCI, and security. To address this, the revised manuscript will expand the relevant section to specify the selection process (experts with demonstrated expertise via publications or industry roles in LLM agents or personal computing), the number of participants, the question protocol (structured prompts on architecture challenges, efficiency trade-offs, and security risks), and how both supportive and dissenting perspectives were balanced in the synthesis to inform the vision of Personal LLM Agents as a future paradigm. revision: yes

Circularity Check

0 steps flagged

No significant circularity in survey and vision paper

full rationale

The paper is a survey and forward-looking vision piece that summarizes existing architectures, expert opinions, and solutions for Personal LLM Agents without presenting any mathematical derivations, equations, fitted parameters, or quantitative predictions. Its central claim is explicitly an envisioning statement rather than a result derived from prior definitions or self-citations. No load-bearing steps reduce to inputs by construction, and the LLM opportunity premise functions as motivation, not as a proven or self-referential result. The content is self-contained through literature review and discussion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central discussion rests on the premise that current IPAs lack key capabilities and that LLMs can supply them; no free parameters or new entities are introduced.

axioms (1)

domain assumption Large language models have powerful semantic understanding and reasoning capabilities that enable autonomous solving of complex problems.
Directly invoked in the abstract as the source of new opportunities for IPAs.

pith-pipeline@v0.9.0 · 5672 in / 1093 out tokens · 67493 ms · 2026-05-17T00:51:46.778382+00:00 · methodology

discussion (0)

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations
cs.CL 2026-05 conditional novelty 8.0

GroupMemBench shows leading LLM memory systems reach only 46% average accuracy on multi-party tasks, with a simple BM25 baseline matching or beating most of them.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs
cs.CL 2026-04 unverdicted novelty 7.0

TriEx is a tri-view explainability framework that reveals systematic mismatches between what multi-agent LLMs say, believe, and do in imperfect-information strategic games.
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models
cs.DC 2026-04 unverdicted novelty 7.0

InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.
ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems in the Wild
cs.AI 2025-12 conditional novelty 7.0

ProAgent uses on-demand tiered perception and context-aware LLM reasoning to deliver proactive assistance on AR glasses, achieving up to 27.7% higher prediction accuracy and 20.5% lower false detections than baselines.
Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. Screenshots
cs.HC 2026-04 unverdicted novelty 6.0

A new benchmark shows LLM smartphone agents achieve comparable success with screen text alone as with screenshots, but both fail often due to UI accessibility and reasoning gaps.
Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy
cs.OS 2026-04 unverdicted novelty 6.0

YoloFS is an agent-native filesystem that stages mutations for review, provides snapshots for agent self-correction, and uses progressive permissions to reduce user interruptions while matching baseline task success.
Chain-of-Authorization: Embedding authorization into large language models
cs.AI 2026-03 unverdicted novelty 6.0

LLMs fine-tuned to output authorization trajectories as a prerequisite for responses achieve high rejection rates for unauthorized prompts while preserving utility in allowed scenarios.
Can LLM Agents Simulate Dynamic Networks? A Case Study on Email Networks with Phishing Synthesis
cs.SI 2026-03 unverdicted novelty 6.0

LLM multi-agent systems augmented with data-driven event triggers and Hawkes processes simulate both micro-level interactions and macroscopic topologies in dynamic email networks for realistic phishing synthesis.
HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling
cs.AI 2026-02 unverdicted novelty 6.0

HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower...
Agentic AI for Trip Planning Optimization Application
cs.AI 2026-04 unverdicted novelty 5.0

An orchestrated multi-agent AI framework for trip planning optimization paired with a new ground-truth dataset achieves 77.4% accuracy on the TOP Benchmark, outperforming single-agent and workflow baselines.
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
cs.CL 2026-04 unverdicted novelty 5.0

A persona-based simulation framework applied to Replika shows narrow emotional responses and frequent normalization of unsafe content such as self-harm and violent fantasies across 1674 dialogues with mental health personas.
Agentic Insight Generation in VSM Simulations
cs.CL 2026-04 unverdicted novelty 5.0

A two-step agentic system for extracting insights from VSM simulations achieves up to 86% accuracy with top LLMs by using progressive data discovery and slim context.
Does a Global Perspective Help Prune Sparse MoEs Elegantly?
cs.CL 2026-04 unverdicted novelty 5.0

GRAPE is a global redundancy-aware pruning strategy for sparse MoEs that dynamically allocates pruning budgets across layers and improves average accuracy by 1.40% over the best local baseline across tested models and...
PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents
cs.HC 2026-04 unverdicted novelty 4.0

PSI uses a shared personal-context bus to publish state and write-back affordances, turning isolated AI-generated modules into synchronized, chat-accessible instruments.
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
cs.AI 2025-04 accept novelty 4.0

A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
LLM Harms: A Taxonomy and Discussion
cs.CY 2025-12 unverdicted novelty 3.0

This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.
A Survey on the Memory Mechanism of Large Language Model based Agents
cs.AI 2024-04 accept novelty 3.0

A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.

Reference graph

Works this paper leans on

299 extracted references · 299 canonical work pages · cited by 18 Pith papers · 23 internal anchors

[1]

Apple. Siri. https://www.apple.com/siri/, 2023. [Online; accessed December 26, 2023]

work page 2023
[2]

Google assistant for android

Google. Google assistant for android. https://developer.android.com/guide/app-actions/ overview, 2023. [Online; accessed December 24, 2023]

work page 2023
[3]

Amazon. Alexa. https://www.alexa.com, 2023. [Online; accessed December 26, 2023]

work page 2023
[4]

Mapping natural language instructions to mobile ui action sequences, 2020

Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. Mapping natural language instructions to mobile ui action sequences, 2020

work page 2020
[5]

Glider: A reinforcement learning approach to extract ui scripts from websites

Yuanchun Li and Oriana Riva. Glider: A reinforcement learning approach to extract ui scripts from websites. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 1420–1430, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450380379. doi: 10.1145/3404835.3462905

work page doi:10.1145/3404835.3462905 2021
[6]

Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration

Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, and Percy Liang. Reinforcement learning on web interfaces using workflow-guided exploration. ArXiv, abs/1802.08802, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

A survey of large language models, 2023

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language models, 2023

work page 2023
[8]

Ibm shoebox

IBM. Ibm shoebox. https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7. html, 2023. [Online; accessed December 26, 2023]

work page 2023
[9]

The harpy speech recognition system: performance with large vocabularies

Bruce Lowerre and R Reddy. The harpy speech recognition system: performance with large vocabularies. The Journal of the Acoustical Society of America, 60(S1):S10–S11, 1976

work page 1976
[10]

Helene Cerf-Danon, Steven DeGennaro, Marco Ferretti, Jorge Gonzalez, and Eric Keppel. 1. 0 TANGORA - a large vocabulary speech recognition system for five languages. In Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), pages 183–192, 1991. doi: 10.21437/Eurospeech.1991-44

work page doi:10.21437/eurospeech.1991-44 1991
[11]

Rabiner and B

L. Rabiner and B. Juang. An introduction to hidden markov models. IEEE ASSP Magazine, 3(1):4–16, 1986. doi: 10.1109/MASSP.1986.1165342. 38

work page doi:10.1109/massp.1986.1165342 1986
[12]

Bamberg, Yen lu Chow, Larry Gillick, Robert Roth, and Dean G

Paul G. Bamberg, Yen lu Chow, Larry Gillick, Robert Roth, and Dean G. Sturtevant. The dragon continuous speech recognition system: A real-time implementation. In Human Language Technology - The Baltic Perspectiv, 1990

work page 1990
[13]

Speakable items

Wikipedia. Speakable items. https://en.wikipedia.org/wiki/Speakable_items, 2023. [Online; ac- cessed January 5, 2023]

work page 2023
[14]

Medspeak: Report creation with continuous speech recognition

Jennifer Lai and John Vergo. Medspeak: Report creation with continuous speech recognition. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, CHI ’97, page 431–438, New York, NY , USA, 1997. Association for Computing Machinery. ISBN 0897918029. doi: 10.1145/258549.258829

work page doi:10.1145/258549.258829 1997
[15]

Speech transcript - jim allchin, winhec 2002

Microsoft. Speech transcript - jim allchin, winhec 2002. https://news.microsoft.com/speeches/ speech-transcript-jim-allchin-winhec-2002/ , 2002. [Online; accessed January 5, 2023]

work page 2002
[16]

Google is taking questions (spoken, via iphone)

John Markoff. Google is taking questions (spoken, via iphone). https://www.nytimes.com/2008/11/14/ technology/internet/14voice.html, 2008. [Online; accessed January 5, 2024]

work page 2008
[17]

Microsoft. Cortana. https://www.microsoft.com/en-us/cortana, 2023. [Online; accessed December 26, 2023]

work page 2023
[18]

Introduce chatgpt

OpenAI. Introduce chatgpt. https://openai.com/blog/chatgpt, 2022. [Online; accessed November 28, 2023]

work page 2022
[19]

Announcing microsoft copilot, your everyday ai companion

Microsoft. Announcing microsoft copilot, your everyday ai companion. https://blogs.microsoft.com/ blog/2023/09/21/announcing-microsoft-copilot-your-everyday-ai-companion/ , 2023. [Online; accessed December 4, 2023]

work page 2023
[20]

Sirikit: Empower users to interact with their devices through voice, intelligent suggestions, and personalized workflows

Apple. Sirikit: Empower users to interact with their devices through voice, intelligent suggestions, and personalized workflows. https://developer.apple.com/documentation/sirikit/, 2023. [Online; accessed December 24, 2023]

work page 2023
[21]

Shortcuts user guide

Apple. Shortcuts user guide. https://support.apple.com/en-hk/guide/shortcuts/welcome/ios,

work page
[22]

[Online; accessed December 24, 2023]

work page 2023
[23]

Tasker: Total automation for android

Joaoapps. Tasker: Total automation for android. https://tasker.joaoapps.com, 2023. [Online; accessed December 24, 2023]

work page 2023
[24]

Anywhere shortcuts

Absinthe. Anywhere shortcuts. https://play.google.com/store/apps/details?id=com.absinthe. anywhere_&hl=en_US&pli=1, 2023. [Online; accessed December 24, 2023]

work page 2023
[25]

Programming iot devices by demonstration using mobile apps

Toby Jia-Jun Li, Yuanchun Li, Fanglin Chen, and Brad A Myers. Programming iot devices by demonstration using mobile apps. In End-User Development: 6th International Symposium, IS-EUD 2017, Eindhoven, The Netherlands, June 13-15, 2017, Proceedings 6, pages 3–17. Springer, 2017

work page 2017
[26]

Ulink: Enabling user-defined deep linking to app content

Tanzirul Azim, Oriana Riva, and Suman Nath. Ulink: Enabling user-defined deep linking to app content. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services , MobiSys ’16, page 305–318, New York, NY , USA, 2016. Association for Computing Machinery. ISBN 9781450342698. doi: 10.1145/2906388.2906416

work page doi:10.1145/2906388.2906416 2016
[27]

what can i help you with?

Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. "what can i help you with?": Infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI ’17, ...

work page doi:10.1145/3098279.3098539 2017
[28]

A mixed-methods approach to understanding user trust after voice assistant failures

Amanda Baughan, Xuezhi Wang, Ariel Liu, Allison Mercurio, Jilin Chen, and Xiao Ma. A mixed-methods approach to understanding user trust after voice assistant failures. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9781450394215. doi: 10.1145/354...

work page doi:10.1145/3544548.3581152 2023
[29]

like having a really bad pa

Ewa Luger and Abigail Sellen. "like having a really bad pa": The gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16, page 5286–5297, New York, NY , USA, 2016. Association for Computing Machinery. ISBN 9781450333627. doi: 10.1145/2858036.2858288. 39

work page doi:10.1145/2858036.2858288 2016
[30]

Matthew B. Hoy. Alexa, siri, cortana, and more: An introduction to voice assistants. Medical Reference Services Quarterly, 37(1):81–88, 2018. doi: 10.1080/02763869.2018.1404391. PMID: 29327988

work page doi:10.1080/02763869.2018.1404391 2018
[31]

Humanoid: A deep learning-based approach to automated black-box android app testing

Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. Humanoid: A deep learning-based approach to automated black-box android app testing. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1070–1073. IEEE, 2019

work page 2019
[32]

Gomez, Łukasz Kaiser, and Illia Polosukhin

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964

work page 2017
[33]

Lee, and Jindong Chen

Zecheng He, Srinivas Sunkara, Xiaoxue Zang, Ying Xu, Lijuan Liu, Nevan Wichers, Gabriel Schubiner, Ruby B. Lee, and Jindong Chen. Actionbert: Leveraging user actions for semantic understanding of user interfaces. In AAAI Conference on Artificial Intelligence, 2020

work page 2020
[34]

Under- standing mobile gui: from pixel-words to screen-sentences

Jingwen Fu, Xiaoyi Zhang, Yuwang Wang, Wenjun Zeng, Sam Yang, and Grayson Hilliard. Under- standing mobile gui: from pixel-words to screen-sentences. ArXiv, abs/2105.11941, 2021. URL https: //api.semanticscholar.org/CorpusID:235187035

work page arXiv 2021
[35]

Gritsenko

Yang Li, Gang Li, Xin Zhou, Mostafa Dehghani, and Alexey A. Gritsenko. Vut: Versatile ui transformer for multi-modal multi-task user interface modeling. ArXiv, abs/2112.05692, 2021

work page arXiv 2021
[36]

Uibert: Learning generic multimodal representations for ui understanding

Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, and Blaise Agüera y Arcas. Uibert: Learning generic multimodal representations for ui understanding. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 1705–1712. International Joint Conferences on Artificial Intel...

work page doi:10.24963/ijcai.2021/235 2021
[37]

Spotlight: Mobile ui understanding using vision-language models with a focus

Gang Li and Yang Li. Spotlight: Mobile ui understanding using vision-language models with a focus. ArXiv, abs/2209.14927, 2022

work page arXiv 2022
[38]

Lexi: Self-supervised learning of the ui language

Pratyay Banerjee, Shweti Mahajan, Kushal Arora, Chitta Baral, and Oriana Riva. Lexi: Self-supervised learning of the ui language. ArXiv, abs/2301.10165, 2023

work page arXiv 2023
[39]

Uinav: A maker of ui automation agents

Wei Li, Fu-Lin Hsu, Will Bishop, Folawiyo Campbell-Ajala, Oriana Riva, and Max Lin. Uinav: A maker of ui automation agents. arXiv preprint arXiv:2312.10170, 2023

work page arXiv 2023
[40]

World of bits: An open-domain platform for web-based agents

Tianlin Tim Shi, Andrej Karpathy, Linxi Jim Fan, Jonathan Hernandez, and Percy Liang. World of bits: An open-domain platform for web-based agents. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 3135–3144. JMLR.org, 2017

work page 2017
[41]

Learning to Navigate the Web

Izzeddin Gur, Ulrich Rückert, Aleksandra Faust, and Dilek Z. Hakkani-Tür. Learning to navigate the web. ArXiv, abs/1812.09195, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[42]

DOM-Q-NET: Grounded RL on Structured Language

Sheng Jia, Jamie Ryan Kiros, and Jimmy Ba. Dom-q-net: Grounded rl on structured language. ArXiv, abs/1902.07257, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1902
[43]

A data-driven approach for learning to control computers

Peter C Humphreys, David Raposo, Tobias Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Adam Santoro, and Timothy Lillicrap. A data-driven approach for learning to control computers. In International Conference on Machine Learning, pages 9466–9482. PMLR, 2022

work page 2022
[44]

Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020

work page 2020
[45]

Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback, 2022

work page 2022
[46]

Brown, Miljan Martic, Shane Legg, and Dario Amodei

Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences, 2023

work page 2023
[47]

Toolformer: Language models can teach themselves to use tools, 2023

Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools, 2023. 40

work page 2023
[48]

Webgpt: Browser-assisted question- answering with human feedback, 2022

Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. Webgpt: Browser-assisted question- answering with human feedback, 2022

work page 2022
[49]

Multimodal web navigation with instruction-finetuned foundation models

Hiroki Furuta, Ofir Nachum, Kuang-Huei Lee, Yutaka Matsuo, Shixiang Shane Gu, and Izzeddin Gur. Multimodal web navigation with instruction-finetuned foundation models. ArXiv, abs/2305.11854, 2023

work page arXiv 2023
[50]

Progprompt: Generating situated robot task plans using large language models

Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530. IEEE, 2023

work page 2023
[51]

Robot task planning based on large language model representing knowledge with directed graph structures, 2023

Yue Zhen, Sheng Bi, Lu Xing-tong, Pan Wei-qin, Shi Hai-peng, Chen Zi-rui, and Fang Yi-shu. Robot task planning based on large language model representing knowledge with directed graph structures, 2023

work page 2023
[52]

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022

work page 2022
[53]

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, 2023

Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, 2023

work page 2023
[54]

Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning

Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, and Hongsheng Li. Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. ArXiv, abs/2310.03731, 2023

work page arXiv 2023
[55]

Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, I. Evtimov, Joanna Bitton, Manish P Bhatt, Cristian Cantón Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre D’efossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Sci...

work page internal anchor Pith review Pith/arXiv arXiv 2023
[56]

Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification, 2023

Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, and Hongsheng Li. Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification, 2023

work page 2023
[57]

Gpt-4 technical report, 2023

OpenAI. Gpt-4 technical report, 2023

work page 2023
[58]

Reinventing search with a new ai-powered microsoft bing and edge, your copilot for the web

Microsoft. Reinventing search with a new ai-powered microsoft bing and edge, your copilot for the web. https://blogs.microsoft.com/blog/2023/02/07/ reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/ ,

work page 2023
[59]

[Online; accessed December 8, 2023]

work page 2023
[60]

Bard: A conversational ai tool by google

Google. Bard: A conversational ai tool by google. https://bard.google.com, 2023. [Online; accessed December 26, 2023]

work page 2023
[61]

Introducing gemini: our largest and most capable ai model

Google. Introducing gemini: our largest and most capable ai model. https://blog.google/technology/ ai/google-gemini-ai/, 2023. [Online; accessed December 26, 2023]

work page 2023
[62]

Reshaping industries with ai: Huawei cloud launches pangu models 3.0 and ascend ai cloud ser- vices

Huawei. Reshaping industries with ai: Huawei cloud launches pangu models 3.0 and ascend ai cloud ser- vices. https://www.huaweicloud.com/intl/en-us/news/20230707180809498.html, 2023. [Online; accessed November 28, 2023]

work page arXiv 2023
[63]

XiaoMi. Milm-6b. https://github.com/XiaoMi/MiLM-6B, 2023. [Online; accessed December 24, 2023]

work page 2023
[64]

The linux operating system

Sayed Naem Bokhari. The linux operating system. Computer, 28(8):74–79, 1995

work page 1995
[65]

Borda count

Wikipedia. Borda count. https://en.wikipedia.org/wiki/Borda_count, 2023. [Online; accessed December 13, 2023]

work page 2023
[66]

Recent advances in end-to-end automatic speech recognition, 2022

Jinyu Li. Recent advances in end-to-end automatic speech recognition, 2022

work page 2022
[67]

Sainath, Ralf Schlüter, and Shinji Watanabe

Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, and Shinji Watanabe. End-to-end speech recognition: A survey, 2023. 41

work page 2023
[68]

A survey on large language model based autonomous agents, 2023

Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents, 2023

work page 2023
[69]

The rise and potential of large language model based agents: A survey, 2023

Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, a...

work page 2023
[70]

Igniting language intelligence: The hitchhiker’s guide from chain-of- thought reasoning to language agents, 2023

Zhuosheng Zhang, Yao Yao, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, and Hai Zhao. Igniting language intelligence: The hitchhiker’s guide from chain-of- thought reasoning to language agents, 2023

work page 2023
[71]

Williams

Steve Young, Milica Gaši´c, Blaise Thomson, and Jason D. Williams. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160–1179, 2013. doi: 10.1109/JPROC.2012.2225812

work page doi:10.1109/jproc.2012.2225812 2013
[72]

doi: 10.18653/v1/ 2021.acl-srw.23

Abhinav Rastogi, Raghav Gupta, and Dilek Hakkani-Tur. Multi-task learning for joint language understanding and dialogue state tracking. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pages 376–384, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/ W18-5045

work page doi:10.18653/v1/ 2018
[73]

Kite: Building conversational bots from mobile apps

Toby Jia-Jun Li and Oriana Riva. Kite: Building conversational bots from mobile apps. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’18, page 96–109, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 9781450357203. doi: 10.1145/3210240.3210339

work page doi:10.1145/3210240.3210339 2018
[74]

Toby Jia-Jun Li, Amos Azaria, and Brad A. Myers. Sugilite: Creating multimodal smartphone automation by demonstration. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, page 6038–6049, New York, NY , USA, 2017. Association for Computing Machinery. ISBN 9781450346559. doi: 10.1145/3025453.3025483

work page doi:10.1145/3025453.3025483 2017
[75]

Can current task-oriented dialogue models automate real-world scenarios in the wild?, 2023

Sang-Woo Lee, Sungdong Kim, Donghyeon Ko, Donghoon Ham, Youngki Hong, Shin Ah Oh, Hyunhoon Jung, Wangkyo Jung, Kyunghyun Cho, Donghyun Kwak, Hyungsuk Noh, and Woomyoung Park. Can current task-oriented dialogue models automate real-world scenarios in the wild?, 2023

work page 2023
[76]

Instructtods: Large language models for end-to-end task-oriented dialogue systems, 2023

Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, and Pascale Fung. Instructtods: Large language models for end-to-end task-oriented dialogue systems, 2023

work page 2023
[77]

Enhancing large language model induced task-oriented dialogue systems through look-forward motivated goals, 2023

Zhiyuan Hu, Yue Feng, Yang Deng, Zekun Li, See-Kiong Ng, Anh Tuan Luu, and Bryan Hooi. Enhancing large language model induced task-oriented dialogue systems through look-forward motivated goals, 2023

work page 2023
[78]

Are llms all you need for task-oriented dialogue?, 2023

V ojtˇech Hudeˇcek and Ondˇrej Dušek. Are llms all you need for task-oriented dialogue?, 2023

work page 2023
[79]

Unlocking the potential of user feedback: Leveraging large language model as user simulators to enhance dialogue system

Zhiyuan Hu, Yue Feng, Anh Tuan Luu, Bryan Hooi, and Aldo Lipani. Unlocking the potential of user feedback: Leveraging large language model as user simulators to enhance dialogue system. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, page 3953–3957, New York, NY , USA, 2023. Association for Computi...

work page doi:10.1145/3583780.3615220 2023
[80]

Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...

work page 2020

Showing first 80 references.