Recognition: 2 theorem links
Personal LLM Agents: Insights and Survey about the Capability, Efficiency and Security
Pith reviewed 2026-05-17 00:51 UTC · model grok-4.3
The pith
Personal LLM Agents will become the dominant software paradigm for end-users through deep integration with personal data and devices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Personal LLM Agents are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. The authors argue that these agents will become a major software paradigm for end-users. To support the vision they summarize key architecture components and design choices, present expert opinions on the topic, identify main challenges for making the agents intelligent, efficient and secure, and survey representative solutions that address those challenges.
What carries the argument
Personal LLM Agents, defined as LLM-based agents deeply integrated with personal data and personal devices for assistance.
If this is right
- Agents gain the ability to understand user intent, plan tasks, use tools, and manage personal data at a level beyond current IPAs.
- Computing and sensing devices become part of a unified personal assistance system rather than isolated tools.
- Solutions for efficiency and security must be solved before the paradigm can scale to everyday use.
- Expert consensus on architecture choices will guide how future agents combine LLMs with personal context.
Where Pith is reading between the lines
- Adoption at this scale would move most user software interaction from tapping apps to conversational, context-aware agents.
- Privacy and data-control mechanisms would need to evolve beyond today's app permission models.
- Developers might start designing new services specifically as tool interfaces for these agents rather than standalone apps.
Load-bearing premise
Large language models supply semantic understanding and reasoning strong enough for agents to solve complex personal problems without constant human direction.
What would settle it
Real-world tests showing that integrated LLM agents still cannot autonomously complete multi-step personal tasks at scale, even after addressing the surveyed challenges.
read the original abstract
Since the advent of personal computing devices, intelligent personal assistants (IPAs) have been one of the key technologies that researchers and engineers have focused on, aiming to help users efficiently obtain information and execute tasks, and provide users with more intelligent, convenient, and rich interaction experiences. With the development of smartphones and IoT, computing and sensing devices have become ubiquitous, greatly expanding the boundaries of IPAs. However, due to the lack of capabilities such as user intent understanding, task planning, tool using, and personal data management etc., existing IPAs still have limited practicality and scalability. Recently, the emergence of foundation models, represented by large language models (LLMs), brings new opportunities for the development of IPAs. With the powerful semantic understanding and reasoning capabilities, LLM can enable intelligent agents to solve complex problems autonomously. In this paper, we focus on Personal LLM Agents, which are LLM-based agents that are deeply integrated with personal data and personal devices and used for personal assistance. We envision that Personal LLM Agents will become a major software paradigm for end-users in the upcoming era. To realize this vision, we take the first step to discuss several important questions about Personal LLM Agents, including their architecture, capability, efficiency and security. We start by summarizing the key components and design choices in the architecture of Personal LLM Agents, followed by an in-depth analysis of the opinions collected from domain experts. Next, we discuss several key challenges to achieve intelligent, efficient and secure Personal LLM Agents, followed by a comprehensive survey of representative solutions to address these challenges.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper surveys Personal LLM Agents—LLM-based systems deeply integrated with personal data and devices for user assistance. It summarizes architecture components and design choices, analyzes opinions from domain experts, discusses key challenges in capability, efficiency, and security, and reviews representative solutions, while envisioning these agents as a future major software paradigm for end-users.
Significance. If the survey is comprehensive and balanced, the work could meaningfully advance HCI research on intelligent personal assistants by mapping LLM opportunities against practical constraints in efficiency and security. The inclusion of expert opinions provides a useful bridge between technical capabilities and user-centered design, though the forward-looking vision would gain strength from clearer grounding in adoption barriers.
major comments (2)
- [Abstract and survey sections] Abstract and survey of representative solutions: the claim of a 'comprehensive survey' is not supported by any description of literature search strategy, inclusion/exclusion criteria, or total works reviewed; without these, it is impossible to assess whether the coverage of capability, efficiency, and security solutions is representative or omits important counter-examples.
- [Expert opinions analysis] Analysis of opinions collected from domain experts: no details are given on expert selection process, number of participants, question protocol, or handling of dissenting views; this weakens the reliability of the insights used to support the central vision of Personal LLM Agents as a major paradigm.
minor comments (2)
- [Abstract] The abstract uses 'etc.' when listing IPA limitations; replace with an explicit enumeration to improve precision.
- [Introduction] Terminology for 'foundation models' and 'Personal LLM Agents' should be defined on first use for readers outside core LLM research.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive overall assessment of our survey on Personal LLM Agents. We appreciate the opportunity to improve the transparency of our methodology and will revise the manuscript accordingly to strengthen these aspects.
read point-by-point responses
-
Referee: [Abstract and survey sections] Abstract and survey of representative solutions: the claim of a 'comprehensive survey' is not supported by any description of literature search strategy, inclusion/exclusion criteria, or total works reviewed; without these, it is impossible to assess whether the coverage of capability, efficiency, and security solutions is representative or omits important counter-examples.
Authors: We agree that explicitly documenting the literature search process would improve the rigor and reproducibility of the survey. The current manuscript presents a curated selection of representative solutions based on relevance to the core themes of capability, efficiency, and security, drawn from recent publications in top venues and repositories. In the revised version, we will add a dedicated 'Survey Methodology' subsection that describes the search strategy (including keywords such as 'personal LLM agents', 'LLM-based personal assistants', 'on-device LLMs', and 'LLM privacy/security'), databases queried (Google Scholar, arXiv, ACM Digital Library, IEEE Xplore), time period (primarily post-2022), inclusion criteria (focus on systems integrating personal data/devices with LLMs), and approximate scale of review (over 100 works screened, with ~40 representative solutions analyzed in depth). This addition will help readers evaluate coverage and identify any potential gaps. revision: yes
-
Referee: [Expert opinions analysis] Analysis of opinions collected from domain experts: no details are given on expert selection process, number of participants, question protocol, or handling of dissenting views; this weakens the reliability of the insights used to support the central vision of Personal LLM Agents as a major paradigm.
Authors: We acknowledge that additional details on the expert consultation process are needed to support the reliability of this analysis. The opinions were synthesized from discussions with domain experts in AI, HCI, and security. To address this, the revised manuscript will expand the relevant section to specify the selection process (experts with demonstrated expertise via publications or industry roles in LLM agents or personal computing), the number of participants, the question protocol (structured prompts on architecture challenges, efficiency trade-offs, and security risks), and how both supportive and dissenting perspectives were balanced in the synthesis to inform the vision of Personal LLM Agents as a future paradigm. revision: yes
Circularity Check
No significant circularity in survey and vision paper
full rationale
The paper is a survey and forward-looking vision piece that summarizes existing architectures, expert opinions, and solutions for Personal LLM Agents without presenting any mathematical derivations, equations, fitted parameters, or quantitative predictions. Its central claim is explicitly an envisioning statement rather than a result derived from prior definitions or self-citations. No load-bearing steps reduce to inputs by construction, and the LLM opportunity premise functions as motivation, not as a proven or self-referential result. The content is self-contained through literature review and discussion.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Large language models have powerful semantic understanding and reasoning capabilities that enable autonomous solving of complex problems.
Forward citations
Cited by 18 Pith papers
-
GroupMemBench: Benchmarking LLM Agent Memory in Multi-Party Conversations
GroupMemBench shows leading LLM memory systems reach only 46% average accuracy on multi-party tasks, with a simple BM25 baseline matching or beating most of them.
-
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
-
TriEx: A Game-based Tri-View Framework for Explaining Internal Reasoning in Multi-Agent LLMs
TriEx is a tri-view explainability framework that reveals systematic mismatches between what multi-agent LLMs say, believe, and do in imperfect-information strategic games.
-
InfiniLoRA: Disaggregated Multi-LoRA Serving for Large Language Models
InfiniLoRA decouples LoRA execution from base-model inference and reports 3.05x higher request throughput plus 54% more adapters meeting strict latency SLOs.
-
ProAgent: Harnessing On-Demand Sensory Contexts for Proactive LLM Agent Systems in the Wild
ProAgent uses on-demand tiered perception and context-aware LLM reasoning to deliver proactive assistance on AR glasses, achieving up to 27.7% higher prediction accuracy and 20.5% lower false detections than baselines.
-
Do LLMs Need to See Everything? A Benchmark and Study of Failures in LLM-driven Smartphone Automation using Screentext vs. Screenshots
A new benchmark shows LLM smartphone agents achieve comparable success with screen text alone as with screenshots, but both fail often due to UI accessibility and reasoning gaps.
-
Don't Let AI Agents YOLO Your Files: Shifting Information and Control to Filesystems for Agent Safety and Autonomy
YoloFS is an agent-native filesystem that stages mutations for review, provides snapshots for agent self-correction, and uses progressive permissions to reduce user interruptions while matching baseline task success.
-
Chain-of-Authorization: Embedding authorization into large language models
LLMs fine-tuned to output authorization trajectories as a prerequisite for responses achieve high rejection rates for unauthorized prompts while preserving utility in allowed scenarios.
-
Can LLM Agents Simulate Dynamic Networks? A Case Study on Email Networks with Phishing Synthesis
LLM multi-agent systems augmented with data-driven event triggers and Hawkes processes simulate both micro-level interactions and macroscopic topologies in dynamic email networks for realistic phishing synthesis.
-
HyMem: Hybrid Memory Architecture with Dynamic Retrieval Scheduling
HyMem introduces dual-granular memory storage with a lightweight summary module for fast responses and selective activation of a deep LLM module for complex queries, outperforming full-context baselines by 92.6% lower...
-
Agentic AI for Trip Planning Optimization Application
An orchestrated multi-agent AI framework for trip planning optimization paired with a new ground-truth dataset achieves 77.4% accuracy on the TOP Benchmark, outperforming single-agent and workflow baselines.
-
Persona-Grounded Safety Evaluation of AI Companions in Multi-Turn Conversations
A persona-based simulation framework applied to Replika shows narrow emotional responses and frequent normalization of unsafe content such as self-harm and violent fantasies across 1674 dialogues with mental health personas.
-
Agentic Insight Generation in VSM Simulations
A two-step agentic system for extracting insights from VSM simulations achieves up to 86% accuracy with top LLMs by using progressive data discovery and slim context.
-
Does a Global Perspective Help Prune Sparse MoEs Elegantly?
GRAPE is a global redundancy-aware pruning strategy for sparse MoEs that dynamically allocates pruning budgets across layers and improves average accuracy by 1.40% over the best local baseline across tested models and...
-
PSI: Shared State as the Missing Layer for Coherent AI-Generated Instruments in Personal AI Agents
PSI uses a shared personal-context bus to publish state and write-back affordances, turning isolated AI-generated modules into synchronized, chat-accessible instruments.
-
From LLM Reasoning to Autonomous AI Agents: A Comprehensive Review
A survey consolidating benchmarks, agent frameworks, real-world applications, and protocols for LLM-based autonomous agents into a proposed taxonomy with recommendations for future research.
-
LLM Harms: A Taxonomy and Discussion
This paper proposes a taxonomy of LLM harms in five categories and suggests mitigation strategies plus a dynamic auditing system for responsible development.
-
A Survey on the Memory Mechanism of Large Language Model based Agents
A systematic review of memory designs, evaluation methods, applications, limitations, and future directions for LLM-based agents.
Reference graph
Works this paper leans on
-
[1]
Apple. Siri. https://www.apple.com/siri/, 2023. [Online; accessed December 26, 2023]
work page 2023
-
[2]
Google. Google assistant for android. https://developer.android.com/guide/app-actions/ overview, 2023. [Online; accessed December 24, 2023]
work page 2023
-
[3]
Amazon. Alexa. https://www.alexa.com, 2023. [Online; accessed December 26, 2023]
work page 2023
-
[4]
Mapping natural language instructions to mobile ui action sequences, 2020
Yang Li, Jiacong He, Xin Zhou, Yuan Zhang, and Jason Baldridge. Mapping natural language instructions to mobile ui action sequences, 2020
work page 2020
-
[5]
Glider: A reinforcement learning approach to extract ui scripts from websites
Yuanchun Li and Oriana Riva. Glider: A reinforcement learning approach to extract ui scripts from websites. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, page 1420–1430, New York, NY , USA, 2021. Association for Computing Machinery. ISBN 9781450380379. doi: 10.1145/3404835.3462905
-
[6]
Reinforcement Learning on Web Interfaces Using Workflow-Guided Exploration
Evan Zheran Liu, Kelvin Guu, Panupong Pasupat, Tianlin Shi, and Percy Liang. Reinforcement learning on web interfaces using workflow-guided exploration. ArXiv, abs/1802.08802, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
A survey of large language models, 2023
Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. A survey of large language models, 2023
work page 2023
-
[8]
IBM. Ibm shoebox. https://www.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7. html, 2023. [Online; accessed December 26, 2023]
work page 2023
-
[9]
The harpy speech recognition system: performance with large vocabularies
Bruce Lowerre and R Reddy. The harpy speech recognition system: performance with large vocabularies. The Journal of the Acoustical Society of America, 60(S1):S10–S11, 1976
work page 1976
-
[10]
Helene Cerf-Danon, Steven DeGennaro, Marco Ferretti, Jorge Gonzalez, and Eric Keppel. 1. 0 TANGORA - a large vocabulary speech recognition system for five languages. In Proc. 2nd European Conference on Speech Communication and Technology (Eurospeech 1991), pages 183–192, 1991. doi: 10.21437/Eurospeech.1991-44
-
[11]
L. Rabiner and B. Juang. An introduction to hidden markov models. IEEE ASSP Magazine, 3(1):4–16, 1986. doi: 10.1109/MASSP.1986.1165342. 38
-
[12]
Bamberg, Yen lu Chow, Larry Gillick, Robert Roth, and Dean G
Paul G. Bamberg, Yen lu Chow, Larry Gillick, Robert Roth, and Dean G. Sturtevant. The dragon continuous speech recognition system: A real-time implementation. In Human Language Technology - The Baltic Perspectiv, 1990
work page 1990
-
[13]
Wikipedia. Speakable items. https://en.wikipedia.org/wiki/Speakable_items, 2023. [Online; ac- cessed January 5, 2023]
work page 2023
-
[14]
Medspeak: Report creation with continuous speech recognition
Jennifer Lai and John Vergo. Medspeak: Report creation with continuous speech recognition. In Proceedings of the ACM SIGCHI Conference on Human Factors in Computing Systems, CHI ’97, page 431–438, New York, NY , USA, 1997. Association for Computing Machinery. ISBN 0897918029. doi: 10.1145/258549.258829
-
[15]
Speech transcript - jim allchin, winhec 2002
Microsoft. Speech transcript - jim allchin, winhec 2002. https://news.microsoft.com/speeches/ speech-transcript-jim-allchin-winhec-2002/ , 2002. [Online; accessed January 5, 2023]
work page 2002
-
[16]
Google is taking questions (spoken, via iphone)
John Markoff. Google is taking questions (spoken, via iphone). https://www.nytimes.com/2008/11/14/ technology/internet/14voice.html, 2008. [Online; accessed January 5, 2024]
work page 2008
-
[17]
Microsoft. Cortana. https://www.microsoft.com/en-us/cortana, 2023. [Online; accessed December 26, 2023]
work page 2023
-
[18]
OpenAI. Introduce chatgpt. https://openai.com/blog/chatgpt, 2022. [Online; accessed November 28, 2023]
work page 2022
-
[19]
Announcing microsoft copilot, your everyday ai companion
Microsoft. Announcing microsoft copilot, your everyday ai companion. https://blogs.microsoft.com/ blog/2023/09/21/announcing-microsoft-copilot-your-everyday-ai-companion/ , 2023. [Online; accessed December 4, 2023]
work page 2023
-
[20]
Apple. Sirikit: Empower users to interact with their devices through voice, intelligent suggestions, and personalized workflows. https://developer.apple.com/documentation/sirikit/, 2023. [Online; accessed December 24, 2023]
work page 2023
-
[21]
Apple. Shortcuts user guide. https://support.apple.com/en-hk/guide/shortcuts/welcome/ios,
-
[22]
[Online; accessed December 24, 2023]
work page 2023
-
[23]
Tasker: Total automation for android
Joaoapps. Tasker: Total automation for android. https://tasker.joaoapps.com, 2023. [Online; accessed December 24, 2023]
work page 2023
-
[24]
Absinthe. Anywhere shortcuts. https://play.google.com/store/apps/details?id=com.absinthe. anywhere_&hl=en_US&pli=1, 2023. [Online; accessed December 24, 2023]
work page 2023
-
[25]
Programming iot devices by demonstration using mobile apps
Toby Jia-Jun Li, Yuanchun Li, Fanglin Chen, and Brad A Myers. Programming iot devices by demonstration using mobile apps. In End-User Development: 6th International Symposium, IS-EUD 2017, Eindhoven, The Netherlands, June 13-15, 2017, Proceedings 6, pages 3–17. Springer, 2017
work page 2017
-
[26]
Ulink: Enabling user-defined deep linking to app content
Tanzirul Azim, Oriana Riva, and Suman Nath. Ulink: Enabling user-defined deep linking to app content. In Proceedings of the 14th Annual International Conference on Mobile Systems, Applications, and Services , MobiSys ’16, page 305–318, New York, NY , USA, 2016. Association for Computing Machinery. ISBN 9781450342698. doi: 10.1145/2906388.2906416
-
[27]
Benjamin R. Cowan, Nadia Pantidi, David Coyle, Kellie Morrissey, Peter Clarke, Sara Al-Shehri, David Earley, and Natasha Bandeira. "what can i help you with?": Infrequent users’ experiences of intelligent personal assistants. In Proceedings of the 19th International Conference on Human-Computer Interaction with Mobile Devices and Services, MobileHCI ’17, ...
-
[28]
A mixed-methods approach to understanding user trust after voice assistant failures
Amanda Baughan, Xuezhi Wang, Ariel Liu, Allison Mercurio, Jilin Chen, and Xiao Ma. A mixed-methods approach to understanding user trust after voice assistant failures. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems, CHI ’23, New York, NY , USA, 2023. Association for Computing Machinery. ISBN 9781450394215. doi: 10.1145/354...
-
[29]
Ewa Luger and Abigail Sellen. "like having a really bad pa": The gulf between user expectation and experience of conversational agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems, CHI ’16, page 5286–5297, New York, NY , USA, 2016. Association for Computing Machinery. ISBN 9781450333627. doi: 10.1145/2858036.2858288. 39
-
[30]
Matthew B. Hoy. Alexa, siri, cortana, and more: An introduction to voice assistants. Medical Reference Services Quarterly, 37(1):81–88, 2018. doi: 10.1080/02763869.2018.1404391. PMID: 29327988
-
[31]
Humanoid: A deep learning-based approach to automated black-box android app testing
Yuanchun Li, Ziyue Yang, Yao Guo, and Xiangqun Chen. Humanoid: A deep learning-based approach to automated black-box android app testing. In 2019 34th IEEE/ACM International Conference on Automated Software Engineering (ASE), pages 1070–1073. IEEE, 2019
work page 2019
-
[32]
Gomez, Łukasz Kaiser, and Illia Polosukhin
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Proceedings of the 31st International Conference on Neural Information Processing Systems, NIPS’17, page 6000–6010, Red Hook, NY , USA, 2017. Curran Associates Inc. ISBN 9781510860964
work page 2017
-
[33]
Zecheng He, Srinivas Sunkara, Xiaoxue Zang, Ying Xu, Lijuan Liu, Nevan Wichers, Gabriel Schubiner, Ruby B. Lee, and Jindong Chen. Actionbert: Leveraging user actions for semantic understanding of user interfaces. In AAAI Conference on Artificial Intelligence, 2020
work page 2020
-
[34]
Under- standing mobile gui: from pixel-words to screen-sentences
Jingwen Fu, Xiaoyi Zhang, Yuwang Wang, Wenjun Zeng, Sam Yang, and Grayson Hilliard. Under- standing mobile gui: from pixel-words to screen-sentences. ArXiv, abs/2105.11941, 2021. URL https: //api.semanticscholar.org/CorpusID:235187035
- [35]
-
[36]
Uibert: Learning generic multimodal representations for ui understanding
Chongyang Bai, Xiaoxue Zang, Ying Xu, Srinivas Sunkara, Abhinav Rastogi, Jindong Chen, and Blaise Agüera y Arcas. Uibert: Learning generic multimodal representations for ui understanding. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, IJCAI-21, pages 1705–1712. International Joint Conferences on Artificial Intel...
-
[37]
Spotlight: Mobile ui understanding using vision-language models with a focus
Gang Li and Yang Li. Spotlight: Mobile ui understanding using vision-language models with a focus. ArXiv, abs/2209.14927, 2022
-
[38]
Lexi: Self-supervised learning of the ui language
Pratyay Banerjee, Shweti Mahajan, Kushal Arora, Chitta Baral, and Oriana Riva. Lexi: Self-supervised learning of the ui language. ArXiv, abs/2301.10165, 2023
-
[39]
Uinav: A maker of ui automation agents
Wei Li, Fu-Lin Hsu, Will Bishop, Folawiyo Campbell-Ajala, Oriana Riva, and Max Lin. Uinav: A maker of ui automation agents. arXiv preprint arXiv:2312.10170, 2023
-
[40]
World of bits: An open-domain platform for web-based agents
Tianlin Tim Shi, Andrej Karpathy, Linxi Jim Fan, Jonathan Hernandez, and Percy Liang. World of bits: An open-domain platform for web-based agents. In Proceedings of the 34th International Conference on Machine Learning - Volume 70, ICML’17, page 3135–3144. JMLR.org, 2017
work page 2017
-
[41]
Izzeddin Gur, Ulrich Rückert, Aleksandra Faust, and Dilek Z. Hakkani-Tür. Learning to navigate the web. ArXiv, abs/1812.09195, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[42]
DOM-Q-NET: Grounded RL on Structured Language
Sheng Jia, Jamie Ryan Kiros, and Jimmy Ba. Dom-q-net: Grounded rl on structured language. ArXiv, abs/1902.07257, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1902
-
[43]
A data-driven approach for learning to control computers
Peter C Humphreys, David Raposo, Tobias Pohlen, Gregory Thornton, Rachita Chhaparia, Alistair Muldal, Josh Abramson, Petko Georgiev, Adam Santoro, and Timothy Lillicrap. A data-driven approach for learning to control computers. In International Conference on Machine Learning, pages 9466–9482. PMLR, 2022
work page 2022
-
[44]
Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B. Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models, 2020
work page 2020
-
[45]
Long Ouyang, Jeff Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, and Ryan Lowe. Training language models to follow instructions with human feedback, 2022
work page 2022
-
[46]
Brown, Miljan Martic, Shane Legg, and Dario Amodei
Paul Christiano, Jan Leike, Tom B. Brown, Miljan Martic, Shane Legg, and Dario Amodei. Deep reinforcement learning from human preferences, 2023
work page 2023
-
[47]
Toolformer: Language models can teach themselves to use tools, 2023
Timo Schick, Jane Dwivedi-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools, 2023. 40
work page 2023
-
[48]
Webgpt: Browser-assisted question- answering with human feedback, 2022
Reiichiro Nakano, Jacob Hilton, Suchir Balaji, Jeff Wu, Long Ouyang, Christina Kim, Christopher Hesse, Shantanu Jain, Vineet Kosaraju, William Saunders, Xu Jiang, Karl Cobbe, Tyna Eloundou, Gretchen Krueger, Kevin Button, Matthew Knight, Benjamin Chess, and John Schulman. Webgpt: Browser-assisted question- answering with human feedback, 2022
work page 2022
-
[49]
Multimodal web navigation with instruction-finetuned foundation models
Hiroki Furuta, Ofir Nachum, Kuang-Huei Lee, Yutaka Matsuo, Shixiang Shane Gu, and Izzeddin Gur. Multimodal web navigation with instruction-finetuned foundation models. ArXiv, abs/2305.11854, 2023
-
[50]
Progprompt: Generating situated robot task plans using large language models
Ishika Singh, Valts Blukis, Arsalan Mousavian, Ankit Goyal, Danfei Xu, Jonathan Tremblay, Dieter Fox, Jesse Thomason, and Animesh Garg. Progprompt: Generating situated robot task plans using large language models. In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 11523–11530. IEEE, 2023
work page 2023
-
[51]
Yue Zhen, Sheng Bi, Lu Xing-tong, Pan Wei-qin, Shi Hai-peng, Chen Zi-rui, and Fang Yi-shu. Robot task planning based on large language model representing knowledge with directed graph structures, 2023
work page 2023
-
[52]
Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022
Wenlong Huang, Pieter Abbeel, Deepak Pathak, and Igor Mordatch. Language models as zero-shot planners: Extracting actionable knowledge for embodied agents, 2022
work page 2022
-
[53]
Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, 2023
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face, 2023
work page 2023
-
[54]
Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning
Ke Wang, Houxing Ren, Aojun Zhou, Zimu Lu, Sichun Luo, Weikang Shi, Renrui Zhang, Linqi Song, Mingjie Zhan, and Hongsheng Li. Mathcoder: Seamless code integration in llms for enhanced mathematical reasoning. ArXiv, abs/2310.03731, 2023
-
[55]
Baptiste Rozière, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Tan, Yossi Adi, Jingyu Liu, Tal Remez, Jérémy Rapin, Artyom Kozhevnikov, I. Evtimov, Joanna Bitton, Manish P Bhatt, Cristian Cantón Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre D’efossez, Jade Copet, Faisal Azhar, Hugo Touvron, Louis Martin, Nicolas Usunier, Thomas Sci...
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[56]
Aojun Zhou, Ke Wang, Zimu Lu, Weikang Shi, Sichun Luo, Zipeng Qin, Shaoqing Lu, Anya Jia, Linqi Song, Mingjie Zhan, and Hongsheng Li. Solving challenging math word problems using gpt-4 code interpreter with code-based self-verification, 2023
work page 2023
- [57]
-
[58]
Reinventing search with a new ai-powered microsoft bing and edge, your copilot for the web
Microsoft. Reinventing search with a new ai-powered microsoft bing and edge, your copilot for the web. https://blogs.microsoft.com/blog/2023/02/07/ reinventing-search-with-a-new-ai-powered-microsoft-bing-and-edge-your-copilot-for-the-web/ ,
work page 2023
-
[59]
[Online; accessed December 8, 2023]
work page 2023
-
[60]
Bard: A conversational ai tool by google
Google. Bard: A conversational ai tool by google. https://bard.google.com, 2023. [Online; accessed December 26, 2023]
work page 2023
-
[61]
Introducing gemini: our largest and most capable ai model
Google. Introducing gemini: our largest and most capable ai model. https://blog.google/technology/ ai/google-gemini-ai/, 2023. [Online; accessed December 26, 2023]
work page 2023
-
[62]
Reshaping industries with ai: Huawei cloud launches pangu models 3.0 and ascend ai cloud ser- vices
Huawei. Reshaping industries with ai: Huawei cloud launches pangu models 3.0 and ascend ai cloud ser- vices. https://www.huaweicloud.com/intl/en-us/news/20230707180809498.html, 2023. [Online; accessed November 28, 2023]
-
[63]
XiaoMi. Milm-6b. https://github.com/XiaoMi/MiLM-6B, 2023. [Online; accessed December 24, 2023]
work page 2023
-
[64]
Sayed Naem Bokhari. The linux operating system. Computer, 28(8):74–79, 1995
work page 1995
-
[65]
Wikipedia. Borda count. https://en.wikipedia.org/wiki/Borda_count, 2023. [Online; accessed December 13, 2023]
work page 2023
-
[66]
Recent advances in end-to-end automatic speech recognition, 2022
Jinyu Li. Recent advances in end-to-end automatic speech recognition, 2022
work page 2022
-
[67]
Sainath, Ralf Schlüter, and Shinji Watanabe
Rohit Prabhavalkar, Takaaki Hori, Tara N. Sainath, Ralf Schlüter, and Shinji Watanabe. End-to-end speech recognition: A survey, 2023. 41
work page 2023
-
[68]
A survey on large language model based autonomous agents, 2023
Lei Wang, Chen Ma, Xueyang Feng, Zeyu Zhang, Hao Yang, Jingsen Zhang, Zhiyuan Chen, Jiakai Tang, Xu Chen, Yankai Lin, Wayne Xin Zhao, Zhewei Wei, and Ji-Rong Wen. A survey on large language model based autonomous agents, 2023
work page 2023
-
[69]
The rise and potential of large language model based agents: A survey, 2023
Zhiheng Xi, Wenxiang Chen, Xin Guo, Wei He, Yiwen Ding, Boyang Hong, Ming Zhang, Junzhe Wang, Senjie Jin, Enyu Zhou, Rui Zheng, Xiaoran Fan, Xiao Wang, Limao Xiong, Yuhao Zhou, Weiran Wang, Changhao Jiang, Yicheng Zou, Xiangyang Liu, Zhangyue Yin, Shihan Dou, Rongxiang Weng, Wensen Cheng, Qi Zhang, Wenjuan Qin, Yongyan Zheng, Xipeng Qiu, Xuanjing Huang, a...
work page 2023
-
[70]
Zhuosheng Zhang, Yao Yao, Aston Zhang, Xiangru Tang, Xinbei Ma, Zhiwei He, Yiming Wang, Mark Gerstein, Rui Wang, Gongshen Liu, and Hai Zhao. Igniting language intelligence: The hitchhiker’s guide from chain-of- thought reasoning to language agents, 2023
work page 2023
-
[71]
Steve Young, Milica Gaši´c, Blaise Thomson, and Jason D. Williams. Pomdp-based statistical spoken dialog systems: A review. Proceedings of the IEEE, 101(5):1160–1179, 2013. doi: 10.1109/JPROC.2012.2225812
-
[72]
doi: 10.18653/v1/ 2021.acl-srw.23
Abhinav Rastogi, Raghav Gupta, and Dilek Hakkani-Tur. Multi-task learning for joint language understanding and dialogue state tracking. In Proceedings of the 19th Annual SIGdial Meeting on Discourse and Dialogue, pages 376–384, Melbourne, Australia, July 2018. Association for Computational Linguistics. doi: 10.18653/v1/ W18-5045
-
[73]
Kite: Building conversational bots from mobile apps
Toby Jia-Jun Li and Oriana Riva. Kite: Building conversational bots from mobile apps. In Proceedings of the 16th Annual International Conference on Mobile Systems, Applications, and Services, MobiSys ’18, page 96–109, New York, NY , USA, 2018. Association for Computing Machinery. ISBN 9781450357203. doi: 10.1145/3210240.3210339
-
[74]
Toby Jia-Jun Li, Amos Azaria, and Brad A. Myers. Sugilite: Creating multimodal smartphone automation by demonstration. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, CHI ’17, page 6038–6049, New York, NY , USA, 2017. Association for Computing Machinery. ISBN 9781450346559. doi: 10.1145/3025453.3025483
-
[75]
Can current task-oriented dialogue models automate real-world scenarios in the wild?, 2023
Sang-Woo Lee, Sungdong Kim, Donghyeon Ko, Donghoon Ham, Youngki Hong, Shin Ah Oh, Hyunhoon Jung, Wangkyo Jung, Kyunghyun Cho, Donghyun Kwak, Hyungsuk Noh, and Woomyoung Park. Can current task-oriented dialogue models automate real-world scenarios in the wild?, 2023
work page 2023
-
[76]
Instructtods: Large language models for end-to-end task-oriented dialogue systems, 2023
Willy Chung, Samuel Cahyawijaya, Bryan Wilie, Holy Lovenia, and Pascale Fung. Instructtods: Large language models for end-to-end task-oriented dialogue systems, 2023
work page 2023
-
[77]
Zhiyuan Hu, Yue Feng, Yang Deng, Zekun Li, See-Kiong Ng, Anh Tuan Luu, and Bryan Hooi. Enhancing large language model induced task-oriented dialogue systems through look-forward motivated goals, 2023
work page 2023
-
[78]
Are llms all you need for task-oriented dialogue?, 2023
V ojtˇech Hudeˇcek and Ondˇrej Dušek. Are llms all you need for task-oriented dialogue?, 2023
work page 2023
-
[79]
Zhiyuan Hu, Yue Feng, Anh Tuan Luu, Bryan Hooi, and Aldo Lipani. Unlocking the potential of user feedback: Leveraging large language model as user simulators to enhance dialogue system. InProceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, page 3953–3957, New York, NY , USA, 2023. Association for Computi...
-
[80]
Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared Kaplan, Prafulla Dhariwal, Arvind Neelakantan, Pranav Shyam, Girish Sastry, Amanda Askell, Sandhini Agarwal, Ariel Herbert-V oss, Gretchen Krueger, Tom Henighan, Rewon Child, Aditya Ramesh, Daniel M. Ziegler, Jeffrey Wu, Clemens Winter, Christopher Hesse, Mark Chen, Eric Sigler, Mateusz Litwi...
work page 2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.