Recognition: no theorem link
MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs
Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3
The pith
MAESTRO extracts user preferences from dialogue to adapt GUIs and guide navigation in conversational agents.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MAESTRO extends the agent's role from execution to decision support. It maintains a shared preference memory that extracts preferences from natural-language utterances with their strength, and provides two mechanisms: Preference-Grounded GUI Adaptation applies in-place operators (augment, sort, filter, and highlight) to the existing GUI according to preference strength, supporting within-stage comparison; Preference-Guided Workflow Navigation detects conflicts between preferences and available options, proposes backtracking, and records failed paths to avoid revisiting dead ends.
What carries the argument
Shared preference memory that extracts preferences with strength from utterances, enabling Preference-Grounded GUI Adaptation through in-place operators and Preference-Guided Workflow Navigation through conflict detection and backtracking proposals.
If this is right
- GUI elements can be augmented, sorted, filtered, or highlighted in place according to the strength of extracted preferences.
- Conflicts between stored preferences and current options trigger automatic proposals to backtrack in the workflow.
- Failed paths are recorded so the agent can steer users away from the same dead ends in later steps.
- The same preference memory supports both text and voice input modes without changing the core adaptation logic.
Where Pith is reading between the lines
- The same memory structure could be reused across separate sessions so preferences accumulate rather than reset each time a new booking starts.
- Navigation guidance might remain useful even when visual GUI adaptation is unavailable, such as in voice-only settings.
- Recording dead ends creates a lightweight form of negative feedback that other agents could consult when ranking future suggestions.
Load-bearing premise
Preferences can be reliably extracted from natural-language utterances together with their strength and that applying in-place GUI operators and backtracking suggestions based on them will produce measurable improvements in user experience for multi-step preference-driven tasks.
What would settle it
A within-subjects study of the movie-booking agent that finds no significant difference in task completion time, number of restarts, or user satisfaction scores between MAESTRO and the baseline condition would show the mechanisms do not deliver the claimed benefits.
Figures
read the original abstract
Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier choices constrain later options and may force users to restart from scratch. User preferences serve as the key criteria for these decisions, yet existing agents do not systematically leverage them. We present MAESTRO, which extends the agent's role from execution to decision support. MAESTRO maintains a shared preference memory that extracts preferences from natural-language utterances with their strength, and provides two mechanisms. Preference-Grounded GUI Adaptation applies in-place operators (augment, sort, filter, and highlight) to the existing GUI according to preference strength, supporting within-stage comparison. Preference-Guided Workflow Navigation detects conflicts between preferences and available options, proposes backtracking, and records failed paths to avoid revisiting dead ends. We evaluated MAESTRO in a movie-booking Conversational Agent with GUI (CAG) through a within-subjects study with two conditions (Baseline vs. MAESTRO) and two modes (Text vs. Voice), with N = 33 participants.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents MAESTRO, a system extending conversational agents with GUIs (CAGs) beyond linear execution. It maintains a shared preference memory that extracts preferences and their strengths from natural-language utterances, then applies two mechanisms: Preference-Grounded GUI Adaptation (in-place operators: augment, sort, filter, highlight) for within-stage comparison and Preference-Guided Workflow Navigation (conflict detection, backtracking proposals, failed-path recording) to avoid dead ends. The approach is evaluated in a movie-booking CAG via a within-subjects study (Baseline vs. MAESTRO; Text vs. Voice) with N=33 participants.
Significance. If the missing evaluation results demonstrate reliable preference extraction and measurable UX gains in multi-step preference-driven tasks, the work would advance CAG design by shifting agents from passive interpreters to active decision-support systems that systematically leverage user preferences across GUI adaptation and workflow guidance. The explicit separation of preference memory from execution and the concrete operators provide a clear, extensible framework.
major comments (2)
- [Abstract / Evaluation] Abstract and evaluation description: the within-subjects study with N=33 is described (two conditions, two modes) but supplies no quantitative results, error bars, statistical tests, exclusion criteria, or qualitative findings, so the central claim that the mechanisms support decision-making cannot be assessed.
- [System Description (preference memory)] Shared preference memory and extraction: the entire system (both GUI adaptation and navigation) rests on an unspecified extractor that pulls preferences plus scalar strength from free-form utterances, yet no accuracy metrics, validation set, or error analysis of extraction is reported; low precision would render downstream operators and conflict detection unreliable.
minor comments (1)
- [Abstract] The abstract ends abruptly after stating N=33 without any summary of outcomes or conclusions.
Simulated Author's Rebuttal
We thank the referee for their insightful comments on our manuscript. We address each of the major comments in detail below and indicate the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: [Abstract / Evaluation] Abstract and evaluation description: the within-subjects study with N=33 is described (two conditions, two modes) but supplies no quantitative results, error bars, statistical tests, exclusion criteria, or qualitative findings, so the central claim that the mechanisms support decision-making cannot be assessed.
Authors: We agree that the abstract and the evaluation description as currently written do not include the quantitative results or supporting statistical information. Since we conducted the within-subjects study with 33 participants, we have the data on task completion, preference usage, user satisfaction, and other metrics along with statistical tests and qualitative insights. In the revised manuscript, we will update the abstract to include key quantitative findings and expand the evaluation section to report error bars, statistical tests, exclusion criteria, and qualitative findings. This will allow readers to fully assess the support for our claims regarding decision-making. revision: yes
-
Referee: [System Description (preference memory)] Shared preference memory and extraction: the entire system (both GUI adaptation and navigation) rests on an unspecified extractor that pulls preferences plus scalar strength from free-form utterances, yet no accuracy metrics, validation set, or error analysis of extraction is reported; low precision would render downstream operators and conflict detection unreliable.
Authors: We agree that the preference extraction component needs to be specified more clearly. We will revise the system description section to detail the method used for extracting preferences and their strengths from natural language utterances. Regarding accuracy metrics, validation set, and error analysis, these were not included in the original evaluation, which focused on the overall system performance in the user study rather than isolated component testing. We will add an analysis of extraction performance based on the collected study data where feasible, but a comprehensive validation would require additional experiments not performed in this work. revision: partial
- Detailed accuracy metrics and validation set for the preference extractor, since a dedicated evaluation of the extraction accuracy was not conducted as part of the study.
Circularity Check
No circularity: empirical system description without derivation or fitted parameters
full rationale
The paper presents MAESTRO as a system that maintains a shared preference memory and applies defined GUI operators (augment, sort, filter, highlight) plus workflow navigation features (conflict detection, backtracking). These mechanisms are introduced directly as design choices in the abstract and full text, with evaluation via a within-subjects user study (N=33). No equations, mathematical derivations, parameter fitting, or predictions that reduce to inputs appear. No self-citation load-bearing uniqueness theorems or ansatzes are invoked. The work is self-contained as an empirical HCI system paper; claims rest on implementation and study results rather than any tautological reduction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption User preferences can be extracted from natural-language utterances together with a numeric strength value
invented entities (1)
-
Shared preference memory
no independent evidence
Reference graph
Works this paper leans on
-
[1]
Rifat Mehreen Amin, Oliver Hans Kühle, Daniel Buschek, and Andreas Butz. 2025. PromptCanvas: Composable Prompting Workspaces Using Dynamic Widgets for Exploration and Iteration in Creative Writing. (2025). doi:10.48550/ARXIV.2506. 03741
-
[2]
Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woomyoung Park, and Nako Sung. 2022. Keep me updated! memory management in long-term conversations. InFindings of the Association for Computational Linguistics: EMNLP 2022. 3769–3787
2022
-
[3]
Bank of America. 2024. Erica: Your Virtual Financial Assistant from Bank of America. https://info.bankofamerica.com/en/digital-banking/erica
2024
-
[4]
Booking.com. 2023. Booking.com Launches New AI Trip Planner to Enhance Travel Planning Experience. https://news.booking.com/bookingcom-launches- new-ai-trip-planner-to-enhance-travel-planning-experience/
2023
-
[5]
Bursztyn, Jennifer Healey, Eunyee Koh, Nedim Lipka, and Larry Birnbaum
Victor S. Bursztyn, Jennifer Healey, Eunyee Koh, Nedim Lipka, and Larry Birnbaum. 2021. Developing a Conversational Recommendation Systemfor Navigating Limited Options.Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems(May 2021), 1–6. doi:10.1145/3411763. 3451596
-
[6]
Yining Cao, Peiling Jiang, and Haijun Xia. 2025. Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, 1–20. doi:10.1145/ 3706598.3713285
-
[7]
Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, and Diyi Yang. 2025. Generative Interfaces for Language Models. arXiv:2508.19227 [cs] doi:10.48550/ arXiv.2508.19227
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Weihao Chen, Xiaoyu Liu, Jiacheng Zhang, Ian Iong Lam, Zhicheng Huang, Rui Dong, Xinyu Wang, and Tianyi Zhang. 2023. MIWA: Mixed-Initiative Web Automation for Better User Control and Confidence. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–15....
-
[9]
Trishul Chilimbi, Alexandre Alves, Anita Vila, AI Conversational, and Burak Gozluklu. 2024. The technology behind Amazon’s GenAI-powered shopping assis- tant, Rufus.Amazon Science (Oct. 2024). url: https://www. amazon. science/blog/the- technology-behind-amazons-genai-powered-shoppingassistant-rufus(2024)
2024
-
[10]
Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco California USA, 815–824. doi:10.1145/2939672.2939746
-
[11]
Vera Demberg, Andi Winterboer, and Johanna D. Moore. 2011. A Strategy for Information Presentation in Spoken Dialog Systems.Computational Linguistics 37, 3 (Sept. 2011), 489–539. doi:10.1162/COLI_a_00064
-
[12]
Dawn Dutton, Selina Chu, James Hubbell, Marilyn Walker, and Shrikanth Narayanan. 2001. Amount of Information Presented in a Complex List: Effects on User Performance.Proceedings of the first international conference on Human language technology research - HLT ’01(2001), 1–6. doi:10.3115/1072133.1072137
-
[13]
Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, and Kaheer Suleman. 2017. Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems. InProceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Kristiina Jokinen, Manfred Stede, David DeVault, and Annie Louis (Eds.). Associat...
-
[14]
Yue Feng, Shuchang Liu, Zhenghai Xue, Qingpeng Cai, Lantao Hu, Peng Jiang, Kun Gai, and Fei Sun. 2023. A Large Language Model Enhanced Conversational Recommender System.ArXiv(Aug. 2023)
2023
-
[15]
Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexiang Chen, and Manoj Tiwari
L. Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexiang Chen, and Manoj Tiwari. 2023. Leveraging Large Language Models in Conversational Recommender Systems.ArXiv(May 2023)
2023
-
[16]
Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.ArXiv(March 2023)
2023
-
[17]
Nobukatsu Hojo, Kazutoshi Shinoda, Yoshihiro Yamazaki, Keita Suzuki, Hiroaki Sugiyama, Kyosuke Nishida, and Kuniko Saito. 2025. GenerativeGUI: Dynamic GUI Generation Leveraging LLMs for Enhanced User Interaction on Chat Interfaces. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association...
-
[18]
Weiyin Hong, James Y. L. Thong, and Kar Yan Tam. 2004. Designing Product Listing Pages on E-Commerce Websites: An Examination of Presentation Mode and Information Format.International Journal of Human-Computer Studies61, 4 (Oct. 2004), 481–503. doi:10.1016/j.ijhcs.2004.01.006
-
[20]
Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2022. A Survey on Conversational Recommender Systems.Comput. Surveys54, 5 (June 2022), 1–36. doi:10.1145/3453154
-
[21]
Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J. Taylor, and Dan Roth. 2025. Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale. (2025). doi:10.48550/ARXIV.2504.14225
-
[22]
Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the Web with Natural Language. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3491102.3501931
-
[23]
Tae Soo Kim, Yoonjoo Lee, Yoonah Park, Jiho Kim, Young-Ho Kim, and Juho Kim
-
[24]
CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions. (2025). doi:10.48550/ARXIV.2508.01674
-
[25]
Heejin Kook, Junyoung Kim, Seongmin Park, and Jongwuk Lee. 2025. Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Association for Compu...
-
[26]
Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang
Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638
-
[27]
Samuel Louvan and Bernardo Magnini. 2020. Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey. InProceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Spain (O...
-
[28]
Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, San Jose California USA, 5286–5297. doi:10.1145/2858036.2858288
-
[29]
Nikola Marangunić and Andrina Granić. 2015. Technology acceptance model: a literature review from 1986 to 2013.Universal access in the information society 14, 1 (2015), 81–95
2015
-
[30]
Damien Masson, Sylvain Malacria, Géry Casiez, and Daniel Vogel. 2024. DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–16. doi:10.1145/3613904.3642462
-
[31]
Palash Nandy, Sigurdur Orn Adalgeirsson, Anoop K. Sinha, Tanya Kraljic, Mike Cleron, Lei Shi, Angad Singh, Ashish Chaudhary, Ashwin Ganti, Christopher A Melancon, Shudi Zhang, David Robishaw, Horia Ciurdar, Justin Secor, Kenneth Aleksander Robertsen, Kirsten Climer, Madison Le, Mathangi Venkatesan, Peggy Chi, Peixin Li, Peter F McDermott, Rachel Shim, Sel...
-
[32]
Nguyen, Anna Sidorova, and Russell Torres
Quynh N. Nguyen, Anna Sidorova, and Russell Torres. 2022. User Interactions with Chatbot Interfaces vs. Menu-based Interfaces: An Empirical Study. Computers in Human Behavior128 (March 2022), 107093. doi:10.1016/j.chb.2021. 107093
-
[33]
Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–22. doi:10.1145/35861...
-
[34]
Yingzhe Peng, Xiaoting Qin, Zhiyang Zhang, Jue Zhang, Qingwei Lin, Xu Yang, Dongmei Zhang, Saravan Rajmohan, and Qi Zhang. 2025. Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, N...
-
[35]
Yi-Hao Peng, Dingzeyu Li, Jeffrey P Bigham, and Amy Pavel. 2025. Morae: Proactively Pausing UI Agents for User Choices. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3746059. 3747797
-
[36]
Tian Qin, Felix Bai, Ting-Yao Hu, Raviteja Vemulapalli, Hema Swetha Koppula, Zhiyang Xu, Bowen Jin, Mert Cemri, Jiarui Lu, Zirui Wang, and Meng Cao
-
[37]
COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization. (2025). doi:10.48550/ARXIV.2510.07043
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.07043 2025
-
[38]
Marcel Ruoff, Brad A. Myers, and Alexander Maedche. 2025. MALACHITE— Enabling Users to Teach GUI-Aware Natural Language Interfaces.ACM Trans. Interact. Intell. Syst.15, 2 (April 2025), 7:1–7:29. doi:10.1145/3716141
-
[40]
Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, and Michael S. Bernstein. 2025. Creating General User Models from Computer Use. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 35, 23 pages. doi:10.1145/374...
-
[41]
Hyewon Suh, Nina Shahriaree, Eric B Hekler, and Julie A Kientz. 2016. Developing and validating the user burden scale: A tool for assessing user burden in computing systems. InProceedings of the 2016 CHI conference on human factors in computing systems. 3988–3999
2016
-
[42]
Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, and Kai Yu
-
[43]
Meta-gui: Towards multi-modal conversational agents on mobile gui
META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. arXiv:2205.11029 [cs] doi:10.48550/arXiv.2205.11029
-
[44]
Vinhtuan Thai, Pierre-Yves Rouille, and Siegfried Handschuh. 2012. Visual Abstraction and Ordering in Faceted Browsing of Text Collections.ACM Trans. Intell. Syst. Technol.3, 2 (Feb. 2012), 21:1–21:24. doi:10.1145/2089094.2089097
-
[45]
C. A. Thompson, M. H. Goker, and P. Langley. 2004. A Personalized System for Conversational Recommendations.Journal of Artificial Intelligence Research21 (March 2004), 393–428. doi:10.1613/jair.1318
-
[46]
Glassman, Jeevana Priya Inala, and Chenglong Wang
Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, and Chenglong Wang. 2024. DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3613904.3642639
-
[47]
Yun Wan, Satya Menon, and Arkalgud Ramaprasad. 2009. The Paradoxical Nature of Electronic Decision Aids on Comparison-Shopping: The Experiments and Analysis.Journal of theoretical and applied electronic commerce research4, 3 (Dec. 2009), 80–96. doi:10.4067/S0718-18762009000300008
-
[48]
Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. 2025. Large Action Models: From Inception to Implementation. arXiv:2412.10047 [cs] doi:10.48550/arXiv.2412.10047
-
[49]
Weidele, Mauro Martino, Abel N
Daniel Karl I. Weidele, Mauro Martino, Abel N. Valente, Gaetano Rossiello, Hendrik Strobelt, Loraine Franke, Kathryn Alvero, Shayenna Misko, Robin Auer, Sugato Bagchi, Nandana Mihindukulasooriya, Faisal Chowdhury, Gregory Bramble, Horst Samulowitz, Alfio Gliozzo, and Lisa Amini. 2024. Empirical Evidence on Conversational Control of GUI in Semantic Automat...
-
[50]
Henry Weld, Xiaoqi Huang, Siqu Long, Josiah Poon, and Soyeon Caren Han
-
[51]
A Survey of Joint Intent Detection and Slot Filling Models in Natural Language Understanding.ACM Comput. Surv.55, 8 (Dec. 2022), 156:1–156:38. doi:10.1145/3547138
-
[52]
Ja Eun Yu and Debaleena Chattopadhyay. 2024. Reducing the Search Space on Demand Helps Older Adults Find Mobile UI Features Quickly, on Par with Younger Adults. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–22. doi:10.1145/3613904.3642796
-
[53]
Ja Eun Yu, Natalie Parde, and Debaleena Chattopadhyay. 2023. “Where Is History”: Toward Designing a Voice Assistant to Help Older Adults Locate Interface Features Quickly. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–19. doi:10.1145/3544548.3581447
-
[54]
Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. 2025. Large Language Model-Brained GUI Agents: A Survey. arXiv:2411.18279 [cs] doi:10.48550/arXiv.2411.18279
-
[55]
Shuning Zhang, Jingruo Chen, Zhiqi Gao, Jiajing Gao, Xin Yi, and Hewu Li. 2025. Characterizing Unintended Consequences in Human-GUI Agent Collaboration for Web Browsing. (2025). doi:10.48550/ARXIV.2505.09875
-
[56]
Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, and Kaixiang Lin
-
[57]
Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs. (2025). doi:10.48550/ARXIV.2502.09597
-
[58]
Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, and Philip S Yu. 2025. A Survey on Large Language Model Based Human-Agent Systems. doi:10.36227/ techrxiv.174612962.26131807/v2 A Appendix A.1 Sample JSON Object in Prefer...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.