pith. machine review for the scientific record. sign in

arxiv: 2604.06134 · v1 · submitted 2026-04-07 · 💻 cs.HC

Recognition: no theorem link

MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3

classification 💻 cs.HC
keywords conversational agentsGUI adaptationuser preferencespreference memoryworkflow navigationdecision supportmulti-step taskshuman-computer interaction
0
0 comments X

The pith

MAESTRO extracts user preferences from dialogue to adapt GUIs and guide navigation in conversational agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MAESTRO to extend conversational agents with GUIs beyond linear execution of commands. It maintains a shared memory that pulls preferences out of natural-language statements along with their strength. These stored preferences then trigger in-place changes to the visible interface and suggestions for moving backward in the workflow when choices conflict. The goal is to support multi-step tasks where early decisions limit later options, such as booking a flight or movie, without forcing users to start over. Evaluation came from a study of 33 people using a movie-booking agent in both text and voice modes.

Core claim

MAESTRO extends the agent's role from execution to decision support. It maintains a shared preference memory that extracts preferences from natural-language utterances with their strength, and provides two mechanisms: Preference-Grounded GUI Adaptation applies in-place operators (augment, sort, filter, and highlight) to the existing GUI according to preference strength, supporting within-stage comparison; Preference-Guided Workflow Navigation detects conflicts between preferences and available options, proposes backtracking, and records failed paths to avoid revisiting dead ends.

What carries the argument

Shared preference memory that extracts preferences with strength from utterances, enabling Preference-Grounded GUI Adaptation through in-place operators and Preference-Guided Workflow Navigation through conflict detection and backtracking proposals.

If this is right

  • GUI elements can be augmented, sorted, filtered, or highlighted in place according to the strength of extracted preferences.
  • Conflicts between stored preferences and current options trigger automatic proposals to backtrack in the workflow.
  • Failed paths are recorded so the agent can steer users away from the same dead ends in later steps.
  • The same preference memory supports both text and voice input modes without changing the core adaptation logic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same memory structure could be reused across separate sessions so preferences accumulate rather than reset each time a new booking starts.
  • Navigation guidance might remain useful even when visual GUI adaptation is unavailable, such as in voice-only settings.
  • Recording dead ends creates a lightweight form of negative feedback that other agents could consult when ranking future suggestions.

Load-bearing premise

Preferences can be reliably extracted from natural-language utterances together with their strength and that applying in-place GUI operators and backtracking suggestions based on them will produce measurable improvements in user experience for multi-step preference-driven tasks.

What would settle it

A within-subjects study of the movie-booking agent that finds no significant difference in task completion time, number of restarts, or user satisfaction scores between MAESTRO and the baseline condition would show the mechanisms do not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2604.06134 by Adnan Abbas, Sang Won Lee, Sangwook Lee, Yan Chen, Young-Ho Kim.

Figure 1
Figure 1. Figure 1: Overview of MAESTRO at the theater selection stage. Initial GUI: The default GUI presents three theater options with [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: MAESTRO layout at the theater selection stage. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Four-step adaptation policy illustrated on a movie-selection example. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Baseline condition interface at the theater selection [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Interaction plots for five dependent variables used in the evaluation: Violation Count (a), Unpreferred Selection [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: A screenshot of the user study interface showing MAESTRO in text mode at the seat-selection stage. The left panel [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗
read the original abstract

Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier choices constrain later options and may force users to restart from scratch. User preferences serve as the key criteria for these decisions, yet existing agents do not systematically leverage them. We present MAESTRO, which extends the agent's role from execution to decision support. MAESTRO maintains a shared preference memory that extracts preferences from natural-language utterances with their strength, and provides two mechanisms. Preference-Grounded GUI Adaptation applies in-place operators (augment, sort, filter, and highlight) to the existing GUI according to preference strength, supporting within-stage comparison. Preference-Guided Workflow Navigation detects conflicts between preferences and available options, proposes backtracking, and records failed paths to avoid revisiting dead ends. We evaluated MAESTRO in a movie-booking Conversational Agent with GUI (CAG) through a within-subjects study with two conditions (Baseline vs. MAESTRO) and two modes (Text vs. Voice), with N = 33 participants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper presents MAESTRO, a system extending conversational agents with GUIs (CAGs) beyond linear execution. It maintains a shared preference memory that extracts preferences and their strengths from natural-language utterances, then applies two mechanisms: Preference-Grounded GUI Adaptation (in-place operators: augment, sort, filter, highlight) for within-stage comparison and Preference-Guided Workflow Navigation (conflict detection, backtracking proposals, failed-path recording) to avoid dead ends. The approach is evaluated in a movie-booking CAG via a within-subjects study (Baseline vs. MAESTRO; Text vs. Voice) with N=33 participants.

Significance. If the missing evaluation results demonstrate reliable preference extraction and measurable UX gains in multi-step preference-driven tasks, the work would advance CAG design by shifting agents from passive interpreters to active decision-support systems that systematically leverage user preferences across GUI adaptation and workflow guidance. The explicit separation of preference memory from execution and the concrete operators provide a clear, extensible framework.

major comments (2)
  1. [Abstract / Evaluation] Abstract and evaluation description: the within-subjects study with N=33 is described (two conditions, two modes) but supplies no quantitative results, error bars, statistical tests, exclusion criteria, or qualitative findings, so the central claim that the mechanisms support decision-making cannot be assessed.
  2. [System Description (preference memory)] Shared preference memory and extraction: the entire system (both GUI adaptation and navigation) rests on an unspecified extractor that pulls preferences plus scalar strength from free-form utterances, yet no accuracy metrics, validation set, or error analysis of extraction is reported; low precision would render downstream operators and conflict detection unreliable.
minor comments (1)
  1. [Abstract] The abstract ends abruptly after stating N=33 without any summary of outcomes or conclusions.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments in detail below and indicate the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: [Abstract / Evaluation] Abstract and evaluation description: the within-subjects study with N=33 is described (two conditions, two modes) but supplies no quantitative results, error bars, statistical tests, exclusion criteria, or qualitative findings, so the central claim that the mechanisms support decision-making cannot be assessed.

    Authors: We agree that the abstract and the evaluation description as currently written do not include the quantitative results or supporting statistical information. Since we conducted the within-subjects study with 33 participants, we have the data on task completion, preference usage, user satisfaction, and other metrics along with statistical tests and qualitative insights. In the revised manuscript, we will update the abstract to include key quantitative findings and expand the evaluation section to report error bars, statistical tests, exclusion criteria, and qualitative findings. This will allow readers to fully assess the support for our claims regarding decision-making. revision: yes

  2. Referee: [System Description (preference memory)] Shared preference memory and extraction: the entire system (both GUI adaptation and navigation) rests on an unspecified extractor that pulls preferences plus scalar strength from free-form utterances, yet no accuracy metrics, validation set, or error analysis of extraction is reported; low precision would render downstream operators and conflict detection unreliable.

    Authors: We agree that the preference extraction component needs to be specified more clearly. We will revise the system description section to detail the method used for extracting preferences and their strengths from natural language utterances. Regarding accuracy metrics, validation set, and error analysis, these were not included in the original evaluation, which focused on the overall system performance in the user study rather than isolated component testing. We will add an analysis of extraction performance based on the collected study data where feasible, but a comprehensive validation would require additional experiments not performed in this work. revision: partial

standing simulated objections not resolved
  • Detailed accuracy metrics and validation set for the preference extractor, since a dedicated evaluation of the extraction accuracy was not conducted as part of the study.

Circularity Check

0 steps flagged

No circularity: empirical system description without derivation or fitted parameters

full rationale

The paper presents MAESTRO as a system that maintains a shared preference memory and applies defined GUI operators (augment, sort, filter, highlight) plus workflow navigation features (conflict detection, backtracking). These mechanisms are introduced directly as design choices in the abstract and full text, with evaluation via a within-subjects user study (N=33). No equations, mathematical derivations, parameter fitting, or predictions that reduce to inputs appear. No self-citation load-bearing uniqueness theorems or ansatzes are invoked. The work is self-contained as an empirical HCI system paper; claims rest on implementation and study results rather than any tautological reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that natural-language utterances contain extractable, strength-annotated preferences that can be used to drive GUI changes and navigation decisions without introducing new entities or fitted parameters.

axioms (1)
  • domain assumption User preferences can be extracted from natural-language utterances together with a numeric strength value
    Invoked in the description of the shared preference memory and the two adaptation mechanisms
invented entities (1)
  • Shared preference memory no independent evidence
    purpose: Store extracted preferences with strength for use by GUI adaptation and workflow navigation
    New component introduced by the system; no independent evidence outside the paper is provided

pith-pipeline@v0.9.0 · 5532 in / 1479 out tokens · 33582 ms · 2026-05-10T19:57:19.221738+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

56 extracted references · 42 canonical work pages · 2 internal anchors

  1. [1]

    Rifat Mehreen Amin, Oliver Hans Kühle, Daniel Buschek, and Andreas Butz. 2025. PromptCanvas: Composable Prompting Workspaces Using Dynamic Widgets for Exploration and Iteration in Creative Writing. (2025). doi:10.48550/ARXIV.2506. 03741

  2. [2]

    Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woomyoung Park, and Nako Sung. 2022. Keep me updated! memory management in long-term conversations. InFindings of the Association for Computational Linguistics: EMNLP 2022. 3769–3787

  3. [3]

    Bank of America. 2024. Erica: Your Virtual Financial Assistant from Bank of America. https://info.bankofamerica.com/en/digital-banking/erica

  4. [4]

    Booking.com. 2023. Booking.com Launches New AI Trip Planner to Enhance Travel Planning Experience. https://news.booking.com/bookingcom-launches- new-ai-trip-planner-to-enhance-travel-planning-experience/

  5. [5]

    Bursztyn, Jennifer Healey, Eunyee Koh, Nedim Lipka, and Larry Birnbaum

    Victor S. Bursztyn, Jennifer Healey, Eunyee Koh, Nedim Lipka, and Larry Birnbaum. 2021. Developing a Conversational Recommendation Systemfor Navigating Limited Options.Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems(May 2021), 1–6. doi:10.1145/3411763. 3451596

  6. [6]

    Yining Cao, Peiling Jiang, and Haijun Xia. 2025. Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, 1–20. doi:10.1145/ 3706598.3713285

  7. [7]

    Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, and Diyi Yang. 2025. Generative Interfaces for Language Models. arXiv:2508.19227 [cs] doi:10.48550/ arXiv.2508.19227

  8. [8]

    Weihao Chen, Xiaoyu Liu, Jiacheng Zhang, Ian Iong Lam, Zhicheng Huang, Rui Dong, Xinyu Wang, and Tianyi Zhang. 2023. MIWA: Mixed-Initiative Web Automation for Better User Control and Confidence. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–15....

  9. [9]

    Trishul Chilimbi, Alexandre Alves, Anita Vila, AI Conversational, and Burak Gozluklu. 2024. The technology behind Amazon’s GenAI-powered shopping assis- tant, Rufus.Amazon Science (Oct. 2024). url: https://www. amazon. science/blog/the- technology-behind-amazons-genai-powered-shoppingassistant-rufus(2024)

  10. [10]

    Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco California USA, 815–824. doi:10.1145/2939672.2939746

  11. [11]

    Vera Demberg, Andi Winterboer, and Johanna D. Moore. 2011. A Strategy for Information Presentation in Spoken Dialog Systems.Computational Linguistics 37, 3 (Sept. 2011), 489–539. doi:10.1162/COLI_a_00064

  12. [12]

    Dawn Dutton, Selina Chu, James Hubbell, Marilyn Walker, and Shrikanth Narayanan. 2001. Amount of Information Presented in a Complex List: Effects on User Performance.Proceedings of the first international conference on Human language technology research - HLT ’01(2001), 1–6. doi:10.3115/1072133.1072137

  13. [13]

    Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, and Kaheer Suleman. 2017. Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems. InProceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Kristiina Jokinen, Manfred Stede, David DeVault, and Annie Louis (Eds.). Associat...

  14. [14]

    Yue Feng, Shuchang Liu, Zhenghai Xue, Qingpeng Cai, Lantao Hu, Peng Jiang, Kun Gai, and Fei Sun. 2023. A Large Language Model Enhanced Conversational Recommender System.ArXiv(Aug. 2023)

  15. [15]

    Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexiang Chen, and Manoj Tiwari

    L. Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexiang Chen, and Manoj Tiwari. 2023. Leveraging Large Language Models in Conversational Recommender Systems.ArXiv(May 2023)

  16. [16]

    Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.ArXiv(March 2023)

  17. [17]

    Nobukatsu Hojo, Kazutoshi Shinoda, Yoshihiro Yamazaki, Keita Suzuki, Hiroaki Sugiyama, Kyosuke Nishida, and Kuniko Saito. 2025. GenerativeGUI: Dynamic GUI Generation Leveraging LLMs for Enhanced User Interaction on Chat Interfaces. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association...

  18. [18]

    Weiyin Hong, James Y. L. Thong, and Kar Yan Tam. 2004. Designing Product Listing Pages on E-Commerce Websites: An Examination of Presentation Mode and Information Format.International Journal of Human-Computer Studies61, 4 (Oct. 2004), 481–503. doi:10.1016/j.ijhcs.2004.01.006

  19. [20]

    Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2022. A Survey on Conversational Recommender Systems.Comput. Surveys54, 5 (June 2022), 1–36. doi:10.1145/3453154

  20. [21]

    Know me, respond to me: Benchmarking llms for dynamic user profiling and personalized responses at scale.arXiv preprint arXiv:2504.14225, 2025

    Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J. Taylor, and Dan Roth. 2025. Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale. (2025). doi:10.48550/ARXIV.2504.14225

  21. [22]

    Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the Web with Natural Language. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3491102.3501931

  22. [23]

    Tae Soo Kim, Yoonjoo Lee, Yoonah Park, Jiho Kim, Young-Ho Kim, and Juho Kim

  23. [24]

    CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions. (2025). doi:10.48550/ARXIV.2508.01674

  24. [25]

    Heejin Kook, Junyoung Kim, Seongmin Park, and Jongwuk Lee. 2025. Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Association for Compu...

  25. [26]

    Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

    Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638

  26. [27]

    Samuel Louvan and Bernardo Magnini. 2020. Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey. InProceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Spain (O...

  27. [28]

    Like Having a Really Bad PA

    Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, San Jose California USA, 5286–5297. doi:10.1145/2858036.2858288

  28. [29]

    Nikola Marangunić and Andrina Granić. 2015. Technology acceptance model: a literature review from 1986 to 2013.Universal access in the information society 14, 1 (2015), 81–95

  29. [30]

    Damien Masson, Sylvain Malacria, Géry Casiez, and Daniel Vogel. 2024. DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–16. doi:10.1145/3613904.3642462

  30. [31]

    Palash Nandy, Sigurdur Orn Adalgeirsson, Anoop K. Sinha, Tanya Kraljic, Mike Cleron, Lei Shi, Angad Singh, Ashish Chaudhary, Ashwin Ganti, Christopher A Melancon, Shudi Zhang, David Robishaw, Horia Ciurdar, Justin Secor, Kenneth Aleksander Robertsen, Kirsten Climer, Madison Le, Mathangi Venkatesan, Peggy Chi, Peixin Li, Peter F McDermott, Rachel Shim, Sel...

  31. [32]

    Nguyen, Anna Sidorova, and Russell Torres

    Quynh N. Nguyen, Anna Sidorova, and Russell Torres. 2022. User Interactions with Chatbot Interfaces vs. Menu-based Interfaces: An Empirical Study. Computers in Human Behavior128 (March 2022), 107093. doi:10.1016/j.chb.2021. 107093

  32. [33]

    ISBN 9798400701320

    Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–22. doi:10.1145/35861...

  33. [34]

    Yingzhe Peng, Xiaoting Qin, Zhiyang Zhang, Jue Zhang, Qingwei Lin, Xu Yang, Dongmei Zhang, Saravan Rajmohan, and Qi Zhang. 2025. Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, N...

  34. [35]

    Yi-Hao Peng, Dingzeyu Li, Jeffrey P Bigham, and Amy Pavel. 2025. Morae: Proactively Pausing UI Agents for User Choices. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3746059. 3747797

  35. [36]

    Tian Qin, Felix Bai, Ting-Yao Hu, Raviteja Vemulapalli, Hema Swetha Koppula, Zhiyang Xu, Bowen Jin, Mert Cemri, Jiarui Lu, Zirui Wang, and Meng Cao

  36. [37]

    COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization. (2025). doi:10.48550/ARXIV.2510.07043

  37. [38]

    Myers, and Alexander Maedche

    Marcel Ruoff, Brad A. Myers, and Alexander Maedche. 2025. MALACHITE— Enabling Users to Teach GUI-Aware Natural Language Interfaces.ACM Trans. Interact. Intell. Syst.15, 2 (April 2025), 7:1–7:29. doi:10.1145/3716141

  38. [40]

    Bernstein

    Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, and Michael S. Bernstein. 2025. Creating General User Models from Computer Use. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 35, 23 pages. doi:10.1145/374...

  39. [41]

    Hyewon Suh, Nina Shahriaree, Eric B Hekler, and Julie A Kientz. 2016. Developing and validating the user burden scale: A tool for assessing user burden in computing systems. InProceedings of the 2016 CHI conference on human factors in computing systems. 3988–3999

  40. [42]

    Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, and Kai Yu

  41. [43]

    Meta-gui: Towards multi-modal conversational agents on mobile gui

    META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. arXiv:2205.11029 [cs] doi:10.48550/arXiv.2205.11029

  42. [44]

    Vinhtuan Thai, Pierre-Yves Rouille, and Siegfried Handschuh. 2012. Visual Abstraction and Ordering in Faceted Browsing of Text Collections.ACM Trans. Intell. Syst. Technol.3, 2 (Feb. 2012), 21:1–21:24. doi:10.1145/2089094.2089097

  43. [45]

    C. A. Thompson, M. H. Goker, and P. Langley. 2004. A Personalized System for Conversational Recommendations.Journal of Artificial Intelligence Research21 (March 2004), 393–428. doi:10.1613/jair.1318

  44. [46]

    Glassman, Jeevana Priya Inala, and Chenglong Wang

    Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, and Chenglong Wang. 2024. DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3613904.3642639

  45. [47]

    Yun Wan, Satya Menon, and Arkalgud Ramaprasad. 2009. The Paradoxical Nature of Electronic Decision Aids on Comparison-Shopping: The Experiments and Analysis.Journal of theoretical and applied electronic commerce research4, 3 (Dec. 2009), 80–96. doi:10.4067/S0718-18762009000300008

  46. [48]

    Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. 2025. Large Action Models: From Inception to Implementation. arXiv:2412.10047 [cs] doi:10.48550/arXiv.2412.10047

  47. [49]

    Weidele, Mauro Martino, Abel N

    Daniel Karl I. Weidele, Mauro Martino, Abel N. Valente, Gaetano Rossiello, Hendrik Strobelt, Loraine Franke, Kathryn Alvero, Shayenna Misko, Robin Auer, Sugato Bagchi, Nandana Mihindukulasooriya, Faisal Chowdhury, Gregory Bramble, Horst Samulowitz, Alfio Gliozzo, and Lisa Amini. 2024. Empirical Evidence on Conversational Control of GUI in Semantic Automat...

  48. [50]

    Henry Weld, Xiaoqi Huang, Siqu Long, Josiah Poon, and Soyeon Caren Han

  49. [51]

    Surv.55, 8 (Dec

    A Survey of Joint Intent Detection and Slot Filling Models in Natural Language Understanding.ACM Comput. Surv.55, 8 (Dec. 2022), 156:1–156:38. doi:10.1145/3547138

  50. [52]

    Ja Eun Yu and Debaleena Chattopadhyay. 2024. Reducing the Search Space on Demand Helps Older Adults Find Mobile UI Features Quickly, on Par with Younger Adults. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–22. doi:10.1145/3613904.3642796

  51. [53]

    Where Is History

    Ja Eun Yu, Natalie Parde, and Debaleena Chattopadhyay. 2023. “Where Is History”: Toward Designing a Voice Assistant to Help Older Adults Locate Interface Features Quickly. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–19. doi:10.1145/3544548.3581447

  52. [54]

    Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. 2025. Large Language Model-Brained GUI Agents: A Survey. arXiv:2411.18279 [cs] doi:10.48550/arXiv.2411.18279

  53. [55]

    Shuning Zhang, Jingruo Chen, Zhiqi Gao, Jiajing Gao, Xin Yi, and Hewu Li. 2025. Characterizing Unintended Consequences in Human-GUI Agent Collaboration for Web Browsing. (2025). doi:10.48550/ARXIV.2505.09875

  54. [56]

    Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, and Kaixiang Lin

  55. [57]

    Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs. (2025). doi:10.48550/ARXIV.2502.09597

  56. [58]

    description

    Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, and Philip S Yu. 2025. A Survey on Large Language Model Based Human-Agent Systems. doi:10.36227/ techrxiv.174612962.26131807/v2 A Appendix A.1 Sample JSON Object in Prefer...