arxiv: 2604.06134 · v1 · submitted 2026-04-07 · 💻 cs.HC

Recognition: no theorem link

MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs

Sangwook Lee , Sang Won Lee , Adnan Abbas , Young-Ho Kim , Yan Chen

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:57 UTC · model grok-4.3

classification 💻 cs.HC

keywords conversational agentsGUI adaptationuser preferencespreference memoryworkflow navigationdecision supportmulti-step taskshuman-computer interaction

0 comments

The pith

MAESTRO extracts user preferences from dialogue to adapt GUIs and guide navigation in conversational agents.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces MAESTRO to extend conversational agents with GUIs beyond linear execution of commands. It maintains a shared memory that pulls preferences out of natural-language statements along with their strength. These stored preferences then trigger in-place changes to the visible interface and suggestions for moving backward in the workflow when choices conflict. The goal is to support multi-step tasks where early decisions limit later options, such as booking a flight or movie, without forcing users to start over. Evaluation came from a study of 33 people using a movie-booking agent in both text and voice modes.

Core claim

MAESTRO extends the agent's role from execution to decision support. It maintains a shared preference memory that extracts preferences from natural-language utterances with their strength, and provides two mechanisms: Preference-Grounded GUI Adaptation applies in-place operators (augment, sort, filter, and highlight) to the existing GUI according to preference strength, supporting within-stage comparison; Preference-Guided Workflow Navigation detects conflicts between preferences and available options, proposes backtracking, and records failed paths to avoid revisiting dead ends.

What carries the argument

Shared preference memory that extracts preferences with strength from utterances, enabling Preference-Grounded GUI Adaptation through in-place operators and Preference-Guided Workflow Navigation through conflict detection and backtracking proposals.

If this is right

GUI elements can be augmented, sorted, filtered, or highlighted in place according to the strength of extracted preferences.
Conflicts between stored preferences and current options trigger automatic proposals to backtrack in the workflow.
Failed paths are recorded so the agent can steer users away from the same dead ends in later steps.
The same preference memory supports both text and voice input modes without changing the core adaptation logic.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same memory structure could be reused across separate sessions so preferences accumulate rather than reset each time a new booking starts.
Navigation guidance might remain useful even when visual GUI adaptation is unavailable, such as in voice-only settings.
Recording dead ends creates a lightweight form of negative feedback that other agents could consult when ranking future suggestions.

Load-bearing premise

Preferences can be reliably extracted from natural-language utterances together with their strength and that applying in-place GUI operators and backtracking suggestions based on them will produce measurable improvements in user experience for multi-step preference-driven tasks.

What would settle it

A within-subjects study of the movie-booking agent that finds no significant difference in task completion time, number of restarts, or user satisfaction scores between MAESTRO and the baseline condition would show the mechanisms do not deliver the claimed benefits.

Figures

Figures reproduced from arXiv: 2604.06134 by Adnan Abbas, Sang Won Lee, Sangwook Lee, Yan Chen, Young-Ho Kim.

**Figure 1.** Figure 1: Overview of MAESTRO at the theater selection stage. Initial GUI: The default GUI presents three theater options with [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: MAESTRO layout at the theater selection stage. [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: Four-step adaptation policy illustrated on a movie-selection example. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Baseline condition interface at the theater selection [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Interaction plots for five dependent variables used in the evaluation: Violation Count (a), Unpreferred Selection [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: A screenshot of the user study interface showing MAESTRO in text mode at the seat-selection stage. The left panel [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

Modern task-oriented chatbots present GUI elements alongside natural-language dialogue, yet the agent's role has largely been limited to interpreting natural-language input as GUI actions and following a linear workflow. In preference-driven, multi-step tasks such as booking a flight or reserving a restaurant, earlier choices constrain later options and may force users to restart from scratch. User preferences serve as the key criteria for these decisions, yet existing agents do not systematically leverage them. We present MAESTRO, which extends the agent's role from execution to decision support. MAESTRO maintains a shared preference memory that extracts preferences from natural-language utterances with their strength, and provides two mechanisms. Preference-Grounded GUI Adaptation applies in-place operators (augment, sort, filter, and highlight) to the existing GUI according to preference strength, supporting within-stage comparison. Preference-Guided Workflow Navigation detects conflicts between preferences and available options, proposes backtracking, and records failed paths to avoid revisiting dead ends. We evaluated MAESTRO in a movie-booking Conversational Agent with GUI (CAG) through a within-subjects study with two conditions (Baseline vs. MAESTRO) and two modes (Text vs. Voice), with N = 33 participants.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MAESTRO adds preference memory plus GUI adaptation and workflow guidance to conversational agents, but the abstract shows no study results so the practical gains stay unproven.

read the letter

The core of this paper is a system that tracks user preferences pulled from chat messages, including their strength, then uses them to tweak the on-screen GUI and steer the overall flow. It introduces two concrete pieces: in-place operators that augment, sort, filter, or highlight options based on those preferences, and navigation logic that flags conflicts, offers backtracking, and logs dead ends to avoid repeats. That combination looks new compared to the linear agents mentioned in the abstract, and it directly targets the restart problem in booking-style tasks. The description of how preferences feed both adaptation and navigation is clear enough to give implementers something to build on. The within-subjects setup with 33 participants and text-versus-voice modes is a reasonable way to test this kind of interface. The paper does a straightforward job of naming the mechanisms and tying them to a real user pain point. The main gap is that the abstract supplies zero outcome numbers, error rates, or statistical comparisons, so there is no way to judge whether the preference extraction holds up or whether the changes actually reduce restarts or improve satisfaction. The stress-test point about extraction accuracy is fair; if that step is unreliable, the rest of the system has nothing solid to work with, and the abstract does not report any separate validation for it. This is the kind of work that HCI researchers building conversational agents with GUIs would find useful for design ideas, even if they end up needing to add their own evaluation. It deserves a serious referee because the mechanisms are defined independently and the study design is in place, but the authors will need to show the data before the claims can be assessed.

Referee Report

2 major / 1 minor

Summary. The paper presents MAESTRO, a system extending conversational agents with GUIs (CAGs) beyond linear execution. It maintains a shared preference memory that extracts preferences and their strengths from natural-language utterances, then applies two mechanisms: Preference-Grounded GUI Adaptation (in-place operators: augment, sort, filter, highlight) for within-stage comparison and Preference-Guided Workflow Navigation (conflict detection, backtracking proposals, failed-path recording) to avoid dead ends. The approach is evaluated in a movie-booking CAG via a within-subjects study (Baseline vs. MAESTRO; Text vs. Voice) with N=33 participants.

Significance. If the missing evaluation results demonstrate reliable preference extraction and measurable UX gains in multi-step preference-driven tasks, the work would advance CAG design by shifting agents from passive interpreters to active decision-support systems that systematically leverage user preferences across GUI adaptation and workflow guidance. The explicit separation of preference memory from execution and the concrete operators provide a clear, extensible framework.

major comments (2)

[Abstract / Evaluation] Abstract and evaluation description: the within-subjects study with N=33 is described (two conditions, two modes) but supplies no quantitative results, error bars, statistical tests, exclusion criteria, or qualitative findings, so the central claim that the mechanisms support decision-making cannot be assessed.
[System Description (preference memory)] Shared preference memory and extraction: the entire system (both GUI adaptation and navigation) rests on an unspecified extractor that pulls preferences plus scalar strength from free-form utterances, yet no accuracy metrics, validation set, or error analysis of extraction is reported; low precision would render downstream operators and conflict detection unreliable.

minor comments (1)

[Abstract] The abstract ends abruptly after stating N=33 without any summary of outcomes or conclusions.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their insightful comments on our manuscript. We address each of the major comments in detail below and indicate the revisions we will make to strengthen the paper.

read point-by-point responses

Referee: [Abstract / Evaluation] Abstract and evaluation description: the within-subjects study with N=33 is described (two conditions, two modes) but supplies no quantitative results, error bars, statistical tests, exclusion criteria, or qualitative findings, so the central claim that the mechanisms support decision-making cannot be assessed.

Authors: We agree that the abstract and the evaluation description as currently written do not include the quantitative results or supporting statistical information. Since we conducted the within-subjects study with 33 participants, we have the data on task completion, preference usage, user satisfaction, and other metrics along with statistical tests and qualitative insights. In the revised manuscript, we will update the abstract to include key quantitative findings and expand the evaluation section to report error bars, statistical tests, exclusion criteria, and qualitative findings. This will allow readers to fully assess the support for our claims regarding decision-making. revision: yes
Referee: [System Description (preference memory)] Shared preference memory and extraction: the entire system (both GUI adaptation and navigation) rests on an unspecified extractor that pulls preferences plus scalar strength from free-form utterances, yet no accuracy metrics, validation set, or error analysis of extraction is reported; low precision would render downstream operators and conflict detection unreliable.

Authors: We agree that the preference extraction component needs to be specified more clearly. We will revise the system description section to detail the method used for extracting preferences and their strengths from natural language utterances. Regarding accuracy metrics, validation set, and error analysis, these were not included in the original evaluation, which focused on the overall system performance in the user study rather than isolated component testing. We will add an analysis of extraction performance based on the collected study data where feasible, but a comprehensive validation would require additional experiments not performed in this work. revision: partial

standing simulated objections not resolved

Detailed accuracy metrics and validation set for the preference extractor, since a dedicated evaluation of the extraction accuracy was not conducted as part of the study.

Circularity Check

0 steps flagged

No circularity: empirical system description without derivation or fitted parameters

full rationale

The paper presents MAESTRO as a system that maintains a shared preference memory and applies defined GUI operators (augment, sort, filter, highlight) plus workflow navigation features (conflict detection, backtracking). These mechanisms are introduced directly as design choices in the abstract and full text, with evaluation via a within-subjects user study (N=33). No equations, mathematical derivations, parameter fitting, or predictions that reduce to inputs appear. No self-citation load-bearing uniqueness theorems or ansatzes are invoked. The work is self-contained as an empirical HCI system paper; claims rest on implementation and study results rather than any tautological reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim rests on the domain assumption that natural-language utterances contain extractable, strength-annotated preferences that can be used to drive GUI changes and navigation decisions without introducing new entities or fitted parameters.

axioms (1)

domain assumption User preferences can be extracted from natural-language utterances together with a numeric strength value
Invoked in the description of the shared preference memory and the two adaptation mechanisms

invented entities (1)

Shared preference memory no independent evidence
purpose: Store extracted preferences with strength for use by GUI adaptation and workflow navigation
New component introduced by the system; no independent evidence outside the paper is provided

pith-pipeline@v0.9.0 · 5532 in / 1479 out tokens · 33582 ms · 2026-05-10T19:57:19.221738+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

56 extracted references · 42 canonical work pages · 2 internal anchors

[1]

Rifat Mehreen Amin, Oliver Hans Kühle, Daniel Buschek, and Andreas Butz. 2025. PromptCanvas: Composable Prompting Workspaces Using Dynamic Widgets for Exploration and Iteration in Creative Writing. (2025). doi:10.48550/ARXIV.2506. 03741

work page doi:10.48550/arxiv.2506 2025
[2]

Sanghwan Bae, Donghyun Kwak, Soyoung Kang, Min Young Lee, Sungdong Kim, Yuin Jeong, Hyeri Kim, Sang-Woo Lee, Woomyoung Park, and Nako Sung. 2022. Keep me updated! memory management in long-term conversations. InFindings of the Association for Computational Linguistics: EMNLP 2022. 3769–3787

2022
[3]

Bank of America. 2024. Erica: Your Virtual Financial Assistant from Bank of America. https://info.bankofamerica.com/en/digital-banking/erica

2024
[4]

Booking.com. 2023. Booking.com Launches New AI Trip Planner to Enhance Travel Planning Experience. https://news.booking.com/bookingcom-launches- new-ai-trip-planner-to-enhance-travel-planning-experience/

2023
[5]

Bursztyn, Jennifer Healey, Eunyee Koh, Nedim Lipka, and Larry Birnbaum

Victor S. Bursztyn, Jennifer Healey, Eunyee Koh, Nedim Lipka, and Larry Birnbaum. 2021. Developing a Conversational Recommendation Systemfor Navigating Limited Options.Extended Abstracts of the 2021 CHI Conference on Human Factors in Computing Systems(May 2021), 1–6. doi:10.1145/3411763. 3451596

work page doi:10.1145/3411763 2021
[6]

Yining Cao, Peiling Jiang, and Haijun Xia. 2025. Generative and Malleable User Interfaces with Generative and Evolving Task-Driven Data Model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems (CHI ’25). Association for Computing Machinery, New York, NY, USA, 1–20. doi:10.1145/ 3706598.3713285

work page arXiv 2025
[7]

Jiaqi Chen, Yanzhe Zhang, Yutong Zhang, Yijia Shao, and Diyi Yang. 2025. Generative Interfaces for Language Models. arXiv:2508.19227 [cs] doi:10.48550/ arXiv.2508.19227

work page internal anchor Pith review Pith/arXiv arXiv 2025
[8]

Weihao Chen, Xiaoyu Liu, Jiacheng Zhang, Ian Iong Lam, Zhicheng Huang, Rui Dong, Xinyu Wang, and Tianyi Zhang. 2023. MIWA: Mixed-Initiative Web Automation for Better User Control and Confidence. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–15....

work page arXiv 2023
[9]

Trishul Chilimbi, Alexandre Alves, Anita Vila, AI Conversational, and Burak Gozluklu. 2024. The technology behind Amazon’s GenAI-powered shopping assis- tant, Rufus.Amazon Science (Oct. 2024). url: https://www. amazon. science/blog/the- technology-behind-amazons-genai-powered-shoppingassistant-rufus(2024)

2024
[10]

Konstantina Christakopoulou, Filip Radlinski, and Katja Hofmann. 2016. Towards Conversational Recommender Systems. InProceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, San Francisco California USA, 815–824. doi:10.1145/2939672.2939746

work page doi:10.1145/2939672.2939746 2016
[11]

Vera Demberg, Andi Winterboer, and Johanna D. Moore. 2011. A Strategy for Information Presentation in Spoken Dialog Systems.Computational Linguistics 37, 3 (Sept. 2011), 489–539. doi:10.1162/COLI_a_00064

work page doi:10.1162/coli_a_00064 2011
[12]

Dawn Dutton, Selina Chu, James Hubbell, Marilyn Walker, and Shrikanth Narayanan. 2001. Amount of Information Presented in a Complex List: Effects on User Performance.Proceedings of the first international conference on Human language technology research - HLT ’01(2001), 1–6. doi:10.3115/1072133.1072137

work page doi:10.3115/1072133.1072137 2001
[13]

Layla El Asri, Hannes Schulz, Shikhar Sharma, Jeremie Zumer, Justin Harris, Emery Fine, Rahul Mehrotra, and Kaheer Suleman. 2017. Frames: A Corpus for Adding Memory to Goal-Oriented Dialogue Systems. InProceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Kristiina Jokinen, Manfred Stede, David DeVault, and Annie Louis (Eds.). Associat...

work page doi:10.18653/v1/w17-5526 2017
[14]

Yue Feng, Shuchang Liu, Zhenghai Xue, Qingpeng Cai, Lantao Hu, Peng Jiang, Kun Gai, and Fei Sun. 2023. A Large Language Model Enhanced Conversational Recommender System.ArXiv(Aug. 2023)

2023
[15]

Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexiang Chen, and Manoj Tiwari

L. Friedman, Sameer Ahuja, David Allen, Zhenning Tan, Hakim Sidahmed, Changbo Long, Jun Xie, Gabriel Schubiner, Ajay Patel, Harsh Lara, Brian Chu, Zexiang Chen, and Manoj Tiwari. 2023. Leveraging Large Language Models in Conversational Recommender Systems.ArXiv(May 2023)

2023
[16]

Yunfan Gao, Tao Sheng, Youlin Xiang, Yun Xiong, Haofen Wang, and Jiawei Zhang. 2023. Chat-REC: Towards Interactive and Explainable LLMs-Augmented Recommender System.ArXiv(March 2023)

2023
[17]

Nobukatsu Hojo, Kazutoshi Shinoda, Yoshihiro Yamazaki, Keita Suzuki, Hiroaki Sugiyama, Kyosuke Nishida, and Kuniko Saito. 2025. GenerativeGUI: Dynamic GUI Generation Leveraging LLMs for Enhanced User Interaction on Chat Interfaces. InProceedings of the Extended Abstracts of the CHI Conference on Human Factors in Computing Systems (CHI EA ’25). Association...

work page doi:10.1145/3706599.3719743 2025
[18]

Weiyin Hong, James Y. L. Thong, and Kar Yan Tam. 2004. Designing Product Listing Pages on E-Commerce Websites: An Examination of Presentation Mode and Information Format.International Journal of Human-Computer Studies61, 4 (Oct. 2004), 481–503. doi:10.1016/j.ijhcs.2004.01.006

work page doi:10.1016/j.ijhcs.2004.01.006 2004
[20]

Dietmar Jannach, Ahtsham Manzoor, Wanling Cai, and Li Chen. 2022. A Survey on Conversational Recommender Systems.Comput. Surveys54, 5 (June 2022), 1–36. doi:10.1145/3453154

work page doi:10.1145/3453154 2022
[21]

Know me, respond to me: Benchmarking llms for dynamic user profiling and personalized responses at scale.arXiv preprint arXiv:2504.14225, 2025

Bowen Jiang, Zhuoqun Hao, Young-Min Cho, Bryan Li, Yuan Yuan, Sihao Chen, Lyle Ungar, Camillo J. Taylor, and Dan Roth. 2025. Know Me, Respond to Me: Benchmarking LLMs for Dynamic User Profiling and Personalized Responses at Scale. (2025). doi:10.48550/ARXIV.2504.14225

work page doi:10.48550/arxiv.2504.14225 2025
[22]

Tae Soo Kim, DaEun Choi, Yoonseo Choi, and Juho Kim. 2022. Stylette: Styling the Web with Natural Language. InProceedings of the 2022 CHI Conference on Human Factors in Computing Systems (CHI ’22). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3491102.3501931

work page doi:10.1145/3491102.3501931 2022
[23]

Tae Soo Kim, Yoonjoo Lee, Yoonah Park, Jiho Kim, Young-Ho Kim, and Juho Kim
[24]

CUPID: Evaluating Personalized and Contextualized Alignment of LLMs from Interactions. (2025). doi:10.48550/ARXIV.2508.01674

work page doi:10.48550/arxiv.2508.01674 2025
[25]

Heejin Kook, Junyoung Kim, Seongmin Park, and Jongwuk Lee. 2025. Empowering Retrieval-based Conversational Recommendation with Contrasting User Preferences. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). Association for Compu...

work page doi:10.18653/v1/2025.naacl- 2025
[26]

Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang

Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, and Percy Liang. 2024. Lost in the Middle: How Language Models Use Long Contexts.Transactions of the Association for Computational Linguistics 12 (2024), 157–173. doi:10.1162/tacl_a_00638

work page doi:10.1162/tacl_a_00638 2024
[27]

Samuel Louvan and Bernardo Magnini. 2020. Recent Neural Methods on Slot Filling and Intent Classification for Task-Oriented Dialogue Systems: A Survey. InProceedings of the 28th International Conference on Computational Linguistics, Donia Scott, Nuria Bel, and Chengqing Zong (Eds.). International Committee on Computational Linguistics, Barcelona, Spain (O...

work page doi:10.18653/v1/ 2020
[28]

Like Having a Really Bad PA

Ewa Luger and Abigail Sellen. 2016. "Like Having a Really Bad PA": The Gulf between User Expectation and Experience of Conversational Agents. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, San Jose California USA, 5286–5297. doi:10.1145/2858036.2858288

work page doi:10.1145/2858036.2858288 2016
[29]

Nikola Marangunić and Andrina Granić. 2015. Technology acceptance model: a literature review from 1986 to 2013.Universal access in the information society 14, 1 (2015), 81–95

2015
[30]

Damien Masson, Sylvain Malacria, Géry Casiez, and Daniel Vogel. 2024. DirectGPT: A Direct Manipulation Interface to Interact with Large Language Models. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–16. doi:10.1145/3613904.3642462

work page doi:10.1145/3613904.3642462 2024
[31]

Palash Nandy, Sigurdur Orn Adalgeirsson, Anoop K. Sinha, Tanya Kraljic, Mike Cleron, Lei Shi, Angad Singh, Ashish Chaudhary, Ashwin Ganti, Christopher A Melancon, Shudi Zhang, David Robishaw, Horia Ciurdar, Justin Secor, Kenneth Aleksander Robertsen, Kirsten Climer, Madison Le, Mathangi Venkatesan, Peggy Chi, Peixin Li, Peter F McDermott, Rachel Shim, Sel...

work page doi:10.1145/3686215.3688372 2024
[32]

Nguyen, Anna Sidorova, and Russell Torres

Quynh N. Nguyen, Anna Sidorova, and Russell Torres. 2022. User Interactions with Chatbot Interfaces vs. Menu-based Interfaces: An Empirical Study. Computers in Human Behavior128 (March 2022), 107093. doi:10.1016/j.chb.2021. 107093

work page doi:10.1016/j.chb.2021 2022
[33]

ISBN 9798400701320

Joon Sung Park, Joseph O’Brien, Carrie Jun Cai, Meredith Ringel Morris, Percy Liang, and Michael S. Bernstein. 2023. Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–22. doi:10.1145/35861...

work page doi:10.1145/3586183.3606763 2023
[34]

Yingzhe Peng, Xiaoting Qin, Zhiyang Zhang, Jue Zhang, Qingwei Lin, Xu Yang, Dongmei Zhang, Saravan Rajmohan, and Qi Zhang. 2025. Navigating the Unknown: A Chat-Based Collaborative Interface for Personalized Exploratory Tasks. InProceedings of the 30th International Conference on Intelligent User Interfaces (IUI ’25). Association for Computing Machinery, N...

work page doi:10.1145/3708359.3712093 2025
[35]

Yi-Hao Peng, Dingzeyu Li, Jeffrey P Bigham, and Amy Pavel. 2025. Morae: Proactively Pausing UI Agents for User Choices. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, 1–14. doi:10.1145/3746059. 3747797

work page doi:10.1145/3746059 2025
[36]

Tian Qin, Felix Bai, Ting-Yao Hu, Raviteja Vemulapalli, Hema Swetha Koppula, Zhiyang Xu, Bowen Jin, Mert Cemri, Jiarui Lu, Zirui Wang, and Meng Cao
[37]

COMPASS: A Multi-Turn Benchmark for Tool-Mediated Planning & Preference Optimization. (2025). doi:10.48550/ARXIV.2510.07043

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2510.07043 2025
[38]

Myers, and Alexander Maedche

Marcel Ruoff, Brad A. Myers, and Alexander Maedche. 2025. MALACHITE— Enabling Users to Teach GUI-Aware Natural Language Interfaces.ACM Trans. Interact. Intell. Syst.15, 2 (April 2025), 7:1–7:29. doi:10.1145/3716141

work page doi:10.1145/3716141 2025
[40]

Bernstein

Omar Shaikh, Shardul Sapkota, Shan Rizvi, Eric Horvitz, Joon Sung Park, Diyi Yang, and Michael S. Bernstein. 2025. Creating General User Models from Computer Use. InProceedings of the 38th Annual ACM Symposium on User Interface Software and Technology (UIST ’25). Association for Computing Machinery, New York, NY, USA, Article 35, 23 pages. doi:10.1145/374...

work page doi:10.1145/3746059.3747722 2025
[41]

Hyewon Suh, Nina Shahriaree, Eric B Hekler, and Julie A Kientz. 2016. Developing and validating the user burden scale: A tool for assessing user burden in computing systems. InProceedings of the 2016 CHI conference on human factors in computing systems. 3988–3999

2016
[42]

Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, and Kai Yu
[43]

Meta-gui: Towards multi-modal conversational agents on mobile gui

META-GUI: Towards Multi-modal Conversational Agents on Mobile GUI. arXiv:2205.11029 [cs] doi:10.48550/arXiv.2205.11029

work page doi:10.48550/arxiv.2205.11029
[44]

Vinhtuan Thai, Pierre-Yves Rouille, and Siegfried Handschuh. 2012. Visual Abstraction and Ordering in Faceted Browsing of Text Collections.ACM Trans. Intell. Syst. Technol.3, 2 (Feb. 2012), 21:1–21:24. doi:10.1145/2089094.2089097

work page doi:10.1145/2089094.2089097 2012
[45]

C. A. Thompson, M. H. Goker, and P. Langley. 2004. A Personalized System for Conversational Recommendations.Journal of Artificial Intelligence Research21 (March 2004), 393–428. doi:10.1613/jair.1318

work page doi:10.1613/jair.1318 2004
[46]

Glassman, Jeevana Priya Inala, and Chenglong Wang

Priyan Vaithilingam, Elena L. Glassman, Jeevana Priya Inala, and Chenglong Wang. 2024. DynaVis: Dynamically Synthesized UI Widgets for Visualization Editing. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–17. doi:10.1145/3613904.3642639

work page doi:10.1145/3613904.3642639 2024
[47]

Yun Wan, Satya Menon, and Arkalgud Ramaprasad. 2009. The Paradoxical Nature of Electronic Decision Aids on Comparison-Shopping: The Experiments and Analysis.Journal of theoretical and applied electronic commerce research4, 3 (Dec. 2009), 80–96. doi:10.4067/S0718-18762009000300008

work page doi:10.4067/s0718-18762009000300008 2009
[48]

Lu Wang, Fangkai Yang, Chaoyun Zhang, Junting Lu, Jiaxu Qian, Shilin He, Pu Zhao, Bo Qiao, Ray Huang, Si Qin, Qisheng Su, Jiayi Ye, Yudi Zhang, Jian-Guang Lou, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. 2025. Large Action Models: From Inception to Implementation. arXiv:2412.10047 [cs] doi:10.48550/arXiv.2412.10047

work page doi:10.48550/arxiv.2412.10047 2025
[49]

Weidele, Mauro Martino, Abel N

Daniel Karl I. Weidele, Mauro Martino, Abel N. Valente, Gaetano Rossiello, Hendrik Strobelt, Loraine Franke, Kathryn Alvero, Shayenna Misko, Robin Auer, Sugato Bagchi, Nandana Mihindukulasooriya, Faisal Chowdhury, Gregory Bramble, Horst Samulowitz, Alfio Gliozzo, and Lisa Amini. 2024. Empirical Evidence on Conversational Control of GUI in Semantic Automat...

work page doi:10.1145/3640543.3645172 2024
[50]

Henry Weld, Xiaoqi Huang, Siqu Long, Josiah Poon, and Soyeon Caren Han
[51]

Surv.55, 8 (Dec

A Survey of Joint Intent Detection and Slot Filling Models in Natural Language Understanding.ACM Comput. Surv.55, 8 (Dec. 2022), 156:1–156:38. doi:10.1145/3547138

work page doi:10.1145/3547138 2022
[52]

Ja Eun Yu and Debaleena Chattopadhyay. 2024. Reducing the Search Space on Demand Helps Older Adults Find Mobile UI Features Quickly, on Par with Younger Adults. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems (CHI ’24). Association for Computing Machinery, New York, NY, USA, 1–22. doi:10.1145/3613904.3642796

work page doi:10.1145/3613904.3642796 2024
[53]

Where Is History

Ja Eun Yu, Natalie Parde, and Debaleena Chattopadhyay. 2023. “Where Is History”: Toward Designing a Voice Assistant to Help Older Adults Locate Interface Features Quickly. InProceedings of the 2023 CHI Conference on Human Factors in Computing Systems (CHI ’23). Association for Computing Machinery, New York, NY, USA, 1–19. doi:10.1145/3544548.3581447

work page doi:10.1145/3544548.3581447 2023
[54]

Chaoyun Zhang, Shilin He, Jiaxu Qian, Bowen Li, Liqun Li, Si Qin, Yu Kang, Minghua Ma, Guyue Liu, Qingwei Lin, Saravan Rajmohan, Dongmei Zhang, and Qi Zhang. 2025. Large Language Model-Brained GUI Agents: A Survey. arXiv:2411.18279 [cs] doi:10.48550/arXiv.2411.18279

work page doi:10.48550/arxiv.2411.18279 2025
[55]

Shuning Zhang, Jingruo Chen, Zhiqi Gao, Jiajing Gao, Xin Yi, and Hewu Li. 2025. Characterizing Unintended Consequences in Human-GUI Agent Collaboration for Web Browsing. (2025). doi:10.48550/ARXIV.2505.09875

work page doi:10.48550/arxiv.2505.09875 2025
[56]

Siyan Zhao, Mingyi Hong, Yang Liu, Devamanyu Hazarika, and Kaixiang Lin
[57]

Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs. (2025). doi:10.48550/ARXIV.2502.09597

work page doi:10.48550/arxiv.2502.09597 2025
[58]

description

Henry Peng Zou, Wei-Chieh Huang, Yaozu Wu, Yankai Chen, Chunyu Miao, Hoang Nguyen, Yue Zhou, Weizhi Zhang, Liancheng Fang, Langzhou He, Yangning Li, Yuwei Cao, Dongyuan Li, Renhe Jiang, and Philip S Yu. 2025. A Survey on Large Language Model Based Human-Agent Systems. doi:10.36227/ techrxiv.174612962.26131807/v2 A Appendix A.1 Sample JSON Object in Prefer...

work page arXiv 2025