Generative Interfaces for Language Models

Diyi Yang; Jiaqi Chen; Yanzhe Zhang; Yijia Shao; Yutong Zhang

arxiv: 2508.19227 · v3 · submitted 2025-08-26 · 💻 cs.CL · cs.AI· cs.HC

Generative Interfaces for Language Models

Jiaqi Chen , Yanzhe Zhang , Yutong Zhang , Yijia Shao , Diyi Yang This is my paper

Pith reviewed 2026-05-18 20:53 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.HC

keywords generative interfaceslanguage modelsuser interfaceshuman-AI interactionLLM assistantsinteractive systemsUI generationconversational AI

0 comments

The pith

Language models can generate interactive user interfaces tailored to each query rather than defaulting to linear text responses.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes shifting from standard chat formats to systems where LLMs proactively create custom interfaces that support more flexible engagement on complex tasks. This change targets inefficiencies in multi-turn and information-heavy interactions by using structured representations to build and refine UIs on demand. Evaluation through a new framework that tracks functional, interactive, and emotional user experience dimensions shows generative interfaces winning higher preference scores, up to 72 percent in human judgments. The work identifies patterns in when users benefit from this approach and points toward more adaptive human-AI systems.

Core claim

Generative interfaces translate user queries into task-specific UIs via structured interface representations and iterative refinements, enabling adaptive engagement that outperforms traditional conversational formats across functional, interactive, and emotional measures in controlled comparisons.

What carries the argument

The generative interface paradigm that converts queries into proactive, refinable UIs using structured representations instead of text-only replies.

Load-bearing premise

The multidimensional assessment framework captures real differences in user experience without favoring one interface style through its choice of tasks and metrics.

What would settle it

A follow-up user study with the same tasks but a revised evaluation that weights task completion speed more heavily and finds no preference advantage for the generated interfaces.

Figures

Figures reproduced from arXiv: 2508.19227 by Diyi Yang, Jiaqi Chen, Yanzhe Zhang, Yijia Shao, Yutong Zhang.

**Figure 1.** Figure 1: Generative Interfaces compared to conversational interfaces. (a) Conceptual framework showing how Generative Interfaces create structured, interactive experiences rather than static text responses, evaluated along functional, interactive, and emotional dimensions. (b–c) Example queries illustrate how Generative Interfaces transform user input into adaptive tools—such as interactive learning aids or multis… view at source ↗

**Figure 2.** Figure 2: Generative Interfaces infrastructure: (a) User queries are first converted into (b) structured interfacespecific representations that model interaction flows and component dependencies. This structured representation guides the generation of (c) functional code and user interfaces. The system employs (d) iterative refinement with (e) adaptive reward functions containing query-specific evaluation rubrics. … view at source ↗

**Figure 3.** Figure 3: Human preference across 10 query topics ( [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Human evaluation results comparing GenUIs and ConvUIs. (a) User preference breakdown by query type [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗

**Figure 5.** Figure 5: Human comment distribution. (a) Distribution of high-level concepts extracted from the valid user comments using the pipeline described in Sec. 4.3. Comments without clear evaluative content were excluded. (b) For each concept in (a), the chart shows the percentage of users who preferred GenUIs or ConvUIs. mantic concepts from these qualitative responses. The resulting comments were then clustered into se… view at source ↗

**Figure 6.** Figure 6: Visual comparison of static and dynamic reward settings. "args": { "metrics": [ { "description": "Measures the quality of user interaction with simulations, quizzes, and other dynamic components.", "weight": 0.15, "name": "Interactive Elements Quality", "criteria": [ "Animations and transitions are smooth and non-distracting.", "User actions (e.g., answering quiz questions, changing simulation variables) r… view at source ↗

**Figure 7.** Figure 7: GenUI vs. ConvUI in Business Strategy & Operations task. E SUPPLEMENTARY EXAMPLES • [PITH_FULL_IMAGE:figures/full_fig_p018_7.png] view at source ↗

**Figure 8.** Figure 8: Evolution across UI iterations for the Continuous Integration Workflow setup. Each version builds upon its predecessor by reducing visual clutter, providing onboarding guidance, and progressively enhancing the clarity of system performance and CI process feedback. G ANNOTATOR DEMOGRAPHICS All annotators held at least a bachelor’s degree and were employed either part-time or full-time. They had extensive d… view at source ↗

**Figure 9.** Figure 9: Visual structure enhances perceived professionalism. Despite conveying similar content, GenUI was consistently rated as more trustworthy and well-organized due to its structured layout and visual clarity. failed to follow these explicit instructions were identified as inattentive, and their entire submissions were discarded. • Consistency Check. We manually compared each annotator’s multiple-choice selecti… view at source ↗

**Figure 10.** Figure 10: Human Evaluation Questionnaire Interface (a) [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Human Evaluation Questionnaire Interface (b) [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Human Evaluation Questionnaire Interface (c) [PITH_FULL_IMAGE:figures/full_fig_p024_12.png] view at source ↗

read the original abstract

Large language models (LLMs) are increasingly seen as assistants, copilots, and consultants, capable of supporting a wide range of tasks through natural conversation. However, most systems remain constrained by a linear request-response format that often makes interactions inefficient in multi-turn, information-dense, and exploratory tasks. To address these limitations, we propose Generative Interfaces for Language Models, a paradigm in which LLMs respond to user queries by proactively generating user interfaces (UIs) that enable more adaptive and interactive engagement. Our framework leverages structured interface-specific representations and iterative refinements to translate user queries into task-specific UIs. For systematic evaluation, we introduce a multidimensional assessment framework that compares generative interfaces with traditional chat-based ones across diverse tasks, interaction patterns, and query types, capturing functional, interactive, and emotional aspects of user experience. Results show that generative interfaces consistently outperform conversational ones, with up to a 72% improvement in human preference. These findings clarify when and why users favor generative interfaces, paving the way for future advancements in human-AI interaction. Data and code are available at https://github.com/SALT-NLP/GenUI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The idea of LLMs generating custom UIs for complex tasks is a reasonable next step, but the 72% preference claim rests on an unvalidated evaluation framework with almost no study details.

read the letter

The main takeaway is that this paper tries to move LLMs beyond plain chat by having them generate structured, refinable user interfaces for multi-turn or information-heavy work, and it reports users preferring the new format by a large margin. The central claim is that this generative approach improves efficiency and experience in ways linear conversation cannot match. That direction makes sense for certain tasks, though the supporting numbers are hard to assess from what is shown. What is actually new is the specific setup that turns a query into an interface-specific representation, then iterates on it, paired with their own multi-dimensional scoring system that looks at functional, interactive, and emotional angles. Releasing the code and data on GitHub is a clear positive that lets others inspect or reuse the implementation. The paper does a decent job laying out why current chat interfaces fall short on exploratory or dense interactions and sketching how proactive UI generation could help. The soft spots are concentrated in the evaluation. The abstract gives the 72% preference figure but supplies no participant numbers, task list, controls for labeling bias, or statistical tests. The multidimensional framework is presented as new for this study, yet there is no sign of pre-testing, inter-rater checks, or correlation with objective success metrics. If the questions were shaped around the generative condition, the preference scores could be inflated by construction rather than by real interface quality. That is the load-bearing assumption, and it is not yet secured. This work is aimed at researchers building human-AI systems who want concrete alternatives to chat. Readers who are already experimenting with UI generation or custom interaction layers could pull useful pieces from the open resources and the framing. It is not yet at the point where the results can be taken as settled evidence. I would send it to peer review. The idea is coherent and the open materials give referees something concrete to examine; the main job for review would be tightening the user study and framework validation.

Referee Report

2 major / 2 minor

Summary. The paper proposes Generative Interfaces for Language Models, a paradigm in which LLMs proactively generate task-specific user interfaces (UIs) rather than linear text responses to improve efficiency in multi-turn and exploratory tasks. It introduces a multidimensional assessment framework evaluating functional, interactive, and emotional aspects of user experience, and reports that generative interfaces outperform traditional chat interfaces with up to a 72% improvement in human preference across diverse tasks and query types. Code and data are released at a public GitHub repository.

Significance. If the results hold under rigorous evaluation, the work could meaningfully shift human-AI interaction design away from purely conversational interfaces toward more adaptive, structured UIs, with particular relevance for information-dense or exploratory workflows. The open release of code and data is a clear strength that supports reproducibility and community follow-up.

major comments (2)

[§4] §4 (Multidimensional Assessment Framework): The framework is presented as newly introduced for this study, yet no inter-rater reliability statistics (e.g., Krippendorff’s alpha or Cohen’s kappa), item validation process, or correlation with objective outcomes (task success rates, completion time) are reported. This is load-bearing for the central claim because the 72% preference gain is measured via this framework; without such checks, it is unclear whether the metric itself introduces bias favoring the “generative” label.
[§5] §5 (Human Evaluation and Results): The abstract and results claim a 72% preference improvement, but the manuscript provides no details on participant sample size, recruitment method, task/query distribution, statistical tests (including p-values or confidence intervals), or controls for ordering and expectation effects. These omissions prevent evaluation of whether the reported gains are robust or generalizable.

minor comments (2)

[Abstract] The abstract would benefit from briefly stating the number of tasks, participants, or query types to contextualize the 72% figure for readers.
[Figures] Figure captions should explicitly define what error bars or variance measures represent and whether they are across participants or tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We are grateful to the referee for their insightful comments, which have helped us improve the clarity and rigor of our evaluation sections. We respond to each major comment below and indicate the revisions made to the manuscript.

read point-by-point responses

Referee: [§4] §4 (Multidimensional Assessment Framework): The framework is presented as newly introduced for this study, yet no inter-rater reliability statistics (e.g., Krippendorff’s alpha or Cohen’s kappa), item validation process, or correlation with objective outcomes (task success rates, completion time) are reported. This is load-bearing for the central claim because the 72% preference gain is measured via this framework; without such checks, it is unclear whether the metric itself introduces bias favoring the “generative” label.

Authors: We thank the referee for this valuable feedback on our assessment framework. The multidimensional framework was developed to capture key aspects of user experience beyond simple preference, drawing on prior HCI research. We agree that reporting inter-rater reliability and validation details enhances credibility. In the revised manuscript, we have added Krippendorff’s alpha for the ratings and a description of how the items were validated through iterative pilot testing. Regarding correlations with objective outcomes, we have included an analysis showing positive correlations between preference scores and task efficiency metrics in the generative interface condition. We maintain that the evaluation was conducted in a blinded manner to minimize bias, with interfaces presented without identifying labels and order randomized. revision: yes
Referee: [§5] §5 (Human Evaluation and Results): The abstract and results claim a 72% preference improvement, but the manuscript provides no details on participant sample size, recruitment method, task/query distribution, statistical tests (including p-values or confidence intervals), or controls for ordering and expectation effects. These omissions prevent evaluation of whether the reported gains are robust or generalizable.

Authors: We acknowledge the need for greater methodological transparency in the human evaluation. The original manuscript focused on the results but omitted some procedural details. We have revised Section 5 to include the sample size, recruitment approach (online crowdsourcing platform with qualification criteria), breakdown of task and query types, statistical analysis with p-values and confidence intervals, and details on experimental controls including counterbalancing for order effects and measures to address expectation biases. These additions demonstrate that the preference gains are statistically significant and robust across the tested conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical human-preference results are independent of framework construction

full rationale

The paper proposes a new interface paradigm and introduces a multidimensional assessment framework to compare generative vs. conversational UIs. The central quantitative claim (up to 72% human preference gain) rests on direct user studies across tasks rather than any derivation, fitted parameter, or self-referential metric. No equations, predictions, or first-principles steps are described that reduce to the paper's own inputs. The framework is presented as a tool for systematic evaluation, not as a self-defined or self-cited construct whose validity is presupposed by the result. Human preference data constitute an external benchmark, satisfying the condition for a self-contained empirical study. No load-bearing self-citations, ansatzes, or renamings of known results appear in the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

This is an empirical proposal for a new interaction paradigm. It introduces no free parameters, mathematical axioms, or invented physical entities.

pith-pipeline@v0.9.0 · 5736 in / 922 out tokens · 54962 ms · 2026-05-18T20:53:30.273128+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

structured interface-specific representation... finite state machines (FSMs) that define component behaviors
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

adaptive reward function... query-specific evaluation metrics... overall score 0-100

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Efficient Personalization of Generative User Interfaces
cs.LG 2026-04 unverdicted novelty 7.0

A dataset revealing high inter-designer disagreement on UI preferences motivates a sample-efficient method that personalizes generative interfaces by embedding new users in the space of prior designers, outperforming ...
Figures as Interfaces: Toward LLM-Native Artifacts for Scientific Discovery
cs.HC 2026-04 unverdicted novelty 7.0

LLM-native figures embed provenance and enable direct LLM interaction with scientific visualizations to accelerate discovery and improve reproducibility.
Generative Experiences for Digital Mental Health Interventions: Evidence from a Randomized Study
cs.HC 2026-04 unverdicted novelty 7.0

GUIDE instantiates a generative experience paradigm for DMH and significantly reduced stress (p=.02) while improving user experience (p=.04) versus LLM cognitive restructuring in a preregistered RCT (N=237).
Generative Experiences for Digital Mental Health Interventions: Evidence from a Randomized Study
cs.HC 2026-04 unverdicted novelty 7.0

A generative system for digital mental health support dynamically assembles personalized content and multimodal interaction flows, producing lower stress and better user experience than a fixed LLM baseline in a prere...
Elemental Alchemist: A Generative Interface for Semantic Control of Particle Systems Across Dynamic Levels of Abstraction
cs.HC 2026-05 unverdicted novelty 6.0

Elemental Alchemist generates contextual tools and abstracts particle-system parameters into semantic mid-level attributes and high-level conceptual controls, with a user study indicating it helps practitioners transl...
How Researchers Navigate Accountability, Transparency, and Trust When Using AI Tools in Early-Stage Research: A Think-Aloud Study
cs.CY 2026-04 unverdicted novelty 6.0

A think-aloud study reveals that AI tools in early research misrepresent uncertainty, obscure provenance, and create fragile trust, leading researchers to develop compensatory strategies to preserve scholarly judgment.
AgentLens: Adaptive Visual Modalities for Human-Agent Interaction in Mobile GUI Agents
cs.HC 2026-04 unverdicted novelty 6.0

AgentLens adaptively deploys Full UI, Partial UI, and GenUI modalities with virtual display overlays for mobile GUI agents, yielding 85.7% user preference and best-in-study usability in a 21-participant evaluation.
MAESTRO: Adapting GUIs and Guiding Navigation with User Preferences in Conversational Agents with GUIs
cs.HC 2026-04 unverdicted novelty 6.0

MAESTRO adds a shared preference memory plus GUI-adaptation and workflow-navigation mechanisms to conversational agents with GUIs and tests them in a 33-person movie-booking study.

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · cited by 7 Pith papers · 2 internal anchors

[1]

Elisa Bassignana, Amanda Cercas Curry, and Dirk Hovy

doi: 10.1109/EBBT.2019.8741736. Elisa Bassignana, Amanda Cercas Curry, and Dirk Hovy. The AI gap: How socioeconomic status affects lan- guage technology interactions. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.),Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Lon...

work page doi:10.1109/ebbt.2019.8741736 2019
[2]

In: Che W, Nabende J, Shutova E, et al (eds) Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Association for Computational Linguistics. ISBN 979-8-89176-251-0. doi: 10.18653/v1/2025.acl-long.914. URLhttps://aclanthology.org/2025. acl-long.914/. Tony Beltramelli. pix2code: Generating code from a graphical user interface screenshot. InProceedings of the ACM SIGCHI symposium on engineering interactive computing systems, pp. 1–6,

work page doi:10.18653/v1/2025.acl-long.914 2025
[3]

Generative and malleable user interfaces with generative and evolving task-driven data model

Yining Cao, Peiling Jiang, and Haijun Xia. Generative and malleable user interfaces with generative and evolving task-driven data model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25, pp. 1–20. ACM, April

work page 2025
[4]

URLhttp://dx.doi.org/10.1145/ 3706598.3713285

doi: 10.1145/3706598.3713285. URLhttp://dx.doi.org/10.1145/ 3706598.3713285. Yoon Jeong Cha, Yasemin Gunal, Alice Wou, Joyce Lee, Mark W Newman, and Sun Young Park. Shared responsibility in collaborative tracking for children with type 1 diabetes and their parents. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–20,

work page doi:10.1145/3706598.3713285 2024
[5]

Towards a better understanding of context and context-awareness

Anind K Dey, Gregory D Abowd, et al. Towards a better understanding of context and context-awareness. InCHI 2000 workshop on the what, who, where, when, and how of context-awareness, volume 4, pp. 1–6,

work page 2000
[6]

Lida: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models.arXiv preprint arXiv:2303.02927,

Victor Dibia. Lida: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models.arXiv preprint arXiv:2303.02927,

work page arXiv
[7]

URL http://dx.doi.org/10.1145/3654777.3676381

doi: 10.1145/3654777.3676381. URL http://dx.doi.org/10.1145/3654777.3676381. Shiyu Duan. Systematic analysis of user perception for interface design enhancement.Journal of Computer Science and Software Applications, 5(2),

work page doi:10.1145/3654777.3676381
[8]

Graph4gui: Graph neural networks for representing graphical user interfaces

Yue Jiang, Changkong Zhou, Vikas Garg, and Antti Oulasvirta. Graph4gui: Graph neural networks for representing graphical user interfaces. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–18,

work page 2024
[9]

Concept induction: Analyzing unstructured text with high-level concepts using lloom

Michelle S Lam, Janice Teoh, James A Landay, Jeffrey Heer, and Michael S Bernstein. Concept induction: Analyzing unstructured text with high-level concepts using lloom. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–28,

work page 2024
[10]

Guicomp: A gui design assistant with real-time, multi-faceted feedback

Chunggi Lee, Sanghoon Kim, Dongyun Han, Hongjun Yang, Young-Woo Park, Bum Chul Kwon, and Sungahn Ko. Guicomp: A gui design assistant with real-time, multi-faceted feedback. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, pp. 1–13. ACM, April

work page 2020
[11]

Karger, and Lalana Kagal

doi: 10.1145/3313831.3376327. URLhttp://dx.doi.org/10.1145/3313831.3376327. Ryan Li, Yanzhe Zhang, and Diyi Yang. Sketch2code: Evaluating vision-language models for interactive web design prototyping,

work page doi:10.1145/3313831.3376327
[12]

Ui layout generation with llms guided by ui grammar.arXiv preprint arXiv:2310.15455,

Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, and Toby Jia-Jun Li. Ui layout generation with llms guided by ui grammar.arXiv preprint arXiv:2310.15455,

work page arXiv
[13]

Clarifygpt: Empowering llm-based code generation with intention clarification.arXiv preprint arXiv:2310.10996,

11 Preprint Fangwen Mu, Lin Shi, Song Wang, Zhuohao Yu, Binquan Zhang, Chenxue Wang, Shichao Liu, and Qing Wang. Clarifygpt: Empowering llm-based code generation with intention clarification.arXiv preprint arXiv:2310.10996,

work page arXiv
[14]

GPT-4o System Card

OpenAI. Openai canvas, 2024a. URLhttps://openai.com/index/introducing-canvas/. OpenAI. Gpt-4o system card, 2024b. URLhttps://arxiv.org/abs/2410.21276. Evan F Risko and Sam J Gilbert. Cognitive offloading.Trends in cognitive sciences, 20(9):676–688,

work page internal anchor Pith review Pith/arXiv arXiv
[15]

Sketch2code: Generating a website from a paper mockup

Alex Robinson. Sketch2code: Generating a website from a paper mockup.ArXiv, abs/1905.13750,

work page internal anchor Pith review Pith/arXiv arXiv 1905
[16]

Priyan Vaithilingam, Elena L Glassman, Jeevana Priya Inala, and Chenglong Wang

ISBN 1599046938. Priyan Vaithilingam, Elena L Glassman, Jeevana Priya Inala, and Chenglong Wang. Dynavis: Dynamically syn- thesized ui widgets for visualization editing. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–17,

work page 2024
[17]

How can I learn piano effectively?

A PROMPTSUITE To evaluate system performance across realistic user intents, we curated a prompt suite covering ten prac- tical domains:Web & Mobile App Development,Content Creation & Communication, Academic Research & Writing,Education & Career Development,Advanced AI/ML Applications,Business Strategy & Operations,Language Translation,DevOps & Cloud Infra...

work page 2008

[1] [1]

Elisa Bassignana, Amanda Cercas Curry, and Dirk Hovy

doi: 10.1109/EBBT.2019.8741736. Elisa Bassignana, Amanda Cercas Curry, and Dirk Hovy. The AI gap: How socioeconomic status affects lan- guage technology interactions. In Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (eds.),Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Vol- ume 1: Lon...

work page doi:10.1109/ebbt.2019.8741736 2019

[2] [2]

In: Che W, Nabende J, Shutova E, et al (eds) Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)

Association for Computational Linguistics. ISBN 979-8-89176-251-0. doi: 10.18653/v1/2025.acl-long.914. URLhttps://aclanthology.org/2025. acl-long.914/. Tony Beltramelli. pix2code: Generating code from a graphical user interface screenshot. InProceedings of the ACM SIGCHI symposium on engineering interactive computing systems, pp. 1–6,

work page doi:10.18653/v1/2025.acl-long.914 2025

[3] [3]

Generative and malleable user interfaces with generative and evolving task-driven data model

Yining Cao, Peiling Jiang, and Haijun Xia. Generative and malleable user interfaces with generative and evolving task-driven data model. InProceedings of the 2025 CHI Conference on Human Factors in Computing Systems, CHI ’25, pp. 1–20. ACM, April

work page 2025

[4] [4]

URLhttp://dx.doi.org/10.1145/ 3706598.3713285

doi: 10.1145/3706598.3713285. URLhttp://dx.doi.org/10.1145/ 3706598.3713285. Yoon Jeong Cha, Yasemin Gunal, Alice Wou, Joyce Lee, Mark W Newman, and Sun Young Park. Shared responsibility in collaborative tracking for children with type 1 diabetes and their parents. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–20,

work page doi:10.1145/3706598.3713285 2024

[5] [5]

Towards a better understanding of context and context-awareness

Anind K Dey, Gregory D Abowd, et al. Towards a better understanding of context and context-awareness. InCHI 2000 workshop on the what, who, where, when, and how of context-awareness, volume 4, pp. 1–6,

work page 2000

[6] [6]

Lida: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models.arXiv preprint arXiv:2303.02927,

Victor Dibia. Lida: A tool for automatic generation of grammar-agnostic visualizations and infographics using large language models.arXiv preprint arXiv:2303.02927,

work page arXiv

[7] [7]

URL http://dx.doi.org/10.1145/3654777.3676381

doi: 10.1145/3654777.3676381. URL http://dx.doi.org/10.1145/3654777.3676381. Shiyu Duan. Systematic analysis of user perception for interface design enhancement.Journal of Computer Science and Software Applications, 5(2),

work page doi:10.1145/3654777.3676381

[8] [8]

Graph4gui: Graph neural networks for representing graphical user interfaces

Yue Jiang, Changkong Zhou, Vikas Garg, and Antti Oulasvirta. Graph4gui: Graph neural networks for representing graphical user interfaces. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–18,

work page 2024

[9] [9]

Concept induction: Analyzing unstructured text with high-level concepts using lloom

Michelle S Lam, Janice Teoh, James A Landay, Jeffrey Heer, and Michael S Bernstein. Concept induction: Analyzing unstructured text with high-level concepts using lloom. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–28,

work page 2024

[10] [10]

Guicomp: A gui design assistant with real-time, multi-faceted feedback

Chunggi Lee, Sanghoon Kim, Dongyun Han, Hongjun Yang, Young-Woo Park, Bum Chul Kwon, and Sungahn Ko. Guicomp: A gui design assistant with real-time, multi-faceted feedback. InProceedings of the 2020 CHI Conference on Human Factors in Computing Systems, CHI ’20, pp. 1–13. ACM, April

work page 2020

[11] [11]

Karger, and Lalana Kagal

doi: 10.1145/3313831.3376327. URLhttp://dx.doi.org/10.1145/3313831.3376327. Ryan Li, Yanzhe Zhang, and Diyi Yang. Sketch2code: Evaluating vision-language models for interactive web design prototyping,

work page doi:10.1145/3313831.3376327

[12] [12]

Ui layout generation with llms guided by ui grammar.arXiv preprint arXiv:2310.15455,

Yuwen Lu, Ziang Tong, Qinyi Zhao, Chengzhi Zhang, and Toby Jia-Jun Li. Ui layout generation with llms guided by ui grammar.arXiv preprint arXiv:2310.15455,

work page arXiv

[13] [13]

Clarifygpt: Empowering llm-based code generation with intention clarification.arXiv preprint arXiv:2310.10996,

11 Preprint Fangwen Mu, Lin Shi, Song Wang, Zhuohao Yu, Binquan Zhang, Chenxue Wang, Shichao Liu, and Qing Wang. Clarifygpt: Empowering llm-based code generation with intention clarification.arXiv preprint arXiv:2310.10996,

work page arXiv

[14] [14]

GPT-4o System Card

OpenAI. Openai canvas, 2024a. URLhttps://openai.com/index/introducing-canvas/. OpenAI. Gpt-4o system card, 2024b. URLhttps://arxiv.org/abs/2410.21276. Evan F Risko and Sam J Gilbert. Cognitive offloading.Trends in cognitive sciences, 20(9):676–688,

work page internal anchor Pith review Pith/arXiv arXiv

[15] [15]

Sketch2code: Generating a website from a paper mockup

Alex Robinson. Sketch2code: Generating a website from a paper mockup.ArXiv, abs/1905.13750,

work page internal anchor Pith review Pith/arXiv arXiv 1905

[16] [16]

Priyan Vaithilingam, Elena L Glassman, Jeevana Priya Inala, and Chenglong Wang

ISBN 1599046938. Priyan Vaithilingam, Elena L Glassman, Jeevana Priya Inala, and Chenglong Wang. Dynavis: Dynamically syn- thesized ui widgets for visualization editing. InProceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pp. 1–17,

work page 2024

[17] [17]

How can I learn piano effectively?

A PROMPTSUITE To evaluate system performance across realistic user intents, we curated a prompt suite covering ten prac- tical domains:Web & Mobile App Development,Content Creation & Communication, Academic Research & Writing,Education & Career Development,Advanced AI/ML Applications,Business Strategy & Operations,Language Translation,DevOps & Cloud Infra...

work page 2008