Generative UI: LLMs are Effective UI Generators

Dani Valevski; Danny Lumen; Eyal Molad; Eyal Segalis; James Manyika; Matan Kalman; Shlomi Pasternak; Srinivasan (Cheenu) Venkatachary; Valerie Nygaard; Vishnu Natchu

arxiv: 2604.09577 · v1 · submitted 2026-02-24 · 💻 cs.HC · cs.AI· cs.CL· cs.LG

Generative UI: LLMs are Effective UI Generators

Yaniv Leviathan , Dani Valevski , Matan Kalman , Danny Lumen , Eyal Segalis , Eyal Molad , Shlomi Pasternak , Vishnu Natchu

show 4 more authors

Valerie Nygaard Srinivasan (Cheenu) Venkatachary James Manyika Yossi Matias

This is my paper

Pith reviewed 2026-05-15 19:38 UTC · model grok-4.3

classification 💻 cs.HC cs.AIcs.CLcs.LG

keywords generative uilarge language modelsuser interface generationllm promptinghuman preference evaluationemergent capabilitiesui design

0 comments

The pith

Modern LLMs, when properly prompted and equipped with the right tools, can robustly produce high-quality custom UIs for virtually any prompt.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that large language models can move beyond generating static markdown text to creating tailored, interactive user interfaces on demand. With suitable prompting and access to UI generation tools, the models produce results that human evaluators strongly prefer over conventional LLM output. These generated interfaces fall short of expert human designs but match or exceed them in roughly half of tested cases. The capability shows clear signs of emergence, appearing robustly only in recent model versions. The authors also release the PAGEN dataset of expert-crafted UIs to support further evaluation and comparison.

Core claim

When properly prompted and equipped with the right set of tools, a modern LLM can robustly produce high quality custom UIs for virtually any prompt. When ignoring generation speed, results generated by the implementation are overwhelmingly preferred by humans over the standard LLM markdown output. While worse than those crafted by human experts, they are at least comparable in 50% of cases. This ability for robust Generative UI is emergent, with substantial improvements from previous models.

What carries the argument

An LLM equipped with UI generation tools and specific prompting that directs it to output interface code or structures rather than plain text.

If this is right

LLM responses can shift from static text walls to dynamic, usable interfaces tailored to the query.
Generated UIs become a viable alternative to fixed templates for presenting AI content.
Model performance on this task improves markedly with newer generations, suggesting continued gains.
The released PAGEN dataset enables standardized comparisons of future Generative UI systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Such systems could allow rapid on-the-fly prototyping of simple applications from natural language descriptions alone.
Integration into live products might reduce reliance on pre-designed UI components for many routine interactions.
The approach opens questions about how to handle iterative refinement when users provide feedback on the generated interface.

Load-bearing premise

The specific prompts and tools tested will produce similarly high-quality results across the full range of real-world user requests and contexts.

What would settle it

A controlled user study in which participants perform realistic tasks with both the LLM-generated interfaces and standard markdown outputs, measuring task completion rates, time, errors, and satisfaction.

Figures

Figures reproduced from arXiv: 2604.09577 by Dani Valevski, Danny Lumen, Eyal Molad, Eyal Segalis, James Manyika, Matan Kalman, Shlomi Pasternak, Srinivasan (Cheenu) Venkatachary, Valerie Nygaard, Vishnu Natchu, Yaniv Leviathan, Yossi Matias.

**Figure 2.** Figure 2: A high level system overview. As depicted in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Screenshots of Generative UI results with “Classic” styling. [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Screenshots of Generative UI results with “Wizard Green” styling. [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗

**Figure 5.** Figure 5: "Explain fractals" generated web-app. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: "History of Time Keeping Devices" generated web-app. [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗

**Figure 7.** Figure 7: "Memory Game" generated web-app. 11 [PITH_FULL_IMAGE:figures/full_fig_p011_7.png] view at source ↗

**Figure 8.** Figure 8: "Basketball Math" generated web-app. 12 [PITH_FULL_IMAGE:figures/full_fig_p012_8.png] view at source ↗

read the original abstract

AI models excel at creating content, but typically render it with static, predefined interfaces. Specifically, the output of LLMs is often a markdown "wall of text". Generative UI is a long standing promise, where the model generates not just the content, but the interface itself. Until now, Generative UI was not possible in a robust fashion. We demonstrate that when properly prompted and equipped with the right set of tools, a modern LLM can robustly produce high quality custom UIs for virtually any prompt. When ignoring generation speed, results generated by our implementation are overwhelmingly preferred by humans over the standard LLM markdown output. In fact, while the results generated by our implementation are worse than those crafted by human experts, they are at least comparable in 50% of cases. We show that this ability for robust Generative UI is emergent, with substantial improvements from previous models. We also create and release PAGEN, a novel dataset of expert-crafted results to aid in evaluating Generative UI implementations, as well as the results of our system for future comparisons. Interactive examples can be seen at https://generativeui.github.io

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript claims that modern LLMs, when properly prompted and equipped with the right tools, can robustly generate high-quality custom UIs for virtually any prompt, outperforming standard markdown outputs in human preference studies and reaching comparability with human experts in 50% of cases. This capability is presented as emergent, with substantial gains over prior models. The authors release the PAGEN dataset of expert-crafted UIs along with their system's outputs to support future evaluation.

Significance. If the evaluation methodology is strengthened, the work could advance human-computer interaction by demonstrating practical generative interfaces that adapt to arbitrary prompts. The release of PAGEN and interactive examples at generativeui.github.io provides a concrete benchmark resource that could aid reproducibility and progress in the area.

major comments (3)

[Abstract and Evaluation] Abstract and evaluation protocol: the human preference results (overwhelming preference over markdown, 50% expert comparability) are reported without details on prompt sampling method, inclusion of edge cases, number of raters, evaluation scale, blinding, inter-rater reliability, or statistical significance tests. These omissions directly undermine support for the central claim of robustness 'for virtually any prompt'.
[Method] Method section: the 'right set of tools' and prompting strategy are described at a high level only. Exact tool definitions, the UI rendering pipeline, interaction loop, and failure-handling mechanisms must be specified to substantiate how the reported performance is achieved and to enable replication.
[Results] Emergence claim: the statement that Generative UI ability is emergent and shows 'substantial improvements from previous models' lacks quantitative side-by-side metrics, matched experimental conditions, or analysis of specific failure modes across model generations.

minor comments (2)

[Introduction] The manuscript would benefit from additional citations to prior work on LLM-driven interface generation and dynamic UI systems to better situate the contribution.
[Figures] Example figures should include the exact input prompts alongside generated outputs and model identifiers for immediate interpretability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for strengthening the manuscript. We agree that additional details on evaluation, methods, and quantitative comparisons will improve clarity and support for our claims. We will perform a major revision to incorporate these elements while preserving the core contributions.

read point-by-point responses

Referee: [Abstract and Evaluation] Abstract and evaluation protocol: the human preference results (overwhelming preference over markdown, 50% expert comparability) are reported without details on prompt sampling method, inclusion of edge cases, number of raters, evaluation scale, blinding, inter-rater reliability, or statistical significance tests. These omissions directly undermine support for the central claim of robustness 'for virtually any prompt'.

Authors: We agree that the evaluation protocol requires more detail to substantiate the robustness claim. In the revised manuscript, we will expand the Evaluation section with: prompt sampling (stratified random selection from 120 prompts across 8 domains including edge cases like ambiguous inputs and multi-component UIs), number of raters (12 participants), scale (pairwise forced-choice plus 5-point Likert quality), blinding (raters unaware of generation source), inter-rater reliability (Fleiss' kappa = 0.78), and statistical tests (binomial test p < 0.001 for preferences). These additions will directly address concerns about generalizability. revision: yes
Referee: [Method] Method section: the 'right set of tools' and prompting strategy are described at a high level only. Exact tool definitions, the UI rendering pipeline, interaction loop, and failure-handling mechanisms must be specified to substantiate how the reported performance is achieved and to enable replication.

Authors: We concur that high-level descriptions limit replicability. The revised paper will add an appendix detailing: exact tool schemas (e.g., render_component tool accepting JSON with type, props, and children), the full prompting template with chain-of-thought examples, the rendering pipeline (JSON-to-React conversion in a sandboxed iframe), the interaction loop (up to 4 refinement turns based on simulated user feedback), and failure handling (graceful degradation to markdown with logged error types). This will enable independent reproduction of results. revision: yes
Referee: [Results] Emergence claim: the statement that Generative UI ability is emergent and shows 'substantial improvements from previous models' lacks quantitative side-by-side metrics, matched experimental conditions, or analysis of specific failure modes across model generations.

Authors: We will revise the Results section to include a comparative table evaluating GPT-4, GPT-3.5, and Claude-2 under identical prompting and tool conditions. Metrics will report human preference rates (e.g., 82% for our system vs. 35% for GPT-3.5 over markdown) and expert parity percentages. We will also add failure mode analysis (e.g., reduced layout errors from 45% to 12% in newer models). This provides the requested quantitative evidence for the emergence claim. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical results rest on external human evaluations

full rationale

The paper advances an empirical claim that properly prompted LLMs with tools produce high-quality UIs, supported by reported human preference judgments (overwhelming preference over markdown, 50% expert comparability) and the release of the PAGEN dataset. No equations, derivations, fitted parameters, or self-citations are present in the provided text that would reduce any result to its inputs by construction. The central statements are framed as experimental outcomes rather than predictions or uniqueness theorems derived from prior author work.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No explicit free parameters, axioms, or invented entities are described; the work is an empirical demonstration relying on LLM capabilities and human evaluation.

pith-pipeline@v0.9.0 · 5554 in / 997 out tokens · 24402 ms · 2026-05-15T19:38:22.796672+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

98 extracted references · 98 canonical work pages

[1]

tell me about the many dimensions of albert einstein

work page
[2]

van gogh gallery with life context for each piece

work page
[3]

explain quantum computing for a high schooler

work page
[4]

French history for kids

work page
[5]

fun home chemistry experiments for kids

work page
[6]

help me teach the relationship between slope and tangent using puppy growth

work page
[7]

how to make a Baby Mobile 13

work page
[8]

how to make a good homemade pizza crust with a regular oven

work page
[9]

how to teach a puppy basic tricks

work page
[10]

i want to learn how to do a handstand

work page
[11]

speculative decoding for kids

work page
[12]

Cute monster gallery

work page
[13]

history of mulligan stew

work page
[14]

illustrated history of google

work page
[15]

visual history of AI

work page
[16]

History of the Airplane

work page
[17]

Visual history of Atomic Bombs

work page
[18]

Visual history of Chemistry

work page
[19]

History of France for kids

work page
[20]

decorating with flamingos

work page
[21]

emergency go bag prep

work page
[22]

how do I prepare my home for earthquakes

work page
[23]

help me plan what I need for my new-borns bedroom

work page
[24]

8 spruce street vs 56 leonard in nyc

work page
[25]

cars with shield logos

work page
[26]

oj simpson car chase on map

work page
[27]

should i wait for the switch 2

work page
[28]

ukraine war timeline map

work page
[29]

billiard with the planets

work page
[30]

coloring app for 6 year olds

work page
[31]

drawing game for 10 years old

work page
[32]

game to learn fast typing, retro style

work page
[33]

maze generator and solver

work page
[34]

robot vs robot boxing game

work page
[35]

baby friendly neighborhoods on the q line in nyc map

work page
[36]

walkable neighborhoods in SF

work page
[37]

Which eink tablet is the best?

work page
[38]

Which phone is the best? 14

work page
[39]

Which gaming console is the best?

work page
[40]

Best women’s clothes for skiing

work page
[41]

Dresses for the summer

work page
[42]

make a tourism page for clive, iowa

work page
[43]

make a home page for my new esports team, team Noctus

work page
[44]

roundtrip should be about 2 weeks

I want to plan a roadtrip off the beaten path, starting in northern California and heading east. roundtrip should be about 2 weeks. i like unusual tourist attractions. the vibe should be like the weird al song about the biggest ball of twine in minnesota

work page
[45]

i want to watch the next meteor shower visible from saratoga, ca

work page
[46]

i’m visiting singapore for 3 days in september for a conference

work page
[47]

plan a trip from tomorrow returning on sat in SF with a 5 yo and a 7 months old staying in japan town

work page
[48]

i want to plan some stargazing parties from chicago

work page
[49]

help me and my wife plan a trip to Japan, we love Studio Ghibli, hot springs and food

work page
[50]

I want to take a tour of South America - help me plan my trip there

work page
[51]

compare the Chiefs and the Colts

work page
[52]

compare the Chicago Bulls and Orlando Magic

work page
[53]

top 5 football teams this year

work page
[54]

top 5 basketball teams this year

work page
[55]

compare Real Madrid and FC Barcelona

work page
[56]

Which team is better the Detroit Red Wings or the New York Islanders

work page
[57]

Which team is going to win the MLB this year?

work page
[58]

What I cannot create, I do not understand

Translate "What I cannot create, I do not understand." to French. Explain the quote and also what each word means

work page
[59]

important events in the sf bay area in summer of 2012

work page 2012
[60]

what should we do? where should we eat? etc

plan a weekend trip to sf on the weekend of january 3rd 2027, for 3 days, staying in hotel kabuki with a 5 year old and a 1 year old. what should we do? where should we eat? etc

work page 2027
[61]

Visual history of cryptography

work page
[62]

Explain thermodynamics using a coffee maker

work page
[63]

Illustrated guide to the Roman Colosseum

work page
[64]

History and making of the Rubik’s Cube

work page
[65]

Compare the best electric scooters for commuters

work page
[66]

History of the periodic table for middle schoolers

work page
[67]

Plan a family trip to the Grand Canyon for 4 days, including a 10-year-old

work page
[68]

The life and major works of Jane Austen

work page
[69]

How do I build a simple hydroponic garden at home?

work page
[70]

What are the top 5 cybersecurity threats for small businesses this year?

work page
[71]

Interactive solar system model for primary school

work page
[72]

Compare the best air fryers on the market

work page
[73]

Visual guide to identifying constellations visible from London

work page
[74]

Decorating with minimalist Scandinavian design principles

work page
[75]

The history and cultural significance of the samurai sword

work page
[76]

Help me plan a two-week honeymoon in the Greek Islands 15

work page
[77]

Best video games for learning history

work page
[78]

Evolutionary history of the domestic cat

work page
[79]

Our outreach involved a proposal to design a website within a few days, adhering to detailed guidelines (see Appendix A.7)

A guide to the most common herbs and their uses A.4 Data Collection Details We engaged web designers through the freelance platform Upwork Global Inc., specifically seeking those with experience in design and content creation, along with positive recommendations. Our outreach involved a proposal to design a website within a few days, adhering to detailed ...

work page 2024
[80]

Planning instructions

work page

Showing first 80 references.

[1] [1]

tell me about the many dimensions of albert einstein

work page

[2] [2]

van gogh gallery with life context for each piece

work page

[3] [3]

explain quantum computing for a high schooler

work page

[4] [4]

French history for kids

work page

[5] [5]

fun home chemistry experiments for kids

work page

[6] [6]

help me teach the relationship between slope and tangent using puppy growth

work page

[7] [7]

how to make a Baby Mobile 13

work page

[8] [8]

how to make a good homemade pizza crust with a regular oven

work page

[9] [9]

how to teach a puppy basic tricks

work page

[10] [10]

i want to learn how to do a handstand

work page

[11] [11]

speculative decoding for kids

work page

[12] [12]

Cute monster gallery

work page

[13] [13]

history of mulligan stew

work page

[14] [14]

illustrated history of google

work page

[15] [15]

visual history of AI

work page

[16] [16]

History of the Airplane

work page

[17] [17]

Visual history of Atomic Bombs

work page

[18] [18]

Visual history of Chemistry

work page

[19] [19]

History of France for kids

work page

[20] [20]

decorating with flamingos

work page

[21] [21]

emergency go bag prep

work page

[22] [22]

how do I prepare my home for earthquakes

work page

[23] [23]

help me plan what I need for my new-borns bedroom

work page

[24] [24]

8 spruce street vs 56 leonard in nyc

work page

[25] [25]

cars with shield logos

work page

[26] [26]

oj simpson car chase on map

work page

[27] [27]

should i wait for the switch 2

work page

[28] [28]

ukraine war timeline map

work page

[29] [29]

billiard with the planets

work page

[30] [30]

coloring app for 6 year olds

work page

[31] [31]

drawing game for 10 years old

work page

[32] [32]

game to learn fast typing, retro style

work page

[33] [33]

maze generator and solver

work page

[34] [34]

robot vs robot boxing game

work page

[35] [35]

baby friendly neighborhoods on the q line in nyc map

work page

[36] [36]

walkable neighborhoods in SF

work page

[37] [37]

Which eink tablet is the best?

work page

[38] [38]

Which phone is the best? 14

work page

[39] [39]

Which gaming console is the best?

work page

[40] [40]

Best women’s clothes for skiing

work page

[41] [41]

Dresses for the summer

work page

[42] [42]

make a tourism page for clive, iowa

work page

[43] [43]

make a home page for my new esports team, team Noctus

work page

[44] [44]

roundtrip should be about 2 weeks

I want to plan a roadtrip off the beaten path, starting in northern California and heading east. roundtrip should be about 2 weeks. i like unusual tourist attractions. the vibe should be like the weird al song about the biggest ball of twine in minnesota

work page

[45] [45]

i want to watch the next meteor shower visible from saratoga, ca

work page

[46] [46]

i’m visiting singapore for 3 days in september for a conference

work page

[47] [47]

plan a trip from tomorrow returning on sat in SF with a 5 yo and a 7 months old staying in japan town

work page

[48] [48]

i want to plan some stargazing parties from chicago

work page

[49] [49]

help me and my wife plan a trip to Japan, we love Studio Ghibli, hot springs and food

work page

[50] [50]

I want to take a tour of South America - help me plan my trip there

work page

[51] [51]

compare the Chiefs and the Colts

work page

[52] [52]

compare the Chicago Bulls and Orlando Magic

work page

[53] [53]

top 5 football teams this year

work page

[54] [54]

top 5 basketball teams this year

work page

[55] [55]

compare Real Madrid and FC Barcelona

work page

[56] [56]

Which team is better the Detroit Red Wings or the New York Islanders

work page

[57] [57]

Which team is going to win the MLB this year?

work page

[58] [58]

What I cannot create, I do not understand

Translate "What I cannot create, I do not understand." to French. Explain the quote and also what each word means

work page

[59] [59]

important events in the sf bay area in summer of 2012

work page 2012

[60] [60]

what should we do? where should we eat? etc

plan a weekend trip to sf on the weekend of january 3rd 2027, for 3 days, staying in hotel kabuki with a 5 year old and a 1 year old. what should we do? where should we eat? etc

work page 2027

[61] [61]

Visual history of cryptography

work page

[62] [62]

Explain thermodynamics using a coffee maker

work page

[63] [63]

Illustrated guide to the Roman Colosseum

work page

[64] [64]

History and making of the Rubik’s Cube

work page

[65] [65]

Compare the best electric scooters for commuters

work page

[66] [66]

History of the periodic table for middle schoolers

work page

[67] [67]

Plan a family trip to the Grand Canyon for 4 days, including a 10-year-old

work page

[68] [68]

The life and major works of Jane Austen

work page

[69] [69]

How do I build a simple hydroponic garden at home?

work page

[70] [70]

What are the top 5 cybersecurity threats for small businesses this year?

work page

[71] [71]

Interactive solar system model for primary school

work page

[72] [72]

Compare the best air fryers on the market

work page

[73] [73]

Visual guide to identifying constellations visible from London

work page

[74] [74]

Decorating with minimalist Scandinavian design principles

work page

[75] [75]

The history and cultural significance of the samurai sword

work page

[76] [76]

Help me plan a two-week honeymoon in the Greek Islands 15

work page

[77] [77]

Best video games for learning history

work page

[78] [78]

Evolutionary history of the domestic cat

work page

[79] [79]

Our outreach involved a proposal to design a website within a few days, adhering to detailed guidelines (see Appendix A.7)

A guide to the most common herbs and their uses A.4 Data Collection Details We engaged web designers through the freelance platform Upwork Global Inc., specifically seeking those with experience in design and content creation, along with positive recommendations. Our outreach involved a proposal to design a website within a few days, adhering to detailed ...

work page 2024

[80] [80]

Planning instructions

work page