UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

· 2025 · cs.CV · arXiv 2511.08195

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

UI-to-code aims to translate UI screenshots into executable front-end code. Despite progress with vision-language models (VLMs), most existing methods formulate UI-to-code as a single-pass generation, which mismatches real-world UI development that is inherently iterative and feedback-driven. We reformulate UI-to-code as an interactive visual optimization problem, where code generation is embedded in a closed-loop process of execution, visual inspection, and iterative refinement driven by rendered visual feedback. To address the non-differentiability of visual objectives and the noise of absolute visual evaluators, we propose Relative Visual Policy Optimization (RVPO), a preference-based reinforcement learning method that optimizes relative visual rankings among rendered candidates under execution feedback. We instantiate this paradigm in UI2Code^N, an open-source 9B model trained via continual pre-training, supervised fine-tuning, and reinforcement learning. Experiments demonstrate state-of-the-art performance on UI drafting, UI polishing, and UI editing benchmarks, even outperforming larger models, with performance consistently improving through iterative visual optimization. Our code and models are available at https://github.com/zai-org/UI2Code_N.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding

cs.CL · 2026-02-02 · unverdicted · novelty 7.0

Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.

Self-Distillation Policy Optimization via Visual Feedback: Bridging Code and Visual Artifacts

cs.AI · 2026-06-09 · unverdicted · novelty 6.0

Visual-SDPO distills visual feedback from rendered code outputs into a student policy via grounded credit weighting and GRPO, yielding over 10-point gains on chart/UI/slide benchmarks.

Test-Time Scaling in Multimodal Foundation Models: A Comprehensive Survey of Generation and Reasoning

cs.CV · 2026-06-06 · unverdicted · novelty 5.0

A survey of test-time scaling for multimodal foundation models that introduces a three-way taxonomy of sampling, feedback, and search approaches along with applications and benchmarks.

citing papers explorer

Showing 1 of 1 citing paper after filters.

CodeOCR: On the Effectiveness of Vision Language Models in Code Understanding cs.CL · 2026-02-02 · unverdicted · none · ref 103 · internal anchor
Multimodal LLMs process code as images to achieve up to 8x token compression, with visual cues like syntax highlighting aiding tasks and clone detection remaining resilient or even improving under compression.

UI2Code^N: UI-to-Code Generation as Interactive Visual Optimization

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer