OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

Feibang Jiang; Haoyue Yang; Xuanle Zhao; Xuexin Liu; Yao Zhu

arxiv: 2604.05514 · v1 · submitted 2026-04-07 · 💻 cs.AI

OmniDiagram: Advancing Unified Diagram Code Generation via Visual Interrogation Reward

Haoyue Yang , Xuanle Zhao , Xuexin Liu , Feibang Jiang , Yao Zhu This is my paper

Pith reviewed 2026-05-10 18:54 UTC · model grok-4.3

classification 💻 cs.AI

keywords diagram code generationvisual feedbackreinforcement learningunified frameworkdiagram datasetvisual interrogationcode generationstate-of-the-art results

0 comments

The pith

OmniDiagram trains code generators for many diagram languages by using self-generated visual questions to score rendered outputs in reinforcement learning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents OmniDiagram as a single framework that supports multiple diagram code languages and task types instead of restricting to narrow cases. It introduces Viva, a process in which the model creates its own visual inquiries to examine how faithfully a rendered diagram matches the intended structure and then uses that feedback to guide reinforcement learning updates. This approach removes the need for manually written ground-truth code during training. The authors also release M3²Diagram, a dataset of more than 196,000 examples. When supervised fine-tuning is followed by Viva-based reinforcement learning, the system records new state-of-the-art results on standard diagram code generation benchmarks.

Core claim

OmniDiagram is a unified framework incorporating diverse diagram code languages and task definitions. To align code logic with visual fidelity in reinforcement learning, it employs Visual Interrogation Verifies All (Viva), a generative strategy that actively produces targeted visual inquiries to scrutinize diagram visual fidelity and supplies fine-grained feedback for optimization. This enables a self-evolving training process that does not require manually annotated ground-truth code. Paired with supervised fine-tuning on the newly constructed M3²Diagram dataset of over 196k high-quality instances, the combination reaches new state-of-the-art performance across diagram code generation tasks

What carries the argument

Visual Interrogation Verifies All (Viva), the mechanism that generates targeted visual inquiries about rendered diagrams to produce fine-grained rewards for reinforcement learning without ground-truth annotations.

If this is right

A single model can now handle a broader set of diagram languages and task formulations than earlier specialized systems.
Training proceeds without paired ground-truth code annotations for every example.
The self-evolving loop allows performance to improve iteratively from visual structure feedback alone.
The released M3²Diagram dataset supplies scale for future training of diagram-related models.
SOTA numbers are established on existing diagram code benchmarks when SFT is followed by Viva-based RL.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same interrogation style of feedback could be adapted to other generation tasks where output is judged by rendered appearance, such as plot or UI code.
If Viva-style questions prove reliable across diagram types, the method might reduce dependence on large labeled datasets in related multimodal code tasks.
The unified framework opens the possibility of extending support to additional languages or tasks not covered in current benchmarks.
Measuring how well Viva inquiries align with human judgments on diagram correctness would provide an independent check on the reward quality.

Load-bearing premise

The visual inquiries that Viva generates give accurate and unbiased feedback on diagram visual fidelity that reliably guides code improvements.

What would settle it

Training the same base model with Viva rewards replaced by syntax-only or random rewards and measuring whether performance on the benchmark suite stays at or above the reported SOTA level would directly test whether the visual interrogation step is necessary for the gains.

Figures

Figures reproduced from arXiv: 2604.05514 by Feibang Jiang, Haoyue Yang, Xuanle Zhao, Xuexin Liu, Yao Zhu.

**Figure 2.** Figure 2: Overview of the OmniDiagram methodology. The framework illustrates the end-to-end flow from scalable [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: Breakdown of the 196k-sample M3 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Progression of the overall reward during the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Statistical breakdown of tasks and diagram [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Qualitative showcase of our model across three modalities (LA [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

**Figure 7.** Figure 7: Qualitative example of visual verification questions for the Text-to-Code task. [PITH_FULL_IMAGE:figures/full_fig_p015_7.png] view at source ↗

**Figure 8.** Figure 8: Qualitative example of visual verification questions for the Diagram-to-Code task. [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗

**Figure 9.** Figure 9: Qualitative example of visual verification questions for the Diagram Editing task. [PITH_FULL_IMAGE:figures/full_fig_p016_9.png] view at source ↗

**Figure 10.** Figure 10: The prompt template used for generating different topics to set scene limitations. [PITH_FULL_IMAGE:figures/full_fig_p017_10.png] view at source ↗

**Figure 11.** Figure 11: The prompt template used for generating diverse diagram scenario based on user topics and specific [PITH_FULL_IMAGE:figures/full_fig_p018_11.png] view at source ↗

**Figure 12.** Figure 12: The prompt template used for generating structured JSON data elements tailored to specific Mermaid [PITH_FULL_IMAGE:figures/full_fig_p019_12.png] view at source ↗

**Figure 13.** Figure 13: The prompt template used and applying structured JSON data into executable Mermaid Mindmap code. [PITH_FULL_IMAGE:figures/full_fig_p020_13.png] view at source ↗

**Figure 14.** Figure 14: The prompt used for the Diagram-to-Code task evaluation. [PITH_FULL_IMAGE:figures/full_fig_p021_14.png] view at source ↗

**Figure 15.** Figure 15: The prompt used for the Diagram Editing task evaluation. [PITH_FULL_IMAGE:figures/full_fig_p022_15.png] view at source ↗

**Figure 16.** Figure 16: The prompt used for Text-to-Code task evaluation. [PITH_FULL_IMAGE:figures/full_fig_p023_16.png] view at source ↗

read the original abstract

The paradigm of programmable diagram generation is evolving rapidly, playing a crucial role in structured visualization. However, most existing studies are confined to a narrow range of task formulations and language support, constraining their applicability to diverse diagram types. In this work, we propose OmniDiagram, a unified framework that incorporates diverse diagram code languages and task definitions. To address the challenge of aligning code logic with visual fidelity in Reinforcement Learning (RL), we introduce a novel visual feedback strategy named Visual Interrogation Verifies All (\textsc{Viva}). Unlike brittle syntax-based rules or pixel-level matching, \textsc{Viva} rewards the visual structure of rendered diagrams through a generative approach. Specifically, \textsc{Viva} actively generates targeted visual inquiries to scrutinize diagram visual fidelity and provides fine-grained feedback for optimization. This mechanism facilitates a self-evolving training process, effectively obviating the need for manually annotated ground truth code. Furthermore, we construct M3$^2$Diagram, the first large-scale diagram code generation dataset, containing over 196k high-quality instances. Experimental results confirm that the combination of SFT and our \textsc{Viva}-based RL allows OmniDiagram to establish a new state-of-the-art (SOTA) across diagram code generation benchmarks.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

OmniDiagram adds a large new dataset and a generative visual RL reward for diagram code generation, but the SOTA claim has no supporting numbers or validation of the feedback quality.

read the letter

The one or two things to know are that the paper introduces a unified framework called OmniDiagram for diagram code generation across different languages and tasks, along with a new dataset and a visual interrogation reward for RL training. What is actually new is the M3²Diagram dataset containing over 196k high-quality instances and the Viva method, which uses a generative model to create targeted visual inquiries about rendered diagrams to give fine-grained rewards. This setup allows self-evolving training without needing manually annotated ground truth code, which is a practical step forward from brittle syntax rules or pixel matching. The paper does well in laying out the problem of limited applicability in prior studies and in proposing this generative feedback as an alternative that could scale better. The soft spots are around the experimental support. The abstract claims SOTA results from combining supervised fine-tuning with Viva-based RL, but it supplies no specific metrics, baselines, or ablation studies to back that up. This makes the central claim hard to evaluate. The stress-test point about Viva feedback accuracy is on target; there is no mention of any checks, like agreement with human raters or comparisons to other quality measures, to confirm that the inquiries provide unbiased and reliable signals. If that does not hold, the RL could reinforce wrong patterns. This paper is for researchers in AI-assisted visualization and code generation from natural language or structured inputs. Readers working on reinforcement learning for visual tasks or building diagram tools could find the Viva concept and the dataset useful to explore, assuming the full paper has more implementation details. It deserves a serious referee because the scale of the dataset and the new reward approach are substantive contributions that merit detailed feedback, even if revisions are needed to strengthen the evidence. I would recommend engaging with the work through peer review, focusing on getting the validation and results presented clearly.

Referee Report

2 major / 0 minor

Summary. The paper introduces OmniDiagram, a unified framework supporting diverse diagram code languages and task formulations. It proposes Viva (Visual Interrogation Verifies All), a generative visual feedback strategy for RL that generates targeted inquiries to assess rendered diagram fidelity and supply rewards, enabling training without ground-truth annotations. The authors construct the M3²Diagram dataset containing over 196k instances and claim that SFT combined with Viva-based RL achieves new state-of-the-art results on diagram code generation benchmarks.

Significance. If the Viva feedback mechanism can be empirically validated as reliable and unbiased, the work could meaningfully advance annotation-efficient RL for structured visualization tasks by addressing the code-to-visual alignment problem in a scalable way. The release of the large-scale M3²Diagram dataset is a clear positive contribution that may serve as a foundation for future benchmark studies in programmable diagram generation.

major comments (2)

Abstract: The central claim that SFT plus Viva-based RL establishes new SOTA performance is asserted without any reported metrics, baseline comparisons, ablation studies, or implementation details, leaving the experimental support for the primary result unverifiable from the manuscript text.
Viva description (Abstract and method sections): The approach depends on the unvalidated assumption that generative visual inquiries produce accurate, unbiased, and fine-grained signals of diagram visual fidelity. No checks—such as inter-rater agreement with humans, correlation with pixel-level metrics, or ablation on interrogator quality—are provided, which is load-bearing because unreliable rewards could reinforce artifacts rather than genuine improvements in the RL stage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and indicate the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: Abstract: The central claim that SFT plus Viva-based RL establishes new SOTA performance is asserted without any reported metrics, baseline comparisons, ablation studies, or implementation details, leaving the experimental support for the primary result unverifiable from the manuscript text.

Authors: We agree that the abstract's brevity makes the SOTA claim difficult to verify at a glance. The full manuscript reports these details in Section 4 (Experiments), including quantitative tables comparing against baselines, ablation studies on the RL component, and implementation specifics. To address the concern directly, we will revise the abstract to include key performance metrics (e.g., relative improvements on the primary benchmarks) and a brief mention of the evaluation setup. revision: yes
Referee: Viva description (Abstract and method sections): The approach depends on the unvalidated assumption that generative visual inquiries produce accurate, unbiased, and fine-grained signals of diagram visual fidelity. No checks—such as inter-rater agreement with humans, correlation with pixel-level metrics, or ablation on interrogator quality—are provided, which is load-bearing because unreliable rewards could reinforce artifacts rather than genuine improvements in the RL stage.

Authors: We acknowledge that direct validation of the Viva reward signals is important for establishing reliability. The current manuscript supports Viva's utility through end-to-end performance gains over SFT-only training and qualitative examples of the generated inquiries. However, we agree that additional checks would strengthen the claims. We will add a dedicated validation subsection that reports correlation between Viva rewards and both human ratings and pixel-level structural metrics, plus an ablation varying the interrogator model. revision: yes

Circularity Check

0 steps flagged

No circularity; derivation relies on novel Viva mechanism, new dataset, and external benchmarks

full rationale

The paper introduces OmniDiagram as a unified framework, proposes Viva as a generative visual interrogation reward for RL without ground-truth annotations, and constructs the M3^2Diagram dataset. The SOTA claim is supported by experimental results on benchmarks rather than any self-definitional reduction, fitted parameter renamed as prediction, or load-bearing self-citation chain. The derivation chain remains self-contained with independent components and external evaluation.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the effectiveness of the Viva visual feedback loop and the quality of the newly constructed dataset; no free parameters are explicitly named in the abstract.

axioms (1)

domain assumption Generative visual inquiries can reliably assess and improve the structural fidelity of rendered diagrams in an RL loop
This is the core premise of the Viva strategy as described.

invented entities (1)

Viva (Visual Interrogation Verifies All) no independent evidence
purpose: Provide fine-grained visual feedback for RL optimization of diagram code without ground-truth annotations
Newly introduced reward mechanism whose independent validation is not described in the abstract.

pith-pipeline@v0.9.0 · 5526 in / 1279 out tokens · 37061 ms · 2026-05-10T18:54:38.184685+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages

[1]

Linzheng Chai, Jian Yang, Shukai Liu, Wei Zhang, Li- ran Wang, Ke Jin, Tao Sun, Congnan Liu, Chenchen Zhang, Hualei Zhu, and 1 others

Starflow: Generating structured work- flow outputs from sketch images.arXiv preprint arXiv:2503.21889. Linzheng Chai, Jian Yang, Shukai Liu, Wei Zhang, Li- ran Wang, Ke Jin, Tao Sun, Congnan Liu, Chenchen Zhang, Hualei Zhu, and 1 others. 2025. Multilingual multimodal software developer for code generation. arXiv preprint arXiv:2507.08719. Lei Chen, Xuanle...

work page arXiv 2025
[2]

nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025

nvbench 2.0: Resolving ambiguity in text- to-visualization through stepwise reasoning.arXiv preprint arXiv:2503.12880. Yuansheng Ni, Songcheng Cai, Xiangchao Chen, Jiarong Liang, Zhiheng Lyu, Jiaqi Deng, Kai Zou, Ping Nie, Fei Yuan, Xiang Yue, and 1 others. 2025. Viscoder2: Building multi-language visualization coding agents.arXiv preprint arXiv:2510.2364...

work page arXiv 2025
[3]

Start: User initiates VPN connection\

Is the top-most node labeled \"Start: User initiates VPN connection\" colored green with a thick green outline?

work page
[4]

Authentication server validates credentials\

Is there a diamond-shaped node with the text \"Authentication server validates credentials\"?

work page
[5]

Is there an arrow labeled \"Yes\" originating from the diamond-shaped node and pointing to a blue node?

work page
[6]

Is there an arrow labeled \"No\" originating from the diamond-shaped node and pointing to a red node?

work page
[7]

Device sends authentication request\

Do all process nodes, such as \"Device sends authentication request\" and \"Authentication successful\", have a blue fill and a thick blue outline?

work page
[8]

Are the connecting lines black arrows that have a stealth arrowhead style?

work page
[9]

Encrypted VPN tunnel established\

Is the node labeled \"Encrypted VPN tunnel established\" colored red with a thick red outline?

work page
[10]

Is the overall layout of the main VPN connection flow primarily vertical, moving from top to bottom?

work page
[11]

Is the diagram free of any overlapping text or lines, ensuring all elements are clearly readable?

work page
[12]

Diagram-to-Code Input: Output(Rendered): Questions:

Do all visible nodes (start, process, decision, end) feature rounded corners and a drop shadow effect? Figure 7: Qualitative example of visual verification questions for the Text-to-Code task. Diagram-to-Code Input: Output(Rendered): Questions:

work page
[13]

Does the diagram contain a top-level node labeled 'Load Balancer'?

work page
[14]

Are there two nodes labeled 'Web Server 1' and 'Web Server 2' positioned below the Load Balancer?

work page
[15]

Is the text 'HTTP Requests' visible on the connections originating from the Load Balancer?

work page
[16]

Does the diagram feature a central node labeled 'Application Server’?

work page
[17]

Do the arrows connecting the Web Servers to the Application Server contain the label 'API Calls’?

work page
[18]

Is there a node labeled 'Cache' positioned at the bottom-left of the structure?

work page
[19]

Is the text 'Cache Responses' present on the link connecting to the Cache node?

work page
[20]

Does the diagram contain a node labeled 'Database' at the bottom- right?

work page
[21]

Is the text 'Query Results' visible on the connection leading to the Database node?

work page
[22]

E Evaluation E.1 Prompt Used in Evaluation To ensure reproducibility, we provide the exact sys- tem prompts used for our GPT-4.1-based evalua- tion

Are all nodes depicted as rounded rectangles with a light orange background and darker orange border? Figure 8: Qualitative example of visual verification questions for the Diagram-to-Code task. E Evaluation E.1 Prompt Used in Evaluation To ensure reproducibility, we provide the exact sys- tem prompts used for our GPT-4.1-based evalua- tion. Figure 14 ill...

work page 2022
[23]

Do all the rectangular nodes display rounded corners instead of sharp 90-degree angles?

work page
[24]

Is the interior fill color of the nodes a soft, light blue?

work page
[25]

Is a darker blue border clearly visible outlining each node?

work page
[26]

Does the border line width appear consistent and distinct across all nodes?

work page
[27]

Is the diagram completely free of any sharp-cornered, white-filled nodes?

work page
[28]

Does the 'Customer Relationship Management' node match the rounded blue style of the other nodes? 7.Are the text labels inside the nodes clearly legible against the light blue background?

work page
[29]

Do the connecting arrows remain correctly attached to the boundaries of the modified nodes?

work page
[30]

Is the diagram free of any broken or floating connections resulting from the style change?

work page
[31]

cess rate of specific modifications requested by the user (e.g., color changes or node deletions), focus- ing strictly on the execution of the edit instruction

Does the overall diagram maintain a consistent visual theme across all block elements? Figure 9: Qualitative example of visual verification questions for the Diagram Editing task. cess rate of specific modifications requested by the user (e.g., color changes or node deletions), focus- ing strictly on the execution of the edit instruction. Content Preserva...

work page 2025
[32]

You MUST return a single, valid JSON object

work page
[33]

The JSON object MUST contain exactly one key: "topics"

work page
[34]

The value of "topics" MUST be a JSON array of strings

work page
[35]

Each string in the array should be a 2-3 sentence topic description corresponding to one set of keywords from the user input

work page
[36]

User Prompt: Please generate topic descriptions for the following 3 characters

The number of strings in the array MUST EXACTLY match the number of keyword sets provided. User Prompt: Please generate topic descriptions for the following 3 characters. Return the result as a JSON object according to the system instructions

work page
[37]

Keywords: Name=Alex, Age=32, Profession=Software Engineer, Trait=innovative problem solving, Goal=to streamline a complex workflow

work page
[38]

Keywords: Name=Jordan, Age=45, Profession=Product Manager, Trait=user-centric design, Goal=to map out a new user experience

work page
[39]

Syntax Tax

Keywords: Name=Taylor, Age=28, Profession=Data Analyst, Trait=data-driven, Goal=to present data insights to stakeholders Figure 10: The prompt template used for generating different topics to set scene limitations. size, pointing to a persistent bottleneck in handling strict domain specific languages. These failures typically occur in diagrams featuring m...

work page
[40]

a Sequence Diagram illustrating a user login process with two-factor authentication

Each topic is a high-level summary of the contents in the diagram with some design details, e.g., “a Sequence Diagram illustrating a user login process with two-factor authentication”

work page
[41]

Each topic should be unique and not overlap with others

The topics should be diverse to help me generate varied diagrams. Each topic should be unique and not overlap with others

work page
[42]

{figure_type}

The topics are strictly conditioned on the Mermaid diagram type. Please ensure the topics you provided can be best visualized in “{figure_type}”

work page
[43]

All topics must be in English, even if the scenario is non-English

work page
[44]

{scenario}

List{num_topics} topics for “{scenario}” and separate them with a | character, e.g., topic1 | topic2 | ...... | topic{num_topics}. Do not include any additional text at the beginning or end of your response. Figure 11: The prompt template used for generating diverse diagram scenario based on user topics and specific diagram types. ral language reasoning c...

work page
[45]

{figure_type}

The data should be highly structured as a JSON object, with its schema tailored specifically for the “{figure_type}” syntax. For example, for a Class Diagram, the JSON should contain a list of ‘classes’ and a list of ‘relationships’

work page
[46]

Do not use placeholder names like xxA, xxB, etc

The data should be realistic, and the contents should be named using real-world entities. Do not use placeholder names like xxA, xxB, etc

work page
[47]

Do not provide too many elements; just the key information

The elements should be concise and directly map to a meaningful diagram. Do not provide too many elements; just the key information

work page
[48]

All elements must be in English, even if the topic is non-English

work page
[49]

{topic}”. I have a JSON object of structured data about “{scenario}

You can use the provided JSON templates to structure decision-based flows. If the topic is related to decision-making or conditional logic, please use or adapt the templates provided in the<templates>block. Figure 12: The prompt template used for generating structured JSON data elements tailored to specific Mermaid diagram types and topics. 19 Prompts for...

work page
[50]

The code must be a valid and complete Mermaid mindmap script adhering strictly to syntax rules

work page
[51]

mindmap”. The “mindmap

The code must start with “mindmap”. The “mindmap” keyword is reserved

work page
[52]

All child nodes MUST be indented more deeply than their parent

Indentation is the ONLY way to define the hierarchy. All child nodes MUST be indented more deeply than their parent. Consistent indentation (e.g., 2 or 4 spaces) is required

work page
[53]

Do not include any additional text outside of the Mermaid code block

work page
[54]

CRITICAL RULE: A mindmap can only have ONE SINGLE ROOT NODE

The code must be self-contained within the“‘mermaid...“‘block. CRITICAL RULE: A mindmap can only have ONE SINGLE ROOT NODE. • The first line of the mindmap defines the root. • All other nodes MUST BE indented under this root node. • Any line with zero indentation (nodes or styles) will be treated as a second root, causing a fatal error. IMPORTANT: Styling...

work page

[1] [1]

Linzheng Chai, Jian Yang, Shukai Liu, Wei Zhang, Li- ran Wang, Ke Jin, Tao Sun, Congnan Liu, Chenchen Zhang, Hualei Zhu, and 1 others

Starflow: Generating structured work- flow outputs from sketch images.arXiv preprint arXiv:2503.21889. Linzheng Chai, Jian Yang, Shukai Liu, Wei Zhang, Li- ran Wang, Ke Jin, Tao Sun, Congnan Liu, Chenchen Zhang, Hualei Zhu, and 1 others. 2025. Multilingual multimodal software developer for code generation. arXiv preprint arXiv:2507.08719. Lei Chen, Xuanle...

work page arXiv 2025

[2] [2]

nvbench 2.0: A benchmark for natural language to visualization under ambi- guity.arXiv preprint arXiv:2503.12880, 2025

nvbench 2.0: Resolving ambiguity in text- to-visualization through stepwise reasoning.arXiv preprint arXiv:2503.12880. Yuansheng Ni, Songcheng Cai, Xiangchao Chen, Jiarong Liang, Zhiheng Lyu, Jiaqi Deng, Kai Zou, Ping Nie, Fei Yuan, Xiang Yue, and 1 others. 2025. Viscoder2: Building multi-language visualization coding agents.arXiv preprint arXiv:2510.2364...

work page arXiv 2025

[3] [3]

Start: User initiates VPN connection\

Is the top-most node labeled \"Start: User initiates VPN connection\" colored green with a thick green outline?

work page

[4] [4]

Authentication server validates credentials\

Is there a diamond-shaped node with the text \"Authentication server validates credentials\"?

work page

[5] [5]

Is there an arrow labeled \"Yes\" originating from the diamond-shaped node and pointing to a blue node?

work page

[6] [6]

Is there an arrow labeled \"No\" originating from the diamond-shaped node and pointing to a red node?

work page

[7] [7]

Device sends authentication request\

Do all process nodes, such as \"Device sends authentication request\" and \"Authentication successful\", have a blue fill and a thick blue outline?

work page

[8] [8]

Are the connecting lines black arrows that have a stealth arrowhead style?

work page

[9] [9]

Encrypted VPN tunnel established\

Is the node labeled \"Encrypted VPN tunnel established\" colored red with a thick red outline?

work page

[10] [10]

Is the overall layout of the main VPN connection flow primarily vertical, moving from top to bottom?

work page

[11] [11]

Is the diagram free of any overlapping text or lines, ensuring all elements are clearly readable?

work page

[12] [12]

Diagram-to-Code Input: Output(Rendered): Questions:

Do all visible nodes (start, process, decision, end) feature rounded corners and a drop shadow effect? Figure 7: Qualitative example of visual verification questions for the Text-to-Code task. Diagram-to-Code Input: Output(Rendered): Questions:

work page

[13] [13]

Does the diagram contain a top-level node labeled 'Load Balancer'?

work page

[14] [14]

Are there two nodes labeled 'Web Server 1' and 'Web Server 2' positioned below the Load Balancer?

work page

[15] [15]

Is the text 'HTTP Requests' visible on the connections originating from the Load Balancer?

work page

[16] [16]

Does the diagram feature a central node labeled 'Application Server’?

work page

[17] [17]

Do the arrows connecting the Web Servers to the Application Server contain the label 'API Calls’?

work page

[18] [18]

Is there a node labeled 'Cache' positioned at the bottom-left of the structure?

work page

[19] [19]

Is the text 'Cache Responses' present on the link connecting to the Cache node?

work page

[20] [20]

Does the diagram contain a node labeled 'Database' at the bottom- right?

work page

[21] [21]

Is the text 'Query Results' visible on the connection leading to the Database node?

work page

[22] [22]

E Evaluation E.1 Prompt Used in Evaluation To ensure reproducibility, we provide the exact sys- tem prompts used for our GPT-4.1-based evalua- tion

Are all nodes depicted as rounded rectangles with a light orange background and darker orange border? Figure 8: Qualitative example of visual verification questions for the Diagram-to-Code task. E Evaluation E.1 Prompt Used in Evaluation To ensure reproducibility, we provide the exact sys- tem prompts used for our GPT-4.1-based evalua- tion. Figure 14 ill...

work page 2022

[23] [23]

Do all the rectangular nodes display rounded corners instead of sharp 90-degree angles?

work page

[24] [24]

Is the interior fill color of the nodes a soft, light blue?

work page

[25] [25]

Is a darker blue border clearly visible outlining each node?

work page

[26] [26]

Does the border line width appear consistent and distinct across all nodes?

work page

[27] [27]

Is the diagram completely free of any sharp-cornered, white-filled nodes?

work page

[28] [28]

Does the 'Customer Relationship Management' node match the rounded blue style of the other nodes? 7.Are the text labels inside the nodes clearly legible against the light blue background?

work page

[29] [29]

Do the connecting arrows remain correctly attached to the boundaries of the modified nodes?

work page

[30] [30]

Is the diagram free of any broken or floating connections resulting from the style change?

work page

[31] [31]

cess rate of specific modifications requested by the user (e.g., color changes or node deletions), focus- ing strictly on the execution of the edit instruction

Does the overall diagram maintain a consistent visual theme across all block elements? Figure 9: Qualitative example of visual verification questions for the Diagram Editing task. cess rate of specific modifications requested by the user (e.g., color changes or node deletions), focus- ing strictly on the execution of the edit instruction. Content Preserva...

work page 2025

[32] [32]

You MUST return a single, valid JSON object

work page

[33] [33]

The JSON object MUST contain exactly one key: "topics"

work page

[34] [34]

The value of "topics" MUST be a JSON array of strings

work page

[35] [35]

Each string in the array should be a 2-3 sentence topic description corresponding to one set of keywords from the user input

work page

[36] [36]

User Prompt: Please generate topic descriptions for the following 3 characters

The number of strings in the array MUST EXACTLY match the number of keyword sets provided. User Prompt: Please generate topic descriptions for the following 3 characters. Return the result as a JSON object according to the system instructions

work page

[37] [37]

Keywords: Name=Alex, Age=32, Profession=Software Engineer, Trait=innovative problem solving, Goal=to streamline a complex workflow

work page

[38] [38]

Keywords: Name=Jordan, Age=45, Profession=Product Manager, Trait=user-centric design, Goal=to map out a new user experience

work page

[39] [39]

Syntax Tax

Keywords: Name=Taylor, Age=28, Profession=Data Analyst, Trait=data-driven, Goal=to present data insights to stakeholders Figure 10: The prompt template used for generating different topics to set scene limitations. size, pointing to a persistent bottleneck in handling strict domain specific languages. These failures typically occur in diagrams featuring m...

work page

[40] [40]

a Sequence Diagram illustrating a user login process with two-factor authentication

Each topic is a high-level summary of the contents in the diagram with some design details, e.g., “a Sequence Diagram illustrating a user login process with two-factor authentication”

work page

[41] [41]

Each topic should be unique and not overlap with others

The topics should be diverse to help me generate varied diagrams. Each topic should be unique and not overlap with others

work page

[42] [42]

{figure_type}

The topics are strictly conditioned on the Mermaid diagram type. Please ensure the topics you provided can be best visualized in “{figure_type}”

work page

[43] [43]

All topics must be in English, even if the scenario is non-English

work page

[44] [44]

{scenario}

List{num_topics} topics for “{scenario}” and separate them with a | character, e.g., topic1 | topic2 | ...... | topic{num_topics}. Do not include any additional text at the beginning or end of your response. Figure 11: The prompt template used for generating diverse diagram scenario based on user topics and specific diagram types. ral language reasoning c...

work page

[45] [45]

{figure_type}

The data should be highly structured as a JSON object, with its schema tailored specifically for the “{figure_type}” syntax. For example, for a Class Diagram, the JSON should contain a list of ‘classes’ and a list of ‘relationships’

work page

[46] [46]

Do not use placeholder names like xxA, xxB, etc

The data should be realistic, and the contents should be named using real-world entities. Do not use placeholder names like xxA, xxB, etc

work page

[47] [47]

Do not provide too many elements; just the key information

The elements should be concise and directly map to a meaningful diagram. Do not provide too many elements; just the key information

work page

[48] [48]

All elements must be in English, even if the topic is non-English

work page

[49] [49]

{topic}”. I have a JSON object of structured data about “{scenario}

You can use the provided JSON templates to structure decision-based flows. If the topic is related to decision-making or conditional logic, please use or adapt the templates provided in the<templates>block. Figure 12: The prompt template used for generating structured JSON data elements tailored to specific Mermaid diagram types and topics. 19 Prompts for...

work page

[50] [50]

The code must be a valid and complete Mermaid mindmap script adhering strictly to syntax rules

work page

[51] [51]

mindmap”. The “mindmap

The code must start with “mindmap”. The “mindmap” keyword is reserved

work page

[52] [52]

All child nodes MUST be indented more deeply than their parent

Indentation is the ONLY way to define the hierarchy. All child nodes MUST be indented more deeply than their parent. Consistent indentation (e.g., 2 or 4 spaces) is required

work page

[53] [53]

Do not include any additional text outside of the Mermaid code block

work page

[54] [54]

CRITICAL RULE: A mindmap can only have ONE SINGLE ROOT NODE

The code must be self-contained within the“‘mermaid...“‘block. CRITICAL RULE: A mindmap can only have ONE SINGLE ROOT NODE. • The first line of the mindmap defines the root. • All other nodes MUST BE indented under this root node. • Any line with zero indentation (nodes or styles) will be treated as a second root, causing a fatal error. IMPORTANT: Styling...

work page