Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions

Manish Gupta; Sankalp Mittal; Shivank Garg

arxiv: 2604.14941 · v1 · submitted 2026-04-16 · 💻 cs.CL

Text2Arch: A Dataset for Generating Scientific Architecture Diagrams from Natural Language Descriptions

Shivank Garg , Sankalp Mittal , Manish Gupta This is my paper

Pith reviewed 2026-05-10 11:47 UTC · model grok-4.3

classification 💻 cs.CL

keywords text-to-diagramarchitecture diagramsDOT codelanguage modelsdatasetsemantic fidelitydiagram generationfine-tuning

0 comments

The pith

A new dataset of text-to-DOT pairs lets small language models generate scientific architecture diagrams at GPT-4o level.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper creates Text2Arch, a large open dataset that pairs natural language descriptions of scientific architectures with their corresponding diagrams and DOT code representations. This resource supports fine-tuning of smaller language models to translate text descriptions into code that renders accurate diagrams. Experiments demonstrate that the resulting models exceed the performance of prior baselines such as DiagramAgent while matching the output quality obtained from GPT-4o through in-context learning. The contribution fills a gap where no suitable large-scale public data previously existed for this text-to-diagram task.

Core claim

By releasing Text2Arch, which supplies aligned textual descriptions, visual architecture images, and DOT code for a wide range of scientific systems, the authors show that fine-tuned small language models can produce high-fidelity diagrams from text input, outperforming existing specialized baselines and reaching parity with in-context learning from GPT-4o.

What carries the argument

The Text2Arch dataset, consisting of scientific architecture images, their natural language descriptions, and associated DOT code representations, which supplies supervised training pairs for mapping semantics to diagram code.

If this is right

Automated generation of clear visual aids for complex system designs in enterprise and research settings.
Reduced ambiguity when conveying scientific processes through combined text and diagram outputs.
Support for educational tools that convert descriptions into ready-to-use architecture visuals.
Public models and data that enable further development of text-to-diagram systems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same paired text-and-code approach could be adapted to generate other diagram types such as flowcharts or network topologies.
Integration with documentation pipelines might allow automatic visual updates whenever system descriptions change.
The dataset could serve as a benchmark for testing how well future models preserve technical relationships in generated visuals.

Load-bearing premise

The collected dataset and chosen metrics accurately reflect real-world scientific descriptions and true improvements in diagram semantic fidelity without selection biases.

What would settle it

Evaluating the fine-tuned models on a fresh collection of independently sourced real scientific text descriptions and finding that generated diagrams show no measurable gain in accuracy or fidelity over the baselines.

Figures

Figures reproduced from arXiv: 2604.14941 by Manish Gupta, Sankalp Mittal, Shivank Garg.

**Figure 2.** Figure 2: TEXT2ARCH Dataset Curation images are stratified split into train and validation so as to maintain the same ratio of arch vs no-arch images in train and test. We train multiple models like CLIP (Radford et al., 2021), ViT (Dosovitskiy et al., 2020), BEiT (Bao et al., 2021), and ResNet (He et al., 2016), and report results in [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 4.** Figure 4: Case Study 1: Comparison showing DeepSeek-7B inference significantly outperforming [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Illustrations for generated dots using DiagramAgent (left), GPT (right top) and fewShot [PITH_FULL_IMAGE:figures/full_fig_p018_5.png] view at source ↗

**Figure 6.** Figure 6: Case Study 2: Comparison showing DeepSeek-7B inference significantly outperforming [PITH_FULL_IMAGE:figures/full_fig_p019_6.png] view at source ↗

**Figure 7.** Figure 7: Illustrations for generated dots using DiagramAgent (left), GPT (middle) and fewShot [PITH_FULL_IMAGE:figures/full_fig_p020_7.png] view at source ↗

**Figure 8.** Figure 8: Case Study 3: Comparison showing DeepSeek-7B inference significantly outperforming [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Illustrations for generated dots using DiagramAgent (left), GPT (middle) and fewShot [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

read the original abstract

Communicating complex system designs or scientific processes through text alone is inefficient and prone to ambiguity. A system that automatically generates scientific architecture diagrams from text with high semantic fidelity can be useful in multiple applications like enterprise architecture visualization, AI-driven software design, and educational content creation. Hence, in this paper, we focus on leveraging language models to perform semantic understanding of the input text description to generate intermediate code that can be processed to generate high-fidelity architecture diagrams. Unfortunately, no clean large-scale open-access dataset exists, implying lack of any effective open models for this task. Hence, we contribute a comprehensive dataset, \system, comprising scientific architecture images, their corresponding textual descriptions, and associated DOT code representations. Leveraging this resource, we fine-tune a suite of small language models, and also perform in-context learning using GPT-4o. Through extensive experimentation, we show that \system{} models significantly outperform existing baseline models like DiagramAgent and perform at par with in-context learning-based generations from GPT-4o. We make the code, data and models publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper releases a practical new dataset pairing scientific architecture text with images and DOT code, plus some benchmarking of fine-tuned models against baselines and GPT-4o, but thin experimental details and possible metric mismatches make the performance claims hard to assess fully.

read the letter

This paper releases a dataset called Text2Arch that links natural language descriptions of scientific architecture with corresponding images and DOT code, then benchmarks fine-tuned small language models and GPT-4o in-context learning on generating the code from text. The dataset fills a gap since no large open resource existed for this specific task, and they make everything public, which is helpful for follow-up work in diagram generation for education or software design. The experiments reportedly show their models outperforming baselines like DiagramAgent and matching GPT-4o, but without details on dataset size, construction method, train-test splits, or the precise evaluation metrics and error analysis, it's difficult to gauge how reliable those performance numbers are. A potential issue is that metrics based on DOT code similarity might not capture true semantic or visual fidelity, as multiple valid DOT representations can produce identical diagrams. If the paper relies on surface-level string matches rather than comparing rendered outputs or using human judgments, the superiority claims could be weaker than they appear. This work is mainly for people building tools for automatic diagram creation in technical domains or looking for data to train models on code generation from descriptions. It shows clear thinking in identifying the need and providing a resource, so it deserves peer review to sort out the experimental specifics.

Referee Report

2 major / 0 minor

Summary. The paper introduces the Text2Arch dataset, which pairs natural language descriptions of scientific architectures with corresponding diagram images and DOT code representations. It fine-tunes small language models on this resource and evaluates in-context learning with GPT-4o, claiming that the resulting models significantly outperform baselines such as DiagramAgent while matching the performance of GPT-4o ICL generations.

Significance. If the dataset construction and evaluations prove robust, the work supplies a much-needed open resource for text-to-diagram generation in scientific and architectural domains, where prior datasets were absent. The public release of data, code, and models strengthens reproducibility and enables follow-on research.

major comments (2)

[Abstract and experimental sections] Abstract and experimental sections: The abstract asserts outperformance 'through extensive experimentation' yet provides no details on dataset size, construction process, train/test splits, chosen evaluation metrics, or statistical tests. These omissions are load-bearing for the central empirical claims and prevent verification that the reported gains are supported by the data.
[Evaluation methodology] Evaluation methodology (likely Section 4 or 5): Generating and scoring DOT code raises a direct concern for semantic fidelity. Multiple syntactically distinct DOT strings can render identical diagrams (different edge orders, attribute placements, or subgraph groupings). If metrics are string-based (e.g., BLEU/ROUGE on raw DOT), they risk rewarding surface-form matches rather than visual or semantic equivalence; the same limitation applies to the DiagramAgent baseline, so relative gains may be metric artifacts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We appreciate the emphasis on transparency in the abstract and experimental design as well as the important methodological point about evaluating DOT code. We address each major comment below and outline the revisions we will make.

read point-by-point responses

Referee: [Abstract and experimental sections] Abstract and experimental sections: The abstract asserts outperformance 'through extensive experimentation' yet provides no details on dataset size, construction process, train/test splits, chosen evaluation metrics, or statistical tests. These omissions are load-bearing for the central empirical claims and prevent verification that the reported gains are supported by the data.

Authors: We agree that the abstract would be strengthened by including these key details to allow immediate verification of the claims. In the revised manuscript we will expand the abstract to report the total dataset size, a concise summary of the construction process, the train/test split ratios, the primary evaluation metrics, and any statistical significance tests performed. We will also review the experimental sections to ensure all of these elements are explicitly stated with clear cross-references and will add statistical tests where they are currently absent. These changes will make the empirical support for our results fully transparent. revision: yes
Referee: [Evaluation methodology] Evaluation methodology (likely Section 4 or 5): Generating and scoring DOT code raises a direct concern for semantic fidelity. Multiple syntactically distinct DOT strings can render identical diagrams (different edge orders, attribute placements, or subgraph groupings). If metrics are string-based (e.g., BLEU/ROUGE on raw DOT), they risk rewarding surface-form matches rather than visual or semantic equivalence; the same limitation applies to the DiagramAgent baseline, so relative gains may be metric artifacts.

Authors: This concern is valid: string-based metrics on raw DOT code can indeed be sensitive to superficial syntactic differences that do not affect the rendered diagram. Because the DiagramAgent baseline is scored with the same metrics, the relative gains we report remain meaningful as a comparative measure, yet we acknowledge the limitation. In the revision we will add an explicit discussion of this issue in the evaluation section, describe any steps taken to mitigate surface-form variance (such as canonicalization where applied), and include qualitative examples of equivalent diagrams produced by different DOT strings. If space permits we will also report supplementary visual similarity results on rendered diagrams to further support semantic fidelity. revision: partial

Circularity Check

0 steps flagged

Empirical dataset release and benchmarking with no derivation chain

full rationale

The paper contributes a new dataset (Text2Arch) of scientific architecture images, text descriptions, and DOT code, then fine-tunes small LMs and evaluates GPT-4o ICL against external baselines such as DiagramAgent. No equations, derivations, fitted parameters renamed as predictions, or self-citation chains appear in the provided abstract or described content. All claims rest on empirical comparisons to independent external models and metrics; the work is self-contained against external benchmarks with no load-bearing self-referential steps.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review reveals no mathematical axioms, free parameters, or invented entities; the work relies on standard language-model fine-tuning practices whose hyperparameters are not specified.

pith-pipeline@v0.9.0 · 5486 in / 1152 out tokens · 47151 ms · 2026-05-10T11:47:36.034357+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 38 canonical work pages

[1]

Determineifthe image is an architecture diagram (commonly usedinresearch papers to depict the structure, components, or workflows of systems)

work page
[2]

Description should include module names, short description of modules, and flow of information across modules

If it is an architecture diagram, generate a concise, precise, and coherent description of 10−20 sentences explaining the main elements of the diagram. Description should include module names, short description of modules, and flow of information across modules

work page
[3]

Hence avoid any irrelevant informationinthe description

The description you provide would be further used to train a model to generate such architecture images. Hence avoid any irrelevant informationinthe description. ## Requirements:

work page
[4]

16 Published as a conference paper at ICLR 2026 (a)Original figure(Fig

Carefully analyze the provided paragraphs, focusing on extracting key elements that directly explain the architecture depictedinthe image. 16 Published as a conference paper at ICLR 2026 (a)Original figure(Fig. 3 from https://arxiv.org/pdf/1701.07543v1) Fig obtained using generated DOT code by our TEXT2ARCH’s finetuned DeepSeek 7B model x1 w1 x2 w2 xI wI ...

work page arXiv 2026
[5]

The output of this summing junction is denoted as u

Summing Junction (Σ): The weighted inputs are summed together at the summing junction, represented by the symbolΣ. The output of this summing junction is denoted as u. 4. Activation Function (Φ): The summed value u is then passed through an activation function, denoted asΦ(.). This function determines the output of the perceptron. 5. Output (y): The final...

work page arXiv 2026
[6]

Exclude extraneous, redundant, or noisy details from the textual content and focus only on the architectural aspects

work page
[7]

Clearly indicate whether the image is an architecture diagram or not

work page
[8]

Label output should be within<label></label>tags and could be ”arch” or ”not arch”

Provide your outputina structured format ## Inputs: IMAGE:#imageURL# Caption:#caption# Description:#Descriptions# ## Example Output: <results> <label>[arch|not arch]</label> <newDesc>Concise and precise description goes here.</newDesc> </results> Output resultsina nested XML. Label output should be within<label></label>tags and could be ”arch” or ”not arc...

work page
[9]

Analyze both the DOT code and the image to identify any incorrect node labels, incorrect connections, or incorrect ordering of nodes

work page
[10]

Refine the DOT code to ensure it accurately represents the structure and relationships depictedinthe image

work page
[11]

## Inputs: IMAGE:#imageURL# Initial DOT code: (which may contain errors or incomplete data) #dotCode# 20 Published as a conference paper at ICLR 2026 (a)Original figure(Fig

Output the corrected DOT fileina structured XML format. ## Inputs: IMAGE:#imageURL# Initial DOT code: (which may contain errors or incomplete data) #dotCode# 20 Published as a conference paper at ICLR 2026 (a)Original figure(Fig. 16 from https://arxiv.org/pdf/1905.09481v1) Fig obtained using generated DOT code by our TEXT2ARCH’s finetuned DeepSeek 7B mode...

work page arXiv 2026
[12]

Conv 3x3: Another convolutional layer with a 3x3 filter size, further refining the features. 5. Conv 3x5: A second convolutional layer with a 3x5 filter size, continuing the feature extraction process. 6. Pool max: A max pooling layer that further reduces the spatial dimensions by taking the maximum value in each region. 7. Conv 3x3: A final convolutional...

work page 2026
[13]

Ensure the refined DOT code fully represents the relationshipsinthe image

work page
[14]

Maintain proper indentation and formattinginthe DOT code

work page
[15]

H GPT PROMPT TO CONVERTTIKZCODE TODOT Given the following LaTeX TikZ code:

Encapsulate the final DOT code within<results><![CDATA[ ]]></results>to prevent XML parsing issues. H GPT PROMPT TO CONVERTTIKZCODE TODOT Given the following LaTeX TikZ code:

work page
[16]

First, re−indent the TikZ partforreadability

work page
[17]

Then, extract all\node text labels and assign each a unique integer ID (e.g., 0, 1, 2...)

work page
[18]

Use the format: ID [label=‘‘...’’]; to define each node

work page
[19]

Infer reasonable directed edges based on layout or label semantics (e.g., data flow, left−to−right, top−to−bottom)

work page
[20]

Maintain proper indentation and formattinginthe DOT code

Output the result as a DOT file using the below graph structure, starting directly with: <results> <![CDATA[ digraph{ 0 [label=‘‘Node 0 description’’] 1 [label=‘‘Node 1 description’’] 2 [label=‘‘Node 2 description’’] 22 Published as a conference paper at ICLR 2026 3 [label=‘‘Node 3 description’’] 0 −>1; 0 −>2; 2 −>3; } ]]> </results> Do not include any ra...

work page 2026
[21]

Read and interpret the following textual description of a system, pipeline, or process

work page
[22]

Generate accurate DOT code that reflects the described structure, relationships, and flow

work page
[23]

Output the DOT codeina structured XML formatfordownstream usage. ## Input: #Cleaned−Description# ## Example Output: ‘‘‘ <results> <![CDATA[ digraph{ 0 [label=‘‘Node 0 description’’] 1 [label=‘‘Node 1 description’’] 2 [label=‘‘Node 2 description’’] 3 [label=‘‘Node 3 description’’] 0 −>1; 0 −>2; 2 −>3; } ]]> </results> ‘‘‘ Instructions: − Identify all relev...

work page 2026
[24]

Analyze the given image along with two candidate textual descriptions (marked as Description 1 and Description 2)

work page
[25]

Determine which description better matches the content and semantics of the image

work page
[26]

## Inputs: Image: IMAGE:#Image URL# Description 1:#description 1# Description 2:#description 2# ## Output Format: Output all of theseina nested XML

Return the index of the better matching description (either 1 or 2), followed by a short explanation justifying your choice. ## Inputs: Image: IMAGE:#Image URL# Description 1:#description 1# Description 2:#description 2# ## Output Format: Output all of theseina nested XML. <results> <index>1</index> <explanation>The explanation should briefly describe why...

work page
[27]

Analyze the image to understand the correct structure, node positions, labels, and connections

work page
[28]

Compare the generated DOT code with both the image and the ground−truth DOT code

work page
[29]

Determineifthe structure, labels, node ordering, and relationshipsinthe generated DOT code accurately reflect the image and ground−truth

work page
[30]

− 4 = Minor discrepancies that don’t affect comprehension

Assign a compatibility score between 0 and 5, where: − 5 = Perfect match. − 4 = Minor discrepancies that don’t affect comprehension. − 3 = Some noticeable errors, but mostly accurate. − 2 = Multiple mismatches that affect comprehension. − 1 = Mostly incorrect. − 0 = Completely unrelated

work page
[31]

Provide a concise explanation (2−3 sentences) describing the key issues or strengths. ## Output Format: <results> <score>4</score> <explanation>The generated DOT code has correct node labels and most connections, but the order and direction of two edges differ from the image.</explanation> </results> Output all of these in a nested XML. 24 Published as a ...

work page 2026
[32]

Use ‘digraph{’ as the graph declaration

work page
[33]

Set appropriate rankdir (TB for top−bottom, LR for left−right) if needed

work page
[34]

Use appropriate node shapes (box is default)

work page
[35]

Create meaningful node labels

work page
[36]

Add edge labels where appropriate to describe relationships

work page
[37]

Keep the graph structure clear and readable

work page
[38]

Instruct

IMPORTANT: Respond with ONLY the DOT code, no explanations or additional text Here are examples of how to convert descriptions to DOT graphs:{few shot examples}. Convert the following description into DOT language code. Respond with ONLY the DOT code and nothing else:{description}’’. Few Shot Examples for the Small Language Models based evaluation Example...

work page 2026

[1] [1]

Determineifthe image is an architecture diagram (commonly usedinresearch papers to depict the structure, components, or workflows of systems)

work page

[2] [2]

Description should include module names, short description of modules, and flow of information across modules

If it is an architecture diagram, generate a concise, precise, and coherent description of 10−20 sentences explaining the main elements of the diagram. Description should include module names, short description of modules, and flow of information across modules

work page

[3] [3]

Hence avoid any irrelevant informationinthe description

The description you provide would be further used to train a model to generate such architecture images. Hence avoid any irrelevant informationinthe description. ## Requirements:

work page

[4] [4]

16 Published as a conference paper at ICLR 2026 (a)Original figure(Fig

Carefully analyze the provided paragraphs, focusing on extracting key elements that directly explain the architecture depictedinthe image. 16 Published as a conference paper at ICLR 2026 (a)Original figure(Fig. 3 from https://arxiv.org/pdf/1701.07543v1) Fig obtained using generated DOT code by our TEXT2ARCH’s finetuned DeepSeek 7B model x1 w1 x2 w2 xI wI ...

work page arXiv 2026

[5] [5]

The output of this summing junction is denoted as u

Summing Junction (Σ): The weighted inputs are summed together at the summing junction, represented by the symbolΣ. The output of this summing junction is denoted as u. 4. Activation Function (Φ): The summed value u is then passed through an activation function, denoted asΦ(.). This function determines the output of the perceptron. 5. Output (y): The final...

work page arXiv 2026

[6] [6]

Exclude extraneous, redundant, or noisy details from the textual content and focus only on the architectural aspects

work page

[7] [7]

Clearly indicate whether the image is an architecture diagram or not

work page

[8] [8]

Label output should be within<label></label>tags and could be ”arch” or ”not arch”

Provide your outputina structured format ## Inputs: IMAGE:#imageURL# Caption:#caption# Description:#Descriptions# ## Example Output: <results> <label>[arch|not arch]</label> <newDesc>Concise and precise description goes here.</newDesc> </results> Output resultsina nested XML. Label output should be within<label></label>tags and could be ”arch” or ”not arc...

work page

[9] [9]

Analyze both the DOT code and the image to identify any incorrect node labels, incorrect connections, or incorrect ordering of nodes

work page

[10] [10]

Refine the DOT code to ensure it accurately represents the structure and relationships depictedinthe image

work page

[11] [11]

## Inputs: IMAGE:#imageURL# Initial DOT code: (which may contain errors or incomplete data) #dotCode# 20 Published as a conference paper at ICLR 2026 (a)Original figure(Fig

Output the corrected DOT fileina structured XML format. ## Inputs: IMAGE:#imageURL# Initial DOT code: (which may contain errors or incomplete data) #dotCode# 20 Published as a conference paper at ICLR 2026 (a)Original figure(Fig. 16 from https://arxiv.org/pdf/1905.09481v1) Fig obtained using generated DOT code by our TEXT2ARCH’s finetuned DeepSeek 7B mode...

work page arXiv 2026

[12] [12]

Conv 3x3: Another convolutional layer with a 3x3 filter size, further refining the features. 5. Conv 3x5: A second convolutional layer with a 3x5 filter size, continuing the feature extraction process. 6. Pool max: A max pooling layer that further reduces the spatial dimensions by taking the maximum value in each region. 7. Conv 3x3: A final convolutional...

work page 2026

[13] [13]

Ensure the refined DOT code fully represents the relationshipsinthe image

work page

[14] [14]

Maintain proper indentation and formattinginthe DOT code

work page

[15] [15]

H GPT PROMPT TO CONVERTTIKZCODE TODOT Given the following LaTeX TikZ code:

Encapsulate the final DOT code within<results><![CDATA[ ]]></results>to prevent XML parsing issues. H GPT PROMPT TO CONVERTTIKZCODE TODOT Given the following LaTeX TikZ code:

work page

[16] [16]

First, re−indent the TikZ partforreadability

work page

[17] [17]

Then, extract all\node text labels and assign each a unique integer ID (e.g., 0, 1, 2...)

work page

[18] [18]

Use the format: ID [label=‘‘...’’]; to define each node

work page

[19] [19]

Infer reasonable directed edges based on layout or label semantics (e.g., data flow, left−to−right, top−to−bottom)

work page

[20] [20]

Maintain proper indentation and formattinginthe DOT code

Output the result as a DOT file using the below graph structure, starting directly with: <results> <![CDATA[ digraph{ 0 [label=‘‘Node 0 description’’] 1 [label=‘‘Node 1 description’’] 2 [label=‘‘Node 2 description’’] 22 Published as a conference paper at ICLR 2026 3 [label=‘‘Node 3 description’’] 0 −>1; 0 −>2; 2 −>3; } ]]> </results> Do not include any ra...

work page 2026

[21] [21]

Read and interpret the following textual description of a system, pipeline, or process

work page

[22] [22]

Generate accurate DOT code that reflects the described structure, relationships, and flow

work page

[23] [23]

Output the DOT codeina structured XML formatfordownstream usage. ## Input: #Cleaned−Description# ## Example Output: ‘‘‘ <results> <![CDATA[ digraph{ 0 [label=‘‘Node 0 description’’] 1 [label=‘‘Node 1 description’’] 2 [label=‘‘Node 2 description’’] 3 [label=‘‘Node 3 description’’] 0 −>1; 0 −>2; 2 −>3; } ]]> </results> ‘‘‘ Instructions: − Identify all relev...

work page 2026

[24] [24]

Analyze the given image along with two candidate textual descriptions (marked as Description 1 and Description 2)

work page

[25] [25]

Determine which description better matches the content and semantics of the image

work page

[26] [26]

## Inputs: Image: IMAGE:#Image URL# Description 1:#description 1# Description 2:#description 2# ## Output Format: Output all of theseina nested XML

Return the index of the better matching description (either 1 or 2), followed by a short explanation justifying your choice. ## Inputs: Image: IMAGE:#Image URL# Description 1:#description 1# Description 2:#description 2# ## Output Format: Output all of theseina nested XML. <results> <index>1</index> <explanation>The explanation should briefly describe why...

work page

[27] [27]

Analyze the image to understand the correct structure, node positions, labels, and connections

work page

[28] [28]

Compare the generated DOT code with both the image and the ground−truth DOT code

work page

[29] [29]

Determineifthe structure, labels, node ordering, and relationshipsinthe generated DOT code accurately reflect the image and ground−truth

work page

[30] [30]

− 4 = Minor discrepancies that don’t affect comprehension

Assign a compatibility score between 0 and 5, where: − 5 = Perfect match. − 4 = Minor discrepancies that don’t affect comprehension. − 3 = Some noticeable errors, but mostly accurate. − 2 = Multiple mismatches that affect comprehension. − 1 = Mostly incorrect. − 0 = Completely unrelated

work page

[31] [31]

Provide a concise explanation (2−3 sentences) describing the key issues or strengths. ## Output Format: <results> <score>4</score> <explanation>The generated DOT code has correct node labels and most connections, but the order and direction of two edges differ from the image.</explanation> </results> Output all of these in a nested XML. 24 Published as a ...

work page 2026

[32] [32]

Use ‘digraph{’ as the graph declaration

work page

[33] [33]

Set appropriate rankdir (TB for top−bottom, LR for left−right) if needed

work page

[34] [34]

Use appropriate node shapes (box is default)

work page

[35] [35]

Create meaningful node labels

work page

[36] [36]

Add edge labels where appropriate to describe relationships

work page

[37] [37]

Keep the graph structure clear and readable

work page

[38] [38]

Instruct

IMPORTANT: Respond with ONLY the DOT code, no explanations or additional text Here are examples of how to convert descriptions to DOT graphs:{few shot examples}. Convert the following description into DOT language code. Respond with ONLY the DOT code and nothing else:{description}’’. Few Shot Examples for the Small Language Models based evaluation Example...

work page 2026