MermaidSeqBench is a new human-verified benchmark for evaluating LLMs on natural language to Mermaid sequence diagram generation, revealing significant capability gaps across models.
Behavioral Augmentation of UML Class Diagrams: An Empirical Study of Large Language Mod- els for Method Generation
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.SE 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Preference-based prompting raises LLM adherence to object-oriented design principles in UML generation but leaves substantial output variance and model-specific differences intact.
citing papers explorer
-
MermaidSeqBench: An Evaluation Benchmark for NL-to-Mermaid Sequence Diagram Generation
MermaidSeqBench is a new human-verified benchmark for evaluating LLMs on natural language to Mermaid sequence diagram generation, revealing significant capability gaps across models.
-
Reliability of Large Language Models for Design Synthesis: An Empirical Study of Variance, Prompt Sensitivity, and Method Scaffolding
Preference-based prompting raises LLM adherence to object-oriented design principles in UML generation but leaves substantial output variance and model-specific differences intact.