MermaidSeqBench is a new human-verified benchmark for evaluating LLMs on natural language to Mermaid sequence diagram generation, revealing significant capability gaps across models.
Mcet: Behavioral model correctness evaluation using large language models, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MermaidSeqBench: An Evaluation Benchmark for NL-to-Mermaid Sequence Diagram Generation
MermaidSeqBench is a new human-verified benchmark for evaluating LLMs on natural language to Mermaid sequence diagram generation, revealing significant capability gaps across models.