MermaidSeqBench is a new human-verified benchmark for evaluating LLMs on natural language to Mermaid sequence diagram generation, revealing significant capability gaps across models.
Free and customizable code documentation with llms: A fine-tuning approach
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
Framework uses eight LLMs for code documentation generation and four LLMs as judges on nine criteria, showing 42% performance gap on medical physics library.
citing papers explorer
-
MermaidSeqBench: An Evaluation Benchmark for NL-to-Mermaid Sequence Diagram Generation
MermaidSeqBench is a new human-verified benchmark for evaluating LLMs on natural language to Mermaid sequence diagram generation, revealing significant capability gaps across models.
-
LLM-Based Code Documentation Generation and Multi-Judge Evaluation
Framework uses eight LLMs for code documentation generation and four LLMs as judges on nine criteria, showing 42% performance gap on medical physics library.