Sakura is a multi-agent system that generates structurally complex tests from NL descriptions, achieving 50-78% higher compilability and 38-66% higher coverage overlap than baselines on 1,464 scenarios from 20 Apache Commons applications.
Yuxiang Wei, Zhe Wang, Jiawei Liu, Yifeng Ding, and Lingming Zhang
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2verdicts
UNVERDICTED 2representative citing papers
CodeFlowBench is a new benchmark with 5000+ problems and GitHub-sourced repos that evaluates LLMs on multi-turn code reuse using dependency-tree structural metrics, revealing performance drops as complexity rises.
citing papers explorer
-
Sakura: An Approach for Generating Complex Tests from Natural Language Test Descriptions
Sakura is a multi-agent system that generates structurally complex tests from NL descriptions, achieving 50-78% higher compilability and 38-66% higher coverage overlap than baselines on 1,464 scenarios from 20 Apache Commons applications.