E2EDev is a BDD-grounded benchmark with fine-grained requirements, test scenarios, and an automated pipeline that shows current E2ESD frameworks and LLMs struggle to produce software meeting user needs.
- Example: - ‘submit-button‘: A button used to submit a form
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2025 1verdicts
CONDITIONAL 1representative citing papers
citing papers explorer
-
E2Edev: Benchmarking Large Language Models in End-to-End Software Development Task
E2EDev is a BDD-grounded benchmark with fine-grained requirements, test scenarios, and an automated pipeline that shows current E2ESD frameworks and LLMs struggle to produce software meeting user needs.