Vibe Code Bench evaluates AI models on building complete web applications from specs, with the best of 16 models achieving 61.8% accuracy on the test split using autonomous browser evaluation.
md at project root in {{ workspace_path }}/ generated - app / , at most one page long , with
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Vibe Code Bench: Evaluating AI Models on End-to-End Web Application Development
Vibe Code Bench evaluates AI models on building complete web applications from specs, with the best of 16 models achieving 61.8% accuracy on the test split using autonomous browser evaluation.