DevBench is a telemetry-driven benchmark with 1,800 instances across six languages and six task categories that evaluates LLMs on realistic code completion and finds the strongest model at only 43.5% Pass@1.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
DevBench: A Realistic, Developer-Informed Benchmark for Code Generation Models
DevBench is a telemetry-driven benchmark with 1,800 instances across six languages and six task categories that evaluates LLMs on realistic code completion and finds the strongest model at only 43.5% Pass@1.