Human-AI collaboration on CentaurEval's collaboration-necessary tasks reaches 31.11% success, far above standalone humans at 18.89% or LLMs at 0.67%.
10 minutes) to provide basic information, educational background, and technical experience
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
CentaurEval: Benchmarking Human-in-the-Loop Value in Agentic Coding
Human-AI collaboration on CentaurEval's collaboration-necessary tasks reaches 31.11% success, far above standalone humans at 18.89% or LLMs at 0.67%.