Convcodeworld: Benchmarking conversational code generation in reproducible feedback environments,

· 2025 · arXiv 2502.19852

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

CodeChat-Eval: Evaluating Large Language Models in Multi-Turn Code Refinement Dialogues

cs.SE · 2026-06-24 · unverdicted · novelty 7.0

CodeChat-Eval shows LLMs lose 19.2% to 69.2% functional correctness over multi-turn refinement dialogues, with largest drops on logic-level and additive changes.

citing papers explorer

Showing 1 of 1 citing paper.

CodeChat-Eval: Evaluating Large Language Models in Multi-Turn Code Refinement Dialogues cs.SE · 2026-06-24 · unverdicted · none · ref 17
CodeChat-Eval shows LLMs lose 19.2% to 69.2% functional correctness over multi-turn refinement dialogues, with largest drops on logic-level and additive changes.

Convcodeworld: Benchmarking conversational code generation in reproducible feedback environments,

fields

years

verdicts

representative citing papers

citing papers explorer