pith. sign in

← back to paper

Review history

arxiv: 2606.25747 · 2 revisions

CodeChat-Eval: Evaluating Large Language Models in Multi-Turn Code Refinement Dialogues

  1. 2026-07-01 UNVERDICTED LOW v0.9.1-grok novelty 7.0
    53230 ms 5762 in 1218 out 2026-07-01T07:03:24.300710+00:00
  2. 2026-06-25 UNVERDICTED LOW v0.9.1-grok novelty 7.0
    21570 ms 5765 in 1079 out 2026-06-25T20:20:38.067563+00:00