Humanevalcomm: Benchmarking the communication competence of code generation for llms and LLM agent

Jie JW Wu, Fatemeh H Fard · 2024 · arXiv 2406.00215

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation

cs.SE · 2026-04-23 · conditional · novelty 7.0

Orchid benchmark shows requirement ambiguity degrades LLM code generation performance across all models, with advanced models hit hardest, and LLMs rarely detect or resolve the ambiguity themselves.

A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback

cs.SE · 2026-05-18 · unverdicted · novelty 5.0

A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.

citing papers explorer

Showing 2 of 2 citing papers.

Assessing the Impact of Requirement Ambiguity on LLM-based Function-Level Code Generation cs.SE · 2026-04-23 · conditional · none · ref 40
Orchid benchmark shows requirement ambiguity degrades LLM code generation performance across all models, with advanced models hit hardest, and LLMs rarely detect or resolve the ambiguity themselves.
A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback cs.SE · 2026-05-18 · unverdicted · none · ref 82
A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.

Humanevalcomm: Benchmarking the communication competence of code generation for llms and LLM agent

fields

years

verdicts

representative citing papers

citing papers explorer