HumanEval on Latest GPT Models–2024

· 2024 · arXiv 2402.14852

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

AdverMCTS: Combating Pseudo-Correctness in Code Generation via Adversarial Monte Carlo Tree Search

cs.SE · 2026-04-12 · unverdicted · novelty 7.0

AdverMCTS frames code generation as a minimax game where an attacker evolves tests to expose flaws in solver-generated code, yielding more robust outputs than static-test baselines.

A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback

cs.SE · 2026-05-18 · unverdicted · novelty 5.0

A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.

From GPT-3 to GPT-5: Mapping their capabilities, scope, limitations, and consequences

cs.AI · 2026-04-11 · unverdicted · novelty 2.0

The GPT family has shifted from scaled text predictors to aligned multimodal tool-oriented systems, with persistent limitations like hallucination and prompt sensitivity remaining unchanged.

citing papers explorer

Showing 3 of 3 citing papers.

AdverMCTS: Combating Pseudo-Correctness in Code Generation via Adversarial Monte Carlo Tree Search cs.SE · 2026-04-12 · unverdicted · none · ref 23
AdverMCTS frames code generation as a minimax game where an attacker evolves tests to expose flaws in solver-generated code, yielding more robust outputs than static-test baselines.
A-ProS: Towards Reliable Autonomous Programming Through Multi-Model Feedback cs.SE · 2026-05-18 · unverdicted · none · ref 37
A-ProS uses a hybrid multi-model feedback framework with stateful refinement to improve success rates on competitive programming problems, achieving over 2x gains compared to baseline agent loops.
From GPT-3 to GPT-5: Mapping their capabilities, scope, limitations, and consequences cs.AI · 2026-04-11 · unverdicted · none · ref 19
The GPT family has shifted from scaled text predictors to aligned multimodal tool-oriented systems, with persistent limitations like hallucination and prompt sensitivity remaining unchanged.

HumanEval on Latest GPT Models–2024

fields

years

verdicts

representative citing papers

citing papers explorer