arXiv preprint arXiv:2310.02368 , year=

Reinforcement Learning from Automatic Feedback for High-Quality Unit Test Generation , author= · arXiv 2310.02368

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades

cs.SE · 2026-05-14 · unverdicted · novelty 7.0

SWE-Chain provides 155 chained version transitions and 1,660 requirements across 9 Python packages, where frontier agents resolve 44.8% of tasks on average and struggle to preserve functionality across releases.

LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code

cs.SE · 2024-03-12 · unverdicted · novelty 6.0

LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

citing papers explorer

Showing 2 of 2 citing papers.

SWE-Chain: Benchmarking Coding Agents on Chained Release-Level Package Upgrades cs.SE · 2026-05-14 · unverdicted · none · ref 17
SWE-Chain provides 155 chained version transitions and 1,660 requirements across 9 Python packages, where frontier agents resolve 44.8% of tasks on average and struggle to preserve functionality across releases.
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code cs.SE · 2024-03-12 · unverdicted · none · ref 230
LiveCodeBench collects 400 recent contest problems to create a contamination-free benchmark evaluating LLMs on code generation and related capabilities like self-repair and execution.

arXiv preprint arXiv:2310.02368 , year=

fields

years

verdicts

representative citing papers

citing papers explorer