Testbench: Evaluating class-level test case generation capability of large language models

Quanjun Zhang, Ye Shang, Chunrong Fang, Siqi Gu, Jianyi Zhou, Zhenyu Chen · 2024 · arXiv 2409.17561

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

FeedbackLLM: Metadata driven Multi-Agentic Language Agnostic Test Case Generator with Evolving prompt and Coverage Feedback

cs.SE · 2026-05-02 · unverdicted · novelty 6.0

FeedbackLLM uses line and branch coverage feedback agents in an iterative multi-agent process with a redundancy cache to generate test cases achieving higher coverage than baselines on standard C and Python benchmarks while scaling linearly in time.

Call-Chain-Aware LLM-Based Test Generation for Java Projects

cs.SE · 2026-04-23 · unverdicted · novelty 6.0

CAT improves line coverage by 18% and branch coverage by 22% over prior LLM test generation methods by adding call-chain and dependency context from static analysis to prompts.

Mutation-Guided Unit Test Generation with a Large Language Model

cs.SE · 2025-06-03 · conditional · novelty 6.0

MUTGEN incorporates mutation feedback into LLM prompts and uses iteration to generate unit tests that achieve higher mutation scores than EvoSuite or vanilla LLM prompting on 204 benchmark subjects.

PPO guided Agentic Pipeline for Adaptive Prompt Selection and Test Case Generation

cs.SE · 2026-05-01 · unverdicted · novelty 5.0

PPO-LLM adaptively selects among eight prompting techniques using an 11-dimensional state vector to guide an LLM toward higher branch and line coverage than static baselines on 20 benchmark programs.

citing papers explorer

Showing 4 of 4 citing papers.

FeedbackLLM: Metadata driven Multi-Agentic Language Agnostic Test Case Generator with Evolving prompt and Coverage Feedback cs.SE · 2026-05-02 · unverdicted · none · ref 32
FeedbackLLM uses line and branch coverage feedback agents in an iterative multi-agent process with a redundancy cache to generate test cases achieving higher coverage than baselines on standard C and Python benchmarks while scaling linearly in time.
Call-Chain-Aware LLM-Based Test Generation for Java Projects cs.SE · 2026-04-23 · unverdicted · none · ref 44
CAT improves line coverage by 18% and branch coverage by 22% over prior LLM test generation methods by adding call-chain and dependency context from static analysis to prompts.
Mutation-Guided Unit Test Generation with a Large Language Model cs.SE · 2025-06-03 · conditional · none · ref 74
MUTGEN incorporates mutation feedback into LLM prompts and uses iteration to generate unit tests that achieve higher mutation scores than EvoSuite or vanilla LLM prompting on 204 benchmark subjects.
PPO guided Agentic Pipeline for Adaptive Prompt Selection and Test Case Generation cs.SE · 2026-05-01 · unverdicted · none · ref 30
PPO-LLM adaptively selects among eight prompting techniques using an 11-dimensional state vector to guide an LLM toward higher branch and line coverage than static baselines on 20 benchmark programs.

Testbench: Evaluating class-level test case generation capability of large language models

fields

years

verdicts

representative citing papers

citing papers explorer