TRAILS infers code correctness by aggregating LLM judgments on input-output pairs from category-partitioned specification tests, improving MCC by up to 39% over Zero-Shot COT on LiveCodeBench and CoCoClaNeL.
Yate: The role of test repair in llm-based unit test generation,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
EvoSuite produced at least one fail-to-pass test for 36% of PRs versus 13% for GPT-4o, but both tools generated no meaningful change-capturing tests for 64% of the PRs evaluated.
citing papers explorer
-
Inferring Code Correctness from Specification
TRAILS infers code correctness by aggregating LLM judgments on input-output pairs from category-partitioned specification tests, improving MCC by up to 39% over Zero-Shot COT on LiveCodeBench and CoCoClaNeL.
-
PR-Aware Automated Unit Test Generation: Challenges and Opportunities
EvoSuite produced at least one fail-to-pass test for 36% of PRs versus 13% for GPT-4o, but both tools generated no meaningful change-capturing tests for 64% of the PRs evaluated.