TestHumanizer uses LLMs as refactoring layers on EvoSuite suites to reach 88-98% compilation rates and better readability on 350 classes from Defects4J and SF110 while preserving coverage.
Test smells in LLM-generated unit tests,
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.SE 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
LLM-generated unit tests with retrieval-augmented context detect faults in 69% of real Python bugs versus 17.2% for general-purpose human-written tests, with similar coverage levels.
citing papers explorer
-
Humanizing Automatically Generated Unit Test Suites with LLM-Based Refactoring
TestHumanizer uses LLMs as refactoring layers on EvoSuite suites to reach 88-98% compilation rates and better readability on 350 classes from Defects4J and SF110 while preserving coverage.
-
LLM vs. Human Unit Tests: Fault Detection on Real Python Bugs
LLM-generated unit tests with retrieval-augmented context detect faults in 69% of real Python bugs versus 17.2% for general-purpose human-written tests, with similar coverage levels.