MUCOCO applies semantic-preserving mutation analysis to automatically expose inconsistent behaviors in code LLMs, detecting inconsistencies in about 15% of cases across 7 models and 4 tasks while outperforming the TURBULENCE baseline.
Knowledge-based Consistency Testing of Large Language Models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.SE 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MUCOCO: Automated Consistency Testing of Code LLMs
MUCOCO applies semantic-preserving mutation analysis to automatically expose inconsistent behaviors in code LLMs, detecting inconsistencies in about 15% of cases across 7 models and 4 tasks while outperforming the TURBULENCE baseline.