MUCOCO applies semantic-preserving mutation analysis to automatically expose inconsistent behaviors in code LLMs, detecting inconsistencies in about 15% of cases across 7 models and 4 tasks while outperforming the TURBULENCE baseline.
ACM Transactions on Software Engineering and Methodology , volume=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Introduces EPC-AW to mitigate epistemic miscalibration in LLM multi-agent planning via consistency-based selection and refinement, reporting 9.75% average success improvement.
citing papers explorer
-
MUCOCO: Automated Consistency Testing of Code LLMs
MUCOCO applies semantic-preserving mutation analysis to automatically expose inconsistent behaviors in code LLMs, detecting inconsistencies in about 15% of cases across 7 models and 4 tasks while outperforming the TURBULENCE baseline.
-
When Planning Fails Despite Correct Execution: On Epistemic Calibration for LLM-Based Multi-Agent Systems
Introduces EPC-AW to mitigate epistemic miscalibration in LLM multi-agent planning via consistency-based selection and refinement, reporting 9.75% average success improvement.