{"paper":{"title":"Bidirectional Empowerment of Metamorphic Testing and Large Language Models: A Systematic Survey","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Metamorphic testing and large language models form a reciprocal relationship that addresses the oracle problem in AI systems.","cross_cats":[],"primary_cat":"cs.SE","authors_text":"Daixu Ren, Tsong Yueh Chen, Yinwang Xu, Zenghui Zhou, Zheng Zheng","submitted_at":"2026-05-12T13:47:26Z","abstract_excerpt":"Large language models (LLMs) have introduced substantial challenges to software quality assurance due to their generative, probabilistic, and open-ended nature, which intensifies the oracle problem and limits the applicability of traditional testing methods. Metamorphic testing (MT), which checks necessary relations among multiple related executions rather than relying on exact expected outputs, has emerged as a promising approach for testing LLMs and other oracle-deficient systems. At the same time, the strong semantic understanding, reasoning, and code generation capabilities of LLMs create "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"The reciprocal relationship is characterized as the bidirectional empowerment of MT and LLMs, with MT for LLMs addressing hallucination, fairness, robustness, code reliability, retrieval-augmented generation, dialogue, and autonomous agents, and LLMs for MT supporting metamorphic relation discovery, input transformation and synthesis, executable test implementation, and agentic closed-loop testing.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"The 93 primary studies selected for review are representative of the field and the proposed taxonomy accurately captures the key interactions without significant selection or categorization bias.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A systematic survey of 93 studies that maps the bidirectional relationship between metamorphic testing and LLMs, proposing a taxonomy for MT applied to LLMs and LLMs applied to MT.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Metamorphic testing and large language models form a reciprocal relationship that addresses the oracle problem in AI systems.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"499f905e2b3a8efd01028a69460d31bb001fc9b82a01b4db022dd9155a2b6816"},"source":{"id":"2605.13898","kind":"arxiv","version":1},"verdict":{"id":"1d5767be-d0e2-4720-81b5-523e247c35e3","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T05:12:08.225354Z","strongest_claim":"The reciprocal relationship is characterized as the bidirectional empowerment of MT and LLMs, with MT for LLMs addressing hallucination, fairness, robustness, code reliability, retrieval-augmented generation, dialogue, and autonomous agents, and LLMs for MT supporting metamorphic relation discovery, input transformation and synthesis, executable test implementation, and agentic closed-loop testing.","one_line_summary":"A systematic survey of 93 studies that maps the bidirectional relationship between metamorphic testing and LLMs, proposing a taxonomy for MT applied to LLMs and LLMs applied to MT.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"The 93 primary studies selected for review are representative of the field and the proposed taxonomy accurately captures the key interactions without significant selection or categorization bias.","pith_extraction_headline":"Metamorphic testing and large language models form a reciprocal relationship that addresses the oracle problem in AI systems."},"references":{"count":119,"sample":[{"doi":"","year":null,"title":"LLM assisted coding with metamorphic specification mutation agent,","work_id":"dbc6fafc-aebf-4388-8414-875ce1415fed","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2026,"title":"arXiv: 2511.18249. ACM Comput. Surv., Vol. 1, No. 1, Article . Publication date: May 2026. Bidirectional Empowerment of Metamorphic Testing and Large Language Models: A Systematic Survey 31","work_id":"e04a67a0-d229-45e9-a677-7370502c126d","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Metamorphic testing of deep code models: a systematic literature review.ACM Transactions on Software Engineering and Methodology, 2025","work_id":"4a213b1b-a913-4d2b-b1ff-1a727f021a47","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"Large language models for software testing: a research roadmap, 2025","work_id":"0e7c08fc-13e1-4863-8ae9-f266aebec8f6","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2025,"title":"A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?","work_id":"2ada7d34-016e-4f4e-87ec-c8ce5b60577f","ref_index":5,"cited_arxiv_id":"2511.05476","is_internal_anchor":true}],"resolved_work":119,"snapshot_sha256":"bd6ffe0b84354d406ca90a5b3ad242b48b4841d4cd6d833b243bad113fadbd77","internal_anchors":5},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}