Hallucination detection in large language models with metamorphic relations

Borui Yang, Md Afif Al Mamun, Jie M Zhang, Gias Uddin · 2025 · arXiv 2502.15844

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher?

cs.SE · 2025-11-07 · unverdicted · novelty 6.0

Student models distilled from code language models often fail to deeply mimic teachers, showing up to 62% behavioral discrepancies and 285% worse drops under attacks that accuracy metrics miss.

Principled Detection of Hallucinations in Large Language Models via Multiple Testing

cs.CL · 2025-08-25 · unverdicted · novelty 6.0

The method aggregates multiple hallucination evaluation scores via conformal p-values to enable calibrated detection with controlled false alarm rates across LLMs and datasets.

citing papers explorer

Showing 2 of 2 citing papers.

A Metamorphic Testing Perspective on Knowledge Distillation for Language Models of Code: Does the Student Deeply Mimic the Teacher? cs.SE · 2025-11-07 · unverdicted · none · ref 42
Student models distilled from code language models often fail to deeply mimic teachers, showing up to 62% behavioral discrepancies and 285% worse drops under attacks that accuracy metrics miss.
Principled Detection of Hallucinations in Large Language Models via Multiple Testing cs.CL · 2025-08-25 · unverdicted · none · ref 24
The method aggregates multiple hallucination evaluation scores via conformal p-values to enable calibrated detection with controlled false alarm rates across LLMs and datasets.

Hallucination detection in large language models with metamorphic relations

fields

years

verdicts

representative citing papers

citing papers explorer