MedHal-Loc benchmark shows KG-triple hallucination detectors localize errors no better than chance on controlled medical statements due to entity extraction limits, while NLI and consistency methods succeed above chance, and real hallucinations are mostly diffuse conclusion changes.
arXiv preprint arXiv:2401.06855 (2024)
9 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
K-FinHallu is the first multi-turn Korean financial RAG hallucination benchmark; frontier LLMs struggle especially on justified abstention while an 8B fine-tuned model reaches competitive performance.
LDKE framework localizes fact-specific layers and disentangles inputs to improve generalization and locality in multimodal knowledge editing for MLLMs.
Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.
ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.
MultiHaluDet uses multi-layer hidden-state probing, multi-scale attention, and a calibrated classifier ensemble to detect multilingual hallucinations, reporting up to 98.55% AUROC on English benchmarks and strong cross-lingual transfer to French, Bangla, and Amharic.
Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.
Fine-tuned compact models achieve strong multilingual performance and large efficiency gains over LLMs on production data from 114 languages for claim detection and 28 for veracity prediction.
HalluScan benchmark evaluates hallucination detection in LLMs, reporting NLI Verification at AUROC 0.88 and introducing HalluScore (r=0.41 with humans) plus Adaptive Detection Routing for 2x cost savings.
citing papers explorer
-
MedHal-Loc: Are "Explainable-by-Architecture" Medical Hallucination Detectors Faithful Localizers? A Localization Benchmark
MedHal-Loc benchmark shows KG-triple hallucination detectors localize errors no better than chance on controlled medical statements due to entity extraction limits, while NLI and consistency methods succeed above chance, and real hallucinations are mostly diffuse conclusion changes.
-
K-FinHallu: A Hallucination Detection Benchmark for Multi-Turn RAG in Korean Finance
K-FinHallu is the first multi-turn Korean financial RAG hallucination benchmark; frontier LLMs struggle especially on justified abstention while an 8B fine-tuned model reaches competitive performance.
-
Towards Localized and Disentangled Knowledge Editing for Multimodal Large Language Models
LDKE framework localizes fact-specific layers and disentangles inputs to improve generalization and locality in multimodal knowledge editing for MLLMs.
-
Entropy Distribution as a Fingerprint for Hallucinations in Generative Models
Token entropy distributions fingerprint hallucinations in generative models, enabling the Calibrated Entropy Score (CES) for single-pass black-box detection with calibration guarantees via a novel DKW inequality.
-
ReFACT: A Benchmark for Scientific Confabulation Detection with Positional Error Annotations
ReFACT benchmark reveals LLMs show a persistent salient distractor failure mode where 61% of incorrect error span predictions are semantically unrelated to actual errors, persisting across model sizes, and comparative judgment yields lower F1 than independent detection.
-
MultiHaluDet: Multilingual Hallucination Detection via LLM Hidden State Probing
MultiHaluDet uses multi-layer hidden-state probing, multi-scale attention, and a calibrated classifier ensemble to detect multilingual hallucinations, reporting up to 98.55% AUROC on English benchmarks and strong cross-lingual transfer to French, Bangla, and Amharic.
-
Steering the Verifiability of Multimodal AI Hallucinations
Researchers create a human-labeled dataset of obvious and elusive multimodal hallucinations and use learned activation-space probes to control their verifiability in MLLMs.
-
Multilingual Fact-Checking at Scale: Fine-Tuned Compact Models vs LLMs
Fine-tuned compact models achieve strong multilingual performance and large efficiency gains over LLMs on production data from 114 languages for claim detection and 28 for veracity prediction.
-
HalluScan: A Systematic Benchmark for Detecting and Mitigating Hallucinations in Instruction-Following LLMs
HalluScan benchmark evaluates hallucination detection in LLMs, reporting NLI Verification at AUROC 0.88 and introducing HalluScore (r=0.41 with humans) plus Adaptive Detection Routing for 2x cost savings.