pith. machine review for the scientific record. sign in

arxiv: 2512.20182 · v4 · submitted 2025-12-23 · 💻 cs.CL · cs.AI

Recognition: unknown

FaithLens: Detecting and Explaining Faithfulness Hallucination

Authors on Pith no claims yet
classification 💻 cs.CL cs.AI
keywords datafaithlensexplanationsfaithfulnesshallucinationadvancedcorrectnessexplanation
0
0 comments X
read the original abstract

Recognizing whether outputs from large language models (LLMs) contain faithfulness hallucination is crucial for real-world applications, e.g., retrieval-augmented generation and summarization. In this paper, we introduce FaithLens, a cost-efficient and effective faithfulness hallucination detection model that can jointly provide binary predictions and corresponding explanations to improve trustworthiness. To achieve this, we first synthesize training data with explanations via advanced LLMs and apply a well-defined data filtering strategy to ensure label correctness, explanation quality, and data diversity. Subsequently, we fine-tune the model on these well-curated training data as a cold start and further optimize it with rule-based reinforcement learning, using rewards for both prediction correctness and explanation quality. Results on 12 diverse tasks show that the 8B-parameter FaithLens outperforms advanced models such as GPT-5.2 and o3. Also, FaithLens can produce high-quality explanations, delivering a distinctive balance of trustworthiness, efficiency, and effectiveness.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. From Context to Skills: Can Language Models Learn from Context Skillfully?

    cs.AI 2026-04 unverdicted novelty 8.0

    Ctx2Skill lets language models autonomously evolve context-specific skills via multi-agent self-play, improving performance on context learning tasks without human supervision.