pith. sign in

arxiv: 2401.15449 · v1 · pith:V7IF3ZSUnew · submitted 2024-01-27 · 💻 cs.CL

Learning to Trust Your Feelings: Leveraging Self-awareness in LLMs for Hallucination Mitigation

classification 💻 cs.CL
keywords knowledgellmsinternalfactualhallucinationlearningmodelsstate
0
0 comments X
read the original abstract

We evaluate the ability of Large Language Models (LLMs) to discern and express their internal knowledge state, a key factor in countering factual hallucination and ensuring reliable application of LLMs. We observe a robust self-awareness of internal knowledge state in LLMs, evidenced by over 85% accuracy in knowledge probing. However, LLMs often fail to express their internal knowledge during generation, leading to factual hallucinations. We develop an automated hallucination annotation tool, Dreamcatcher, which merges knowledge probing and consistency checking methods to rank factual preference data. Using knowledge preference as reward, We propose a Reinforcement Learning from Knowledge Feedback (RLKF) training framework, leveraging reinforcement learning to enhance the factuality and honesty of LLMs. Our experiments across multiple models show that RLKF training effectively enhances the ability of models to utilize their internal knowledge state, boosting performance in a variety of knowledge-based and honesty-related tasks.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Scaling with Confidence: Calibrating Confidence of LLMs for Adaptive Test Time Scaling

    cs.AI 2026-07 unverdicted novelty 5.0

    C3RL is a new RL algorithm combining correctness, calibration, and reference accuracy rewards to improve LLM confidence calibration, enabling CAS to outperform majority voting with up to 12.33x lower inference cost.

  2. SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning

    cs.CV 2026-06 unverdicted novelty 5.0

    SingGuard presents a policy-adaptive multimodal LLM guardrail family with hybrid reasoning regimes and a new benchmark of 56,340 examples, claiming SOTA F1 across 35 datasets and improved policy adherence under runtim...

  3. SingGuard: A Policy-Adaptive Multimodal LLM Guardrail with Dynamic Reasoning

    cs.CV 2026-06 unverdicted novelty 5.0

    SingGuard introduces a policy-adaptive multimodal LLM guardrail with dynamic reasoning regimes and SingGuard-Bench, reporting SOTA F1 scores across 35 datasets and improved policy-following accuracy under runtime shifts.