LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.
arXiv preprint arXiv:2311.09677 , volume=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.
LLMs predict outcomes of real scientific experiments at 14-26% accuracy, comparable to human experts, but lack calibration on prediction reliability while humans demonstrate strong calibration.
RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.
Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.
A stateful iterative RAG system converts retrieved documents into scored reasoning units, maintains supportive and non-supportive evidence, and performs deficiency-driven query refinement to achieve more robust QA performance.
A survey that compiles and taxonomizes more than 32 existing hallucination mitigation techniques for LLMs while analyzing their challenges and limitations.
citing papers explorer
-
A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models
A survey that compiles and taxonomizes more than 32 existing hallucination mitigation techniques for LLMs while analyzing their challenges and limitations.