arXiv preprint arXiv:2311.09677 , volume=

URLhttps://arxiv · 2023 · arXiv 2311.09677

7 Pith papers cite this work. Polarity classification is still indexing.

7 Pith papers citing it

read on arXiv browse 7 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Can LLMs Use Linguistic Uncertainty Markers to Reliably Reflect Intrinsic Confidence?

cs.CL · 2026-05-27 · unverdicted · novelty 7.0

LLMs struggle to associate epistemic markers with stable internal confidence levels across distributions, even under model-centric interpretations, while maintaining somewhat consistent marker rankings.

Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use

cs.AI · 2026-05-13 · unverdicted · novelty 7.0 · 2 refs

Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.

SciPredict: Can LLMs Predict the Outcomes of Scientific Experiments in Natural Sciences?

cs.AI · 2026-04-12 · unverdicted · novelty 7.0

LLMs predict outcomes of real scientific experiments at 14-26% accuracy, comparable to human experts, but lack calibration on prediction reliability while humans demonstrate strong calibration.

Reinforcement Learning with Metacognitive Feedback Elicits Faithful Uncertainty Expression in LLMs

cs.CL · 2026-06-30 · unverdicted · novelty 6.0

RLMF uses quality of model self-judgments to refine RL rankings and select training data, achieving SOTA faithful calibration while preserving accuracy and outperforming standard RL by up to 63%.

ASPI: Seeking Ambiguity Clarification Amplifies Prompt Injection Vulnerability in LLM Agents

cs.CR · 2026-05-17 · conditional · novelty 6.0

Clarification-seeking in LLM agents amplifies prompt injection attack success from ~2% to over 30% across ten frontier models in a new 728-scenario benchmark.

Stateful Evidence-Driven Retrieval-Augmented Generation with Iterative Reasoning

cs.CL · 2026-03-25 · unverdicted · novelty 5.0

A stateful iterative RAG system converts retrieved documents into scored reasoning units, maintains supportive and non-supportive evidence, and performs deficiency-driven query refinement to achieve more robust QA performance.

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models

cs.CL · 2024-01-02 · accept · novelty 4.0

A survey that compiles and taxonomizes more than 32 existing hallucination mitigation techniques for LLMs while analyzing their challenges and limitations.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Comprehensive Survey of Hallucination Mitigation Techniques in Large Language Models cs.CL · 2024-01-02 · accept · none · ref 42
A survey that compiles and taxonomizes more than 32 existing hallucination mitigation techniques for LLMs while analyzing their challenges and limitations.

arXiv preprint arXiv:2311.09677 , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer