Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.
Do Large Language Models Know What They Don
4 Pith papers cite this work. Polarity classification is still indexing.
years
2026 4verdicts
UNVERDICTED 4representative citing papers
Frontier LLMs struggle to discriminate data uncertainty from model uncertainty even when accurate, but a new benchmark and lightweight RL strategy improve attribution without sacrificing answer accuracy.
BLINKG is a benchmark for evaluating LLMs on mapping input data schemas to ontology concepts for knowledge graph construction, with experiments showing promising but limited performance in complex real-world scenarios.
LaaB improves LLM hallucination detection by mapping self-judgment labels back into neural feature space and using mutual learning under logical consistency constraints between responses and meta-judgments.
citing papers explorer
-
Model-Adaptive Tool Necessity Reveals the Knowing-Doing Gap in LLM Tool Use
Model-adaptive tool necessity shows 26-54% mismatch with actual tool calls across LLMs, driven by nearly orthogonal hidden-state signals for cognition versus action.
-
BLINKG: A Benchmark for LLM-Integrated Knowledge Graph Generation
BLINKG is a benchmark for evaluating LLMs on mapping input data schemas to ontology concepts for knowledge graph construction, with experiments showing promising but limited performance in complex real-world scenarios.