Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models

Alfonso Amayuelas, Kyle Wong, Liangming Pan, Wenhu Chen, William Yang Wang · 2024 · DOI 10.18653/v1/2024.findings-acl.383

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

PhantomBench: Benchmarking the Non-existential Threat of Language Models

cs.CL · 2026-06-09 · unverdicted · novelty 7.0

PhantomBench is a new benchmark of 60K+ non-existent terms showing language models hallucinate at rates up to 86.7 percent even when inputs assume the concepts exist.

Beyond "I Don't Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

Frontier LLMs struggle to discriminate data uncertainty from model uncertainty even when accurate, but a new benchmark and lightweight RL strategy improve attribution without sacrificing answer accuracy.

Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information

cs.AI · 2026-05-27 · unverdicted · novelty 5.0

JTS trains reasoning models via supervised warm-up and missing-premise RL to make an explicit answerability commitment that triggers early termination on unanswerable inputs, raising Abstention@Detection near saturation.

citing papers explorer

Showing 3 of 3 citing papers after filters.

PhantomBench: Benchmarking the Non-existential Threat of Language Models cs.CL · 2026-06-09 · unverdicted · none · ref 42
PhantomBench is a new benchmark of 60K+ non-existent terms showing language models hallucinate at rates up to 86.7 percent even when inputs assume the concepts exist.
Beyond "I Don't Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty cs.CL · 2026-04-19 · unverdicted · none · ref 32
Frontier LLMs struggle to discriminate data uncertainty from model uncertainty even when accurate, but a new benchmark and lightweight RL strategy improve attribution without sacrificing answer accuracy.
Bridging the Detection-to-Abstention Gap in Reasoning Models under Insufficient Information cs.AI · 2026-05-27 · unverdicted · none · ref 1
JTS trains reasoning models via supervised warm-up and missing-premise RL to make an explicit answerability commitment that triggers early termination on unanswerable inputs, raising Abstention@Detection near saturation.

Knowledge of Knowledge: Exploring Known-Unknowns Uncertainty with Large Language Models

fields

years

verdicts

representative citing papers

citing papers explorer