CLAMBER: A benchmark of identifying and clarifying ambiguous information needs in large language models

Tong Zhang, Peixin Qin, Yang Deng, Chen Huang, Wenqiang Lei, Junhong Liu, Dingnan Jin, Hongru Liang, Tat-Seng Chua · 2024 · DOI 10.18653/v1/2024.acl-long.578

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science

cs.AI · 2026-05-18 · unverdicted · novelty 7.0

SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechanics disambiguation cases.

Beyond "I Don't Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty

cs.CL · 2026-04-19 · unverdicted · novelty 7.0

Frontier LLMs struggle to discriminate data uncertainty from model uncertainty even when accurate, but a new benchmark and lightweight RL strategy improve attribution without sacrificing answer accuracy.

Can Large Language Models Really Recognize Your Name?

cs.CR · 2025-05-20 · unverdicted · novelty 6.0

LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.

citing papers explorer

Showing 3 of 3 citing papers.

SCICONVBENCH: Benchmarking LLMs on Multi-Turn Clarification for Task Formulation in Computational Science cs.AI · 2026-05-18 · unverdicted · none · ref 81
SCICONVBENCH is a new benchmark evaluating LLMs on multi-turn disambiguation and inconsistency resolution for task formulation in computational science, with frontier models reaching only 52.7% success on fluid mechanics disambiguation cases.
Beyond "I Don't Know": Evaluating LLM Self-Awareness in Discriminating Data and Model Uncertainty cs.CL · 2026-04-19 · unverdicted · none · ref 28
Frontier LLMs struggle to discriminate data uncertainty from model uncertainty even when accurate, but a new benchmark and lightweight RL strategy improve attribution without sacrificing answer accuracy.
Can Large Language Models Really Recognize Your Name? cs.CR · 2025-05-20 · unverdicted · none · ref 80
LLMs exhibit 20-40% lower recall on ambiguous human names for PII detection, worsening under prompt injections, as shown via the new AmBench benchmark.

CLAMBER: A benchmark of identifying and clarifying ambiguous information needs in large language models

fields

years

verdicts

representative citing papers

citing papers explorer