Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning

Anand Kannappan; Darshan Deshpande; Rebecca Qian; Sky CH-Wang; Smaranda Muresan

arxiv: 2503.19193 · v1 · pith:LIMG75N3new · submitted 2025-03-24 · 💻 cs.AI · cs.CL· cs.IR· cs.MA

Browsing Lost Unformed Recollections: A Benchmark for Tip-of-the-Tongue Search and Reasoning

Sky CH-Wang , Darshan Deshpande , Smaranda Muresan , Anand Kannappan , Rebecca Qian This is my paper

classification 💻 cs.AI cs.CLcs.IRcs.MA

keywords questionsreasoningassistantsbenchmarkbrowsinggenerallostrecollections

0 comments

read the original abstract

We introduce Browsing Lost Unformed Recollections, a tip-of-the-tongue known-item search and reasoning benchmark for general AI assistants. BLUR introduces a set of 573 real-world validated questions that demand searching and reasoning across multi-modal and multilingual inputs, as well as proficient tool use, in order to excel on. Humans easily ace these questions (scoring on average 98%), while the best-performing system scores around 56%. To facilitate progress toward addressing this challenging and aspirational use case for general AI assistants, we release 350 questions through a public leaderboard, retain the answers to 250 of them, and have the rest as a private test set.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Multilingual and Domain-Agnostic Tip-of-the-Tongue Query Generation for Simulated Evaluation
cs.IR 2026-04 unverdicted novelty 7.0

An LLM simulation framework generates multilingual tip-of-the-tongue queries, validated by rank correlation with real queries, producing the first large-scale ToT benchmarks for four languages.