Fact, fetch, and reason: A unified evaluation of retrieval-augmented generation

Satyapriya Krishna, Kalpesh Krishna, Anhad Mohananey, Steven Schwarcz, Adam Stambler, Shyam Upadhyay, Manaal Faruqui · 2025 · DOI 10.18653/v1/2025.naacl-long.243

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling

cs.CL · 2026-06-11 · unverdicted · novelty 8.0

LoHoSearch is a new benchmark of 544 KG-constructed questions across 11 domains where the strongest search agent scores 34.74% and context strategies add at most 6.8%.

LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know?

cs.AI · 2026-05-27 · unverdicted · novelty 7.0

LiveBrowseComp shows search agents rely on intrinsic knowledge on standard benchmarks, with scores dropping 25-40 points and closed-book accuracy below 2% on questions about facts from the prior 90 days.

Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG

cs.CL · 2026-05-31 · unverdicted · novelty 6.0

CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.

citing papers explorer

Showing 3 of 3 citing papers after filters.

LoHoSearch: Benchmarking Long-Horizon Search Agents Beyond the Human Difficulty Ceiling cs.CL · 2026-06-11 · unverdicted · none · ref 15
LoHoSearch is a new benchmark of 544 KG-constructed questions across 11 domains where the strongest search agent scores 34.74% and context strategies add at most 6.8%.
LiveBrowseComp: Are Search Agents Searching, or Just Verifying What They Already Know? cs.AI · 2026-05-27 · unverdicted · none · ref 43
LiveBrowseComp shows search agents rely on intrinsic knowledge on standard benchmarks, with scores dropping 25-40 points and closed-book accuracy below 2% on questions about facts from the prior 90 days.
Beyond Topical Similarity: Contrastive Evidence Retrieval with Interpretable Attention Alignment in RAG cs.CL · 2026-05-31 · unverdicted · none · ref 51
CERA fine-tunes a dense retriever with triplet contrastive learning plus attention alignment to human rationales, claiming better retrieval effectiveness and faithfulness on clinical trial reports than Contriever and standard hard-negative baselines.

Fact, fetch, and reason: A unified evaluation of retrieval-augmented generation

fields

years

verdicts

representative citing papers

citing papers explorer