LLM agents reach 90.9% retrieval recall at K=200 but recover at most 52.7% of ground-truth included studies because they cannot reliably apply PI/ECO eligibility criteria to topically similar distractors.
Elliott, Tari Turner, Ornella Clavisi, James Thomas, Julian P
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Benchmarking LLM Agents on Meta-Analysis Articles from Nature Portfolio
LLM agents reach 90.9% retrieval recall at K=200 but recover at most 52.7% of ground-truth included studies because they cannot reliably apply PI/ECO eligibility criteria to topically similar distractors.