The authors introduce and release two new benchmarks for maternal-health RAG evaluation, built from expert sources with graded labels and disclosed limitations rather than binary judgments or new question authoring.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
Contrastive Reflection identifies error-anchored slices in agent traces, adds contrastive successes, and uses a Teacher LLM to generate prompt edits that are accepted only if they improve validation performance, raising HotpotQA exact-match from 51.4% to 60.4%.
citing papers explorer
-
mamabench and mamaretrieval: Benchmarks for Evaluating Medical Retrieval-Augmented Generation in Maternal, Neonatal, and Reproductive Health
The authors introduce and release two new benchmarks for maternal-health RAG evaluation, built from expert sources with graded labels and disclosed limitations rather than binary judgments or new question authoring.