Releases mamabench (25,949 QA items from seven expert sources) and mamaretrieval (3,185 graded queries over 63,650 chunks) to evaluate RAG in maternal, neonatal, and reproductive health.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Contrastive Reflection identifies error-anchored slices in agent traces, adds contrastive successes, and uses a Teacher LLM to generate prompt edits that are accepted only if they improve validation performance, raising HotpotQA exact-match from 51.4% to 60.4%.
citing papers explorer
-
mamabench and mamaretrieval: Benchmarks for Evaluating Medical Retrieval-Augmented Generation in Maternal, Neonatal, and Reproductive Health
Releases mamabench (25,949 QA items from seven expert sources) and mamaretrieval (3,185 graded queries over 63,650 chunks) to evaluate RAG in maternal, neonatal, and reproductive health.
-
Contrastive Reflection for Iterative Prompt Optimization
Contrastive Reflection identifies error-anchored slices in agent traces, adds contrastive successes, and uses a Teacher LLM to generate prompt edits that are accepted only if they improve validation performance, raising HotpotQA exact-match from 51.4% to 60.4%.