CARE, a context-aware LLM judge, outperforms standard methods when evaluating multi-hop retrieval quality in RAG systems.
arXiv preprint arXiv:2410.12248 (2024)
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
background 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
background 1polarities
background 1representative citing papers
Deepchecks is a new multi-faceted evaluation framework for RAG that incorporates root cause analysis and production monitoring to assess reliability, relevance, and user satisfaction.
citing papers explorer
-
Evaluating Multi-Hop Reasoning in RAG Systems: A Comparison of LLM-Based Retriever Evaluation Strategies
CARE, a context-aware LLM judge, outperforms standard methods when evaluating multi-hop retrieval quality in RAG systems.
-
Deepchecks: Evaluating Retrieval-Augmented Generation (RAG)
Deepchecks is a new multi-faceted evaluation framework for RAG that incorporates root cause analysis and production monitoring to assess reliability, relevance, and user satisfaction.