Benchmarking large lan- guage models in retrieval-augmented generation

· 2023 · arXiv 2309.01431

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

VLegal-Bench: Cognitively Grounded Benchmark for Vietnamese Legal Reasoning of Large Language Models

cs.CL · 2025-12-16 · conditional · novelty 7.0

VLegal-Bench supplies 10,450 expert-validated samples for evaluating LLMs on Vietnamese legal questions, retrieval, multi-step reasoning, and scenario solving.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries

cs.CL · 2024-01-27 · accept · novelty 7.0

MultiHop-RAG is a new benchmark dataset demonstrating that existing retrieval-augmented generation systems perform poorly on multi-hop queries requiring retrieval and reasoning over multiple evidence pieces.

An Annotation Scheme and Classifier for Personal Facts in Dialogue

cs.CL · 2026-05-11 · accept · novelty 6.0

An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.

Beyond the Parameters: A Technical Survey of Contextual Enrichment in Large Language Models: From In-Context Prompting to Causal Retrieval-Augmented Generation

cs.CL · 2026-04-03 · unverdicted · novelty 4.0

The survey unifies LLM augmentation techniques along the single axis of structured context supplied at inference time and supplies a literature screening protocol plus deployment decision framework.

Retrieval-Augmented Generation for Large Language Models: A Survey

cs.CL · 2023-12-18 · unverdicted · novelty 3.0

A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.

citing papers explorer

Showing 2 of 2 citing papers after filters.

MultiHop-RAG: Benchmarking Retrieval-Augmented Generation for Multi-Hop Queries cs.CL · 2024-01-27 · accept · none · ref 5
MultiHop-RAG is a new benchmark dataset demonstrating that existing retrieval-augmented generation systems perform poorly on multi-hop queries requiring retrieval and reasoning over multiple evidence pieces.
An Annotation Scheme and Classifier for Personal Facts in Dialogue cs.CL · 2026-05-11 · accept · none · ref 2
An extended annotation scheme with new categories and attributes plus a Gemma-300M-based multi-head classifier achieves 81.6% macro F1 on personal fact classification, outperforming few-shot LLM baselines by nearly 9 points with lower compute.

Benchmarking large lan- guage models in retrieval-augmented generation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer