Auto-ARGUE: LLM-Based Report Generation Evaluation

· 2025 · cs.IR · arXiv 2509.26184

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Generation of citation-backed reports is a primary use case for retrieval-augmented generation (RAG) systems. While open-source evaluation tools exist for various RAG tasks, tools designed for report generation are lacking. Accordingly, we introduce Auto-ARGUE, a robust LLM-based implementation of the recently proposed ARGUE framework for report generation evaluation. We present analysis of Auto-ARGUE on the report generation pilot task from the TREC 2024 NeuCLIR track and on two tasks from the TREC 2024 RAG track, showing good system-level correlations with human judgments. Additionally, we release ARGUE-Viz, a web app for visualization and fine-grained analysis of Auto-ARGUE judgments and scores.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation

cs.CL · 2026-05-06 · unverdicted · novelty 6.0

DoGMaTiQ automates QA-nugget creation via document-grounded generation, paraphrase clustering, and quality-based subselection, yielding strong rank correlations with human judgments on cross-lingual TREC tasks.

Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization

cs.DC · 2026-04-22 · unverdicted · novelty 5.0

BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.

Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage

cs.IR · 2026-03-09 · unverdicted · novelty 5.0

Coverage-focused retrieval metrics correlate strongly with nugget coverage in RAG responses across text and multimodal benchmarks, supporting their use as performance proxies when retrieval and generation goals align.

citing papers explorer

Showing 3 of 3 citing papers.

DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation cs.CL · 2026-05-06 · unverdicted · none · ref 35 · internal anchor
DoGMaTiQ automates QA-nugget creation via document-grounded generation, paraphrase clustering, and quality-based subselection, yielding strong rank correlations with human judgments on cross-lingual TREC tasks.
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization cs.DC · 2026-04-22 · unverdicted · none · ref 41 · internal anchor
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.
Beyond Relevance: On the Relationship Between Retrieval and RAG Information Coverage cs.IR · 2026-03-09 · unverdicted · none · ref 58 · internal anchor
Coverage-focused retrieval metrics correlate strongly with nugget coverage in RAG responses across text and multimodal benchmarks, supporting their use as performance proxies when retrieval and generation goals align.

Auto-ARGUE: LLM-Based Report Generation Evaluation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer