DoGMaTiQ automates creation of QA-based nuggets via document-grounded generation, paraphrase clustering, and quality subselection, producing report evaluations that correlate strongly with human judgments on cross-lingual TREC tasks.
Auto-ARGUE: LLM-Based Report Generation Evaluation
3 Pith papers cite this work. Polarity classification is still indexing.
abstract
Generation of citation-backed reports is a primary use case for retrieval-augmented generation (RAG) systems. While open-source evaluation tools exist for various RAG tasks, tools designed for report generation are lacking. Accordingly, we introduce Auto-ARGUE, a robust LLM-based implementation of the recently proposed ARGUE framework for report generation evaluation. We present analysis of Auto-ARGUE on the report generation pilot task from the TREC 2024 NeuCLIR track and on two tasks from the TREC 2024 RAG track, showing good system-level correlations with human judgments. Additionally, we release ARGUE-Viz, a web app for visualization and fine-grained analysis of Auto-ARGUE judgments and scores.
citation-role summary
citation-polarity summary
years
2026 3roles
background 1polarities
background 1representative citing papers
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.
citing papers explorer
-
DoGMaTiQ: Automated Generation of Question-and-Answer Nuggets for Report Evaluation
DoGMaTiQ automates creation of QA-based nuggets via document-grounded generation, paraphrase clustering, and quality subselection, producing report evaluations that correlate strongly with human judgments on cross-lingual TREC tasks.
-
Distributed Generative Inference of LLM at Internet Scales with Multi-Dimensional Communication Optimization
BloomBee is a distributed LLM inference system that achieves up to 1.76x higher throughput and 43.2% lower latency than prior decentralized systems by optimizing communication across multiple dimensions in low-bandwidth internet settings.