QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

Ahmad Zaidi; Ahmed Hassan Awadallah; Asli Celikyilmaz; Da Yin; Dragomir Radev; Ming Zhong; Mutethia Mutuma; Rahul Jha; Tao Yu; Xipeng Qiu

arxiv: 2104.05938 · v1 · pith:QK6K3PMInew · submitted 2021-04-13 · 💻 cs.CL

QMSum: A New Benchmark for Query-based Multi-domain Meeting Summarization

Ming Zhong , Da Yin , Tao Yu , Ahmad Zaidi , Mutethia Mutuma , Rahul Jha , Ahmed Hassan Awadallah , Asli Celikyilmaz

show 3 more authors

Yang Liu Xipeng Qiu Dragomir Radev

This is my paper

classification 💻 cs.CL

keywords meetingmeetingsqmsumsummarizationtaskbenchmarklongmulti-domain

0 comments

read the original abstract

Meetings are a key component of human collaboration. As increasing numbers of meetings are recorded and transcribed, meeting summaries have become essential to remind those who may or may not have attended the meetings about the key decisions made and the tasks to be completed. However, it is hard to create a single short summary that covers all the content of a long meeting involving multiple people and topics. In order to satisfy the needs of different types of users, we define a new query-based multi-domain meeting summarization task, where models have to select and summarize relevant spans of meetings in response to a query, and we introduce QMSum, a new benchmark for this task. QMSum consists of 1,808 query-summary pairs over 232 meetings in multiple domains. Besides, we investigate a locate-then-summarize method and evaluate a set of strong summarization baselines on the task. Experimental results and manual analysis reveal that QMSum presents significant challenges in long meeting summarization for future research. Dataset is available at \url{https://github.com/Yale-LILY/QMSum}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 8 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

IE as Cache: Information Extraction Enhanced Agentic Reasoning
cs.CL 2026-04 unverdicted novelty 7.0

IE-as-Cache framework repurposes information extraction as a dynamic cognitive cache to improve agentic reasoning accuracy in LLMs on challenging benchmarks.
Trustworthy AI: Ensuring Reliability and Accountability from Models to Agents
cs.LG 2026-05 unverdicted novelty 6.0

The thesis presents a kernel method for multiaccuracy across overlooked subpopulations, information-theoretic optimal watermarking for LLMs, and a simulator showing LLM agents outperforming humans in supply chains whi...
Ada-KV: Optimizing KV Cache Eviction by Adaptive Budget Allocation for Efficient LLM Inference
cs.CL 2024-07 accept novelty 6.0

Ada-KV is the first head-wise adaptive KV cache budget allocator for LLMs, using a theoretical loss upper bound to allocate eviction differently per attention head and yielding higher quality than uniform methods on l...
SnapKV: LLM Knows What You are Looking for Before Generation
cs.CL 2024-04 conditional novelty 6.0

SnapKV selects clustered important KV positions per attention head from an observation window at the prompt end, yielding 3.6x faster generation and 8.2x better memory efficiency on 16K-token inputs with comparable pe...
Retentive Network: A Successor to Transformer for Large Language Models
cs.CL 2023-07 unverdicted novelty 6.0

RetNet is a new sequence modeling architecture that delivers parallel training, constant-time inference, and competitive language modeling performance as a potential replacement for Transformers.
Coverage-Driven KV Cache Eviction for Efficient and Improved Inference of LLM
cs.CL 2026-06 unverdicted novelty 5.0

K-VEC is a coverage-aware KV-cache eviction strategy using cross-head and cross-layer modules that improves performance by up to 10.35 points over prior methods on LongBench subsets at fixed memory budget.
E2LLM: Encoder Elongated Large Language Models for Long-Context Understanding and Reasoning
cs.CL 2024-09 unverdicted novelty 5.0

E2LLM uses encoder-based soft prompt compression for long contexts to improve LLM reasoning on tasks like summarization and QA while maintaining efficiency.
Retrieval-Augmented Generation for Large Language Models: A Survey
cs.CL 2023-12 unverdicted novelty 3.0

A survey of RAG paradigms, components, benchmarks, and challenges for improving LLMs on knowledge-intensive tasks.