pith. sign in

arxiv: 2402.18479 · v2 · pith:IVKMN6YHnew · submitted 2024-02-28 · 💻 cs.CL

NewsQs: Multi-Source Question Generation for the Inquiring Mind

classification 💻 cs.CL
keywords modeldatasetnewsnewsqsquestionshumanmulti-documentsummarization
0
0 comments X
read the original abstract

We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents. To create NewsQs, we augment a traditional multi-document summarization dataset with questions automatically generated by a T5-Large model fine-tuned on FAQ-style news articles from the News On the Web corpus. We show that fine-tuning a model with control codes produces questions that are judged acceptable more often than the same model without them as measured through human evaluation. We use a QNLI model with high correlation with human annotations to filter our data. We release our final dataset of high-quality questions, answers, and document clusters as a resource for future work in query-based multi-document summarization.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.