Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering

Jimmy Lin; Kun Xiong; Luchen Tan; Ming Li; Wei Yang; Yuqing Xie

arxiv: 1904.06652 · v1 · pith:RQTOO3JLnew · submitted 2019-04-14 · 💻 cs.CL · cs.IR

Data Augmentation for BERT Fine-Tuning in Open-Domain Question Answering

Wei Yang , Yuqing Xie , Luchen Tan , Kun Xiong , Ming Li , Jimmy Lin This is my paper

classification 💻 cs.CL cs.IR

keywords databertdatasetsansweringaugmentationlargepreviousquestion

0 comments

read the original abstract

Recently, a simple combination of passage retrieval using off-the-shelf IR techniques and a BERT reader was found to be very effective for question answering directly on Wikipedia, yielding a large improvement over the previous state of the art on a standard benchmark dataset. In this paper, we present a data augmentation technique using distant supervision that exploits positive as well as negative examples. We apply a stage-wise approach to fine tuning BERT on multiple datasets, starting with data that is "furthest" from the test data and ending with the "closest". Experimental results show large gains in effectiveness over previous approaches on English QA datasets, and we establish new baselines on two recent Chinese QA datasets.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dense Passage Retrieval for Open-Domain Question Answering
cs.CL 2020-04 accept novelty 8.0

Dense dual-encoder retrievers outperform BM25 by 9-19% absolute in top-20 passage retrieval accuracy across open-domain QA datasets and enable new state-of-the-art end-to-end QA results.