emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Anusri Pampari; Jennifer Liang; Jian Peng; Preethi Raghavan

arxiv: 1809.00732 · v1 · pith:3PDBQCOGnew · submitted 2018-09-03 · 💻 cs.CL

emrQA: A Large Corpus for Question Answering on Electronic Medical Records

Anusri Pampari , Preethi Raghavan , Jennifer Liang , Jian Peng This is my paper

classification 💻 cs.CL

keywords questionannotationsansweringcorpusdatasetdatasetselectronicemrqa

0 comments

read the original abstract

We propose a novel methodology to generate domain-specific large-scale question answering (QA) datasets by re-purposing existing annotations for other NLP tasks. We demonstrate an instance of this methodology in generating a large-scale QA dataset for electronic medical records by leveraging existing expert annotations on clinical notes for various NLP tasks from the community shared i2b2 datasets. The resulting corpus (emrQA) has 1 million question-logical form and 400,000+ question-answer evidence pairs. We characterize the dataset and explore its learning potential by training baseline models for question to logical form and question to answer mapping.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PubMedQA: A Dataset for Biomedical Research Question Answering
cs.CL 2019-09 unverdicted novelty 7.0

PubMedQA supplies 273k+ biomedical QA instances that require reasoning over research abstracts to produce yes/no/maybe answers.
HypEHR: Hyperbolic Modeling of Electronic Health Records for Efficient Question Answering
cs.AI 2026-04 unverdicted novelty 6.0

HypEHR is a hyperbolic embedding model for EHR data that uses Lorentzian geometry and hierarchy-aware pretraining to answer clinical questions nearly as well as large language models but with much smaller size.
Towards an AI co-scientist
cs.AI 2025-02 unverdicted novelty 6.0

A multi-agent AI system generates novel biomedical hypotheses that show promising experimental validation in drug repurposing for leukemia, new targets for liver fibrosis, and a bacterial gene transfer mechanism.
Towards Expert-Level Medical Question Answering with Large Language Models
cs.CL 2023-05 unverdicted novelty 6.0

Med-PaLM 2 achieves 86.5% accuracy on MedQA and approaches or exceeds prior state-of-the-art on other medical QA benchmarks while receiving higher physician preference ratings than human answers on consumer questions.