Misleading Failures of Partial-input Baselines

Eric Wallace; Jordan Boyd-Graber; Shi Feng

arxiv: 1905.05778 · v3 · pith:5ZU33PGJnew · submitted 2019-05-14 · 💻 cs.LG · cs.AI· cs.CL· stat.ML

Misleading Failures of Partial-input Baselines

Shi Feng , Eric Wallace , Jordan Boyd-Graber This is my paper

classification 💻 cs.LG cs.AIcs.CLstat.ML

keywords partial-inputdatasetartifactsbaselinesbaselinehypothesis-onlymodelmodels

0 comments

read the original abstract

Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only models for SNLI or question-only models for VQA). When a partial-input baseline gets high accuracy, a dataset is cheatable. However, the converse is not necessarily true: the failure of a partial-input baseline does not mean a dataset is free of artifacts. To illustrate this, we first design artificial datasets which contain trivial patterns in the full input that are undetectable by any partial-input model. Next, we identify such artifacts in the SNLI dataset - a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of the examples that are previously considered "hard". Our work provides a caveat for the use of partial-input baselines for dataset verification and creation.

This paper has not been read by Pith yet.

Misleading Failures of Partial-input Baselines

discussion (0)