pith. sign in

arxiv: 1905.05778 · v3 · pith:5ZU33PGJnew · submitted 2019-05-14 · 💻 cs.LG · cs.AI· cs.CL· stat.ML

Misleading Failures of Partial-input Baselines

classification 💻 cs.LG cs.AIcs.CLstat.ML
keywords partial-inputdatasetartifactsbaselinesbaselinehypothesis-onlymodelmodels
0
0 comments X
read the original abstract

Recent work establishes dataset difficulty and removes annotation artifacts via partial-input baselines (e.g., hypothesis-only models for SNLI or question-only models for VQA). When a partial-input baseline gets high accuracy, a dataset is cheatable. However, the converse is not necessarily true: the failure of a partial-input baseline does not mean a dataset is free of artifacts. To illustrate this, we first design artificial datasets which contain trivial patterns in the full input that are undetectable by any partial-input model. Next, we identify such artifacts in the SNLI dataset - a hypothesis-only model augmented with trivial patterns in the premise can solve 15% of the examples that are previously considered "hard". Our work provides a caveat for the use of partial-input baselines for dataset verification and creation.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.