Adversarial Examples for Evaluating Reading Comprehension Systems
read the original abstract
Standard accuracy metrics indicate that reading comprehension systems are making rapid progress, but the extent to which these systems truly understand language remains unclear. To reward systems with real language understanding abilities, we propose an adversarial evaluation scheme for the Stanford Question Answering Dataset (SQuAD). Our method tests whether systems can answer questions about paragraphs that contain adversarially inserted sentences, which are automatically generated to distract computer systems without changing the correct answer or misleading humans. In this adversarial setting, the accuracy of sixteen published models drops from an average of $75\%$ F1 score to $36\%$; when the adversary is allowed to add ungrammatical sequences of words, average accuracy on four models decreases further to $7\%$. We hope our insights will motivate the development of new models that understand language more precisely.
This paper has not been read by Pith yet.
Forward citations
Cited by 7 Pith papers
-
Online Learning-to-Defer with Varying Experts
Presents the first online learning-to-defer algorithm with regret bounds O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
-
Universal and Transferable Adversarial Attacks on Aligned Language Models
Gradient and greedy search over token suffixes produces universal, transferable adversarial prompts that elicit objectionable outputs from aligned models including black-box commercial systems.
-
Online Learning-to-Defer with Varying Experts
Presents the first online Learning-to-Defer algorithm achieving regret O((n + n_e) T^{2/3}) generally and O((n + n_e) sqrt(T)) under low noise for multiclass classification with varying experts.
-
Pushing the Boundaries of Multiple Choice Evaluation to One Hundred Options
Scaling multiple-choice questions to 100 options on a Korean error detection task shows that LLM performance on conventional benchmarks overstates true competence due to shortcut strategies.
-
Adversarial Robustness in One-Stage Learning-to-Defer
Develops the first adversarial robustness framework for one-stage learning-to-defer, including cost-sensitive surrogate losses and theoretical consistency guarantees for classification and regression.
-
ReDef: Do Code Language Models Truly Understand Code Changes for Just-in-Time Software Defect Prediction?
ReDef creates a revert-anchored dataset of 3,164 defective and 10,268 clean code modifications and shows that code language models perform better with diff encodings but maintain stable performance under counterfactua...
-
Machine Reading Comprehension: a Literature Review
A 2019 survey of machine reading comprehension corpora and methods.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.