Competency Problems: On Finding and Removing Artifacts in Language Data

Matt Gardner, William Merrill, Jesse Dodge, Matthew Peters, Alexis Ross, Sameer Singh, Noah A · 2021 · DOI 10.18653/v1/2021.emnlp-main.135

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance

cs.AI · 2026-05-11 · unverdicted · novelty 5.0

The authors propose creating data probes—synthetic sequences from defined random processes—to reveal how data properties drive LLM behavior across workflow stages.

Defending against Backdoor Attacks via Module Switching

cs.CR · 2025-04-08 · unverdicted · novelty 5.0

Module-switching defense disrupts backdoors more effectively than weight averaging with fewer models and remains robust even when some models share the same backdoors.

Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility

cs.LG · 2026-05-07 · unverdicted · novelty 4.0 · 2 refs

Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

citing papers explorer

Showing 3 of 3 citing papers.

Position: Let's Develop Data Probes to Fundamentally Understand How Data Affects LLM Performance cs.AI · 2026-05-11 · unverdicted · none · ref 12
The authors propose creating data probes—synthetic sequences from defined random processes—to reveal how data properties drive LLM behavior across workflow stages.
Defending against Backdoor Attacks via Module Switching cs.CR · 2025-04-08 · unverdicted · none · ref 12
Module-switching defense disrupts backdoors more effectively than weight averaging with fewer models and remains robust even when some models share the same backdoors.
Benchmarked Yet Not Measured -- Generative AI Should be Evaluated Against Real-World Utility cs.LG · 2026-05-07 · unverdicted · none · ref 50 · 2 links
Generative AI evaluation must shift from static benchmark scores to measuring sustained improvements in human capabilities within specific deployment contexts.

Competency Problems: On Finding and Removing Artifacts in Language Data

fields

years

verdicts

representative citing papers

citing papers explorer