The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence

Jessica Yurkofsky; Josh Joseph; Kasia S. Chmielinski; Kemi Thomas; Matt Taylor; Sarah Newman; Yue Chelsea Qiu

arxiv: 2201.03954 · v2 · pith:6JAKOKA7new · submitted 2022-01-10 · 💻 cs.LG · cs.AI

The Dataset Nutrition Label (2nd Gen): Leveraging Context to Mitigate Harms in Artificial Intelligence

Kasia S. Chmielinski , Sarah Newman , Matt Taylor , Josh Joseph , Kemi Thomas , Jessica Yurkofsky , Yue Chelsea Qiu This is my paper

classification 💻 cs.LG cs.AI

keywords labeldatanutritiondatasetdatasetsdesignlaunchingmitigate

0 comments

read the original abstract

As the production of and reliance on datasets to produce automated decision-making systems (ADS) increases, so does the need for processes for evaluating and interrogating the underlying data. After launching the Dataset Nutrition Label in 2018, the Data Nutrition Project has made significant updates to the design and purpose of the Label, and is launching an updated Label in late 2020, which is previewed in this paper. The new Label includes context-specific Use Cases &Alerts presented through an updated design and user interface targeted towards the data scientist profile. This paper discusses the harm and bias from underlying training data that the Label is intended to mitigate, the current state of the work including new datasets being labeled, new and existing challenges, and further directions of the work, as well as Figures previewing the new label.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Evaluating Structured Documentation as a Tool for Reflexivity in Dataset Development
cs.CY 2026-05 unverdicted novelty 5.0

Structured dataset documentation shows little engagement with major reflexivity themes from FAccT literature, leading to a new codebook and extended datasheet questions.
Evaluation of AI Ethics Tools in Language Models: A Developers' Perspective Case Study
cs.CY 2025-12 unverdicted novelty 3.0

Evaluation of Model Cards, ALTAI, FactSheets, and Harms Modeling on Portuguese language models shows they provide broad ethical guidance but overlook unique language features and negative impacts.