pith. sign in

arxiv: 2108.04884 · v3 · pith:QU7RSHPFnew · submitted 2021-08-10 · 💻 cs.LG · stat.ML

Retiring Adult: New Datasets for Fair Machine Learning

classification 💻 cs.LG stat.ML
keywords dataadultdatasetscensusfairnessresearchalgorithmicavailable
0
0 comments X
read the original abstract

Although the fairness community has recognized the importance of data, researchers in the area primarily rely on UCI Adult when it comes to tabular data. Derived from a 1994 US Census survey, this dataset has appeared in hundreds of research papers where it served as the basis for the development and comparison of many algorithmic fairness interventions. We reconstruct a superset of the UCI Adult data from available US Census sources and reveal idiosyncrasies of the UCI Adult dataset that limit its external validity. Our primary contribution is a suite of new datasets derived from US Census surveys that extend the existing data ecosystem for research on fair machine learning. We create prediction tasks relating to income, employment, health, transportation, and housing. The data span multiple years and all states of the United States, allowing researchers to study temporal shift and geographic variation. We highlight a broad initial sweep of new empirical insights relating to trade-offs between fairness criteria, performance of algorithmic interventions, and the role of distribution shift based on our new datasets. Our findings inform ongoing debates, challenge some existing narratives, and point to future research directions. Our datasets are available at https://github.com/zykls/folktables.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Toward Calibrated, Fair, and accurate Deepfake Detection

    cs.LG 2026-06 unverdicted novelty 7.0

    Face-Feature Tuning is a label-free logit remapping method that reduces FPR/TPR gaps across groups in deepfake detection while preserving overall accuracy.

  2. Linear Strategic Classification with Endogenous Improvements

    cs.LG 2026-05 unverdicted novelty 7.0

    Formalizes improvement-aware strategic classification for linear classifiers under single-index models, proves the strategic-optimal classifier is a parallel shift of the Bayes boundary, and supplies PAC guarantees wi...

  3. Anytime PAC-Bayes for Constrained Density-Ratio Networks under Covariate Shift

    cs.LG 2026-05 unverdicted novelty 7.0

    Framework combining constrained density-ratio networks with anytime PAC-Bayes for covariate shift.

  4. Rashomon Sets and Model Multiplicity in Federated Learning

    cs.LG 2026-02 unverdicted novelty 7.0

    The work provides the first formal definitions of Rashomon sets for federated learning and introduces a multiplicity-aware training pipeline evaluated on standard benchmarks.

  5. Anytime PAC-Bayes for Constrained Density-Ratio Networks under Covariate Shift

    cs.LG 2026-05 unverdicted novelty 6.0

    A constrained density-ratio network with augmented-Lagrangian enforcement and anytime PAC-Bayes delivers generalization certificates for importance-weighted learning under covariate shift.

  6. Anytime PAC-Bayes for Constrained Density-Ratio Networks under Covariate Shift

    cs.LG 2026-05 unverdicted novelty 6.0

    A constrained density-ratio network approximates the Radon-Nikodym derivative and feeds an anytime PAC-Bayes certificate for learning under covariate shift, validated via synthetic patch tests and real-data deployment.