pith:HJFXGWJT
How Hyper-Datafication Impacts the Sustainability Costs in Frontier AI
Hyper-datafication in frontier AI redistributes environmental burdens, labor risks, and representational harms toward the Global South and precarious workers.
arxiv:2602.00056 v4 · 2026-01-20 · cs.CY · cs.AI
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{HJFXGWJTFENV5K4MFZ2F33QKUQ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
Our analyses reveal that hyper-datafication does not merely increase resource consumption but systematically redistributes environmental burdens, labour risks, and representational harms toward the Global South, precarious data workers, and under-represented cultures.
That the sample of approximately 550,000 Hugging Face Hub datasets combined with qualitative responses from data workers in Kenya sufficiently represents the global data practices and impacts of frontier AI models.
Hyper-datafication in frontier AI increases resource consumption and redistributes environmental burdens, labor risks, and representational harms toward the Global South, data workers, and under-represented cultures, based on analysis of 550,000 Hugging Face datasets and Kenyan worker responses.
Formal links
Receipt and verification
| First computed | 2026-06-09T02:07:18.797464Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
3a4b735933291b5eab8c2e745dee0aa424cb2c65a295f95e2ba43422800b0b6a
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/HJFXGWJTFENV5K4MFZ2F33QKUQ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 3a4b735933291b5eab8c2e745dee0aa424cb2c65a295f95e2ba43422800b0b6a
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "16393a5049da71e3fa936d22d6ac96346bcddddbf77ca377557fb2fe2c328910",
"cross_cats_sorted": [
"cs.AI"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CY",
"submitted_at": "2026-01-20T00:54:37Z",
"title_canon_sha256": "c3d8d7fb91b710b8fcbeff3670f4e0a684102c5de433390c3d2b9a2e3fc318b3"
},
"schema_version": "1.0",
"source": {
"id": "2602.00056",
"kind": "arxiv",
"version": 4
}
}